From nemesis!uhclem@fw.ast.com Sun Feb 25 18:40:54 1996 Received: from fw.ast.com (fw.ast.com [165.164.6.25]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id SAA13538 for ; Sun, 25 Feb 1996 18:40:54 -0800 (PST) Received: from nemesis by fw.ast.com with uucp (Smail3.1.29.1 #2) id m0tqsoT-00084iC; Sun, 25 Feb 96 20:37 CST Received: by nemesis.lonestar.org (Smail3.1.27.1 #20) id m0tqslH-000CKVC; Sun, 25 Feb 96 20:34 WET Message-Id: Date: Sun, 25 Feb 96 20:34 WET From: uhclem@nemesis.lonestar.org Reply-To: uhclem To: FreeBSD-gnats-submit@freebsd.org Subject: Warning from sio driver reports wrong device FDIV045 X-Send-Pr-Version: 3.2 >Number: 1042 >Category: i386 >Synopsis: Warning from sio driver reports wrong device FDIV045 >Confidential: no >Severity: non-critical >Priority: low >Responsible: bde >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Feb 25 18:50:01 PST 1996 >Closed-Date: Fri Jul 3 02:18:40 PDT 1998 >Last-Modified: Fri Jul 3 02:19:45 PDT 1998 >Originator: Frank Durda IV >Release: FreeBSD 2.1-STABLE i386 >Organization: None >Environment: FreeBSD 2.1 system running a 486DX-33MHz 128K L2 Cache, 12Meg RAM, four 16550A serial ports (one NS16552 (dual port 16550A), two Startech 16550A), 1540B SCSI controller, WD8013 Ethernet (inactive). Ports sio0, sio2, sio3 connected to modems, sio1 not connected to anything for this test. Two of the ports connected to Telebit WorldBlazers configured at fixed DTE 57600. Also a Cardinal V.34 modem at DTE 57600 All modems have hardware flow control enabled. >Description: When the DTE speed of the WorldBlazers is increased from 38400 to 57600, the above system experiences "tty-level buffer overflows". As a symptom of the problem, UUCP sessions end up receiving corrupted files (this should not happen but it does), and the kernel reports messages like: Feb 25 19:48:00 nemesis /kernel: sio1: 247 more tty-level buffer overflows (total 3100) Note that the system reports the problem on sio1, when there is nothing connected to that port. That actual overrun probably occurred on sio0 or sio3. Another interesting thing is that the Cardinal modem is V.34 and receives compressed news at rates up to 3100CPS, but never appears to cause these overruns. The Telebits (Turbo PEP or PEP) only manage between 1600 and 2100 CPS and they do experience these overruns when the DTE is set to 57600. There are no overruns when the Worldblazers are fixed at 38400. Hardware flow control is set on all devices and uucico is patched to force RTSCTS flow control on incoming and outgoing UUCP sessions, and this can be verified by stty -a < /dev/tty[Dd]3. Modifying the sio.c driver to trigger at 8 instead of 14 reduces but does not eliminate the above error messages. Only reducing the DTE on the WorldBlazers back to 38400 eliminates the problem. I have also swapped ports in case the NS16552 and Startech parts were performing differently. The problem follows the ports used by the WorldBlazers. So the problems appear to be: 1. Faulty reporting of the guilty device in the kernel warning message. It seems to always blame sio1 regardless of what lines are active. 2. There doesn't appear to be any documentation on what the kernel error message is trying to report. Reducing the FIFO interrupt trigger did not help, implying a different type of overrun in the kernel instead of a hardware FIFO overrun. Because PEP tends to return data in bursts of 64 bytes, perhaps some software-based buffer is being overrun. Since there appears to be code in sio.c that would detect overruns in the hardware FIFO, report this and lower the trigger value automatically, either this code isn't working or this isn't the type of overrun the kernel is trying to report. Again, no documentation. 3. When the kernel message is displayed, it usually is displayed three times in a row, all with the same timestamp. It only appears once in /var/log/messages. >How-To-Repeat: Here, simply establish a protocol g or i UUCP session using Telebit WorldBlazers and receive data from a remote system with the DTE fixed at 57600. If the connection is at 22000bps or faster, failure is likely. Failures only appear during PEP/Turbo PEP sessions. >Fix: Workarounds: By reducing the hardware interrupt trigger to 8 (from 14), the error count was reduced, but not eliminated. The only sure-fire workaround is to lower the DTE speeds to 38400. *END* >Release-Note: >Audit-Trail: From: Bruce Evans To: FreeBSD-gnats-submit@FreeBSD.ORG, uhclem@freefall.freebsd.org Cc: Subject: Re: i386/1042: Warning from sio driver reports wrong device FDIV045 Date: Mon, 26 Feb 1996 18:33:51 +1100 >... >Ports sio0, sio2, sio3 connected to modems, sio1 not connected to anything ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >... >Feb 25 19:48:00 nemesis /kernel: sio1: 247 more tty-level buffer overflows (total 3100) >Note that the system reports the problem on sio1, when there is nothing >connected to that port. That actual overrun probably occurred on sio0 >or sio3. This may be caused by sio1 picking up radiation from the other ports. It shouldn't occur if sio1 isn't open, however (then the UART may be kept busy by the radiation but the driver ignores it). The radiation problem can usually be fixed by connecting the port to something (even something inactive). The verbose error reporting can take long enough to interfere with the reception of futher data :-(. Errors were once reported every clock tick (the rc driver still does this) and slow machines take more than one clock tick to report an error so the first error triggered an endless cascade of errors. >Another interesting thing is that the Cardinal modem is V.34 and receives >compressed news at rates up to 3100CPS, but never appears to cause >these overruns. The Telebits (Turbo PEP or PEP) only manage between >1600 and 2100 CPS and they do experience these overruns when the DTE >is set to 57600. There are no overruns when the Worldblazers are fixed >at 38400. Do the Telebits honour flow control? >So the problems appear to be: >1. Faulty reporting of the guilty device in the kernel warning message. > It seems to always blame sio1 regardless of what lines are active. Probably not. >2. There doesn't appear to be any documentation on what the kernel > error message is trying to report. See the sio man page. > Reducing the FIFO interrupt trigger did not help, implying a > different type of overrun in the kernel instead of a hardware FIFO > overrun. Because PEP tends to return data in bursts of 64 bytes, > perhaps some software-based buffer is being overrun. The raw queue has a size of only 1024 at all baud rates so it is quite easy to overrun at high baud rates. At 115200 bps, 1024 bytes may arrive in less than one process scheduling quantum (100 msec) so there the buffer is too small if there are 2 hog processes. Flow control had better work. > Since there appears to be code in sio.c that would detect overruns > in the hardware FIFO, report this and lower the trigger value > automatically, either this code isn't working or this isn't the > type of overrun the kernel is trying to report. Again, no > documentation. That code has almost always been disabled and doesn't exist in -current. It tended to drop the trigger level to 1 for transient errors. >3. When the kernel message is displayed, it usually is displayed three > times in a row, all with the same timestamp. It only appears once > in /var/log/messages. Messages are normally repeated for each root login. Bruce Responsible-Changed-From-To: freebsd-bugs->bde Responsible-Changed-By: scrappy Responsible-Changed-When: Wed Apr 10 11:33:54 PDT 1996 Responsible-Changed-Why: another one that falls under Bruce's domain State-Changed-From-To: open->closed State-Changed-By: phk State-Changed-When: Fri Jul 3 02:18:40 PDT 1998 State-Changed-Why: As part of our PR audition campaign, this PR has been closed. The subject seems to be in the category of pilot error or misunderstanding or alternatively of insufficient significance to draw any developer attention. We apologize for late response to this PR. >Unformatted: