From wh@helbig.informatik.ba-stuttgart.de Sat Dec 28 04:00:47 1996 Received: from helbig.informatik.ba-stuttgart.de (helbig.informatik.ba-stuttgart.de [141.31.166.22]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id EAA09328 for ; Sat, 28 Dec 1996 04:00:35 -0800 (PST) Received: (from root@localhost) by helbig.informatik.ba-stuttgart.de (8.8.4/8.7.3) id NAA22867; Sat, 28 Dec 1996 13:00:22 +0100 (MET) Message-Id: <199612281200.NAA22867@helbig.informatik.ba-stuttgart.de> Date: Sat, 28 Dec 1996 13:00:22 +0100 (MET) From: Wolfgang Helbig Reply-To: wh@helbig.informatik.ba-stuttgart.de To: FreeBSD-gnats-submit@freebsd.org Subject: system hangs with second Ide controller X-Send-Pr-Version: 3.2 >Number: 2305 >Category: kern >Synopsis: system hangs with second Ide controller >Confidential: no >Severity: critical >Priority: high >Responsible: se >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Dec 28 04:10:01 PST 1996 >Closed-Date: Sat Apr 12 14:04:25 PDT 1997 >Last-Modified: Sat Apr 12 14:08:48 PDT 1997 >Originator: Wolfgang Helbig >Release: FreeBSD 3.0-CURRENT i386 >Organization: >Environment: Processor: i486DX2 Motherboard: UM8810P AIO Chipsets: UMC UM8881 (Host Bridge) UMC UM8886 (ISA Brigde Controller) CMD PCIO640B (PCI Bus IDE Controller) BIOS: Phoenix - BIOS Disks: Conner Peripherals 850 MB CFS850A Seagate 1275 MB ST31276A CD-ROM: Mitsumi ATAPI FX400 Software: Generic Kernel FreeBSD 2.1.5 Generic Kernel FreeBSD 2.2 Alpha Kernel FreeBSD - current (22. Dec ) >Description: Whenever wdc1 and wdc0 are used concurrently (copying one file from a disk connected to wdc1 to a disk connected to wdc0) the system hangs after about 2 seconds. >How-To-Repeat: Attach one disk as master to wdc0 the other as master to wdc1 and copy a big file from one disk to the other. OR: Attach one disk as master to wdc2, the other as master to wdc0 and the CD-ROM as slave to wdc0. Try to install from CD-ROM to the disk on wdc2. (FreeBSD 2.1.5-boot.flp) Everything works fine till the binaries are installed. The system hangs. >Fix: Attach both disks to wdc0 and forget the CD-ROM. >Release-Note: >Audit-Trail: From: Helbig@MX.BA-Stuttgart.De To: freebsd-gnats-submit@freebsd.org, wh@helbig.informatik.ba-stuttgart.de Cc: Subject: Re: kern/2305: system hangs with second Ide controller Date: Wed, 1 Jan 1997 12:48:11 +0100 (MET) In the meantime I have been pointed to some URL's (thanks to everyone !) from which I learned that I have a "well known problem", the IDE-Controller CMD 640B, which just doesn't support concurrently use of both channels. Seems that other operating systems either don't use primary and secondary IDE port simultaneously (like DOS ...) or have an option to prevent this (like Linux). I wonder why it's worthwhile to implement a workaround for this hardware flaw in the FreeBSD-Kernel. The URLS are: http://thef-nym.sci.kun.nl/cgi-pieterh/atazip/atafq-3.html#ss3.2 http://www.phyttp://www.physics.ohio-state.edu/~jrdemers/ibm/eide.txt Of the three 640B-flaws mentioned in the second URL only the third seems to apply to my configuration. (lucky me!): Does not support simultaneous I/O on primary and secondary EIDE ports. If someone patched the wd-controller to avoid simultaneous I/O please let me know! Happy New Year, Wolfgang Helbig From: Mike Pritchard To: freebsd-gnats-submit Cc: Subject: Re: kern/2305 Date: Tue, 18 Feb 1997 23:25:19 -0800 (PST) Here is a work around posted to freebsd-hackers resent here to get into the audit trail of this PR: > From: Wolfgang Helbig > Subject: CMD640b workaround - final(?) version Hi, this is my final patch to work around the CMD640b hardware bug. This bug does not allow you to use the primary and secondary IDE-channel simultaneously, since there is only one hardware buffer for both channels. Other known bugs of the CMD640-series and the RZ1000-IDE-Chip are not addressed by this patch. Those bugs (bad prefetch buffer handling and problems resulting from interference with the floppy-controller) apparently did not occure on my system. The workaround consists mainly in serializing access to the two ide- channels. This job is done very similar to the serializing of master and slave device of one channel. It is tested by Jason and Nadav. Jason provided a version for FreeBSD 2.1.5 for the test. Thanx!!! This final version detects the presence of the CMD640-chip and enables accordingly the workaround code. This detection takes place in the pci.c file and the result is communicated to the wd.c file by the global external variable cmd640_detected. (I know this is dirty, but better ways that are still easier did not come to my mind, sorry!) For all this to take place, you still have to add the options "CMD640" line to the kernel configuration file. Without this option, the old code in wd.c and pci.c is active. With this option, the detection is made, and -- if the chip is detected -- the serialization of primary and secondary channel takes place. If you use the options "CMD640" you also *have* to put controller pci0 into your configuration file! Maybe someone can put the right clauses in /sys/conf/files to automaticly resoving this (ugly) dependency! This code is tested with the GENERIC- and a customized kernel, but it might need more testing, before it is committed to -current. I think it is too late for FreeBSD 2.2 Have fun! Wolfgang Helbig helbig@ba-stuttgart.de Following are the patches of wd.c and pci.c Index: wd.c =================================================================== RCS file: /usr/cvsroot/src/sys/i386/isa/wd.c,v retrieving revision 1.122 diff -c -r1.122 wd.c *** wd.c 1997/01/14 06:41:40 1.122 --- wd.c 1997/02/11 20:57:49 *************** *** 128,133 **** --- 128,135 ---- #define RECAL 2 /* doing restore */ #define OPEN 3 /* done with open */ + #define PRIMARY 0 + /* * Disk geometry. A small part of struct disklabel. * XXX disklabel.5 contains an old clone of disklabel.h. *************** *** 238,243 **** --- 240,250 ---- { wdopen, wdclose, wdstrategy, wdioctl, /*0*/ wddump, wdsize, 0, "wd", &wd_cdevsw, -1 }; + #ifdef CMD640 + static int atapictrlr; + extern int cmd640_detected; /* defined in pci.c */ + #endif + /* * Probe for controller. */ *************** *** 360,366 **** --- 367,384 ---- if (dvp->id_unit >= NWDC) return (0); + #ifdef CMD640 + if (cmd640_detected) { + if (dvp->id_unit == PRIMARY) { + printf("wdc0: CMD640b workaround enabled\n"); + TAILQ_INIT( &wdtab[PRIMARY].controller_queue); + } + } else + TAILQ_INIT( &wdtab[dvp->id_unit].controller_queue); + + #else TAILQ_INIT( &wdtab[dvp->id_unit].controller_queue); + #endif for (wdup = isa_biotab_wdc; wdup->id_driver != 0; wdup++) { if (wdup->id_iobase != dvp->id_iobase) *************** *** 467,480 **** if (wddrives[lunit]->dk_ctrlr == dvp->id_unit && wddrives[lunit]->dk_unit == unit) goto next; ! atapi_attach (dvp->id_unit, unit, dvp->id_iobase); next: } #endif /* * Discard any interrupts generated by wdgetctlr(). wdflushirq() * doesn't work now because the ambient ipl is too high. */ wdtab[dvp->id_unit].b_active = 2; return (1); } --- 485,510 ---- if (wddrives[lunit]->dk_ctrlr == dvp->id_unit && wddrives[lunit]->dk_unit == unit) goto next; ! if (atapi_attach (dvp->id_unit, unit, dvp->id_iobase)) { ! #ifdef CMD640 ! if (cmd640_detected) ! atapictrlr = dvp->id_unit; ! #endif ! } next: } #endif /* * Discard any interrupts generated by wdgetctlr(). wdflushirq() * doesn't work now because the ambient ipl is too high. */ + #ifdef CMD640 + if (cmd640_detected) + wdtab[PRIMARY].b_active = 2; + else + wdtab[dvp->id_unit].b_active = 2; + #else wdtab[dvp->id_unit].b_active = 2; + #endif return (1); } *************** *** 490,495 **** --- 520,528 ---- struct disk *du; int lunit = dkunit(bp->b_dev); int s; + #ifdef CMD640 + int ctrlr; + #endif /* valid unit, controller, and request? */ if (lunit >= NWD || bp->b_blkno < 0 || (du = wddrives[lunit]) == NULL *************** *** 548,555 **** --- 581,598 ---- wdsleep(du->dk_ctrlr, "wdlab"); du->dk_state = WANTOPEN; } + #ifdef CMD640 + if (cmd640_detected) + ctrlr = PRIMARY; + else + ctrlr = du ->dk_ctrlr; + #endif + #ifdef CMD640 + if (wdtab[ctrlr].b_active == 0) + #else if (wdtab[du->dk_ctrlr].b_active == 0) + #endif wdstart(du->dk_ctrlr); /* start controller */ if (du->dk_dkunit >= 0) { *************** *** 611,617 **** --- 654,669 ---- TAILQ_REMOVE( &drive_queue[du->dk_lunit], bp, b_act); /* link onto controller queue */ + #ifdef CMD640 + if (cmd640_detected) { + TAILQ_INSERT_TAIL( &wdtab[PRIMARY].controller_queue, bp, b_act); + } else { + TAILQ_INSERT_TAIL( &wdtab[ctrlr].controller_queue, bp, b_act); + } + + #else TAILQ_INSERT_TAIL( &wdtab[ctrlr].controller_queue, bp, b_act); + #endif /* mark the drive unit as busy */ wdutab[du->dk_lunit].b_active = 1; *************** *** 636,655 **** int lunit; int count; #ifdef ATAPI if (wdtab[ctrlr].b_active == 2) wdtab[ctrlr].b_active = 0; if (wdtab[ctrlr].b_active) return; - #endif /* is there a drive for the controller to do a transfer with? */ bp = wdtab[ctrlr].controller_queue.tqh_first; if (bp == NULL) { #ifdef ATAPI ! if (atapi_start && atapi_start (ctrlr)) /* mark controller active in ATAPI mode */ wdtab[ctrlr].b_active = 3; #endif return; } --- 688,737 ---- int lunit; int count; + #ifdef CMD640 + int c; + + if (cmd640_detected) + c = PRIMARY; + else + c = ctrlr; + #endif + #ifdef ATAPI + #ifdef CMD640 + if (wdtab[c].b_active == 2) + wdtab[c].b_active = 0; + if (wdtab[c].b_active) + return; + #else if (wdtab[ctrlr].b_active == 2) wdtab[ctrlr].b_active = 0; if (wdtab[ctrlr].b_active) return; /* is there a drive for the controller to do a transfer with? */ + #endif + #endif + #ifdef CMD640 + bp = wdtab[c].controller_queue.tqh_first; + #else bp = wdtab[ctrlr].controller_queue.tqh_first; + #endif if (bp == NULL) { #ifdef ATAPI ! #ifdef CMD640 ! if (cmd640_detected) { ! if (atapi_start && atapi_start (atapictrlr)) ! wdtab[c].b_active = 3; ! } else { ! if (atapi_start && atapi_start (ctrlr)) ! wdtab[ctrlr].b_active = 3; ! } ! #else /* mark controller active in ATAPI mode */ + if (atapi_start && atapi_start (ctrlr)) wdtab[ctrlr].b_active = 3; #endif + #endif return; } *************** *** 705,711 **** --- 787,797 ---- blknum - ds_offset) + ds_offset; } + #ifdef CMD640 + wdtab[c].b_active = 1; /* mark controller active */ + #else wdtab[ctrlr].b_active = 1; /* mark controller active */ + #endif /* if starting a multisector transfer, or doing single transfers */ if (du->dk_skip == 0 || (du->dk_flags & DKFL_SINGLE)) { *************** *** 717,723 **** --- 803,813 ---- head = (blknum % secpercyl) / secpertrk; sector = blknum % secpertrk; + #ifdef CDM640 + if (wdtab[c].b_errcnt && (bp->b_flags & B_READ) == 0) + #else if (wdtab[ctrlr].b_errcnt && (bp->b_flags & B_READ) == 0) + #endif du->dk_bc += DEV_BSIZE; count = howmany( du->dk_bc, DEV_BSIZE); *************** *** 863,872 **** --- 953,973 ---- { register struct disk *du; register struct buf *bp; + #ifdef CMD640 + int c ; + if (cmd640_detected) + c = PRIMARY; + else + c = unit; + if (wdtab[c].b_active == 2) + return; /* intr in wdflushirq() */ + if (!wdtab[c].b_active) { + #else if (wdtab[unit].b_active == 2) return; /* intr in wdflushirq() */ if (!wdtab[unit].b_active) { + #endif #ifdef WDDEBUG /* * These happen mostly because the power-mgt part of the *************** *** 878,895 **** --- 979,1020 ---- return; } #ifdef ATAPI + #ifdef CMD640 + if (wdtab[c].b_active == 3) { + #else if (wdtab[unit].b_active == 3) { + #endif /* process an ATAPI interrupt */ + #ifdef CMD640 + if (cmd640_detected) { + if (atapi_intr && atapi_intr (atapictrlr)) + /* ATAPI op continues */ + return; + } else { + if (atapi_intr && atapi_intr (unit)) + /* ATAPI op continues */ + return; + } + #else if (atapi_intr && atapi_intr (unit)) /* ATAPI op continues */ return; + #endif /* controller is free, start new op */ + #ifdef CMD640 + wdtab[c].b_active = 0; + #else wdtab[unit].b_active = 0; + #endif wdstart (unit); return; } #endif + #ifdef CMD640 + bp = wdtab[c].controller_queue.tqh_first; + #else bp = wdtab[unit].controller_queue.tqh_first; + #endif du = wddrives[dkunit(bp->b_dev)]; du->dk_timeout = 0; *************** *** 900,906 **** --- 1025,1035 ---- /* is it not a transfer, but a control operation? */ if (du->dk_state < OPEN) { + #ifdef CMD640 + wdtab[c].b_active = 0; + #else wdtab[unit].b_active = 0; + #endif switch (wdcontrol(bp)) { case 0: return; *************** *** 942,949 **** --- 1071,1083 ---- bp->b_error = EIO; bp->b_flags |= B_ERROR; } else if (du->dk_status & WDCS_ERR) { + #ifdef CMD640 + if (++wdtab[c].b_errcnt < RETRIES) { + wdtab[c].b_active = 0; + #else if (++wdtab[unit].b_errcnt < RETRIES) { wdtab[unit].b_active = 0; + #endif } else { wderror(bp, du, "hard error"); bp->b_error = EIO; *************** *** 957,963 **** --- 1091,1101 ---- * If this was a successful read operation, fetch the data. */ if (((bp->b_flags & (B_READ | B_ERROR)) == B_READ) + #ifdef CMD640 + && wdtab[c].b_active) { + #else && wdtab[unit].b_active) { + #endif int chk, dummy, multisize; multisize = chk = du->dk_currentiosize * DEV_BSIZE; if( du->dk_bc < chk) { *************** *** 999,1016 **** --- 1137,1168 ---- } outt: + #ifdef CMD640 + if (wdtab[c].b_active) { + #else if (wdtab[unit].b_active) { + #endif if ((bp->b_flags & B_ERROR) == 0) { du->dk_skip += du->dk_currentiosize;/* add to successful sectors */ + #ifdef CMD640 + if (wdtab[c].b_errcnt) + wderror(bp, du, "soft error"); + wdtab[c].b_errcnt = 0; + #else if (wdtab[unit].b_errcnt) wderror(bp, du, "soft error"); wdtab[unit].b_errcnt = 0; + #endif /* see if more to transfer */ if (du->dk_bc > 0 && (du->dk_flags & DKFL_ERROR) == 0) { if( (du->dk_flags & DKFL_SINGLE) || ((bp->b_flags & B_READ) == 0)) { + #ifdef CMD640 + wdtab[c].b_active = 0; + #else wdtab[unit].b_active = 0; + #endif wdstart(unit); } else { du->dk_timeout = 1 + 3; *************** *** 1021,1027 **** --- 1173,1183 ---- du->dk_skip = 0; du->dk_flags &= ~DKFL_ERROR; du->dk_flags |= DKFL_SINGLE; + #ifdef CMD640 + wdtab[c].b_active = 0; + #else wdtab[unit].b_active = 0; + #endif wdstart(unit); return; /* redo xfer sector by sector */ } *************** *** 1030,1037 **** --- 1186,1198 ---- done: ; /* done with this transfer, with or without error */ du->dk_flags &= ~DKFL_SINGLE; + #ifdef CMD640 + TAILQ_REMOVE(&wdtab[c].controller_queue, bp, b_act); + wdtab[c].b_errcnt = 0; + #else TAILQ_REMOVE(&wdtab[unit].controller_queue, bp, b_act); wdtab[unit].b_errcnt = 0; + #endif bp->b_resid = bp->b_bcount - du->dk_skip * DEV_BSIZE; wdutab[du->dk_lunit].b_active = 0; wdutab[du->dk_lunit].b_errcnt = 0; *************** *** 1044,1057 **** --- 1205,1226 ---- } /* controller idle */ + #ifdef CMD640 + wdtab[c].b_active = 0; + #else wdtab[unit].b_active = 0; + #endif /* anything more on drive queue? */ wdustart(du); /* anything more for controller to do? */ #ifndef ATAPI /* This is not valid in ATAPI mode. */ + #ifdef CMD640 + if (wdtab[c].controller_queue.tqh_first) + #else if (wdtab[unit].controller_queue.tqh_first) + #endif #endif wdstart(unit); } *************** *** 1065,1070 **** --- 1234,1242 ---- register unsigned int lunit; register struct disk *du; int error; + #ifdef CMD640 + int c; + #endif lunit = dkunit(dev); if (lunit >= NWD || dktype(dev) != 0) *************** *** 1074,1081 **** --- 1246,1263 ---- return (ENXIO); /* Finish flushing IRQs left over from wdattach(). */ + #ifdef CMD640 + if (cmd640_detected) + c = PRIMARY; + else + c = du->dk_ctrlr; + + if (wdtab[c].b_active == 2) + wdtab[c].b_active = 0; + #else if (wdtab[du->dk_ctrlr].b_active == 2) wdtab[du->dk_ctrlr].b_active = 0; + #endif du->dk_flags &= ~DKFL_BADSCAN; *************** *** 1238,1251 **** --- 1420,1446 ---- { register struct disk *du; int ctrlr; + #ifdef CMD640 + int c; + #endif du = wddrives[dkunit(bp->b_dev)]; ctrlr = du->dk_ctrlr; + #ifdef CMD640 + if (cmd640_detected) + c = PRIMARY; + else + c = ctrlr; + #endif switch (du->dk_state) { case WANTOPEN: tryagainrecal: + #ifdef CMD640 + wdtab[c].b_active = 1; + #else wdtab[ctrlr].b_active = 1; + #endif if (wdcommand(du, 0, 0, 0, 0, WDCC_RESTORE | WD_STEP) != 0) { wderror(bp, du, "wdcontrol: wdcommand failed"); goto maybe_retry; *************** *** 1259,1271 **** --- 1454,1474 ---- if (du->dk_status & WDCS_ERR) wdunwedge(du); du->dk_state = WANTOPEN; + #ifdef CMD640 + if (++wdtab[c].b_errcnt < RETRIES) + #else if (++wdtab[ctrlr].b_errcnt < RETRIES) + #endif goto tryagainrecal; bp->b_error = ENXIO; /* XXX needs translation */ bp->b_flags |= B_ERROR; return (2); } + #ifdef CMD640 + wdtab[c].b_errcnt = 0; + #else wdtab[ctrlr].b_errcnt = 0; + #endif du->dk_state = OPEN; /* * The rest of the initialization can be done by normal *************** *** 1373,1379 **** --- 1576,1589 ---- error = 1; } if (error) { + #ifdef CMD640 + if (cmd640_detected) + wdtab[PRIMARY].b_errcnt += RETRIES; + else + wdtab[du->dk_ctrlr].b_errcnt += RETRIES; + #else wdtab[du->dk_ctrlr].b_errcnt += RETRIES; + #endif return (1); } if (wdcommand(du, du->dk_dd.d_ncylinders, du->dk_dd.d_ntracks - 1, 0, *************** *** 1750,1759 **** if (wddoingadump) return (EFAULT); - #if 0 - /* Mark controller active for if we panic during the dump. */ - wdtab[du->dk_ctrlr].b_active = 1; - #endif wddoingadump = 1; /* Recalibrate the drive. */ --- 1960,1965 ---- *************** *** 1900,1909 **** --- 2106,2125 ---- static void wdflushirq(struct disk *du, int old_ipl) { + #ifdef CMD640 + int c = du->dk_ctrlr; + if (cmd640_detected) + c = PRIMARY; + wdtab[c].b_active = 2; + splx(old_ipl); + (void)splbio(); + wdtab[c].b_active = 0; + #else wdtab[du->dk_ctrlr].b_active = 2; splx(old_ipl); (void)splbio(); wdtab[du->dk_ctrlr].b_active = 0; + #endif } /* *************** *** 1943,1950 **** --- 2159,2175 ---- wdsleep(int ctrlr, char *wmesg) { int s = splbio(); + #ifdef CMD640 + int c = ctrlr; + + if (cmd640_detected) + c = PRIMARY; + while (wdtab[c].b_active) + tsleep((caddr_t)&wdtab[PRIMARY].b_active, PZERO - 1, wmesg, 1); + #else while (wdtab[ctrlr].b_active) tsleep((caddr_t)&wdtab[ctrlr].b_active, PZERO - 1, wmesg, 1); + #endif splx(s); } ----------------------------------------------------------------------------- Index: pci.c =================================================================== RCS file: /usr/cvsroot/src/sys/pci/pci.c,v retrieving revision 1.65 diff -c -r1.65 pci.c *** pci.c 1997/02/05 07:23:56 1.65 --- pci.c 1997/02/11 20:27:40 *************** *** 148,153 **** --- 148,156 ---- * for the Orion host to PCI bridge * UGLY hack ... :( Will be changed :) */ + #ifdef CMD640 + int cmd640_detected = 0; + #endif /*-------------------------------------------------------- ** ** Local variables. *************** *** 728,733 **** --- 731,740 ---- if ((!type) || (type==0xfffffffful)) continue; + #ifdef CMD640 + if (type == 0x06401095) + cmd640_detected = 1; + #endif /* ** lookup device in ioconfiguration: */ -- Mike Pritchard mpp@FreeBSD.org "Go that way. Really fast. If something gets in your way, turn" Responsible-Changed-From-To: freebsd-bugs->se Responsible-Changed-By: mpp Responsible-Changed-When: Wed Feb 19 22:58:09 PST 1997 Responsible-Changed-Why: I've seen Stefans name mentioned a few times in looking into this PR, so he gets resposibility for it. State-Changed-From-To: open->closed State-Changed-By: se State-Changed-When: Sat Apr 12 14:04:25 PDT 1997 State-Changed-Why: Fixed in rev. 1.126 of /sys/i386/isa/wd.c: The problem was caused by a chip bug, a workaround will compiled into the kernel, if 'options "CMD640"' is specified in the config file. >Unformatted: