From nobody@FreeBSD.org Wed Nov 12 23:36:03 2008 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68DE71065670 for ; Wed, 12 Nov 2008 23:36:03 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 530B88FC14 for ; Wed, 12 Nov 2008 23:36:03 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id mACNa29c027465 for ; Wed, 12 Nov 2008 23:36:02 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id mACNa2nj027464; Wed, 12 Nov 2008 23:36:02 GMT (envelope-from nobody) Message-Id: <200811122336.mACNa2nj027464@www.freebsd.org> Date: Wed, 12 Nov 2008 23:36:02 GMT From: Aurélien Méré To: freebsd-gnats-submit@FreeBSD.org Subject: Network packets corrupted when bge card is in 64-bit PCI slot X-Send-Pr-Version: www-3.1 X-GNATS-Notify: >Number: 128833 >Category: kern >Synopsis: [bge] Network packets corrupted when bge card is in 64-bit PCI slot >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-net >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Nov 12 23:40:00 UTC 2008 >Closed-Date: Mon Dec 15 21:10:51 UTC 2008 >Last-Modified: Mon Dec 15 21:10:51 UTC 2008 >Originator: Aurélien Méré >Release: 7.1-STABLE >Organization: AMC-OS Development Team >Environment: FreeBSD vodka.adriana.amc-os.com 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #2: Tue Nov 11 19:41:15 CET 2008 root@vodka.adriana.amc-os.com:/mnt/usr/obj/mnt/usr/src/sys/VODKA-1.0.5 i386 >Description: Server is based on an Asus A7M766-D motherboard with 2 Athlon MP processors. When plugged in a 66MHz 64-bit port, the 3Com 3C996 1000-SX PCI-X (bge) network card works incorrectly, while everything is fine in a classic 32-bit port. I tried disabling specific card features, like TX and RX checksums but the result is worse, no traffic goes outside. Problem is that some packets have their content changed, leading to incorrect decoding by applications mostly ending with "protocol error". In a tcpdump -XX, that's the kind of change that is observable, just in the middle of the packet (here in an SSH initiation), at 0x78 for example : SERVER OK SEND : 0x0000: 000a 5e62 3282 0060 97a0 11cd 0800 4500 ..^b2..`......E. 0x0010: 0314 8b24 4000 4006 1ea1 c0a8 0606 c0a8 ...$@.@......... 0x0020: 06c8 0016 febc 4f32 b113 8da3 9cf9 8018 ......O2........ 0x0030: 2086 4bf8 0000 0101 080a 2c06 63de 0004 ..K.......,.c... 0x0040: 7de2 0000 02dc 0a14 871a a2ba 5a09 5823 }...........Z.X# 0x0050: 7431 4bb8 32f0 08a3 0000 007e 6469 6666 t1K.2......~diff 0x0060: 6965 2d68 656c 6c6d 616e 2d67 726f 7570 ie-hellman-group 0x0070: 2d65 7863 6861 6e67 652d 7368 6132 3536 -exchange-sha256 0x0080: 2c64 6966 6669 652d 6865 6c6c 6d61 6e2d ,diffie-hellman- 0x0090: 6772 6f75 702d 6578 6368 616e 6765 2d73 group-exchange-s 0x00a0: 6861 312c 6469 6666 6965 2d68 656c 6c6d ha1,diffie-hellm 0x00b0: 616e 2d67 726f 7570 3134 2d73 6861 312c an-group14-sha1, 0x00c0: 6469 6666 6965 2d68 656c 6c6d 616e 2d67 diffie-hellman-g 0x00d0: 726f 7570 312d 7368 6131 0000 0007 7373 roup1-sha1....ss 0x00e0: 682d 6473 7300 0000 9d61 6573 3132 382d h-dss....aes128- 0x00f0: 6362 632c 3364 6573 2d63 6263 2c62 6c6f cbc,3des-cbc,blo 0x0100: 7766 6973 682d 6362 632c 6361 7374 3132 wfish-cbc,cast12 0x0110: 382d 6362 632c 6172 6366 6f75 7231 3238 8-cbc,arcfour128 SERVER NOK RECEIVE : 0x0000: 000a 5e62 3282 0060 97a0 11cd 0800 4500 ..^b2..`......E. 0x0010: 0314 8b24 4000 4006 1ea1 c0a8 0606 c0a8 ...$@.@......... 0x0020: 06c8 0016 febc 4f32 b113 8da3 9cf9 8018 ......O2........ 0x0030: 2086 4bf8 0000 0101 080a 2c06 63de 0004 ..K.......,.c... 0x0040: 7de2 0000 02dc 0a14 871a a2ba 5a09 5823 }...........Z.X# 0x0050: 7431 4bb8 32f0 08a3 0000 007e 6469 6666 t1K.2......~diff 0x0060: 6965 2d68 656c 6c6d 616e 2d67 726f 7570 ie-hellman-group 0x0070: 2d65 7863 6861 6e67 2c64 6966 6669 652d -exchang,diffie- 0x0080: 2c64 6966 6669 652d 6865 6c6c 6d61 6e2d ,diffie-hellman- 0x0090: 6772 6f75 702d 6578 6368 616e 6765 2d73 group-exchange-s 0x00a0: 6861 312c 6469 6666 6965 2d68 656c 6c6d ha1,diffie-hellm 0x00b0: 616e 2d67 726f 7570 3134 2d73 6861 312c an-group14-sha1, 0x00c0: 6469 6666 6965 2d68 656c 6c6d 616e 2d67 diffie-hellman-g 0x00d0: 726f 7570 312d 7368 6131 0000 0007 7373 roup1-sha1....ss 0x00e0: 682d 6473 7300 0000 9d61 6573 3132 382d h-dss....aes128- 0x00f0: 6362 632c 3364 6573 7766 6973 682d 6362 cbc,3deswfish-cb 0x0100: 7766 6973 682d 6362 632c 6361 7374 3132 wfish-cbc,cast12 0x0110: 382d 6362 632c 6172 6366 6f75 7231 3238 8-cbc,arcfour128 There seems to be some buffer issue on the reception side. The problems appear only on the RX, packets are correctly received (well identical at the tcpdump level anyway) on the destination server. Follows ifconfig, dmesg and pciconf Thanks for your help, Aurélien -- bge0: flags=8843 metric 0 mtu 1500 options=9b ether 00:0a:5e:62:32:82 inet 192.168.6.200 netmask 0xffffff00 broadcast 192.168.6.255 media: Ethernet autoselect (1000baseSX ) status: active FreeBSD 7.1-PRERELEASE #2: Tue Nov 11 19:41:15 CET 2008 root@vodka.adriana.amc-os.com:/mnt/usr/obj/mnt/usr/src/sys/VODKA-1.0.5 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(TM) MP 2000+ (1666.74-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbff AMD Features=0xc0480800 real memory = 805306368 (768 MB) avail memory = 778317824 (742 MB) MPTable: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Assuming intbase of 0 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 smbios0: at iomem 0xf3b60-0xf3b7e on motherboard smbios0: Version: 2.3, BCD Revision: 2.3 cryptosoft0: on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 agp0: on hostb0 pcib1: at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0xd800-0xd8ff mem 0xf4000000-0xf5ffffff,0xf8000000-0xf9ffffff irq 16 at device 5.0 on pci1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xb800-0xb80f at device 7.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] amdpm0: port 0xe4e0-0xe4ff at device 7.3 on pci0 smbus0: on amdpm0 smb0: on smbus0 bge0: <3Com Gigabit Fiber-SX Server NIC, ASIC rev. 0x105> mem 0xf3800000-0xf380ffff irq 16 at device 8.0 on pci0 bge0: Ethernet address: 00:0a:5e:62:32:82 bge0: [ITHREAD] re0: port 0xb400-0xb4ff mem 0xf3000000-0xf30000ff irq 17 at device 9.0 on pci0 re0: Chip rev. 0x04000000 re0: MAC rev. 0x00000000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: 00:18:4d:79:72:65 re0: [FILTER] pcib2: at device 16.0 on pci0 pci2: on pcib2 atapci1: port 0xa800-0xa807,0xa400-0xa403,0xa000-0xa007,0x9800-0x9803,0x9400-0x940f mem 0xf2000000-0xf20003ff irq 18 at device 5.0 on pci2 atapci1: [ITHREAD] ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] ata4: on atapci1 ata4: [ITHREAD] ata5: on atapci1 ata5: [ITHREAD] atapci2: port 0x9000-0x900f,0x8800-0x880f,0x8400-0x840f,0x8000-0x800f,0x7800-0x781f,0x7400-0x74ff irq 17 at device 6.0 on pci2 atapci2: [ITHREAD] ata6: on atapci2 ata6: [ITHREAD] ata7: on atapci2 ata7: [ITHREAD] ata8: on atapci2 ata8: [ITHREAD] atapci3: port 0x7000-0x7007,0x6800-0x6803,0x6400-0x6407,0x6000-0x6003,0x5800-0x580f mem 0xf1800000-0xf1803fff irq 19 at device 8.0 on pc i2 atapci3: [ITHREAD] ata9: on atapci3 ata9: [ITHREAD] ata10: on atapci3 ata10: [ITHREAD] cpu0 on motherboard cpu1 on motherboard pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff,0xd0000-0xd27ff pnpid ORM0000 on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (port) unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to deny, logging unlimited ad0: 15488MB at ata0-master WDMA2 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719407 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719420 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719423 ad4: 305245MB at ata2-master SATA150 ad8: 238475MB at ata4-master SATA150 ad10: 305245MB at ata5-master SATA150 ad12: 953869MB at ata6-master SATA150 ad14: 476940MB at ata7-master SATA150 ad18: 190782MB at ata9-master UDMA100 ad20: 19077MB at ata10-master UDMA100 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719423 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719423 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719423 ad0: FAILURE - READ_DMA status=51 error=10 LBA=31719423 SMP: AP CPU #1 Launched! hostb0@pci0:0:0:0: class=0x060000 card=0x00000000 chip=0x700c1022 rev=0x11 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-762 CPU to PCI Bridge (SMP chipset)' class = bridge subclass = HOST-PCI cap 02[a0] = AGP 2x 1x SBA disabled pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x700d1022 rev=0x00 hdr=0x01 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-762 CPU to PCI Bridge (AGP 4x)' class = bridge subclass = PCI-PCI isab0@pci0:0:7:0: class=0x060100 card=0x80441043 chip=0x74401022 rev=0x05 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 (Opus) PCI to ISA/LPC Bridge' class = bridge subclass = PCI-ISA atapci0@pci0:0:7:1: class=0x01018a card=0x74411022 chip=0x74411022 rev=0x04 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 (Opus) EIDE Controller' class = mass storage subclass = ATA amdpm0@pci0:0:7:3: class=0x068000 card=0x80441043 chip=0x74431022 rev=0x03 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 (Opus) ACPI Controller' class = bridge bge0@pci0:0:8:0: class=0x020000 card=0x100410b7 chip=0x164514e4 rev=0x15 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5701 NetXtreme BCM5701 Gigabit Ethernet' class = network subclass = ethernet cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split transaction cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 8 messages, 64 bit re0@pci0:0:9:0: class=0x020000 card=0x311a1385 chip=0x816910ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8110SB Single-Chip Gigabit LOM Ethernet Controller' class = network subclass = ethernet cap 01[dc] = powerspec 2 supports D0 D1 D2 D3 current D0 pcib2@pci0:0:16:0: class=0x060400 card=0x00000000 chip=0x74481022 rev=0x05 hdr=0x01 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 (Opus) PCI Bridge' class = bridge subclass = PCI-PCI vgapci0@pci0:1:5:0: class=0x030000 card=0x0030121a chip=0x0005121a rev=0x01 hdr=0x00 vendor = '3dfx Interactive Inc' device = 'Voodoo3 All Voodoo3 chips, 3000' class = display subclass = VGA cap 02[54] = AGP 2x 1x SBA disabled cap 01[60] = powerspec 1 supports D0 D3 current D0 atapci1@pci0:2:5:0: class=0x010400 card=0x61141095 chip=0x31141095 rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'Sil 3114 SATALink/SATARaid Controller' class = mass storage subclass = RAID cap 01[60] = powerspec 2 supports D0 D1 D2 D3 current D0 atapci2@pci0:2:6:0: class=0x010400 card=0x32491106 chip=0x32491106 rev=0x50 hdr=0x00 vendor = 'VIA Technologies Inc' device = 'VT6421 VIA VT6421 RAID Controller' class = mass storage subclass = RAID cap 01[e0] = powerspec 2 supports D0 D3 current D0 atapci3@pci0:2:8:0: class=0x018085 card=0x4d68105a chip=0x4d68105a rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc' device = 'PDC20268 Ultra100 TX2 EIDE Controller' class = mass storage cap 01[60] = powerspec 1 supports D0 D1 D3 current D0 >How-To-Repeat: Any kind of protocol with large packets (ICMP seems to look fine for example) but dns, http, ssh communication mostly fails with "protocol error", "packet error" or so. Other example with HTTP (Location URL corrupted) : root@vodka:~> telnet www.freebsd.org 80 Trying 69.147.83.33... Connected to www.freebsd.org. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.0 301 Moved Permanently Connection: close Locttp://wwttp://www.freebsd.org/ Content-Length: 0 Date: Wed, 12 Nov 2008 23:33:39 GMT Server: httpd/1.4.x LaHonda Connection closed by foreign host. >Fix: Putting the PCI card in a classic 32 bit port fixes the problem. Here are dmesg and pciconf changes : bge0@pci0:2:8:0: class=0x020000 card=0x100410b7 chip=0x164514e4 rev=0x15 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5701 NetXtreme BCM5701 Gigabit Ethernet' class = network subclass = ethernet cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split transaction cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 8 messages, 64 bit re0: port 0xb400-0xb4ff mem 0xf3800000-0xf38000ff irq 17 at device 9.0 on pci0 re0: Chip rev. 0x04000000 re0: MAC rev. 0x00000000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: 00:18:4d:79:72:65 re0: [FILTER] pcib2: at device 16.0 on pci0 pci2: on pcib2 atapci1: port 0xa800-0xa807,0xa400-0xa403,0xa000-0xa007,0x9800-0x9803,0x9400-0x940f mem 0xf2800000-0xf28003ff irq 18 at device 5.0 on pci2 atapci1: [ITHREAD] ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] ata4: on atapci1 ata4: [ITHREAD] ata5: on atapci1 ata5: [ITHREAD] atapci2: port 0x9000-0x900f,0x8800-0x880f,0x8400-0x840f,0x8000-0x800f,0x7800-0x781f,0x7400-0x74ff irq 17 at device 6.0 on pci2 atapci2: [ITHREAD] ata6: on atapci2 ata6: [ITHREAD] ata7: on atapci2 ata7: [ITHREAD] ata8: on atapci2 ata8: [ITHREAD] bge0: <3Com Gigabit Fiber-SX Server NIC, ASIC rev. 0x105> mem 0xf2000000-0xf200ffff irq 19 at device 8.0 on pci2 bge0: Ethernet address: 00:0a:5e:62:32:82 bge0: [ITHREAD] >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Thu Nov 13 01:09:00 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=128833 From: Marius Strobl To: bug-followup@FreeBSD.org, freebsd@amc-os.com Cc: Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Thu, 13 Nov 2008 23:14:46 +0100 Hrm, I could be that the BCM5701 data corruption bug actually is 64-bit rather than only PCI-X bus specific. Could you please give the patch at: http://people.freebsd.org/~marius/bge_5701.diff a try? From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: "Marius Strobl" , Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Sat, 15 Nov 2008 02:01:50 +0100 Hi I installed the patch there are still the same issues : 0x0060: 6965 2d68 656c 6c6d 616e 2d67 726f 7570 ie-hellman-group 0x0070: 2d65 7863 6861 6e67 2c64 6966 6669 652d -exchang,diffie- 0x0080: 2c64 6966 6669 652d 6865 6c6c 6d61 6e2d ,diffie-hellman- 0x0090: 6772 6f75 702d 6578 6368 616e 6765 2d73 group-exchange-s 0x00a0: 6861 312c 6469 6666 6965 2d68 656c 6c6d ha1,diffie-hellm 0x00b0: 616e 2d67 726f 7570 3134 2d73 6861 312c an-group14-sha1, 0x00c0: 6469 6666 6965 2d68 656c 6c6d 616e 2d67 diffie-hellman-g 0x0060: 6965 2d68 656c 6c6d 616e 2d67 726f 7570 ie-hellman-group 0x0070: 2d65 7863 6861 6e67 652d 7368 6132 3536 -exchange-sha256 0x0080: 2c64 6966 6669 652d 6865 6c6c 6d61 6e2d ,diffie-hellman- 0x0090: 6772 6f75 702d 6578 6368 616e 6765 2d73 group-exchange-s 0x00a0: 6861 312c 6469 6666 6965 2d68 656c 6c6d ha1,diffie-hellm 0x00b0: 616e 2d67 726f 7570 3134 2d73 6861 312c an-group14-sha1, 0x00c0: 6469 6666 6965 2d68 656c 6c6d 616e 2d67 diffie-hellman-g I will try to add some debug, don't hesitate to tell me if you have other ideas so i can do more tests. Thanks, Aurélien From: Marius Strobl To: =?unknown-8bit?Q?Aur=E9lien_M=E9r=E9?= Cc: bug-followup@FreeBSD.org Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Sun, 16 Nov 2008 22:45:33 +0100 Ok, thanks for testing anyway. I still think that this isn't really a driver bug though but you are hitting some hardware-related problem like f.e. a silicon bug and the question is how to work around it. Looking at the bge(4) versions of the other BSDs and the corresponding Linux and OpenSolaris drivers I can't spot a such a workaround apart from the already known PCI-X issue, unfortunately. The only other thing that comes to my mind is that you might suffer from sort of the opposite of the problem worked around by ti_64bitslot_war() (the NICs driven by ti(4) are the predecessors of those supported by bge(4)). Given that this also involves the BIOS that could then explain why you're see first person to hit this problem. Could you please instrument bge(4) to print the content of the BGE_PCI_PCISTATE register and report back which values it's initialized to depending on which type of slot the card is plugged into? Marius From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: "Marius Strobl" Cc: Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Mon, 17 Nov 2008 02:37:51 +0100 Hi As first check, when device is being attached, here are the values reported and the diff : in 32 bit slot : BGE_PCI_PCISTATE = 0x96 (0x86 | BGE_PCISTATE_32BIT_BUS) bge_flags = 0x120E (0x100E | BGE_FLAG_PCIX) in 64 bit slot : BGE_PCI_PCISTATE = 0x8E (0x86 | BGE_PCISTATE_PCI_BUSSPEED) bge_flags = 0x1A0E (0x100E | BGE_FLAG_PCIX | BGE_FLAG_64BIT) Seems logical so far, I'll try to look further. Thanks for your help, Aurélien From: Marius Strobl To: =?unknown-8bit?Q?Aur=E9lien_M=E9r=E9?= Cc: bug-followup@FreeBSD.org Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Tue, 18 Nov 2008 23:46:21 +0100 On Mon, Nov 17, 2008 at 02:37:51AM +0100, Aurlien Mr wrote: > Seems logical so far, I'll try to look further. Apart from the problem described by davidch@ (I'm not sure you actually have a BCM5701 A3 though, at least bge(4) doesn't seem to be aware of that revision) the BGE_PCI_PCISTATE and bge_flags pairs you reported don't match though; according to BGE_PCI_PCISTATE the card isn't in a PCI-X slot in either case (BGE_PCISTATE_PCI_BUSMODE is always set, which means PCI) and AFAICT your motherboard chipset also doesn't support PCI-X. However, as you noted BGE_FLAG_PCIX is set for whatever reason in both cases, which leads to some inappropriate initialization of the controller. As a quick test could you please check whether replacing the "#if __FreeBSD_version > 602101" in the driver with an "#if 0" makes any difference to your problem? Marius From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: "Marius Strobl" Cc: Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Wed, 19 Nov 2008 02:27:43 +0100 Concerning the problem described by davidch@ , my chip is reported as a B5 revision (01050000), so it might not be the case here. You're right, the M/B doesn't support PCI-X at all. As detailed on the manual, the NB chipset (AMD762) provides support for 2x66 MHz 64-bit PCI 2.2 masters, and the SB chipset (AMD768) provides a secondary PCI 2.2 bridge 33MHz. I tried the patch but it didn't solve the problem, whilst the BGE_FLAG_PCIX was no longer in the flags, which seems much more correct anyway. At the end of the initialization values are these, as planned : bge_flags = 0x0010180F (0x10100F | BGE_FLAG_64BIT) . BGE_PCI_PCISTATE is unchanged Note I forgot to mention the BGE_FLAG_RXALIGN_BUG (100000) and BGE_FLAG_TBI (1) last time, as I tested before reset of the chip where these flags are set. To: =?unknown-8bit?Q?Aur=E9lien_M=E9r=E9?= Cc: bug-followup@FreeBSD.org Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Wed, 19 Nov 2008 22:42:31 +0100 > Note I forgot to mention the BGE_FLAG_RXALIGN_BUG (100000) and BGE_FLAG_TBI > (1) last time, as I tested before reset of the chip where these flags are > set. Okay. Have you tried the workaround described by davidch@ anyway? If that also doesn't make a difference I'm unfortunately out of ideas regarding your corruption problmem. Marius From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: "Marius Strobl" Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Thu, 20 Nov 2008 00:06:44 +0100 The "workaround" works. By forcing the 32 bit mode during bge_chipinit, from the last stable version of if_bge.c, it works correctly. The reported flags after bge_attach is 0x0010120F, logically as when in a 32 bit slot. Doesn't sound like a real solution to me though, as we still don't really know what the problem is :) Thx, Aurélien From: Marius Strobl To: David Christensen Cc: bug-followup@FreeBSD.org Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Sun, 23 Nov 2008 16:34:59 +0100 David, could it be that bug doesn't only affect 5701 A3 but also B3 (i.e. chipid 0x01050000) as in this case or even all 5701 revisions? How does the problem you describe relate to the 5701 PCI-X issue, which we align the RX buffer differently for as a workaround, would that problem also be avoided by limiting 5701 to 32-bit operations? Or is the A3-errata you described an entirely different issue and limited to 5701 in a 64-bit non-PCI-X slot, or would 5701 in a PCI-X slot even require both workarounds? Marius From: "David Christensen" To: "Marius Strobl" Cc: "bug-followup@FreeBSD.org" Subject: RE: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Mon, 24 Nov 2008 11:28:10 -0800 I checked the assembly instructions for the 5701 and even though the ASIC ID decodes as B5, the revision of the chip is actually A3. (You should be able to verify this as the silkscreen on the part should show "P13".) Unfortunately the "friendly" revision of the chip doesn't match the "ASIC" revision of the chip for the 5701 and the errata references the "friendly" name. The result is that the part you know as B5 is affected by this errata. Other versions of the chip (A2 which you know as B2 and A1 which you=20 know as B1) are not subject to this errata. > How does the problem you describe relate to the > 5701 PCI-X issue, which we align the RX buffer differently > for as a workaround, would that problem also be avoided > by limiting 5701 to 32-bit operations? Or is the A3-errata > you described an entirely different issue and limited to 5701 > in a 64-bit non-PCI-X slot, or would 5701 in a PCI-X slot > even require both workarounds? Which PCI-X issue are you referring to? Can you point me to the line number on http://fxr.watson.org/fxr/source/dev/bge/if_bge.c? Dave From: Marius Strobl To: David Christensen Cc: "bug-followup@FreeBSD.org" Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Mon, 24 Nov 2008 22:40:36 +0100 > I checked the assembly instructions for the 5701 and even though > the ASIC ID decodes as B5, the revision of the chip is actually > A3. (You should be able to verify this as the silkscreen on the > part should show "P13".) Unfortunately the "friendly" revision > of the chip doesn't match the "ASIC" revision of the chip for the > 5701 and the errata references the "friendly" name. The result > is that the part you know as B5 is affected by this errata. Other > versions of the chip (A2 which you know as B2 and A1 which you > know as B1) are not subject to this errata. Ah, this explains it. Thanks for looking it up! > Which PCI-X issue are you referring to? Can you point me to > the line number on http://fxr.watson.org/fxr/source/dev/bge/if_bge.c? I was refering to BGE_FLAG_RX_ALIGNBUG, the lines dealing with it are 874-875, 933-934, 2698-2708 and 3112-3122. The Linux tg3 driver does pretty much the same via rx_offset. Marius From: "David Christensen" To: "Marius Strobl" Cc: "bug-followup@FreeBSD.org" Subject: RE: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Mon, 24 Nov 2008 14:38:35 -0800 > I was refering to BGE_FLAG_RX_ALIGNBUG, the lines dealing with it > are 874-875, 933-934, 2698-2708 and 3112-3122. The Linux tg3 > driver does pretty much the same via rx_offset. It's a different problem. The RX_ALIGNBUG is described in the errata as follows: "Description: In PCI-X mode, on rare instances, the DMA write engine=20 can incorrectly DMA duplicate data to the host if the first word of=20 the data being transferred is to a non-zero-offset address (offset=20 from the 8-byte boundary). Workaround: Align buffers to zero offset in the driver." I suppose you could force PCI mode for this problem too though at the expense of possible reduced performance. Dave From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Wed, 26 Nov 2008 00:02:51 +0100 Just to precise/acknowledge some points, as it seems some of my posts were not displayed : - Forcing 32 bit mode in bge_chipinit fixed the problem as you supposed. - The chip, even if reported as B5, seems consequently to have the supposed bug, and actually the chip is BCM5701TKHB TK0525 P13, so listed as bugged (A3). - Clearly it doesn't work at all without the RX_ALIGNBUG fix, even there where PCI-X is presumedly not enabled, but I didn't check again after we found the bug for the PCI-X detection. But now being forced like this it has no longer any kind of interest to have this card.. except the mobo has a separate PCI bus for the 2 64-bit slots so there should be less impact, but I hope you'll find a better solution anyway :) Thanks, Aurélien From: Marius Strobl To: freebsd@amc-os.com Cc: "bug-followup@FreeBSD.org" Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Sun, 7 Dec 2008 18:36:26 +0100 Aurélien, could you please verify that the patch at: http://people.freebsd.org/~marius/bge_5701b5_pcix.diff solves your problem? Thanks, Marius From: =?iso-8859-1?B?QXVy6WxpZW4gTely6Q==?= To: "Marius Strobl" Cc: Subject: Re: kern/128833: [bge] Network packets corrupted when bge card is in 64-bit PCI slot Date: Sun, 7 Dec 2008 23:01:50 +0100 Hi Marius, The patch works fine with this card. Thanks, Aurélien State-Changed-From-To: open->closed State-Changed-By: marius State-Changed-When: Mon Dec 15 21:08:02 UTC 2008 State-Changed-Why: Close, a workaround for the hardware bug was committed to head (r185812), stable/7 (r186134), releng/7.1 (r186135) and stable/6 (186136). http://www.freebsd.org/cgi/query-pr.cgi?pr=128833 >Unformatted: