From nobody@FreeBSD.org Wed Jan 24 06:28:25 2001 Return-Path: Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id BA4DD37B69E for ; Wed, 24 Jan 2001 06:28:24 -0800 (PST) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.1/8.11.1) id f0OESO551048; Wed, 24 Jan 2001 06:28:24 -0800 (PST) (envelope-from nobody) Message-Id: <200101241428.f0OESO551048@freefall.freebsd.org> Date: Wed, 24 Jan 2001 06:28:24 -0800 (PST) From: myleal@spliceip.com.br To: freebsd-gnats-submit@FreeBSD.org Subject: FreeBSD 4.2 Panics in Realtek rl driver X-Send-Pr-Version: www-1.0 >Number: 24608 >Category: kern >Synopsis: FreeBSD 4.2 Panics in Realtek rl driver >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Jan 24 06:30:01 PST 2001 >Closed-Date: Sat Apr 21 13:37:00 PDT 2001 >Last-Modified: Sat Apr 21 13:38:58 PDT 2001 >Originator: Marcus Yuri Maranhao Leal >Release: 4.2-RELEASE >Organization: Splice IP >Environment: FreeBSD pajeus.spliceip.com.br 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Wed Dec 6 12:05:51 BRST 2000 root@piranhas.spliceip.com.br:/usr/src/sys/compile/PIRANHAS i386 >Description: The machine is configurated as a firewall, with 4 Realtek RealTek 8139 10/100BaseTX network interfaces. The machine panics eventually. The following is the log of kernel debug: bash-2.04# pwd /var/crash bash-2.04# ls -l total 829274 -rw-r--r-- 1 root wheel 2 Jan 23 09:37 bounds -rw-r--r-- 1 root wheel 1779263 Jan 23 09:25 kernel.0 -rw-r--r-- 1 root wheel 1779263 Jan 23 09:38 kernel.3 -rw-r--r-- 1 root wheel 5 Nov 20 10:03 minfree -rw------- 1 root wheel 268423168 Jan 23 09:25 vmcore.0 -rw------- 1 root wheel 174063616 Jan 23 09:29 vmcore.1 -rw------- 1 root wheel 134217728 Jan 23 09:33 vmcore.2 -rw------- 1 root wheel 268423168 Jan 23 09:38 vmcore.3 bash-2.04# gdb -k kernel.0 vmcore.0 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... IdlePTD 3031040 initial pcb at 263c80 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0xa4c07000 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01a8968 stack pointer = 0x10:0xc0242368 frame pointer = 0x10:0xc0242374 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = net tty trap number = 12 panic: page fault syncing disks... 5 done Uptime: 10d21h4m25s dumping to dev #ad/0x20001, offset 533096 dump ata0: resetting devices .. done 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 0xc013cef6 in dumpsys () (kgdb) where #0 0xc013cef6 in dumpsys () #1 0xc013cd17 in boot () #2 0xc013d094 in poweroff_wait () #3 0xc0203561 in trap_fatal () #4 0xc0203239 in trap_pfault () #5 0xc0202e1f in trap () #6 0xc01a8968 in rl_encap () #7 0xc01a8b37 in rl_start () #8 0xc017655c in ether_output_frame () #9 0xc01764ca in ether_output () #10 0xc0190b5f in ip_output () #11 0xc018fd99 in ip_forward () #12 0xc018ee16 in ip_input () #13 0xc018f183 in ipintr () ------------ The other core (vmcore.3) has the same trace, so, I think that the problem is in RealTek rl driver. We have another FreeBSD 4.2 Firewall, with 4 NICs, with the same configuration (except the NICs, they are 3COM 10/100 FastEtherlink XL), that does not panics. The realtek NIC are: bash-2.04$ dmesg | grep -i realtek rl0: port 0xa400-0xa4ff mem 0xdb800000-0xdb8000ff irq 11 at device 9.0 on pci0 rlphy0: on miibus0 rl1: port 0xa000-0xa0ff mem 0xdb000000-0xdb0000ff irq 9 at device 10.0 on pci0 rlphy1: on miibus1 rl2: port 0x9800-0x98ff mem 0xda800000-0xda8000ff irq 5 at device 11.0 on pci0 rlphy2: on miibus2 rl3: port 0x9400-0x94ff mem 0xda000000-0xda0000ff irq 11 at device 12.0 on pci0 rlphy3: on miibus3 --------------------------------------------------------- >How-To-Repeat: Try to install 4 Realtek NICs in FreeBSD 4.2, and let it runs as a firewall. >Fix: Does not known. Maybe a rl driver fix? >Release-Note: >Audit-Trail: From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Sun, 11 Feb 2001 16:11:40 +0200 Hi. It looks like I've hit the same trouble. I've upgraded 4.1-RELEASE router to 4.2-RELEASE yesterday. It was rebooted several times while past 24 hours. I erroneously decided that it was IPSEC code trouble, and started to rebuild kernel without IPSEC. When after reboot with new kernel, I've got crash again, I decided to write PR or look appropriate (and found kern/24608). Crashes are located in 4 places: at ../../kern/uipc_mbuf2.c:270 at ../../pci/if_rl.c:1314 (this one originally reported in this PR) at ../../kern/uipc_socket.c:558 at ../../kern/uipc_mbuf.c:621 #6 0xc0161624 in m_aux_add (m=0xc05a7100, af=2, type=50) at ../../kern/uipc_mbuf2.c:270 #7 0xc01bf290 in ipsec_setsocket (m=0xc05a7100, so=0xc6df2a80) -- #6 0xc01fe56c in rl_encap (sc=0xc0d29a00, m_head=0xc05a7800) at ../../pci/if_rl.c:1314 #7 0xc01fe73b in rl_start (ifp=0xc0d29a00) at ../../pci/if_rl.c:1367 -- #6 0xc01620a8 in sosend (so=0xc6df1840, addr=0xc0da0ae0, uio=0xc7806ed0, top=0x0, control=0x0, flags=0, p=0xc7326f60) at ../../kern/uipc_socket.c:558 -- #6 0xc01fe56c in rl_encap (sc=0xc0d29800, m_head=0xc05a7600) at ../../pci/if_rl.c:1314 #7 0xc01fe73b in rl_start (ifp=0xc0d29800) at ../../pci/if_rl.c:1367 -- #6 0xc0161624 in m_aux_add (m=0xc05a7400, af=2, type=50) at ../../kern/uipc_mbuf2.c:270 #7 0xc01bf290 in ipsec_setsocket (m=0xc05a7400, so=0xc6df5000) -- #6 0xc01fe56c in rl_encap (sc=0xc0d29a00, m_head=0xc05b1500) at ../../pci/if_rl.c:1314 #7 0xc01fe73b in rl_start (ifp=0xc0d29a00) at ../../pci/if_rl.c:1367 -- #6 0xc016004c in m_copym (m=0xc05b1c00, off0=2920, len=872, wait=1) at ../../kern/uipc_mbuf.c:621 #7 0xc01ab330 in tcp_output (tp=0xc6f7a2e0) at ../../netinet/tcp_output.c:590 -- #6 0xc016004c in m_copym (m=0xc05a9700, off0=1460, len=872, wait=1) at ../../kern/uipc_mbuf.c:621 #7 0xc01ab330 in tcp_output (tp=0xc6f760c0) at ../../netinet/tcp_output.c:590 -- #6 0xc016004c in m_copym (m=0xc05b5c00, off0=7300, len=1156, wait=1) at ../../kern/uipc_mbuf.c:621 #7 0xc01ab330 in tcp_output (tp=0xc6f7c940) at ../../netinet/tcp_output.c:590 Here is my dmesg with IPSEC compiled: Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.2-RELEASE #0: Sat Feb 10 15:05:08 EET 2001 stask@btr.unisquad.com:/usr/src/sys/compile/btr Timecounter "i8254" frequency 1193182 Hz CPU: Pentium II/Pentium II Xeon/Celeron (501.14-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x665 Stepping = 5 Features=0x183fbff real memory = 67108864 (65536K bytes) avail memory = 61898752 (60448K bytes) Preloaded elf kernel "kernel" at 0xc033d000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc033d09c. Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at 7.2 irq 10 chip1: port 0x440-0x44f at device 7.3 on pci0 pci0: at 15.0 rl0: port 0xe400-0xe4ff mem 0xfebeff00-0xfebeffff irq 9 at device 16.0 on pci0 rl0: Ethernet address: 00:50:ba:83:7a:09 miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl1: port 0xe000-0xe0ff mem 0xfebefe00-0xfebefeff irq 7 at device 17.0 on pci0 rl1: Ethernet address: 00:50:ba:83:99:c7 miibus1: on rl1 rlphy1: on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: parallel port not found. IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to accept, logging limited to 100 packets/entry by default DUMMYNET initialized (000608) IPsec: Initialized Security Association Processing. IP Filter: v3.4.8 initialized. Default = pass all, Logging = enabled ad0: 6149MB [13328/15/63] at ata0-master UDMA33 Mounting root from ufs:/dev/ad0s1a WARNING: / was not properly dismounted ipfw: Accounting cleared. uhci0: port 0xef80-0xef9f irq 10 at device 7.2 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered Here is kgdb output on core of kernel without IPSEC. I've resently got one more crash, kgdb output is almost the same. I'll post it if needed, and I'll post as much of this staff as needed :) Script started on Sun Feb 11 14:56:39 2001 btr# gdb -k GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd". (kgdb) symbol-file /sys/compile/btr/kernel.debug Reading symbols from /sys/compile/btr/kernel.debug...done. (kgdb) exec-file /var/crash/kernel.42 (kgdb) core-file /var/crash/vmcore.42 IdlePTD 3305472 initial pcb at 2a60e0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x5ac0ac00 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01e8b20 stack pointer = 0x10:0xc02850a4 frame pointer = 0x10:0xc02850b0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 3 current process = Idle interrupt mask = net tty trap number = 12 panic: page fault syncing disks... 5 3 done Uptime: 33m9s dumping to dev #ad/0x20001, offset 380928 dump ata0: resetting devices .. done 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 dumpsys () at ../../kern/kern_shutdown.c:469 469 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:469 #1 0xc013e397 in boot (howto=256) at ../../kern/kern_shutdown.c:309 #2 0xc013e72d in panic (fmt=0xc027a0af "page fault") at ../../kern/kern_shutdown.c:556 #3 0xc02451b2 in trap_fatal (frame=0xc0285064, eva=1522576384) at ../../i386/i386/trap.c:951 #4 0xc0244e65 in trap_pfault (frame=0xc0285064, usermode=0, eva=1522576384) at ../../i386/i386/trap.c:844 #5 0xc0244a07 in trap (frame={tf_fs = 16, tf_es = -1071120368, tf_ds = -1820065776, tf_edi = 1, tf_esi = 6754970, tf_ebp = -1071099728, tf_isp = -1071099760, tf_ebx = 1, tf_edx = 1522576384, tf_ecx = 0, tf_eax = 6754970, tf_trapno = 12, tf_err = 0, tf_eip = -1071740128, tf_cs = 8, tf_eflags = 78342, tf_esp = -1067788544, tf_ss = -1067788544}) at ../../i386/i386/trap.c:443 #6 0xc01e8b20 in rl_encap (sc=0xc0d29800, m_head=0xc05ad700) at ../../pci/if_rl.c:1314 #7 0xc01e8cef in rl_start (ifp=0xc0d29800) at ../../pci/if_rl.c:1367 #8 0xc0181aac in ether_output_frame (ifp=0xc0d29800, m=0xc05ad700) at ../../net/if_ethersubr.c:401 #9 0xc0181a1a in ether_output (ifp=0xc0d29800, m=0xc05ad700, dst=0xc0d9c130, rt0=0xc0ec8400) at ../../net/if_ethersubr.c:354 #10 0xc019f697 in ip_output (m0=0xc05ad700, opt=0x0, ro=0xc6fb9d08, flags=0, imo=0x0) at ../../netinet/ip_output.c:787 #11 0xc01a43da in tcp_output (tp=0xc6fb9d80) at ../../netinet/tcp_output.c:859 ---Type to continue, or q to quit--- #12 0xc01a31ad in tcp_input (m=0xc05aa700, off0=20, proto=6) at ../../netinet/tcp_input.c:2220 #13 0xc019df03 in ip_input (m=0xc05aa700) at ../../netinet/ip_input.c:731 #14 0xc019df77 in ipintr () at ../../netinet/ip_input.c:759 (kgdb) up 6 #6 0xc01e8b20 in rl_encap (sc=0xc0d29800, m_head=0xc05ad700) at ../../pci/if_rl.c:1314 1314 return(1); (kgdb) l 1309 */ 1310 1311 MGETHDR(m_new, M_DONTWAIT, MT_DATA); 1312 if (m_new == NULL) { 1313 printf("rl%d: no memory for tx list", sc->rl_unit); 1314 return(1); 1315 } 1316 if (m_head->m_pkthdr.len > MHLEN) { 1317 MCLGET(m_new, M_DONTWAIT); 1318 if (!(m_new->m_flags & M_EXT)) { (kgdb) p *sc $1 = {arpcom = {ac_if = {if_softc = 0xc0d29800, if_name = 0xc0265d76 "rl", if_link = {tqe_next = 0xc02a6ae0, tqe_prev = 0xc0d29a08}, if_addrhead = { tqh_first = 0xc0d32f00, tqh_last = 0xc0d7d690}, if_pcount = 0, if_bpf = 0xc0595760, if_index = 2, if_unit = 1, if_timer = 0, if_flags = -30717, if_ipending = 0, if_linkmib = 0x0, if_linkmiblen = 0, if_data = {ifi_type = 6 '\006', ifi_physical = 0 '\000', ifi_addrlen = 6 '\006', ifi_hdrlen = 14 '\016', ifi_recvquota = 0 '\000', ifi_xmitquota = 0 '\000', ifi_mtu = 1500, ifi_metric = 0, ifi_baudrate = 10000000, ifi_ipackets = 9556, ifi_ierrors = 0, ifi_opackets = 9758, ifi_oerrors = 0, ifi_collisions = 0, ifi_ibytes = 1958413, ifi_obytes = 975722, ifi_imcasts = 3, ifi_omcasts = 0, ifi_iqdrops = 0, ifi_noproto = 0, ifi_hwassist = 0, ifi_unused = 0, ifi_lastchange = {tv_sec = 0, tv_usec = 0}}, if_multiaddrs = {lh_first = 0xc0595000}, if_amcount = 0, if_output = 0xc0181708 , if_start = 0xc01e8ccc , if_done = 0, if_ioctl = 0xc01e9164 , if_watchdog = 0xc01e9250 , if_poll_recv = 0, if_poll_xmit = 0, if_poll_intren = 0, if_poll_slowinput = 0, if_init = 0xc01e8e8c , if_resolvemulti = 0xc0181ddc , if_snd = { ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 50, ifq_drops = 0}, if_poll_slowq = 0x0, if_prefixhead = {tqh_first = 0x0, tqh_last = 0xc0d298d0}}, ac_enaddr = "\000Pº\203\231Ç", ---Type to continue, or q to quit--- ac_multicnt = 0, ac_netgraph = 0x0}, rl_bhandle = 57344, rl_btag = 0, rl_res = 0xc0d2d780, rl_irq = 0xc0d2d700, rl_intrhand = 0xc0595860, rl_miibus = 0xc0d30400, rl_unit = 1 '\001', rl_type = 2 '\002', rl_stats_no_timeout = 0 '\000', rl_txthresh = 96, rl_cdata = {cur_rx = 0, rl_rx_buf = 0xc6417008 "ataID = A33D1B4B5493F0AEF66DE545547781EF, maxResults = 4, TTL = 1, serverIP=213.73.176.103\n\216\022\f8", rl_rx_buf_ptr = 0xc6417000 "\017·", rl_tx_chain = {0x0, 0x0, 0x0, 0x0}, last_tx = 2 '\002', cur_tx = 2 '\002'}, rl_stat_ch = { callout = 0xc2154588}} (kgdb) p sc->rl_unit $2 = 1 '\001' (kgdb) p m_new $3 = (struct mbuf *) 0x1 (kgdb) p *m_new cannot read proc at 0 (kgdb) Script done on Sun Feb 11 15:38:55 2001 Thank you for your attention. \bye Stas From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Mon, 12 Feb 2001 14:23:37 +0200 Hi. Using gdb a bit more, I've found that actually crashes occur not on lines it says: at ../../kern/uipc_mbuf2.c:270 at ../../pci/if_rl.c:1314 (this one originally reported in this PR) at ../../kern/uipc_socket.c:558 at ../../kern/uipc_mbuf.c:621 but in MGET() and MGETHDR() preceeding these lines at sys/mbuf.h lines 287 and 317. Commented assembly code proving that are at http://tiger.unisquad.com/~stask/rl/typescript.detailed.44-46 BTW, there are other typescripts. MGET()/MGETHDR() deal with 0x5aXXXXXX and 0x5bXXXXXX instead of 0xc0XXXXXX (other valid mbufs are at 0xc0XXXXXX). I've added some logging to the kernel (m_mballoc()) to check which addresses are usually used in the kernel for mbufs. I'll report results. It looks like m_mballoc() puts wrong value into mmbfree. Looking at it (kern/uipc_mbuf.c) I came to conclusion that it is kern_malloc() returning wrong value. It looks like I should stop here, because I am not familiar with kernel and I don't understand kern_malloc()'s comment at all :( But I'll try to makecouple of guesses :) There are many restrictions in use of kern_malloc(), and probably it may not be used here. Guess 1. kern_malloc() should work at splhigh, while MGET()/MGETHDR() use splimp. Guess 2. kern_malloc() should be only called from kern/kern_malloc.c. PS. When searching for a quick fix yesterday, I've tried to use old version of if_rl.c (from 4.1-RELEASE), but this didn't help, of course. \bye Stas From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Mon, 12 Feb 2001 14:31:56 +0200 Oops, sorry, in previous post I should say "kmem_malloc()" instead of "kern_malloc()". Once more: kern_malloc() (vm/vm_kern.c) is referenced from m_mballoc() (kern/uipc_mbuf.c) and probably returns wrong value. 1. kern_malloc() should work at splhigh. 2. kern_malloc() should be called only from malloc() (kern/kern_malloc.c) \bye Stas From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Mon, 12 Feb 2001 14:56:32 +0200 Oh, I'm really not in my best mood today... In previous post I again should say "kmem_malloc()" instead of "kern_malloc()". Once more: kmem_malloc() (vm/vm_kern.c) is referenced from m_mballoc() (kern/uipc_mbuf.c) and probably returns wrong value. 1. kmem_malloc() should work at splhigh. 2. kmem_malloc() should be called only from malloc() (kern/kern_malloc.c) Sorry for inconvenience. \bye Stas From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Mon, 12 Feb 2001 21:18:23 +0200 Hi. It looks like kmem_malloc() is not the one who returns wrong pointer to m_mballoc(), but it is one of MGET()/MGETHDR()/MFREE() spoiling mmbfree, and thus MGET()/MGETHDR() return wrong pointer in various places. I've added ever more logging to see who of them is the killer, and I'll post results tomorrow. \bye Stas From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: myleal@spliceip.com.br Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Thu, 12 Apr 2001 17:52:24 +0300 Hi. You probably forgot about this issue. Think it is gone? Negative. BTW, I've reinstalled 4.2-RELEASE (because of HDD fault. And I didn't post results that day because if it - sorry). And there were no reboots during whole month when machine was proxy server. And machine reboots daily again when it became router, firewall, NAT and tarffic shaper. So, I've added several panic() calls to mbuf code. Don't laugh at it, please! I'm not familiar with the kernel and I'm just poking around in the hope that someone will take all this data and fix the bug. Panics are designed to occur when someone tries to assign wrong value to either mmbfree or mclfree. I believe values between 0x5a000000 and 0x63000000 to be wrong. Correct me please. Code is at http://tiger.unisquad.com/~stask/rl/mbuf_panics.diff And one of these panics was triggered! Whole typescript is at http://tiger.unisquad.com/~stask/rl/typescript.19-20, along with another case when some_mbuf->m_next was invalid too. So, I have almost nothing interesting: 1. This time mmbfree is spoiled in MGETHDR(), in sosend() at ../../kern/uipc_socket.c:555 2. Wrong value for mmbfree was taken from mmbfree->m_next. 3. Here is the offending mbuf: (kgdb) p *mmbfree $3 = {m_hdr = {mh_next = 0x5ac08d00, mh_nextpkt = 0x0, ^^^^^^^^^^ - here is wrong value mh_data = 0xc05c95f0 "", mh_len = 14, mh_type = 0, mh_flags = 2}, M_dat = { MH = {MH_pkthdr = {rcvif = 0xc05a6400, len = 1490, header = 0xc6dfc780, csum_flags = 0, csum_data = 16, aux = 0x0}, MH_dat = {MH_ext = { ext_buf = 0xc05e0000 "k/clickbanner.asp?id=88&page=7&btype=2&bstype=1&rnd='+rnd+'\" target=_top>'+\r\n'\"SLE')"..., ext_free = 0, ext_size = 2048, ext_ref = 0}, MH_databuf = "\000\000^ I skip the rest of junk because it might kill someone's mailreader. You can find it at above URL"...}}, M_databuf = "\000dZ And this I skip too"...}} Simple grep '->m_next = ' on kernel source gave me 177 results and I'm afraid I have to add 177 panics to the kernel. And I'm afraid ever more that my grep missed some, say '->m_next += ' or '->m_next = '. Any advise is welcome. Ever advise to post all data here instead of URLs :) Thank you. \bye Stas On Mon, Feb 12, 2001 at 06:18:23PM +0000, stask@tiger.unisquad.com wrote: > > Hi. > > It looks like kmem_malloc() is not the one who returns wrong pointer > to m_mballoc(), but it is one of MGET()/MGETHDR()/MFREE() spoiling mmbfree, > and thus MGET()/MGETHDR() return wrong pointer in various places. > > I've added ever more logging to see who of them is the killer, and I'll > post results tomorrow. > > \bye > Stas > From: Stas Kisel To: freebsd-gnats-submit@FreeBSD.org Cc: Subject: Re: kern/24608: FreeBSD 4.2 Panics in Realtek rl driver Date: Sat, 21 Apr 2001 18:39:45 +0300 Thank you, it looks like this patch fixes problem. My router did not experiense panic since I've applied patch (more than week ago). \bye Stas On Thu, Apr 12, 2001 at 04:33:11PM +0100, Ian Dowse wrote: > > This looks like the symptoms of the icmp_error problem that was > fixed recently. This bug caused the two upper bytes of mh_next to > get swapped, i.e. 0xc05a8d00->0x5ac08d00. Try either updating to > a more recent -stable, or apply the following patch in > /usr/src/sys/netinet: > > Ian > > --- ip_icmp.c 2001/02/23 20:51:46 1.53 > +++ ip_icmp.c 2001/03/08 19:03:26 1.54 > @@ -164,6 +164,8 @@ > if (m == NULL) > goto freeit; > icmplen = min(oiplen + 8, oip->ip_len); > + if (icmplen < sizeof(struct ip)) > + panic("icmp_error: bad length"); > m->m_len = icmplen + ICMP_MINLEN; > MH_ALIGN(m, m->m_len); > icp = mtod(m, struct icmp *); > @@ -189,7 +191,7 @@ > } > > icp->icmp_code = code; > - bcopy((caddr_t)oip, (caddr_t)&icp->icmp_ip, icmplen); > + m_copydata(n, 0, icmplen, (caddr_t)&icp->icmp_ip); > nip = &icp->icmp_ip; > > /* > --- ip_input.c 2001/03/05 22:40:27 1.161 > +++ ip_input.c 2001/03/08 19:03:26 1.162 > @@ -1563,12 +1563,21 @@ > } > > /* > - * Save at most 64 bytes of the packet in case > - * we need to generate an ICMP message to the src. > - */ > - mcopy = m_copy(m, 0, imin((int)ip->ip_len, 64)); > - if (mcopy && (mcopy->m_flags & M_EXT)) > - m_copydata(mcopy, 0, sizeof(struct ip), mtod(mcopy, caddr_t)); > + * Save the IP header and at most 8 bytes of the payload, > + * in case we need to generate an ICMP message to the src. > + * > + * We don't use m_copy() because it might return a reference > + * to a shared cluster. Both this function and ip_output() > + * assume exclusive access to the IP header in `m', so any > + * data in a cluster may change before we reach icmp_error(). > + */ > + MGET(mcopy, M_DONTWAIT, m->m_type); > + if (mcopy != NULL) { > + M_COPY_PKTHDR(mcopy, m); > + mcopy->m_len = imin((IP_VHL_HL(ip->ip_vhl) << 2) + 8, > + (int)ip->ip_len); > + m_copydata(m, 0, mcopy->m_len, mtod(mcopy, caddr_t)); > + } > > #ifdef IPSTEALTH > if (!ipstealth) { > @@ -1715,8 +1724,6 @@ > m_freem(mcopy); > return; > } > - if (mcopy->m_flags & M_EXT) > - m_copyback(mcopy, 0, sizeof(struct ip), mtod(mcopy, caddr_t)); > icmp_error(mcopy, type, code, dest, destifp); > } > > State-Changed-From-To: open->closed State-Changed-By: iedowse State-Changed-When: Sat Apr 21 13:37:00 PDT 2001 State-Changed-Why: Submitter says this issue has been resolved. Thanks for the bug report! http://www.freebsd.org/cgi/query-pr.cgi?pr=24608 >Unformatted: