From nobody@FreeBSD.org Tue Oct 24 18:36:01 2006 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B93B16A412 for ; Tue, 24 Oct 2006 18:36:01 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4CB5D43DC0 for ; Tue, 24 Oct 2006 18:35:46 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k9OIZj8W048674 for ; Tue, 24 Oct 2006 18:35:45 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id k9OIZjBR048673; Tue, 24 Oct 2006 18:35:45 GMT (envelope-from nobody) Message-Id: <200610241835.k9OIZjBR048673@www.freebsd.org> Date: Tue, 24 Oct 2006 18:35:45 GMT From: Kai Gallasch To: freebsd-gnats-submit@FreeBSD.org Subject: kernel panic 6.2 prerelease-20061017 amd64 X-Send-Pr-Version: www-3.0 >Number: 104765 >Category: kern >Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >Confidential: no >Severity: serious >Priority: high >Responsible: rwatson >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Oct 24 18:40:17 GMT 2006 >Closed-Date: Wed Dec 06 12:44:45 GMT 2006 >Last-Modified: Wed Dec 06 12:44:45 GMT 2006 >Originator: Kai Gallasch >Release: 6.2 prerelease (checkout 20061017) >Organization: FREE! >Environment: FreeBSD geldkraft.free.de 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Sun Oct 22 13:36:38 CEST 2006 houdini@geldkraft.free.de:/usr/obj/usr/src/sys/SMP amd64 >Description: Kernel panics after 1-3 days uptime with trap number 12 - page fault. kernel config: -------------- GENERIC (SMP) with "makeoptions DEBUG=-g" $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.439.2.14 2006/10/09 18:41:36 simon Exp $ Hardware: --------- HP/Compaq DL385 Dual Opteron (Dual Core) with ServeRaid 6 (Raid 5) and 1G RAM. dmesg: ------ Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #0: Sun Oct 22 13:36:38 CEST 2006 houdini@geldkraft.free.de:/usr/obj/usr/src/sys/SMP ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 280 (2405.47-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x2 Cores per package: 2 real memory = 1073709056 (1023 MB) avail memory = 1023938560 (976 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard ioapic3 irqs 32-35 on motherboard ioapic4 irqs 36-39 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 pcib0: on acpi0 pci0: on pcib0 pcib1: at device 3.0 on pci0 pci1: on pcib1 ohci0: mem 0xf7df0000-0xf7df0fff irq 19 at device 0.0 on pci1 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xf7de0000-0xf7de0fff irq 19 at device 0.1 on pci1 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered pci1: at device 2.0 (no driver attached) pci1: at device 2.2 (no driver attached) pci1: at device 3.0 (no driver attached) isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at device 4.1 on pci0 ata0: on atapci0 ata1: on atapci0 pci0: at device 4.3 (no driver attached) pcib2: at device 7.0 on pci0 pci2: on pcib2 ciss0: port 0x5000-0x50ff mem 0xf7ef0000-0xf7ef1fff,0xf7e80000-0xf7ebffff irq 24 at device 4.0 on pci2 ciss0: [GIANT-LOCKED] pci0: at device 7.1 (no driver attached) pcib3: at device 8.0 on pci0 pci3: on pcib3 bge0: mem 0xf7ff0000-0xf7ffffff irq 28 at device 6.0 on pci3 miibus0: on bge0 brgphy0: on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:17:a4:8f:27:68 bge1: mem 0xf7fe0000-0xf7feffff irq 29 at device 6.1 on pci3 miibus1: on bge1 brgphy1: on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:17:a4:8f:27:67 pci0: at device 8.1 (no driver attached) pcib4: on acpi0 pci4: on pcib4 pcib5: at device 9.0 on pci4 pci5: on pcib5 pci4: at device 9.1 (no driver attached) pcib6: at device 10.0 on pci4 pci6: on pcib6 pci4: at device 10.1 (no driver attached) atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 sio0: port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console fdc0: port 0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xee000-0xeffff on isa0 ppc0: cannot reserve I/O port range sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM at ata0-master PIO4 SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! da0 at ciss0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-0 device da0: 135.168MB/s transfers da0: 17200MB (35226720 512 byte sectors: 255H 32S/T 4317C) da1 at ciss0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-0 device da1: 135.168MB/s transfers da1: 17200MB (35226720 512 byte sectors: 255H 32S/T 4317C) da2 at ciss0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-0 device da2: 135.168MB/s transfers da2: 69499MB (142334880 512 byte sectors: 255H 32S/T 17443C) da3 at ciss0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-0 device da3: 135.168MB/s transfers da3: 69499MB (142334880 512 byte sectors: 255H 32S/T 17443C) da4 at ciss0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-0 device da4: 135.168MB/s transfers da4: 139799MB (286309920 512 byte sectors: 255H 32S/T 35087C) backtrace: ---------- [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: d, page not present instruction pointer = 0x8:0xffffffff803eea47 stack pointer = 0x10:0xffffffffa814a8b0 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 27596 (tcpserver) trap number = 12 panic: page fault cpuid = 3 Uptime: 2h12m0s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) quit geldkraft:/etc # mount /usr/src/ geldkraft:/etc # cd /usr/src/sys/amd64/conf/ geldkraft:/usr/src/sys/amd64/conf # kgdb SMP /var/crash/vmcore.0 kgdb: bad namelist - no kernbase geldkraft:/usr/src/sys/amd64/conf # kgdb /boot/kernel/kernel /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: d, page not present instruction pointer = 0x8:0xffffffff803eea47 stack pointer = 0x10:0xffffffffa814a8b0 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 27596 (tcpserver) trap number = 12 panic: page fault cpuid = 3 Uptime: 2h12m0s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff803eea47 0xffffffff803eea47 is in _mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:548). 543 * If the current owner of the lock is executing on another 544 * CPU, spin instead of blocking. 545 */ 546 owner = (struct thread *)(v & MTX_FLAGMASK); 547 #ifdef ADAPTIVE_GIANT 548 if (TD_IS_RUNNING(owner)) { 549 #else 550 if (m != &Giant && TD_IS_RUNNING(owner)) { 551 #endif 552 turnstile_release(&m->mtx_object); (kgdb) backtrace #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f8fd7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9671 in panic (fmt=0xffffff0002116980 "X?J:") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff80618b3f in trap_fatal (frame=0xffffff0002116980, eva=18446742975175902040) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80619066 in trap (frame= {tf_rdi = 11, tf_rsi = -1099476932224, tf_rdx = 6, tf_rcx = 0, tf_r8 = 4, tf_r9 = -1098475933086, tf_rax = 1, tf_rbx = -1099415090280, tf_rbp = 4, tf_r10 = 4, tf_r11 = 4, tf_r12 = -1099476932224, tf_r13 = -1098728017152, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1475041088, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff8060442b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff0005c10b98, tid=18446742974232619392, opts=6, file=0x0, line=4) at /usr/src/sys/kern/kern_mutex.c:546 #8 0xffffffff804bb51d in ip_ctloutput (so=0xb, sopt=0xffffffffa814ab30) at /usr/src/sys/netinet/ip_output.c:1193 #9 0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0024a0d268, sopt=0xffffffffa814ab30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff804416b8 in sosetopt (so=0xffffff0024a0d268, sopt=0xffffffffa814ab30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0002116980, s=616888072, level=4, name=0, val=0x4, valseg=1035694690, valsize=11) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff80447bfe in setsockopt (td=0xb, uap=0xffffff0002116980) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff80619991 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff806045c8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) >How-To-Repeat: problem occurs in between 1-3 days uptime of server >Fix: Raising some sysctl values seems to lengthen the intervals between crashes. Although I might be mistaken that tweaking them has some effect on the problem. # default war 12328 #kern.maxfiles=80000 # default 128 #kern.ipc.somaxconn=384 # default war 11095 #kern.maxfilesperproc=50000 >Release-Note: >Audit-Trail: From: Kai Gallasch To: bug-followup@FreeBSD.org, gallasch@free.de Cc: Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Wed, 25 Oct 2006 11:49:33 +0200 Here 1*) is another backtrace of a new kernel panic. Looks very similar to my previous commited one - even the same current process "tcpserver" that is involved in the panic, which always shows up when the kernel panics. At first I thought that it's always 'tcpserver' because on a busy mailserver running qmail it could be expected as this process is quite active, but maybe the panics that I have with my 6.2-PRE are related to the folloing thread on freebsd-stable http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029433.html and especially (in this thread) http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029487.html Maybe then to some the snippet 2*) is helpful where I tried to follow what Gleb Smirnoff advised to do in http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029452.html Cheers, K. --- 1*) backtrace - 20061025 --- Unread portion of the kernel message buffer: sor read, page not present instruction pointer = 0x8:0xffffffff803eea47 stack pointer = 0x10:0xffffffffa7e548b0 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 8013 (tcpserver) trap number = 12 panic: page fault cpuid = 2 Uptime: 10h10m5s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff803eea47 0xffffffff803eea47 is in _mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:548). 543 * If the current owner of the lock is executing on another 544 * CPU, spin instead of blocking. 545 */ 546 owner = (struct thread *)(v & MTX_FLAGMASK); 547 #ifdef ADAPTIVE_GIANT 548 if (TD_IS_RUNNING(owner)) { 549 #else 550 if (m != &Giant && TD_IS_RUNNING(owner)) { 551 #endif 552 turnstile_release(&m->mtx_object); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f8fd7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9671 in panic (fmt=0xffffff0010624720 "?\226\230\017") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff80618b3f in trap_fatal (frame=0xffffff0010624720, eva=18446742974459582128) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80619066 in trap (frame= {tf_rdi = 123, tf_rsi = -1099236751584, tf_rdx = 6, tf_rcx = 0, tf_r8 = 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1099331437672, tf_rbp = 4, tf_r10 = -2050201464, tf_r11 = -1099236751584, tf_r12 = -1099236751584, tf_r13 = -1098723105024, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1478145856, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff8060442b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff000abd7b98, tid=18446742974472800032, opts=6, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:546 #8 0xffffffff804bb51d in ip_ctloutput (so=0x7b, sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/ip_output.c:1193 #9 0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0033fe14d0, sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff804416b8 in sosetopt (so=0xffffff0033fe14d0, sopt=0xffffffffa7e54b30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0010624720, s=586531656, level=-2050201464, name=0, val=0x0, valseg=UIO_USERSPACE, valsize=123) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff80619991 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff806045c8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) --- 2*) kgdb session on latest crashdump - 20061025 --- instruction pointer = 0x8:0xffffffff803eea47 stack pointer = 0x10:0xffffffffa7e548b0 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 8013 (tcpserver) trap number = 12 panic: page fault cpuid = 2 Uptime: 10h10m5s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f8fd7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9671 in panic (fmt=0xffffff0010624720 "?\226\230\017") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff80618b3f in trap_fatal (frame=0xffffff0010624720, eva=18446742974459582128) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80619066 in trap (frame= {tf_rdi = 123, tf_rsi = -1099236751584, tf_rdx = 6, tf_rcx = 0, tf_r8 = 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1099331437672, tf_rbp = 4, tf_r10 = -2050201464, tf_r11 = -1099236751584, tf_r12 = -1099236751584, tf_r13 = -1098723105024, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1478145856, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff8060442b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff000abd7b98, tid=18446742974472800032, opts=6, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:546 #8 0xffffffff804bb51d in ip_ctloutput (so=0x7b, sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/ip_output.c:1193 #9 0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0033fe14d0, sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff804416b8 in sosetopt (so=0xffffff0033fe14d0, sopt=0xffffffffa7e54b30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0010624720, s=586531656, level=-2050201464, name=0, val=0x0, valseg=UIO_USERSPACE, valsize=123) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff80619991 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff806045c8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 12 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720) at /usr/src/sys/kern/uipc_syscalls.c:1307 1307 return (kern_setsockopt(td, uap->s, uap->level, uap->name, (kgdb) p *sopt No symbol "sopt" in current context. (kgdb) p *kern_setsockopt $1 = {int (struct thread *, int, int, int, void *, enum uio_seg, socklen_t)} 0xffffffff80447a80 (kgdb) frame 12 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720) at /usr/src/sys/kern/uipc_syscalls.c:1307 1307 return (kern_setsockopt(td, uap->s, uap->level, uap->name, (kgdb) p td->td_proc->p_comm Cannot access memory at address 0x7b ---------- Remko: Adding to PR (misfiled from 105389): Randomly, and always while running tcpserver, a component used with qmail. The kernel will panic. This is the same issue as kern/104765 and possibly "freebsd panic on HP Proliant DL360" However, here we have i386 The mail server having this issue, handles 8000 emails a day. Panics occur every few hours to few days. Please see 104765 for traces. Nov 9 12:50:33 whitepine kernel: Fatal trap 12: page fault while in kernel mode Nov 9 12:50:33 whitepine kernel: fault virtual address = 0x78 Nov 9 12:50:33 whitepine kernel: fault code = supervisor read, page not present Nov 9 12:50:33 whitepine kernel: instruction pointer = 0x20:0xc06807e1 Nov 9 12:50:33 whitepine kernel: stack pointer = 0x28:0xeaf2aab8 Nov 9 12:50:33 whitepine kernel: frame pointer = 0x28:0xeaf2aabc Nov 9 12:50:33 whitepine kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Nov 9 12:50:33 whitepine kernel: = DPL 0, pres 1, def32 1, gran 1 Nov 9 12:50:33 whitepine kernel: processor eflags = resume, IOPL = 0 Nov 9 12:50:33 whitepine kernel: current process = 15690 (tcpserver) Nov 9 12:50:33 whitepine kernel: trap number = 12 Nov 9 12:50:33 whitepine kernel: panic: page fault Nov 9 12:50:33 whitepine kernel: Uptime: 51m25s Nov 9 12:50:33 whitepine kernel: Cannot dump. No dump device defined. From: Kai Gallasch To: bug-followup@FreeBSD.org, gallasch@free.de Cc: Subject: Re: kern/104765: [hp] kernel panic 6.2 prerelease-20061017 amd64 Date: Mon, 13 Nov 2006 12:49:12 +0100 Server now runs stable for about 10 days (no crash) with FreeBSD 6.2-BETA3 (cvs checkout 2006/11/01) and GENERIC SMP Kernel. debug.mpsafenet=0 We set /boot/loader.conf debug.mpsafenet=0 - seems to help here.. Subject of PR 104765 has been changed from "kern/104765: kernel panic 6.2 prerelease-20061017 amd64" to "kern/104765: [hp] kernel panic 6.2 prerelease-20061017 amd64" Does this mean the bug is HP hardware related? No feedback in the PR? Responsible-Changed-From-To: freebsd-bugs->rwatson Responsible-Changed-By: rwatson Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 Responsible-Changed-Why: Claim ownership, since I've been looking at issues similar or identical to this. Some questions: (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c you're running with? (2) Could you try the most recent patch attached to PR 102412? This is a patch to ip_ctloutput(). I've attached it below, but the chances are good that GNATS will mangle the patch. Index: ip_output.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_output.c,v retrieving revision 1.242.2.16 diff -u -r1.242.2.16 ip_output.c --- ip_output.c 24 Oct 2006 13:23:03 -0000 1.242.2.16 +++ ip_output.c 26 Oct 2006 18:20:55 -0000 @@ -1155,6 +1155,7 @@ struct sockopt *sopt; { struct inpcb *inp = sotoinpcb(so); + struct inpcbinfo *pcbinfo = inp->inp_pcbinfo; int error, optval; error = optval = 0; @@ -1190,12 +1191,15 @@ m_free(m); break; } + INP_INFO_WLOCK(pcbinfo); if (so->so_pcb == NULL) { + INP_INFO_WUNLOCK(pcbinfo); m_free(m); error = EINVAL; break; } INP_LOCK(inp); + INP_INFO_WUNLOCK(pcbinfo); error = ip_pcbopts(inp, sopt->sopt_name, m); INP_UNLOCK(inp); return (error); http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 From: Robert Watson To: Kai Gallasch Cc: bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Tue, 14 Nov 2006 15:49:07 +0000 (GMT) On Tue, 14 Nov 2006, Kai Gallasch wrote: > Robert Watson wrote: >> Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >> >> Responsible-Changed-From-To: freebsd-bugs->rwatson >> Responsible-Changed-By: rwatson >> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 >> Responsible-Changed-Why: >> Claim ownership, since I've been looking at issues similar or identical >> to this. Some questions: >> >> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c >> you're running with? > > /usr/src/sys/netinet/ip_output.c > > * @(#)ip_output.c 8.3 (Berkeley) 1/21/94 > * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03 > rwatson Exp $ Sounds good. I particularly wanted to make sure you had the most recent revision of this file. > /usr/src/sys/netinet/tcp_usrreq.c > > * From: @(#)tcp_usrreq.c 8.2 (Berkeley) 1/3/94 > * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44 > mux Exp $ > >> (2) Could you try the most recent patch attached to PR 102412? This is >> a patch to ip_ctloutput(). I've attached it below, but the chances >> are good that GNATS will mangle the patch. > > ok, I will apply the patch and rebuild. > > # cat /boot/loader.conf > debug.mpsafenet=0 > > If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is its > default value) Since I set this value to 0 the server didn't crash and > reached 10 days uptime. Yes, please. The race in question does exist with debug.mpsafenet=0, but it would only occur during very heavy paging, in which case Giant gets dropped during copyin/copyout. Otherwise, it doesn't. Thanks, Robert N M Watson Computer Laboratory University of Cambridge From: Kai Gallasch To: Robert Watson Cc: bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Thu, 23 Nov 2006 00:34:19 +0100 Robert Watson wrote: > > On Tue, 14 Nov 2006, Kai Gallasch wrote: > >> Robert Watson wrote: >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >>> >>> Responsible-Changed-From-To: freebsd-bugs->rwatson >>> Responsible-Changed-By: rwatson >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 >>> Responsible-Changed-Why: >>> Claim ownership, since I've been looking at issues similar or identical >>> to this. Some questions: >>> >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c >>> you're running with? >> >> /usr/src/sys/netinet/ip_output.c >> >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03 >> rwatson Exp $ > > Sounds good. I particularly wanted to make sure you had the most recent > revision of this file. > >> /usr/src/sys/netinet/tcp_usrreq.c >> >> * From: @(#)tcp_usrreq.c 8.2 (Berkeley) 1/3/94 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44 >> mux Exp $ >> >>> (2) Could you try the most recent patch attached to PR 102412? This is >>> a patch to ip_ctloutput(). I've attached it below, but the chances >>> are good that GNATS will mangle the patch. >> >> ok, I will apply the patch and rebuild. Another crash. (following the previous two crashes after applying your patch) Here is the output of kgdb. To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching output of the previous two crashdumps also. -K. --- kgdb output, kernel panic 20071123 - kern/104765 --- panic: page fault cpuid = 3 Uptime: 21h6m5s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff00011924c0 "?6\236 ") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff00011924c0, eva=18446742974745163440) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7ff6820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1099441690208, tf_rsi = -1476433104, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1099493202752, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1476433104, tf_rbp = -1098734529648, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1099441690208, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1476433696, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff00011924c0, s=774342528, level=-2138030408, name=-2138553872, val=0xffffff00011924c0, valseg=1035680408, valsize=69937568) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff00042b29a0, uap=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) --- kgdb output, kernel panic 20071120 - kern/104765 --- #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980, eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp = -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980, s=232190912, level=-2138030408, name=-2138553872, val=0xffffff000f456980, valseg=1035680408, valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0, uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) --- kgdb output, kernel panic 20071116 - kern/104765 --- (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260, eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp = -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260, s=421725600, level=-2138030408, name=-2138553872, val=0xffffff0022251260, valseg=1035680408, valsize=248297888) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0, uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) From: Robert Watson To: FreeBSD-gnats-submit@FreeBSD.org Cc: Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd) Date: Fri, 24 Nov 2006 14:05:14 +0000 (GMT) Append follow-up to PR. Robert N M Watson Computer Laboratory University of Cambridge ---------- Forwarded message ---------- Date: Thu, 16 Nov 2006 11:47:29 +0100 From: Kai Gallasch To: Robert Watson Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Robert Watson wrote: > > On Tue, 14 Nov 2006, Kai Gallasch wrote: > >> Robert Watson wrote: >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >>> >>> Responsible-Changed-From-To: freebsd-bugs->rwatson >>> Responsible-Changed-By: rwatson >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 >>> Responsible-Changed-Why: >>> Claim ownership, since I've been looking at issues similar or identical >>> to this. Some questions: >>> >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c >>> you're running with? >> >> /usr/src/sys/netinet/ip_output.c >> >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03 >> rwatson Exp $ > > Sounds good. I particularly wanted to make sure you had the most recent > revision of this file. > >> /usr/src/sys/netinet/tcp_usrreq.c >> >> * From: @(#)tcp_usrreq.c 8.2 (Berkeley) 1/3/94 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44 >> mux Exp $ >> >>> (2) Could you try the most recent patch attached to PR 102412? This is >>> a patch to ip_ctloutput(). I've attached it below, but the chances >>> are good that GNATS will mangle the patch. >> >> ok, I will apply the patch and rebuild. >> >> # cat /boot/loader.conf >> debug.mpsafenet=0 >> >> If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is >> its default value) Since I set this value to 0 the server didn't crash >> and reached 10 days uptime. > > Yes, please. The race in question does exist with debug.mpsafenet=0, > but it would only occur during very heavy paging, in which case Giant > gets dropped during copyin/copyout. Otherwise, it doesn't. Hi Robert. After 1d 7h the server crashed again. Here is the backtrace. # kgdb /usr/obj/usr/src/sys/SMP/kernel.debug /var/crash/vmcore.4 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: panic: page fault cpuid = 3 Uptime: 1d7h39m57s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260, eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp = -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260, s=421725600, level=-2138030408, name=-2138553872, val=0xffffff0022251260, valseg=1035680408, valsize=248297888) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0, uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) BTW, make.conf: # cat /etc/make.conf CFLAGS= -O -pipe CPUTYPE= opteron NO_PROFILE= true From: Robert Watson To: FreeBSD-gnats-submit@FreeBSD.org Cc: Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd) Date: Fri, 24 Nov 2006 14:05:43 +0000 (GMT) Append followup to the PR. Robert N M Watson Computer Laboratory University of Cambridge ---------- Forwarded message ---------- Date: Mon, 20 Nov 2006 17:06:23 +0100 From: Kai Gallasch To: Robert Watson Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Robert Watson schrieb: > > On Tue, 14 Nov 2006, Kai Gallasch wrote: > >> Robert Watson wrote: >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >>> >>> Responsible-Changed-From-To: freebsd-bugs->rwatson >>> Responsible-Changed-By: rwatson >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 >>> Responsible-Changed-Why: >>> Claim ownership, since I've been looking at issues similar or identical >>> to this. Some questions: >>> >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c >>> you're running with? >> >> /usr/src/sys/netinet/ip_output.c >> >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03 >> rwatson Exp $ > > Sounds good. I particularly wanted to make sure you had the most recent > revision of this file. > >> /usr/src/sys/netinet/tcp_usrreq.c >> >> * From: @(#)tcp_usrreq.c 8.2 (Berkeley) 1/3/94 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44 >> mux Exp $ >> >>> (2) Could you try the most recent patch attached to PR 102412? This is >>> a patch to ip_ctloutput(). I've attached it below, but the chances >>> are good that GNATS will mangle the patch. >> >> ok, I will apply the patch and rebuild. >> >> # cat /boot/loader.conf >> debug.mpsafenet=0 >> >> If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is >> its default value) Since I set this value to 0 the server didn't crash >> and reached 10 days uptime. > > Yes, please. The race in question does exist with debug.mpsafenet=0, > but it would only occur during very heavy paging, in which case Giant > gets dropped during copyin/copyout. Otherwise, it doesn't. Hi. Again a kernel panic. This is the second one with your patch applied. Attached is the backtrace of the crash. Must the debug kernel "kernel.debug" be installed as running kernel on the server, or is it sufficient for debugging purposes and kgdb usage to have it available in /usr/obj/usr/src/sys/SMP/kernel.debug ? Cheers, Kai. -- backtrace crash 20061120 -- # kgdb /usr/obj/usr/src/sys/SMP/kernel.debug /var/crash/vmcore.5 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: panic: page fault cpuid = 3 Uptime: 4d8h19m7s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980, eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp = -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980, s=232190912, level=-2138030408, name=-2138553872, val=0xffffff000f456980, valseg=1035680408, valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0, uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) -- backtrace crash 20061120 -- From: Robert Watson To: FreeBSD-gnats-submit@FreeBSD.org Cc: Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd) Date: Fri, 24 Nov 2006 14:06:13 +0000 (GMT) Append followup to PR. Robert N M Watson Computer Laboratory University of Cambridge ---------- Forwarded message ---------- Date: Thu, 23 Nov 2006 00:34:19 +0100 From: Kai Gallasch To: Robert Watson Cc: bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Robert Watson wrote: > > On Tue, 14 Nov 2006, Kai Gallasch wrote: > >> Robert Watson wrote: >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64 >>> >>> Responsible-Changed-From-To: freebsd-bugs->rwatson >>> Responsible-Changed-By: rwatson >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 >>> Responsible-Changed-Why: >>> Claim ownership, since I've been looking at issues similar or identical >>> to this. Some questions: >>> >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c >>> you're running with? >> >> /usr/src/sys/netinet/ip_output.c >> >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03 >> rwatson Exp $ > > Sounds good. I particularly wanted to make sure you had the most recent > revision of this file. > >> /usr/src/sys/netinet/tcp_usrreq.c >> >> * From: @(#)tcp_usrreq.c 8.2 (Berkeley) 1/3/94 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44 >> mux Exp $ >> >>> (2) Could you try the most recent patch attached to PR 102412? This is >>> a patch to ip_ctloutput(). I've attached it below, but the chances >>> are good that GNATS will mangle the patch. >> >> ok, I will apply the patch and rebuild. Another crash. (following the previous two crashes after applying your patch) Here is the output of kgdb. To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching output of the previous two crashdumps also. -K. --- kgdb output, kernel panic 20071123 - kern/104765 --- panic: page fault cpuid = 3 Uptime: 21h6m5s Dumping 1023 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff00011924c0 "?6\236 ") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff00011924c0, eva=18446742974745163440) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7ff6820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1099441690208, tf_rsi = -1476433104, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1099493202752, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1476433104, tf_rbp = -1098734529648, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1099441690208, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1476433696, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff00042b29a0, sopt=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff00011924c0, s=774342528, level=-2138030408, name=-2138553872, val=0xffffff00011924c0, valseg=1035680408, valsize=69937568) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff00042b29a0, uap=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) --- kgdb output, kernel panic 20071120 - kern/104765 --- #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980, eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp = -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0, sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980, s=232190912, level=-2138030408, name=-2138553872, val=0xffffff000f456980, valseg=1035680408, valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0, uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) --- kgdb output, kernel panic 20071116 - kern/104765 --- (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f9557 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260, eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #6 0xffffffff80619933 in trap (frame= {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx = -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 = -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp = -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 = -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282, tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #7 0xffffffff80604b2b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157 #9 0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0, sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260, s=421725600, level=-2138030408, name=-2138553872, val=0xffffff0022251260, valseg=1035680408, valsize=248297888) at /usr/src/sys/kern/uipc_syscalls.c:1351 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0, uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307 #13 0xffffffff8061a1b1 in syscall (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792 #14 0xffffffff80604cc8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270 #15 0x00000008006c460c in ?? () Previous frame inner to this frame (corrupt stack?) From: Robert Watson To: Kai Gallasch Cc: bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Fri, 24 Nov 2006 14:16:50 +0000 (GMT) On Thu, 23 Nov 2006, Kai Gallasch wrote: > Another crash. (following the previous two crashes after applying your > patch) Here is the output of kgdb. > > To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching > output of the previous two crashdumps also. Hmm. This is unfortunate, as it suggests that finding a non-disruptive fix for this will be difficult. I'm not sure how you feel about running a -CURRENT kernel, but the architectural change that fixes this whole class of race conditions is present there. There have been some recent hitches in 7-CURRENT due to introducing MSI support, so if you are willing to give this a try you may also want to add the following to your /boot/loader.conf before starting: hw.pci.enable_msi="0" hw.pci.enable_msix="0" Otherwise the 7-CURRENT kernel is in quite good shape. Running with it for a few days to see if the crash problem "goes away" would be quite useful. In the mean time I'll explore another workaround to use as a substitute for the architectural fix during the release cycle. Robert N M Watson Computer Laboratory University of Cambridge From: Robert Watson To: Kai Gallasch Cc: bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Sat, 25 Nov 2006 00:57:11 +0000 (GMT) On Thu, 23 Nov 2006, Kai Gallasch wrote: > Another crash. (following the previous two crashes after applying your > patch) Here is the output of kgdb. > > To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching > output of the previous two crashdumps also. The attached patch may provide a more substantive solution for this problem, at least until 7.x. I've booted and tested this, but since I don't have a reproduction scenario for the specific bug you're running into right now, I've not managed to test those particular cases. If GNATS/etc mangle the patch, you can also download it from: http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff It appears to apply without problems against a stock RELENG_6 src/sys/netinet directory, so you may need to remove the current patch you're running with before proceeding. Robert N M Watson Computer Laboratory University of Cambridge Index: ip_output.c =================================================================== RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_output.c,v retrieving revision 1.242.2.16 diff -u -r1.242.2.16 ip_output.c --- ip_output.c 24 Oct 2006 13:23:03 -0000 1.242.2.16 +++ ip_output.c 24 Nov 2006 15:47:13 -0000 @@ -1148,15 +1148,29 @@ /* * IP socket option processing. + * + * There are two versions of this call in order to work around a race + * condition in TCP in FreeBSD 6.x. In the TCP implementation, so->so_pcb + * can become NULL if the pcb or pcbinfo lock isn't held. However, when + * entering ip_ctloutput(), neither lock is held, and finding the pointer to + * either lock requires follow so->so_pcb, which may be NULL. + * ip_ctloutput_pcbinfo() accepts the pcbinfo pointer so that the lock can be + * safely acquired. This is not required in FreeBSD 7.x because the + * invariants on so->so_pcb are much stronger, so it cannot become NULL + * while the socket is in use. */ int -ip_ctloutput(so, sopt) +ip_ctloutput_pcbinfo(so, sopt, pcbinfo) struct socket *so; struct sockopt *sopt; + struct inpcbinfo *pcbinfo; { struct inpcb *inp = sotoinpcb(so); int error, optval; + if (pcbinfo == NULL) + pcbinfo = inp->inp_pcbinfo; + error = optval = 0; if (sopt->sopt_level != IPPROTO_IP) { return (EINVAL); @@ -1190,12 +1204,15 @@ m_free(m); break; } + INP_INFO_WLOCK(pcbinfo); if (so->so_pcb == NULL) { + INP_INFO_WUNLOCK(pcbinfo); m_free(m); error = EINVAL; break; } INP_LOCK(inp); + INP_INFO_WUNLOCK(pcbinfo); error = ip_pcbopts(inp, sopt->sopt_name, m); INP_UNLOCK(inp); return (error); @@ -1217,10 +1234,14 @@ if (error) break; + INP_INFO_WLOCK(pcbinfo); if (so->so_pcb == NULL) { + INP_INFO_WUNLOCK(pcbinfo); error = EINVAL; break; } + INP_LOCK(inp); + INP_INFO_WUNLOCK(pcbinfo); switch (sopt->sopt_name) { case IP_TOS: inp->inp_ip_tos = optval; @@ -1277,6 +1298,7 @@ OPTSET(INP_DONTFRAG); break; } + INP_UNLOCK(inp); break; #undef OPTSET @@ -1295,11 +1317,13 @@ if (error) break; + INP_INFO_WLOCK(pcbinfo); if (so->so_pcb == NULL) { error = EINVAL; break; } INP_LOCK(inp); + INP_INFO_WUNLOCK(pcbinfo); switch (optval) { case IP_PORTRANGE_DEFAULT: inp->inp_flags &= ~(INP_LOWPORT); @@ -1480,6 +1504,15 @@ return (error); } +int +ip_ctloutput(so, sopt) + struct socket *so; + struct sockopt *sopt; +{ + + return (ip_ctloutput_pcbinfo(so, sopt, NULL)); +} + /* * Set up IP options in pcb for insertion in output packets. * Store in mbuf with pointer in pcbopt, adding pseudo-option Index: ip_var.h =================================================================== RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_var.h,v retrieving revision 1.95 diff -u -r1.95 ip_var.h --- ip_var.h 2 Jul 2005 23:13:31 -0000 1.95 +++ ip_var.h 24 Nov 2006 15:32:53 -0000 @@ -144,6 +144,7 @@ struct ip; struct inpcb; +struct inpcbinfo; struct route; struct sockopt; @@ -164,6 +165,8 @@ extern struct pr_usrreqs rip_usrreqs; int ip_ctloutput(struct socket *, struct sockopt *sopt); +int ip_ctloutput_pcbinfo(struct socket *, struct sockopt *sopt, + struct inpcbinfo *pcbinfo); void ip_drain(void); void ip_fini(void *xtp); int ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu, Index: tcp_usrreq.c =================================================================== RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.124.2.3 diff -u -r1.124.2.3 tcp_usrreq.c --- tcp_usrreq.c 27 Sep 2006 09:24:44 -0000 1.124.2.3 +++ tcp_usrreq.c 24 Nov 2006 14:59:41 -0000 @@ -1035,7 +1035,7 @@ error = ip6_ctloutput(so, sopt); else #endif /* INET6 */ - error = ip_ctloutput(so, sopt); + error = ip_ctloutput_pcbinfo(so, sopt, &tcbinfo); return (error); } tp = intotcpcb(inp); State-Changed-From-To: open->feedback State-Changed-By: rwatson State-Changed-When: Sat Nov 25 11:01:09 UTC 2006 State-Changed-Why: Change to feedback state, waiting feedback on a new patch. http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 From: Kai Gallasch To: Robert Watson Cc: Johannes 5 Joemann Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Sun, 26 Nov 2006 03:23:54 +0100 Robert Watson schrieb: > > On Thu, 23 Nov 2006, Kai Gallasch wrote: > >> Another crash. (following the previous two crashes after applying your >> patch) Here is the output of kgdb. >> >> To keep bug-followup@freebsd.org for kern/104765 up to date I am >> attaching output of the previous two crashdumps also. > > The attached patch may provide a more substantive solution for this > problem, at least until 7.x. I've booted and tested this, but since I > don't have a reproduction scenario for the specific bug you're running > into right now, I've not managed to test those particular cases. If > GNATS/etc mangle the patch, you can also download it from: > > http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff > > It appears to apply without problems against a stock RELENG_6 > src/sys/netinet directory, so you may need to remove the current patch > you're running with before proceeding. Hi. I just rebuilt the server with your patch 20061124-ip_ctloutput.diff applied to a fresh checkout of RELENG_6. Thanks for all your effort and time debugging this problem, especially with an upcoming 6.2 release in the queue. --Kai. > > Robert N M Watson > Computer Laboratory > University of Cambridge > > Index: ip_output.c > =================================================================== > RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_output.c,v > retrieving revision 1.242.2.16 > diff -u -r1.242.2.16 ip_output.c > --- ip_output.c 24 Oct 2006 13:23:03 -0000 1.242.2.16 > +++ ip_output.c 24 Nov 2006 15:47:13 -0000 > @@ -1148,15 +1148,29 @@ From: Robert Watson To: Kai Gallasch Cc: Johannes 5 Joemann , bug-followup@FreeBSD.org Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 Date: Tue, 28 Nov 2006 15:01:10 +0000 (GMT) On Sun, 26 Nov 2006, Kai Gallasch wrote: >> The attached patch may provide a more substantive solution for this >> problem, at least until 7.x. I've booted and tested this, but since I >> don't have a reproduction scenario for the specific bug you're running into >> right now, I've not managed to test those particular cases. If GNATS/etc >> mangle the patch, you can also download it from: >> >> http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff >> >> It appears to apply without problems against a stock RELENG_6 >> src/sys/netinet directory, so you may need to remove the current patch >> you're running with before proceeding. > > I just rebuilt the server with your patch 20061124-ip_ctloutput.diff applied > to a fresh checkout of RELENG_6. > > Thanks for all your effort and time debugging this problem, especially with > an upcoming 6.2 release in the queue. Any luck with this patch? I'd love to get this fixed merged into the stable and release branches, but don't want to do that without confirmation it helps. Thanks, Robert N M Watson Computer Laboratory University of Cambridge From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/104765: commit references a PR Date: Tue, 28 Nov 2006 21:41:32 +0000 (UTC) rwatson 2006-11-28 21:41:12 UTC FreeBSD src repository Modified files: (Branch: RELENG_6) sys/netinet ip_output.c ip_var.h tcp_usrreq.c Log: Reformulate ip_ctloutput() and tcp_ctloutput() to work around the fact that so_pcb can be invalidated at any time due to an untimely reset. Move the body of ip_ctloutput() to ip_ctloutput_pcbinfo(), which accepts a pcbinfo argument, and wrap it with ip_ctloutput(), which passes a NULL. Modify tcp_ctloutput() to directly invoke ip_ctloutput_pcbinfo() and pass tcbinfo. Hold the pcbinfo lock when dereferencing so_pcb and acquiring the inpcb lock in order to prevent the inpcb from being freed; the pcbinfo lock is then immediately dropped. This is required as TCP may free the inppcb and invalidate so_pcb due to a reset at any time in the RELENG_6 network stack, which otherwise leads to a panic. This panic might be frequently seen on highly loaded IRC and Samba servers, which have long-lasting TCP connections, query socket options frequently, and see a significant number of reset connections. This change has been merged directly to RELENG_6 as the problem does not exist in HEAD, where the invariants for so_pcb are much stronger; the architectural changes in HEAD avoid the need to acquire a global lock in the socket option path. This change will be merged to RELENG_6_2. PR: 102412, 104765 Reviewed by: Diane Bruce Tested by: Daniel Austin , Kai Gallasch Revision Changes Path 1.242.2.17 +34 -1 src/sys/netinet/ip_output.c 1.95.2.1 +3 -0 src/sys/netinet/ip_var.h 1.124.2.4 +1 -1 src/sys/netinet/tcp_usrreq.c _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/104765: commit references a PR Date: Tue, 28 Nov 2006 23:19:35 +0000 (UTC) rwatson 2006-11-28 23:19:18 UTC FreeBSD src repository Modified files: (Branch: RELENG_6_2) sys/netinet ip_output.c ip_var.h tcp_usrreq.c Log: Merge ip_output.c:1.242.2.17, ip_var.h:1.95.2.1, tcp_usrreq.c:1.124.2.4 from RELENG_6 to RELENG_6_2: Reformulate ip_ctloutput() and tcp_ctloutput() to work around the fact that so_pcb can be invalidated at any time due to an untimely reset. Move the body of ip_ctloutput() to ip_ctloutput_pcbinfo(), which accepts a pcbinfo argument, and wrap it with ip_ctloutput(), which passes a NULL. Modify tcp_ctloutput() to directly invoke ip_ctloutput_pcbinfo() and pass tcbinfo. Hold the pcbinfo lock when dereferencing so_pcb and acquiring the inpcb lock in order to prevent the inpcb from being freed; the pcbinfo lock is then immediately dropped. This is required as TCP may free the inppcb and invalidate so_pcb due to a reset at any time in the RELENG_6 network stack, which otherwise leads to a panic. This panic might be frequently seen on highly loaded IRC and Samba servers, which have long-lasting TCP connections, query socket options frequently, and see a significant number of reset connections. This change has been merged directly to RELENG_6 as the problem does not exist in HEAD, where the invariants for so_pcb are much stronger; the architectural changes in HEAD avoid the need to acquire a global lock in the socket option path. This change will be merged to RELENG_6_2. PR: 102412, 104765 Reviewed by: Diane Bruce Tested by: Daniel Austin , Kai Gallasch Approved by: re (kensmith) Revision Changes Path 1.242.2.16.2.1 +34 -1 src/sys/netinet/ip_output.c 1.95.8.1 +3 -0 src/sys/netinet/ip_var.h 1.124.2.3.2.1 +1 -1 src/sys/netinet/tcp_usrreq.c _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" State-Changed-From-To: feedback->closed State-Changed-By: rwatson State-Changed-When: Wed Dec 6 12:42:54 UTC 2006 State-Changed-Why: As there have been no ruther reports of panic after a week and patches have been merged to appropriate branches, assume that the problem is resolved. If this is not the case, please let me know. Thanks for the report, patch testing, and patience! http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 >Unformatted: