From rivers@dignus.com Wed Jul 22 07:51:15 1998 Received: from elvis.vnet.net (elvis.vnet.net [166.82.1.5]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA14157 for ; Wed, 22 Jul 1998 07:51:11 -0700 (PDT) (envelope-from rivers@dignus.com) Received: from dignus.com (ponds.vnet.net [166.82.177.48]) by elvis.vnet.net (8.8.8/8.8.4) with ESMTP id KAA01885 for ; Wed, 22 Jul 1998 10:50:47 -0400 (EDT) Received: from lakes.dignus.com (lakes [10.0.0.3]) by dignus.com (8.8.8/8.8.5) with ESMTP id LAA01815 for ; Wed, 22 Jul 1998 11:22:44 -0400 (EDT) Received: (from rivers@localhost) by lakes.dignus.com (8.8.8/8.6.9) id KAA00527; Wed, 22 Jul 1998 10:54:52 -0400 (EDT) Message-Id: <199807221454.KAA00527@lakes.dignus.com> Date: Wed, 22 Jul 1998 10:54:52 -0400 (EDT) From: Thomas David Rivers Reply-To: rivers@dignus.com To: FreeBSD-gnats-submit@freebsd.org Subject: panic: malloc: wrong bucket X-Send-Pr-Version: 3.2 >Number: 7367 >Category: kern >Synopsis: panic: malloc: wrong bucket >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Jul 22 09:00:01 PDT 1998 >Closed-Date: Sat Feb 5 05:56:47 PST 2000 >Last-Modified: Sat Feb 5 05:57:06 PST 2000 >Originator: Thomas David Rivers >Release: FreeBSD 2.2.6-RELEASE i386 >Organization: Dignus LLC >Environment: FreeBSD 2.2.6; 32Meg machine, XFreeBSD, Matrox Millenium II card. >Description: I get "mysterious" panics/crashs after upgrading to 2.2.6 (from 2.2.5). Sometimes, I get a panic with a nice savecore, sometimes I don't. Here's the traceback of the latest one (from gdb -k): #0 boot (howto=256) at ../../kern/kern_shutdown.c:266 #1 0xf0112882 in panic (fmt=0xf010f01b "malloc: wrong bucket") at ../../kern/kern_shutdown.c:390 #2 0xf010f364 in malloc (size=264, type=41, flags=0) at ../../kern/kern_malloc.c:226 #3 0xf010c992 in fork1 (p1=0xf0b04000, flags=20, retval=0xefbfff84) at ../../kern/kern_fork.c:170 #4 0xf010c870 in fork (p=0xf0b04000, uap=0xefbfff94, retval=0xefbfff84) at ../../kern/kern_fork.c:91 #5 0xf01c853f in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 368640, tf_esi = 337812, tf_ebp = -272639004, tf_isp = -272629788, tf_ebx = 1, tf_edx = 368640, tf_ecx = 337812, tf_eax = 2, tf_trapno = 12, tf_err = 7, tf_eip = 168389, tf_cs = 31, tf_eflags = 514, tf_esp = -272639028, tf_ss = 39}) at ../../i386/i386/trap.c:918 #6 0x291c5 in ?? () #7 0x2e49 in ?? () #8 0x2399 in ?? () #9 0x2148 in ?? () #10 0x909e in ?? () #11 0x107e in ?? () which seems to be an issue with the kernel malloc routines, the call in fork1 looks like: 169 /* Allocate new proc. */ 170 MALLOC(newproc, struct proc *, sizeof(struct proc), M_PROC, M_WAITOK); I'd guess that the malloc chains had been corrupted, and this is not the culprit of the corruption; but the call that discovered it. I have the kernel (a 2.2.6-RELEASE kernel config'd with debug) and the core file if anyone is interested. The panic call looks like (from kern_malloc.c): 221 freep->spare0 = 0; 222 #endif /* DIAGNOSTIC */ 223 #ifdef KMEMSTATS 224 kup = btokup(va); 225 if (kup->ku_indx != indx) 226 panic("malloc: wrong bucket"); 227 if (kup->ku_freecnt == 0) 228 panic("malloc: lost data"); 229 kup->ku_freecnt--; 230 kbp->kb_totalfree--; and, the problem here is kup is NULL! [so, the dereference kup->ku_indx gets a bogus value.] va is 0xf0b46c00, and *va is NULL. >How-To-Repeat: Hmm... for me, it's boot up and wait a few days. >Fix: As a diagnostic, perhaps a PANIC in kern_malloc if kup is NULL? >Release-Note: >Audit-Trail: State-Changed-From-To: open->feedback State-Changed-By: phk State-Changed-When: Fri Jul 24 00:21:43 PDT 1998 State-Changed-Why: Thomas, I hate to say this, but my initial reaction is "hardware!" It could be though, that you are on to a very elusive bug in the kernel, but in such case, I doubt anybody but you will be in a postiion to find it. The diagnostic you propose sounds like a good first step, and all I can suggest you is to do the salami method from there. You may want to transplant another motherboard into the machine for a week or so, just to rule out the hardware From: Brett Glass To: freebsd-gnats-submit@freebsd.org, rivers@dignus.com, hackers@freebsd.com Cc: Subject: Re: kern/7367: panic: malloc: wrong bucket Date: Wed, 05 Aug 1998 10:43:05 -0600 FreeBSD 2.2.7-RELEASE: We've been getting kernel panics, with spontaneous reboots, during periods of heavy memory and CPU activity (e.g. kernel rebuilds and gzip -9 on large files). dmesg output follows. Sometimes, the system just reboots without flashing a message, but we have seen an error message that complains of a "recursive call" to malloc() and one panic reboot displayed a screen that mentioned a fatal "page not found" error. The problem seems to occur when physical memory is fully committed and we're swapping. I suspect that the bug will be easier to flush out if the system has less memory -- i.e. if it has 8 MB or 16 MB of RAM rather than 32 MB. We'd suspect hardware, except that RAM all tests good and we've changed nothing but the OS version. What's more, the system ONLY CRASHES when specific programs are run -- e.g. when we do a kernel recompile or run that backup. System stats: Intel 486DX4/100 Zeos "Rattler" motherboard with integrated IDE 16 MB RAM Mitsumi CD-ROM with proprietary interface 10 serial ports w/multiport cards Western Digital Caviar 2.5 GB IDE drive Artisoft AE-3 NE2000 clone Diamond PCI graphics adapter (not running X, so VGA mode used always) dmesg output follows: Copyright (c) 1992-1998 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 2.2.7-RELEASE #0: Mon Aug 3 03:43:28 MDT 1998 root@lariat.lariat.org:/usr/src/sys/compile/LARIAT CPU: i486 DX4 (486-class CPU) Origin = "GenuineIntel" Id = 0x480 Stepping=0 Features=0x3 real memory = 16777216 (16384K bytes) avail memory = 15089664 (14736K bytes) Probing for devices on PCI bus 0: chip0 rev 4 on pci0:0:0 chip1 rev 3 on pci0:2:0 vga0 rev 0 on pci0:13:0 Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <4 virtual consoles, flags=0x0> ed0 at 0x320-0x33f irq 5 on isa ed0: address 00:00:6e:24:e4:15, type NE2000 (16 bit) sio0 at 0x1e0-0x1e7 flags 0x185 on isa sio0: type 16550A (multiport) sio1 at 0x280-0x287 irq 9 flags 0x185 on isa sio1: type 16550A (multiport master) sio2 at 0x3f8-0x3ff flags 0x585 on isa sio2: type 16550A (multiport) sio3 at 0x2f8-0x2ff flags 0x585 on isa sio3: type 16550A (multiport) sio4 at 0x3e8-0x3ef flags 0x585 on isa sio4: type 16550A (multiport) sio5 at 0x2e8-0x2ef irq 4 flags 0x585 on isa sio5: type 16550A (multiport master) sio6 at 0x1f8-0x1ff flags 0x985 on isa sio6: type 16550A (multiport) sio7 at 0x1e8-0x1ef flags 0x985 on isa sio7: type 16550A (multiport) sio8 at 0x2a8-0x2af flags 0x985 on isa sio8: type 16550A (multiport) sio9 at 0x1a8-0x1af irq 3 flags 0x985 on isa sio9: type 16550A (multiport master) lpt0 at 0x3bc-0x3c3 irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fd0: 1.44MB 3.5in wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): wd0: 2441MB (4999680 sectors), 4960 cyls, 16 heads, 63 S/T, 512 B/S mcd0: type Mitsumi FX001D, version info: D 2 mcd0 at 0x310-0x313 irq 10 on isa npx0 flags 0x1 on motherboard npx0: INT 16 interface WARNING: / was not properly dismounted. State-Changed-From-To: feedback->closed State-Changed-By: asmodai State-Changed-When: Sat Feb 5 05:56:47 PST 2000 State-Changed-Why: Feedback time-out. >Unformatted: