From stijn@sandcat.nl Mon Nov 28 09:41:15 2005 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D367816A420; Mon, 28 Nov 2005 09:41:15 +0000 (GMT) (envelope-from stijn@sandcat.nl) Received: from kweetal.tue.nl (kweetal.tue.nl [131.155.3.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AD8B43D73; Mon, 28 Nov 2005 09:41:13 +0000 (GMT) (envelope-from stijn@sandcat.nl) Received: from localhost (localhost [127.0.0.1]) by kweetal.tue.nl (Postfix) with ESMTP id 0038413B94B; Mon, 28 Nov 2005 10:41:11 +0100 (CET) Received: from kweetal.tue.nl ([127.0.0.1]) by localhost (kweetal.tue.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 60654-01; Mon, 28 Nov 2005 10:41:11 +0100 (CET) Received: from umta.win.tue.nl (umta.win.tue.nl [131.155.71.100]) by kweetal.tue.nl (Postfix) with ESMTP id 25BD713B943; Mon, 28 Nov 2005 10:41:11 +0100 (CET) Received: from sandcat.nl (pcwin165 [131.155.69.121]) by umta.win.tue.nl (Postfix) with ESMTP id 20D9231401C; Mon, 28 Nov 2005 10:41:11 +0100 (CET) Received: by sandcat.nl (Postfix, from userid 1001) id 3D52DAC823; Mon, 28 Nov 2005 10:43:38 +0100 (CET) Message-Id: <20051128094338.3D52DAC823@sandcat.nl> Date: Mon, 28 Nov 2005 10:43:38 +0100 (CET) From: Stijn Hoop Reply-To: Stijn Hoop To: FreeBSD-gnats-submit@freebsd.org Cc: Lukas Ertl Subject: panic due to g_malloc returning null in gv_drive_done X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 89660 >Category: kern >Synopsis: [vinum] [patch] [panic] due to g_malloc returning null in gv_drive_done >Confidential: no >Severity: serious >Priority: medium >Responsible: jh >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Nov 28 09:50:02 GMT 2005 >Closed-Date: Fri Feb 26 11:21:05 UTC 2010 >Last-Modified: Fri Feb 26 11:21:05 UTC 2010 >Originator: Stijn Hoop >Release: FreeBSD 6.0-RELEASE i386 >Organization: >Environment: System: FreeBSD 6.0-RELEASE #1: Sun Nov 27 14:48:26 CET 2005 stijn@pcwin002.win.tue.nl:/net/freebsd/6.0-SECURITY/obj/net/freebsd/6.0-SECURITY/src/sys/SANDCAT i386 >Description: - This machine panics every night on the daily maintenance time, with the following backtrace (hand transcribed because of a lack of serial cable): Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x20:0xc0711a41 stack pointer = 0x28:0xe3245ccc frame pointer = 0x28:0xe3245cd8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) [thread pid 3 tid 100021 ] Stopped at gv_drive_done+0x29: movl %ebx,0(%eax) db> bt Tracing pid 3 tid 100021 td 0xc1e8f300 gv_drive_done(c6bcc630) at gv_drive_done+0x29 biodone(c6bcc630) at biodone+0x8b g_io_schedule_up(c1e8f300) at g_io_schedule_up+0x86 g_up_procbody(0,e3245d38) at g_up_procbody+0x6e fork_exit(c0482c04,0,e3245d38) at fork_exit+0x70 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3245d6c, ebp = 0 --- db> Luckily I was able to obtain a crash dump, which yielded the following information (after loading the geom_vinum symbols from a debug module): #0 doadump () at pcpu.h:165 #1 0xc0431953 in db_fncall (dummy1=-1066867712, dummy2=0, dummy3=-1067549157, dummy4=0xe3245af8 "$[$ã\210e^À\020[$ã\024[$ã\220\a") at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:492 #2 0xc0431758 in db_command (last_cmdp=0xc0667c04, cmd_table=0x0, aux_cmd_tablep=0xc0638d58, aux_cmd_tablep_end=0xc0638d5c) at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:350 #3 0xc0431820 in db_command_loop () at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:458 #4 0xc043342d in db_trap (type=12, code=0) at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_main.c:221 #5 0xc04d38ff in kdb_trap (type=12, code=0, tf=0xe3245c8c) at /net/freebsd/6.0-SECURITY/src/sys/kern/subr_kdb.c:473 #6 0xc05fea4c in trap_fatal (frame=0xe3245c8c, eva=0) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:822 #7 0xc05fe7bb in trap_pfault (frame=0xe3245c8c, usermode=0, eva=0) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:742 #8 0xc05fe3d5 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = -484179928, tf_edi = 0, tf_esi = -1038670080, tf_ebp = -484156200, tf_isp = -484156232, tf_ebx = -960707024, tf_edx = 0, tf_ecx = -1041698048, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1066329535, tf_cs = 32, tf_eflags = 590470, tf_esp = -1066329576, tf_ss = -960707024}) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:432 #9 0xc05f0f8a in calltrap () at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:139 #10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290 #11 0xc05038a7 in biodone (bp=0xc6bcc630) at /net/freebsd/6.0-SECURITY/src/sys/kern/vfs_bio.c:2893 #12 0xc0482a26 in g_io_schedule_up (tp=0xc1e8f300) at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_io.c:474 #13 0xc0482c72 in g_up_procbody () at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_kern.c:95 #14 0xc04a5820 in fork_exit (callout=0xc0482c04 , arg=0x0, frame=0xe3245d38) at /net/freebsd/6.0-SECURITY/src/sys/kern/kern_fork.c:789 #15 0xc05f0fec in fork_trampoline () at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:208 (kgdb) frame 10 #10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290 290 geom.h: No such file or directory. in geom.h (kgdb) print/x $eax $3 = 0x0 (kgdb) disassemble gv_drive_done+0x41 Dump of assembler code for function gv_drive_done: 0xc0711a18 : push %ebp 0xc0711a19 : mov %esp,%ebp 0xc0711a1b : push %edi 0xc0711a1c : push %esi 0xc0711a1d : push %ebx 0xc0711a1e : mov 0x8(%ebp),%ebx 0xc0711a21 : mov 0x44(%ebx),%eax 0xc0711a24 : mov (%eax),%eax 0xc0711a26 : mov 0x3c(%eax),%esi 0xc0711a29 : orb $0x1,0x2(%ebx) 0xc0711a2d : push $0x101 0xc0711a32 : push $0xc0643ca0 0xc0711a37 : push $0xc 0xc0711a39 : call 0xc04b0a30 0xc0711a3e : add $0xc,%esp 0xc0711a41 : mov %ebx,(%eax) 0xc0711a43 : push $0xf6 0xc0711a48 : push $0xc071a752 0xc0711a4d : lea 0x7c(%esi),%ebx which on my system corresponds to %%% static void gv_drive_done(struct bio *bp) { struct gv_drive *d; struct gv_bioq *bq; /* Put the BIO on the worker queue again. */ d = bp->bio_from->geom->softc; bp->bio_cflags |= GV_BIO_DONE; bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO); <--- g_malloc returns NULL bq->bp = bp; mtx_lock(&d->bqueue_mtx); TAILQ_INSERT_TAIL(&d->bqueue, bq, queue); wakeup(d); mtx_unlock(&d->bqueue_mtx); } %%% Now, I don't know if this routine is run in an interrupt context, so I really wouldn't know if it is possible to remove the M_NOWAIT. Any thoughts? >How-To-Repeat: I wish I knew; I've tried running all of 'periodic daily', the backup routines (normally run at 4:00 AM), generating disk load using dd if={gvinum drive} of=/dev/null, all at the same time, and the box just copes (albeit slowly). However this is the 5th day in a row that it crashes during daily maintenance, so something else must also be triggering it. I'll update the PR if I can reproduce it more easily but for now it's "only" once a day... >Fix: Would love to have one. >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->le Responsible-Changed-By: le Responsible-Changed-When: Mon Nov 28 12:10:28 GMT 2005 Responsible-Changed-Why: Grab this one. http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 From: Lukas Ertl To: bug-followup@FreeBSD.org, Stijn Hoop Cc: Subject: Re: kern/89660: panic due to g_malloc returning null in gv_drive_done Date: Mon, 28 Nov 2005 13:14:36 +0100 (CET) On Mon, 28 Nov 2005, Stijn Hoop wrote: > bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO); <--- g_malloc > returns NULL > bq->bp = bp; > mtx_lock(&d->bqueue_mtx); > TAILQ_INSERT_TAIL(&d->bqueue, bq, queue); > wakeup(d); > mtx_unlock(&d->bqueue_mtx); > } > %%% > > Now, I don't know if this routine is run in an interrupt context, so > I really wouldn't know if it is possible to remove the M_NOWAIT. g_malloc() returning NULL when specifying M_NOWAIT is a serious issue, but I can't remove the M_NOWAIT since sleeping is not allowed in the geom I/O path. This may be not even a geom_vinum or geom problem. cheers, le -- Lukas Ertl http://homepage.univie.ac.at/l.ertl/ le@FreeBSD.org http://people.freebsd.org/~le/ From: Stijn Hoop To: Lukas Ertl Cc: bug-followup@FreeBSD.org Subject: Re: kern/89660: panic due to g_malloc returning null in gv_drive_done Date: Mon, 28 Nov 2005 14:08:25 +0100 On Mon, Nov 28, 2005 at 01:14:36PM +0100, Lukas Ertl wrote: > On Mon, 28 Nov 2005, Stijn Hoop wrote: > > bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO); <--- g_malloc > > returns NULL > > bq->bp = bp; > > mtx_lock(&d->bqueue_mtx); > > TAILQ_INSERT_TAIL(&d->bqueue, bq, queue); > > wakeup(d); > > mtx_unlock(&d->bqueue_mtx); > >} > >%%% > > > > Now, I don't know if this routine is run in an interrupt context, so > > I really wouldn't know if it is possible to remove the M_NOWAIT. > > g_malloc() returning NULL when specifying M_NOWAIT is a serious issue, but > I can't remove the M_NOWAIT since sleeping is not allowed in the geom I/O > path. > > This may be not even a geom_vinum or geom problem. OK, but where shall I look then? It would be very strange if the system runs out of memory because it has 1G RAM and as far as I can see, all processes are still running. Those would have to have been swapped out in order for the system to run out of memory I think? Do I need to reconfigure a memory pool somehow? I'd really appreciate a hint on how to debug this. As far as I can tell, the hardware is OK (I've been through 3 mainboards and lots of disks already). I'm willing to debug this further, but I'll need some instructions. Thanks in advance, --Stijn -- What would this sentence be like if it weren't self-referential? Adding to audit trail from misfiled PR kern/90100: Date: Thu, 8 Dec 2005 11:17:47 +0100 From: Stijn Hoop Hi, The previous patch had a minor problem on boot, because the drive init was also done somewhere else and I missed that one. The attached patch appears to work on a gvinum mirror volume. Being brave I decided to run this on the machine that panics on random nights. Let's hope it survives for > 1 week. If only I could reproduce the panic :( If this patch is bogus in some way please let me know; it certainly appears to work fine but as stated in my previous e-mail this is my first time hacking the kernel, I may certainly have missed some serious interaction bugs. --Stijn -- "Diane, 2:15 in the afternoon, November 14. Entering town of Twin Peaks. Five miles south of the Canadian border, twelve miles west of the state line. Never seen so many trees in my life. As W.C. Fields would say, I'd rather be here than Philadelphia." -- Special Agent Dale Cooper, "Twin Peaks" --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="gvinum-tested.patch" Index: geom_vinum_drive.c =================================================================== RCS file: /net/freebsd/cvsroot/src/sys/geom/vinum/geom_vinum_drive.c,v retrieving revision 1.18.2.2 diff -u -u -r1.18.2.2 geom_vinum_drive.c --- geom_vinum_drive.c 9 Oct 2005 04:34:29 -0000 1.18.2.2 +++ geom_vinum_drive.c 8 Dec 2005 09:28:01 -0000 @@ -79,6 +79,7 @@ d->freelist_entries = 1; TAILQ_INIT(&d->bqueue); + TAILQ_INIT(&d->reserve); mtx_init(&d->bqueue_mtx, "gv_drive", NULL, MTX_DEF); kthread_create(gv_drive_worker, d, NULL, 0, 0, "gv_d %s", d->name); d->flags |= GV_DRIVE_THREAD_ACTIVE; @@ -241,9 +242,11 @@ /* Put the BIO on the worker queue again. */ d = bp->bio_from->geom->softc; bp->bio_cflags |= GV_BIO_DONE; - bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO); - bq->bp = bp; + /* XXX: do not malloc but get the next reserve bioq */ mtx_lock(&d->bqueue_mtx); + bq = TAILQ_FIRST(&d->reserve); + TAILQ_REMOVE(&d->reserve, bq, queue); + bq->bp = bp; TAILQ_INSERT_TAIL(&d->bqueue, bq, queue); wakeup(d); mtx_unlock(&d->bqueue_mtx); @@ -361,6 +364,18 @@ mtx_lock(&d->bqueue_mtx); continue; } + + bq2 = g_malloc(sizeof(*bq2), M_WAITOK | M_ZERO); + if (bq2 == NULL) { + g_io_deliver(bp, ENOMEM); + mtx_lock(&d->bqueue_mtx); + continue; + } + + mtx_lock(&d->bqueue_mtx); + TAILQ_INSERT_HEAD(&d->reserve, bq2, queue); + mtx_unlock(&d->bqueue_mtx); + if (cbp->bio_offset + cbp->bio_length > s->size) cbp->bio_length = s->size - cbp->bio_offset; @@ -498,6 +513,7 @@ d->freelist_entries = 1; TAILQ_INIT(&d->bqueue); + TAILQ_INIT(&d->reserve); /* Save it into the main configuration. */ LIST_INSERT_HEAD(&sc->drives, d, drive); Index: geom_vinum_var.h =================================================================== RCS file: /net/freebsd/cvsroot/src/sys/geom/vinum/geom_vinum_var.h,v retrieving revision 1.8.2.1 diff -u -u -r1.8.2.1 geom_vinum_var.h --- geom_vinum_var.h 19 Aug 2005 08:48:04 -0000 1.8.2.1 +++ geom_vinum_var.h 7 Dec 2005 08:22:49 -0000 @@ -203,6 +203,9 @@ struct g_geom *geom; /* The geom of this drive. */ struct gv_softc *vinumconf; /* Pointer to the vinum conf. */ + + /* XXX: bioq gv_drive_done reserve, also protected by bqueue_mtx */ + TAILQ_HEAD(,gv_bioq) reserve; /* To avoid running out of mem */ }; /* softc for a subdisk. */ --Nq2Wo0NMKNjxTN9z-- Responsible-Changed-From-To: le->freebsd-geom Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 19 20:01:32 UTC 2008 Responsible-Changed-Why: With bugmeister hat on, reassign as le@ has not been active in a while. http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 State-Changed-From-To: open->analyzed State-Changed-By: lulf State-Changed-When: Wed May 21 17:51:56 UTC 2008 State-Changed-Why: - Looking into it. http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 From: Jaakko Heinonen To: bug-followup@FreeBSD.org, stijn@win.tue.nl Cc: Subject: Re: kern/89660: [vinum] [patch] [panic] due to g_malloc returning null in gv_drive_done Date: Thu, 25 Feb 2010 19:50:51 +0200 This has been probably fixed in r190507. Can you confirm? http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 State-Changed-From-To: analyzed->feedback State-Changed-By: jh State-Changed-When: Thu Feb 25 18:30:44 UTC 2010 State-Changed-Why: Note that submitter has been asked for feedback. Responsible-Changed-From-To: freebsd-geom->jh Responsible-Changed-By: jh Responsible-Changed-When: Thu Feb 25 18:30:44 UTC 2010 Responsible-Changed-Why: Track. http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 From: Stijn Hoop To: Jaakko Heinonen Cc: bug-followup@FreeBSD.org, stijn@win.tue.nl Subject: Re: kern/89660: [vinum] [patch] [panic] due to g_malloc returning null in gv_drive_done Date: Fri, 26 Feb 2010 08:46:14 +0100 Hi, On Thu, 25 Feb 2010 19:50:51 +0200 Jaakko Heinonen wrote: > This has been probably fixed in r190507. Can you confirm? > > http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 I'm sorry but the machine in question has been decommissioned and I'm not using FreeBSD anymore on other machines. It would be very hard for me to reproduce this bug, as it was not readily reproducable on the retired hardware back then. My suggestion would be to assume that the bug is no longer present. Considering the amount of changes in http://svn.freebsd.org/viewvc/base?view=revision&revision=190507 I'd say there's no real other option. Thanks for checking on this. Regards, -- Stijn Hoop Helpdeskmanager / Linux system administrator Faculty of Mathematics & Computer Science Eindhoven University of Technology tel. +31402475023 email s.hoop@tue.nl State-Changed-From-To: feedback->closed State-Changed-By: jh State-Changed-When: Fri Feb 26 11:21:04 UTC 2010 State-Changed-Why: Submitter and lulf@ consider this resolved. http://www.freebsd.org/cgi/query-pr.cgi?pr=89660 >Unformatted: