From ruben@erg.verweg.com Tue Sep 16 11:05:20 2008 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 155F1106567C for ; Tue, 16 Sep 2008 11:05:20 +0000 (UTC) (envelope-from ruben@erg.verweg.com) Received: from erg.verweg.com (erg.verweg.com [217.77.141.129]) by mx1.freebsd.org (Postfix) with ESMTP id 904E68FC23 for ; Tue, 16 Sep 2008 11:05:19 +0000 (UTC) (envelope-from ruben@erg.verweg.com) Received: from erg.verweg.com (erg.verweg.com [217.77.141.129]) by erg.verweg.com (8.14.3/8.14.2) with ESMTP id m8GAsn3E079296 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 16 Sep 2008 10:54:50 GMT (envelope-from ruben@erg.verweg.com) Received: (from ruben@localhost) by erg.verweg.com (8.14.3/8.14.2/Submit) id m8GAsnp9079295; Tue, 16 Sep 2008 12:54:49 +0200 (CEST) (envelope-from ruben) Message-Id: <200809161054.m8GAsnp9079295@erg.verweg.com> Date: Tue, 16 Sep 2008 12:54:49 +0200 (CEST) From: Ruben van Staveren Reply-To: Ruben van Staveren To: FreeBSD-gnats-submit@freebsd.org Cc: Subject: panic: Journal overflow on gmirrored gjournal X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 127420 >Category: kern >Synopsis: [geom] [gjournal] [panic] Journal overflow on gmirrored gjournal >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-geom >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Sep 16 11:10:02 UTC 2008 >Closed-Date: >Last-Modified: Thu Sep 16 18:57:40 UTC 2010 >Originator: Ruben van Staveren >Release: FreeBSD 7.1-PRERELEASE amd64 >Organization: >Environment: System: FreeBSD chassis 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #2: Tue Sep 16 11:29:52 CEST 2008 root@chassis:/opt/obj/usr/cvsup/7-stable/src/sys/CHASSIS-DEBUG amd64 >Description: Crash 1 panic: Journal overflow (joffset=180955342336 active=180735900160 inactive=180952868864) cpuid = 1 Uptime: 40m34s Physical memory: 4085 MB Dumping 625 MB: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x200 fault code = supervisor read instruction, page not present instruction pointer = 0x8:0x200 stack pointer = 0x10:0xffffffffae1ece40 frame pointer = 0x10:0xffffffffae1ece70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 47 (g_journal mirror/gm) trap number = 12 Crash 2 (with debug kernel) panic: Journal overflow (joffset=180542946816 active=181305220608 inactive=180542008320) cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17d g_journal_flush() at g_journal_flush+0x8cb g_journal_worker() at g_journal_worker+0x14ce fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffae1edd30, rbp = 0 --- panic: BUF_UNLOCK 0xffffffff9a26e220 while B_REMFREE is still set. cpuid = 1 panic: BUF_UNLOCK 0xffffffff9a04b420 while B_REMFREE is still set. cpuid = 1 Uptime: 20m24s Physical memory: 4084 MB Dumping 625 MB: Unfortunately, dumping doesn't succeed anymore at this stage Kernel config, the -DEBUG version just includes that file with as extra options: options BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_KDB options DIAGNOSTIC (I had to disable some KASSERTS in sys/geom/geom_io.c as gjournal may alter some data there it seems, also see http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-08/msg00648.html ) http://ruben.is.verweg.com/stuff/gjournal-panic/CHASSIS http://ruben.is.verweg.com/stuff/gjournal-panic/dmesg.boot The machine is a Sun X2100M2 with 2 x 250Gb SATA drives Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NOFAILSYNC GenID: 0 SyncID: 1 ID: 4042519102 Providers: 1. Name: mirror/gm0 Mediasize: 250055999488 (233G) Sectorsize: 512 Mode: r6w6e8 Consumers: 1. Name: ad4 Mediasize: 250056000000 (233G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: NONE GenID: 0 SyncID: 1 ID: 2820405034 2. Name: ad6 Mediasize: 250056000000 (233G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: NONE GenID: 0 SyncID: 1 ID: 933275518 Geom name: gjournal 243051746 ID: 243051746 Providers: 1. Name: mirror/gm0s1a.journal Mediasize: 3221224960 (3.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1a Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 4294966784 Jstart: 3221224960 Role: Data,Journal Geom name: gjournal 3027218344 ID: 3027218344 Providers: 1. Name: mirror/gm0s1d.journal Mediasize: 33285996032 (31G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1d Mediasize: 34359738368 (32G) Sectorsize: 512 Mode: r1w1e1 Jend: 34359737856 Jstart: 33285996032 Role: Data,Journal Geom name: gjournal 1964026446 ID: 1964026446 Providers: 1. Name: mirror/gm0s1e.journal Mediasize: 3221224960 (3.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1e Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 4294966784 Jstart: 3221224960 Role: Data,Journal Geom name: gjournal 3220754734 ID: 3220754734 Providers: 1. Name: mirror/gm0s1f.journal Mediasize: 7516192256 (7.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1f Mediasize: 8589934592 (8.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 8589934080 Jstart: 7516192256 Role: Data,Journal Geom name: gjournal 1120739874 ID: 1120739874 Providers: 1. Name: mirror/gm0s1g.journal Mediasize: 180255252480 (168G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1g Mediasize: 181328994816 (169G) Sectorsize: 512 Mode: r1w1e1 Jend: 181328994304 Jstart: 180255252480 Role: Data,Journal Name Status Components label/swap N/A mirror/gm0s1b ufs/root N/A mirror/gm0s1a.journal ufs/var N/A mirror/gm0s1d.journal ufs/tmp N/A mirror/gm0s1e.journal ufs/usr N/A mirror/gm0s1f.journal ufs/opt N/A mirror/gm0s1g.journal ******* Working on device /dev/ad4 ******* parameters extracted from in-core disklabel are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 488375937 (238464 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 703/ head 254/ sector 63 The data for partition 2 is: The data for partition 3 is: The data for partition 4 is: ******* Working on device /dev/ad6 ******* parameters extracted from in-core disklabel are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 488375937 (238464 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 703/ head 254/ sector 63 The data for partition 2 is: The data for partition 3 is: The data for partition 4 is: # /dev/mirror/gm0s1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 8388608 16 4.2BSD 2048 16384 28528 b: 33554432 8388624 swap c: 488375937 0 unused 0 0 # "raw" part, don't edit d: 67108864 41943056 4.2BSD 2048 16384 28528 e: 8388608 109051920 4.2BSD 2048 16384 28528 f: 16777216 117440528 4.2BSD 2048 16384 28528 g: 354158193 134217744 4.2BSD 2048 16384 28528 /dev/ufs/root on / (ufs, asynchronous, local, gjournal) devfs on /dev (devfs, local) /dev/ufs/opt on /opt (ufs, asynchronous, local, gjournal) /dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal) /dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal) /dev/ufs/var on /var (ufs, asynchronous, local, gjournal) >How-To-Repeat: on /opt/bonnie, run in parallel bonnie++ -c 4 -s 4096 -r 4096 -u nobody -d $PWD both bonnie processes will stall the system with suspfs/wdrain states until it panics. Also building a 1Gb sized nanobsd image will lock during disk install phase on suspfs/wdrain, but that is not always reproducable: it succeeds about 50% of the time. It looks it takes longer to trigger when using the debugging options. >Fix: Maybe don't run a mirrored gjournal on FreeBSD/amd64 ? >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Sep 17 15:16:09 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=127420 From: Ruben van Staveren To: bug-followup@FreeBSD.org, ruben@verweg.com Cc: Subject: Re: kern/127420: [gjournal] [panic] Journal overflow on gmirrored gjournal Date: Tue, 23 Sep 2008 11:29:59 +0200 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --Apple-Mail-41--958136030 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Hi, I managed to trigger a new panic, still not able to get a proper dump but this is a capture from the serial console. I was running a couple of bonnie++'s before to "exercise" the system. At the time of the panic one bonnie and one nanobsd build was running. I had enabled geom mirror and journal debug sysctl's Some minutes before the actual panic there was a complaint made by fsync, and gjournal not being able to suspend a filesystem. http://ruben.is.verweg.com/stuff/gjournal-panic/gjournal-textdump-text-only.txt Regards, Ruben --Apple-Mail-41--958136030 content-type: application/pgp-signature; x-mac-type=70674453; name=PGP.sig content-description: This is a digitally signed message part content-disposition: inline; filename=PGP.sig content-transfer-encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (Darwin) iD8DBQFI2LcXZ88+mcQxRw0RAkkyAJ9VD9cCfWfPusPWCM8sawG/WnVtHQCfXOV8 5Ipf+qF7c1I4JgOPRCHp8rs= =Mm/k -----END PGP SIGNATURE----- --Apple-Mail-41--958136030-- From: Alex Keda To: bug-followup@FreeBSD.org, ruben@verweg.com Cc: Subject: Re: kern/127420: [gjournal] [panic] Journal overflow on gmirrored gjournal Date: Thu, 28 May 2009 21:45:24 +0400 I have some problem on my system: HP$ uname -a FreeBSD HP.lissyara.su 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri May 22 22:14:24 MSD 2009 lissyara@HP.lissyara.su:/usr/obj/usr/src/sys/GENERIC amd64 HP$ For reproduce, just - make buildkernel. HP# gjournal list Geom name: gjournal 1458850558 ID: 1458850558 Providers: 1. Name: ad4s1a.journal Mediasize: 158913789440 (148G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad4s1a Mediasize: 158913789952 (148G) Sectorsize: 512 Mode: r1w1e1 Role: Data 2. Name: ad4s1d Mediasize: 129303552 (123M) Sectorsize: 512 Mode: r1w1e1 Jend: 129303040 Jstart: 0 Role: Journal HP# From: Spartak Radchenko To: bug-followup@FreeBSD.org, ruben@verweg.com Cc: Subject: Re: kern/127420: [gjournal] [panic] Journal overflow on gmirrored gjournal Date: Thu, 09 Jul 2009 18:22:55 +0400 I have the same problem. FreeBSD 7.2-RELEASE amd64, gjournal on gmirrored volume (local drive + geom_gate mirrored). I am trying to make something like a HA cluster using freevrrpd, ggate, gmirror and gjournal. It generally works, but every time a server with ggated running goes down (I use hardware reset for testing) first ggate0 device is removed from gmirrored volume on master as it should, next master panics with "gjournal overflow" message. Responsible-Changed-From-To: freebsd-fs->freebsd-geom Responsible-Changed-By: arundel Responsible-Changed-When: Thu Sep 16 18:56:53 UTC 2010 Responsible-Changed-Why: This one looks more geom than fs related. http://www.freebsd.org/cgi/query-pr.cgi?pr=127420 >Unformatted: