From root@wallenda.spg.more.net Mon Apr 2 16:44:53 2007 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 522E716A401 for ; Mon, 2 Apr 2007 16:44:53 +0000 (UTC) (envelope-from root@wallenda.spg.more.net) Received: from wallenda.spg.more.net (wallenda.spg.more.net [204.185.42.133]) by mx1.freebsd.org (Postfix) with ESMTP id 3341013C44C for ; Mon, 2 Apr 2007 16:44:53 +0000 (UTC) (envelope-from root@wallenda.spg.more.net) Received: by wallenda.spg.more.net (Postfix, from userid 0) id 75A4D5C3C; Mon, 2 Apr 2007 11:13:33 -0500 (CDT) Message-Id: <20070402161333.75A4D5C3C@wallenda.spg.more.net> Date: Mon, 2 Apr 2007 11:13:33 -0500 (CDT) From: dan@more.net Reply-To: dan@more.net To: FreeBSD-gnats-submit@freebsd.org Cc: dan@more.net Subject: fsck fails on 6T filesystem X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 111146 >Category: bin >Synopsis: [2tb] fsck(8) fails on 6T filesystem >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-fs >State: suspended >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Apr 02 16:50:00 GMT 2007 >Closed-Date: >Last-Modified: Mon May 18 04:33:27 UTC 2009 >Originator: Dan D Niles >Release: FreeBSD 6.2-RELEASE-p3 i386 >Organization: MOREnet - Missouri Research and Education Network >Environment: System: FreeBSD hostname 6.2-RELEASE-p3 FreeBSD 6.2-RELEASE-p3 #5: Wed Mar 28 07:44:39 CDT 2007 root@hostname:/usr/obj/usr/src/sys/BIG_MEM i386 >Description: I have a 6T filesystem on a server that crashed. I cannot fsck the filesystem. # fsck -t ufs -y /dev/da0 fsck_ufs: cannot alloc 1993797728 bytes for inoinfo I also tried: # fsck -t ufs -f -p /dev/da0 /dev/da0: UNKNOWN FILE TYPE I=11895232 /dev/da0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. I built a custom kernel with MAXDSIZ and DFLDSIZ just under 3G, and got the same results. It was at about 430M in use when it crashed, so the total would be 2332 M which is less that the size allowed (reported by limits). NOTE: I have temporarily replaced the server. For a short time I have the crashed filesystem available for testing and debugging code. I have a core dump from the fsck. >How-To-Repeat: On a 6T filesystem that has crashed, run: fsck -t ufs -y /dev/da0 >Fix: >Release-Note: >Audit-Trail: From: Astrodog To: bug-followup@FreeBSD.org, dan@more.net Cc: Subject: Re: bin/111146: fsck fails on 6T filesystem Date: Wed, 4 Apr 2007 08:13:57 -0500 How much memory do you have in this system? There is a minimum ammount of memory required to fsck large filesystems, I've found. --- Harrison Grundy From: Dan D Niles To: Astrodog Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6T filesystem Date: Wed, 04 Apr 2007 09:29:29 -0500 I only have 3G at the moment, but fsck is failing when the resulting memory usage would be 2.3G. I have MAXDSIZ and DFLDSIZE set to 2.8G. I have 2G of swap space, none of which gets used. I'm getting a little pressure to reformat the array. Is there any debugging you would like me to do? Thanks for your response, Dan D Niles From: Jan Srzednicki To: bug-followup@FreeBSD.org, dan@more.net Cc: Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Sun, 8 Apr 2007 21:24:55 +0200 Hi, First of all, show the output of both "ulimit -Sa" and "ulimit -Ha". It is possible that you may need to raise the soft limit manually. If the values are all right, try running fsck with strace/truss and show the result. -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta From: Dan D Niles To: Jan Srzednicki Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 09 Apr 2007 11:12:46 -0500 # ulimit -Sa core file size (blocks, -c) unlimited data seg size (kbytes, -d) 2935808 file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 11095 pipe size (512 bytes, -p) 1 stack size (kbytes, -s) 65536 cpu time (seconds, -t) unlimited max user processes (-u) 5547 virtual memory (kbytes, -v) unlimited # ulimit -Ha core file size (blocks, -c) unlimited data seg size (kbytes, -d) 2935808 file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 11095 pipe size (512 bytes, -p) 1 stack size (kbytes, -s) 65536 cpu time (seconds, -t) unlimited max user processes (-u) 5547 virtual memory (kbytes, -v) unlimited I've ordered a SCSI card to move the raid device to a server that I can bring up to 8G of ram. I'm hoping the card gets here before I need to give the array back. I'll run fsck with truss and see with I find out. Thanks, Dan From: Dan D Niles To: Jan Srzednicki Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 09 Apr 2007 11:27:10 -0500 On Sun, 2007-04-08 at 21:24 +0200, Jan Srzednicki wrote: > > If the values are all right, try running fsck with strace/truss and show > the result. > I added a debugging print statement to fsck_ffs, and sent it a SIGINFO every two seconds. Here is the tail of the output, and the tail of the truss output. It seems like it is allocation space for < 10k inodes at a time until it fails. When it fails it is trying to allocate space for 1.5g inodes. Is that normal? /dev/da0: phase 1: cyl group 2223 of 33666 (6%) Trying to calloc space for 2240 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 6208 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 768 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 4032 inodes Trying to calloc space for 6208 inodes Trying to calloc space for 1664 inodes /dev/da0: phase 1: cyl group 2252 of 33666 (6%) Trying to calloc space for 3584 inodes /dev/da0: phase 1: cyl group 2253 of 33666 (6%) Trying to calloc space for 448 inodes Trying to calloc space for 3648 inodes Trying to calloc space for 384 inodes Trying to calloc space for 4352 inodes Trying to calloc space for 384 inodes Trying to calloc space for 5376 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 384 inodes Trying to calloc space for 448 inodes Trying to calloc space for 1572191256 inodes fsck_ffs: cannot alloc 1993797728 bytes for inoinfo 919: break(0x22ab2000) = 0 (0x0) 1919: break(0x22ab3000) = 0 (0x0) 1919: lseek(4,0x6570640000,SEEK_SET) = 1885601792 (0x70640000) 1919: read(4,"\M-mA\^D\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536 (0x10000) 1919: lseek(4,0x657bdf0000,SEEK_SET) = 2078212096 (0x7bdf0000) 1919: read(4,"\0\0\0\0U\^B\t\0004\^[\^EF\M-V\b"...,16384) = 16384 (0x4000) 1919: write(1,"Trying to calloc space for 448 i"...,38) = 38 (0x26) 1919: lseek(4,0x657bdf4000,SEEK_SET) = 2078228480 (0x7bdf4000) 1919: read(4,"\M-mA\^B\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536 (0x10000) 1919: break(0x22ab4000) = 0 (0x0) 1919: lseek(4,0x657be04000,SEEK_SET) = 2078294016 (0x7be04000) 1919: read(4,"\0\0\0\0000\0\0\0000\0\0\0\0\0\0"...,65536) = 65536 (0x10000) 1919: lseek(4,0x65875b4000,SEEK_SET) = -2024062976 (0x875b4000) 1919: read(4,"\0\0\M-'\M-K,\M^H\M-:\M-Q*\^C\0"...,16384) = 16384 (0x4000) 1919: write(1,"Trying to calloc space for 15721"...,45) = 45 (0x2d) 1919: write(2,"fsck_ffs: ",10) = 10 (0xa) 1919: write(2,"cannot alloc 1993797728 bytes fo"...,41) = 41 (0x29) 1919: write(2,"\n",1) = 1 (0x1) 1919: exit(0x8) 1919: process exit, rval = 2048 From: Jan Srzednicki To: Dan D Niles Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 9 Apr 2007 21:48:52 +0200 > It seems like it is allocation space for < 10k inodes at a time until it > fails. When it fails it is trying to allocate space for 1.5g inodes. > Is that normal? Check with dumpfs how many inodes are there in your filesystem. -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta From: Dan D Niles To: Jan Srzednicki Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 09 Apr 2007 15:09:28 -0500 On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote: > Check with dumpfs how many inodes are there in your filesystem. dumpfs seg-faulted and dumped core. It spit out this info before core dumping: magic 19540119 (UFS2) time Wed Mar 28 14:00:00 2007 superblock location 65536 id [ 43d90071 e579e310 ] ncg 33666 size 3167475584 blocks 3067823920 bsize 16384 shift 14 mask 0xffffc000 fsize 2048 shift 11 mask 0xfffff800 frag 8 shift 3 fsbtodb 2 minfree 8% optim time symlinklen 120 maxbsize 16384 maxbpg 2048 maxcontig 8 contigsumsize 8 nbfree 159788467 ndir 2581658 nifree 784218256 nffree 1488762 bpg 11761 fpg 94088 ipg 23552 nindir 2048 inopb 64 maxfilesize 140806241583103 sbsize 2048 cgsize 16384 csaddr 3000 cssize 540672 sblkno 40 cblkno 48 iblkno 56 dblkno 3000 cgrotor 28218 fmod 0 ronly 0 clean 0 avgfpdir 64 avgfilesize 16384 flags unclean fsmnt /LSO volname swuid 0 cs[].cs_(nbfree,ndir,nifree,nffree): (4606,234,23288,6) (3955,223,23288,24) (80,0,23223,753) (3,226,23298,8) (16,87,23338,81) (3,227,23298,7) (2436,185,23340,19) (4330,891,21577,21) (3971,170,23288,6) (1967,186,23336,33) (1812,177,23342,48) (6639,199,233 24,50) (6084,236,23288,16) (5213,224,23300,16) (5211,232,23287,19) (6042,237,23 288,8) (5213,236,23288,11) (5213,237,23288,10) (6120,237,23288,59) (1363,226,23 298,219) (5193,235,23288,60) (4,227,23298,8) (3059,197,23298,30) (5218,199,23288, 9) (6137,363,22338,9) (5221,174,23288,9) (5213,200,23288,48) (4323,199,2328 8,42) [clipped] From: Jan Srzednicki To: Dan D Niles Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 9 Apr 2007 22:13:36 +0200 On Mon, Apr 09, 2007 at 03:09:28PM -0500, Dan D Niles wrote: > On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote: > > Check with dumpfs how many inodes are there in your filesystem. > > dumpfs seg-faulted and dumped core. It spit out this info before core > dumping: That's kinda strange, dumpfs never did that to me. It appears to me that this filesystem has got quite severely corrupted. Did you try newfs on it? And another thing: try tuning up the -i, -f and -b parameters to newfs. I assume that on such a big filesystem average filesize will be much bigger than the "UNIX default" (10k), so you can safely set these to their maximums (and allocate inodes more scarcely). -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta From: Dan D Niles To: Jan Srzednicki Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 09 Apr 2007 15:30:23 -0500 On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote: > That's kinda strange, dumpfs never did that to me. It appears to me > that > this filesystem has got quite severely corrupted. Did you try newfs on > it? Not yet. I'd like to figure out why I can't fsck it first. Running newfs on your backup disk is not a viable solution. There is data I cannot pull of the disk. If my primary storage had crashed also, I'd be hosed. > And another thing: try tuning up the -i, -f and -b parameters to > newfs. > I assume that on such a big filesystem average filesize will be much > bigger than the "UNIX default" (10k), so you can safely set these to > their maximums (and allocate inodes more scarcely). Running df reports 8683374 inodes used and 784218256 free. This could be wrong since the filesystem is dirty and mounted ro. FreeBSD's newfs scales things automatically, though perhaps not enough: tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time From: Jan Srzednicki To: Dan D Niles Cc: bug-followup@FreeBSD.org Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 9 Apr 2007 22:39:52 +0200 On Mon, Apr 09, 2007 at 03:30:23PM -0500, Dan D Niles wrote: > On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote: > > That's kinda strange, dumpfs never did that to me. It appears to me > > that > > this filesystem has got quite severely corrupted. Did you try newfs on > > it? > > Not yet. I'd like to figure out why I can't fsck it first. Running > newfs on your backup disk is not a viable solution. There is data I > cannot pull of the disk. If my primary storage had crashed also, I'd be > hosed. Well, you need to take into the account that your data may be hosed. Backup your primary storage NOW. :) > > And another thing: try tuning up the -i, -f and -b parameters to > > newfs. > > I assume that on such a big filesystem average filesize will be much > > bigger than the "UNIX default" (10k), so you can safely set these to > > their maximums (and allocate inodes more scarcely). > > Running df reports 8683374 inodes used and 784218256 free. This could > be wrong since the filesystem is dirty and mounted ro. > > FreeBSD's newfs scales things automatically, though perhaps not enough: It does not scale anything. Last time I checked (a few years ago) even the -g option did not make any difference either, so I had to tune things up manually with -i, -f and -b. > tunefs: maximum blocks per file in a cylinder group: (-e) 2048 > tunefs: average file size: (-f) 16384 > tunefs: average number of files in a directory: (-s) 64 > tunefs: minimum percentage of free space: (-m) 8% > tunefs: optimization preference: (-o) time These are the default values for any filesystem, regardles of it's size. -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta From: Dan D Niles To: bug-followup@FreeBSD.org Cc: Harrison Grundy , Jan Srzednicki Subject: Re: bin/111146: fsck fails on 6Tfilesystem Date: Mon, 16 Apr 2007 14:08:57 -0500 I attached the failed raid device to a newer server with 8G of RAM. I booted to an amd64 kernel, and set datasize limit to 7G. Resource limits (current): cputime infinity secs filesize infinity kB datasize 7340032 kB stacksize-cur 8192 kB coredumpsize infinity kB memoryuse-cur 8093236 kB memorylocked-cur 1299644 kB maxprocesses 6164 openfiles 12328 sbsize infinity bytes vmemoryuse infinity kB Now when I run fsck I get: ** /dev/da0 ** Last Mounted on /LSO ** Phase 1 - Check Blocks and Sizes fsck_ffs: bad inode number 53321728 to nextinode My theory is that some bits got flipped in the meta-data and cg_initediblk is getting a bad value. The value of 1,572,191,256 that it returns just before it fails is greater than the total number of inodes, which is around 784,218,256. It is distressing that some bits in the meta-data could get flipped during normal usage resulting in an unusable filesystem. I have 19 hours before I need to reformat the array and put it back into production. Is there anything else I should try before then? Thanks, Dan State-Changed-From-To: open->feedback State-Changed-By: linimon State-Changed-When: Wed Apr 25 22:28:39 UTC 2007 State-Changed-Why: To submitter: did the fsck fix this problem? Responsible-Changed-From-To: freebsd-bugs->linimon Responsible-Changed-By: linimon Responsible-Changed-When: Wed Apr 25 22:28:39 UTC 2007 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 State-Changed-From-To: feedback->suspended State-Changed-By: linimon State-Changed-When: Thu Apr 26 23:02:37 UTC 2007 State-Changed-Why: Submitter had to format the drive, so we can't duplicate this right now. Set this to 'suspended' to note that it is a problem that probably still needs investigating. http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 Responsible-Changed-From-To: linimon->freebsd-bugs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Jun 12 03:40:30 UTC 2007 Responsible-Changed-Why: Return this one to the pool. http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 18 04:33:11 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 >Unformatted: