From nobody@FreeBSD.org Fri Aug 5 19:16:50 2005 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DDE3816A41F for ; Fri, 5 Aug 2005 19:16:50 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9986043D4C for ; Fri, 5 Aug 2005 19:16:50 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j75JGopS042429 for ; Fri, 5 Aug 2005 19:16:50 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id j75JGoUY042428; Fri, 5 Aug 2005 19:16:50 GMT (envelope-from nobody) Message-Id: <200508051916.j75JGoUY042428@www.freebsd.org> Date: Fri, 5 Aug 2005 19:16:50 GMT From: David Kirchner To: freebsd-gnats-submit@FreeBSD.org Subject: 5.4-STABLE unresponsive during background fsck 2TB partition X-Send-Pr-Version: www-2.3 >Number: 84589 >Category: kern >Synopsis: [2TB] 5.4-STABLE unresponsive during background fsck 2TB partition >Confidential: no >Severity: serious >Priority: medium >Responsible: jh >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Aug 05 19:20:11 GMT 2005 >Closed-Date: Sun Dec 12 09:24:56 UTC 2010 >Last-Modified: Sun Dec 12 09:24:56 UTC 2010 >Originator: David Kirchner >Release: FreeBSD 5.4-STABLE as of 20050729 >Organization: >Environment: FreeBSD host 5.4-STABLE FreeBSD 5.4-STABLE #0: Thu Aug 4 05:14:16 PDT 2005 root@:/usr/src/sys/i386/compile/STD i386 kernel config at: http://dpk.net/STD dmesg at: http://dpk.net/dmesg.fsck_problem >Description: After a large filesystem is marked dirty (due to a panic or a ^C'd fsck), and then a reboot, the background fsck starts. Approximately 1-2 minutes later the server slows down. Eventually, within about 5-10 minutes, all disk access attempts cease to function, and the server becomes unresponsive to even hitting return in bash. You can still ping the server, and if you connect to SSH it will still go through all the motions, right up until it is about to spawn login. This, even though the partition being fsck'd is not in use. As far as I can tell it will never recover. I've given it over 12 hours. It doesn't panic, unfortunately, or give any indication on the console why it is having trouble. fsck works fine when you run it from the command line, in the foreground. >How-To-Repeat: Install 5.4-STABLE on a multi-TB server, creating a 36GB / partition, and 1 or more 2TB partitions (you will need to use auto-carving). Use softupdates to format the large partitions. Use UFS1. Leave the large target partition completely empty. Unmount the target partition. Start "fsck /dev/whatever", and hit ^C part way through. Verify it says "FILE SYSTEM MARKED DIRTY". Reboot. Log in again to monitor the server. It will eventually stop responding to your commands. >Fix: Disable background fsck in /etc/rc.conf: background_fsck="NO" It may be that using UFS2 also fixes the problem (but we've had other issues with that, I'll open another PR when I can reproduce that). >Release-Note: >Audit-Trail: From: David Kirchner To: bug-followup@freebsd.org Cc: Subject: Re: i386/84589: 5.4-STABLE unresponsive during background fsck 2TB partition Date: Fri, 30 Dec 2005 10:07:28 -0800 This bug has been reproduced on a different server (similar hardware) running 6.0-RELEASE and UFS2. I accidentally forgot to disable background fscks on the server (big d'oh!) and about 12 hours after the server rebooted access to the disk started slowing down, eventually becoming completely unresponsive, forcing a reboot. The reboot took about 2 minutes to take effect, probably because the server was "busy" with the fsck. I was able to log in to it before it locked up, and tried ktrace'ing the fsck_ffs process. It had no activity. I suspect it deadlocked against something else. Unfortunately the server was a NFS server, so the NFS client also had to be rebooted due to a separate NFS client deadlock bug. The how-to-repeat is the same: That ^C fsck step is just to trigger a dirty filesystem. Really, really easy to duplicate. The workaround is the same: Disable background_fsck for all 5.4 or 6.0 servers (or for any servers capable of performing a background fsck). FWIW: The foreground fsck takes far less than 12 hours to complete. Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 18 04:33:51 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=84589 State-Changed-From-To: open->feedback State-Changed-By: jh State-Changed-When: Fri Oct 29 08:15:25 UTC 2010 State-Changed-Why: Is this still a problem for you? r184934 might have improved the snapshot creation on large file systems. Responsible-Changed-From-To: freebsd-fs->jh Responsible-Changed-By: jh Responsible-Changed-When: Fri Oct 29 08:15:25 UTC 2010 Responsible-Changed-Why: Track. http://www.freebsd.org/cgi/query-pr.cgi?pr=84589 State-Changed-From-To: feedback->closed State-Changed-By: jh State-Changed-When: Sun Dec 12 09:24:55 UTC 2010 State-Changed-Why: Feedback timeout. http://www.freebsd.org/cgi/query-pr.cgi?pr=84589 >Unformatted: