From nobody@FreeBSD.org Mon Jan 7 00:46:37 2002 Return-Path: Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 9634C37B41B for ; Mon, 7 Jan 2002 00:46:36 -0800 (PST) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id g078kao61432; Mon, 7 Jan 2002 00:46:36 -0800 (PST) (envelope-from nobody) Message-Id: <200201070846.g078kao61432@freefall.freebsd.org> Date: Mon, 7 Jan 2002 00:46:36 -0800 (PST) From: Karsten Thygesen To: freebsd-gnats-submit@FreeBSD.org Subject: Panic: vm_page_unwire: invalid wire count: 0 X-Send-Pr-Version: www-1.0 >Number: 33637 >Category: kern >Synopsis: Panic: vm_page_unwire: invalid wire count: 0 >Confidential: no >Severity: critical >Priority: high >Responsible: dillon >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jan 07 00:50:01 PST 2002 >Closed-Date: Sun Jul 14 17:58:45 PDT 2002 >Last-Modified: Sun Jul 14 17:58:45 PDT 2002 >Originator: Karsten Thygesen >Release: 4.5-prerelease (CVS pr. 2001-12-25) >Organization: Sonofon >Environment: FreeBSD abnew01.sonofon.dk 4.5-PRERELEASE FreeBSD 4.5-PRERELEASE #6: Wed Dec 26 00:58:04 CET 2001 root@abnew01.sonofon.dk:/usr/obj/usr/src/sys/ABNEW01 i386 >Description: Server crashes after 3-7 days of uptime. It's a 4 CPU Compaq Proliant server with 3GB memory and 1.8Tb scsi disks. It's running as (diablo) newsserver and is medium loaded. The error message is: panic: vm_page_unwire: invalid wire count: 0 mp_lock = 01000001; cpuid = 1; lapic.id = 00000000; boot() called on cpu#1 syncing disks... 234 36 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 giving up on 2 buffers Uptime: 3d23h57m20s Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#1 cpu_reset: Stopping other CPUs cpu_reset: Restarting BSP cpu_reset_proxy: Grabbed mp lock for BSP cpu_reset_proxy: Stopped CPU 1 >How-To-Repeat: It have happened 3 times now >Fix: None known >Release-Note: >Audit-Trail: From: "Ted Mittelstaedt" To: , Cc: Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 01:09:01 -0800 What is the history of this system? Has it run prior versions of FreeBSD without problems? Does this problem happen with a uniprocessor kernel? Ted Mittelstaedt tedm@toybox.placo.com From: Karsten Thygesen To: 'Ted Mittelstaedt' , freebsd-gnats-submit@FreeBSD.org, Karsten Thygesen Cc: Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 10:58:09 +0100 Hi The server is a news server in production. It was running INN as newsserver software for more than 6 months using FreeBSD-4.3-stable. Then I started to roll in diablo (also newsserver software) on the same server and then I started to see crashes with the same error message - daily!. I then updated to the latest 4.5 and the system was more stable again. I shut down INN completly and migrated 100% to diablo and now the system is running 3-7 days between crashes. I have not tried a uniprocesser kernel and as this is a production system, it's not that easy to try - further, I fear that a single cpu is enough for the current load. Karsten -----Original Message----- From: Ted Mittelstaedt [mailto:tedm@toybox.placo.com] Sent: Wednesday, January 09, 2002 10:09 AM To: freebsd-gnats-submit@FreeBSD.org; kay@sonofon.dk Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 What is the history of this system? Has it run prior versions of FreeBSD without problems? Does this problem happen with a uniprocessor kernel? Ted Mittelstaedt tedm@toybox.placo.com From: "Ted Mittelstaedt" To: "Karsten Thygesen" , Cc: Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 02:38:47 -0800 In summary: The server ran fine for 6 months using INN. The server is now broken running Diablo. Why is this a FreeBSD problem? If it was fine without Diablo, and crashes with Diablo, then the problem is Diablo!!!! I'd recommend that this PR be suspended until such time that the Diablo developers have had a chance to respond to this, and explain why the problem is FreeBSD when the FreeBSD server only started crashing after Diablo was run on it. Ted Mittelstaedt tedm@toybox.placo.com From: Peter Pentchev To: Ted Mittelstaedt Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 12:45:37 +0200 On Wed, Jan 09, 2002 at 02:40:02AM -0800, Ted Mittelstaedt wrote: > > In summary: > > The server ran fine for 6 months using INN. > > The server is now broken running Diablo. > > Why is this a FreeBSD problem? If it was fine without Diablo, and crashes > with > Diablo, then the problem is Diablo!!!! > > I'd recommend that this PR be suspended until such time that the Diablo > developers > have had a chance to respond to this, and explain why the problem is FreeBSD > when > the FreeBSD server only started crashing after Diablo was run on it. An application should not cause a kernel panic if it only uses the system calls documented in section 2 or the library functions documented in section 3 of the manual. I highly doubt that the Diablo developers are meddling with kernel structures directly, therefore it is indeed a FreeBSD problem if a kernel panic occurs. G'luck, Peter -- What would this sentence be like if pi were 3? From: Karsten Thygesen To: 'Ted Mittelstaedt' , Karsten Thygesen , freebsd-gnats-submit@FreeBSD.org Cc: Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 12:37:32 +0100 It is a freebsd problem as diablo runs as an ordinary user without special privileges. No user program should be able to triger a kernel fault unless it is the kernels fault, right? You can not blame this on diablo.... Karsten -----Original Message----- From: Ted Mittelstaedt [mailto:tedm@toybox.placo.com] Sent: Wednesday, January 09, 2002 11:39 AM To: Karsten Thygesen; freebsd-gnats-submit@FreeBSD.org Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 In summary: The server ran fine for 6 months using INN. The server is now broken running Diablo. Why is this a FreeBSD problem? If it was fine without Diablo, and crashes with Diablo, then the problem is Diablo!!!! I'd recommend that this PR be suspended until such time that the Diablo developers have had a chance to respond to this, and explain why the problem is FreeBSD when the FreeBSD server only started crashing after Diablo was run on it. Ted Mittelstaedt tedm@toybox.placo.com From: "Ted Mittelstaedt" To: "Peter Pentchev" Cc: Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Wed, 9 Jan 2002 04:22:39 -0800 Any application is able to cause a kernel panic with just regular library calls. Of course they shouldn't do it, but they can if they want. This is one reason login.conf exists. The Diablo support forum is a more appropriate place to start your troubleshooting. Matt Dillon (originator of Diablo) has done a lot of work in the FreeBSD virtual memory system and Diablo was originally developed on FreeBSD. Please, please, don't try to do an end-run around the Diablo support and development team, they really are your best resource for getting this fixed in a timely manner! Ted Mittelstaedt tedm@toybox.placo.com Responsible-Changed-From-To: freebsd-bugs->dillon Responsible-Changed-By: sheldonh Responsible-Changed-When: Wed Jan 9 05:53:49 PST 2002 Responsible-Changed-Why: Matt knows a thing or two about diablo _and_ the FreeBSD VM subsystem. :-) http://www.FreeBSD.org/cgi/query-pr.cgi?pr=33637 State-Changed-From-To: open->feedback State-Changed-By: sheldonh State-Changed-When: Wed Jan 9 06:03:44 PST 2002 State-Changed-Why: Please follow the advice given at the following web page to provide more detail: http://www.freebsd.org/FAQ/advanced.html#KERNEL-PANIC-TROUBLESHOOTING Please copy your feedback to , using the subject line of this message. http://www.FreeBSD.org/cgi/query-pr.cgi?pr=33637 From: Peter Pentchev To: Ted Mittelstaedt Cc: gnb@itga.com.au, bug-followup@FreeBSD.ORG Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Fri, 11 Jan 2002 03:58:41 +0200 On Fri, Jan 11, 2002 at 12:33:01AM -0800, Ted Mittelstaedt wrote: > >-----Original Message----- > >From: gnb@itga.com.au [mailto:gnb@itga.com.au] > >Sent: Thursday, January 10, 2002 2:42 PM > >To: Ted Mittelstaedt > >Cc: gnb@itga.com.au; freebsd-bugs@FreeBSD.ORG > >Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 > > > > > >> The submitter of the PR > >> was operating under a false assumption - that he could run any program and > >> assume that it was impossible for it to crash the OS. > > > >But I don't think that is a false assumtion. On the contrary, that > >is exactly > >what I would expect and hope to find from a "production quality" OS like > >FreeBSD. > > If you give a program life-and-death authority over the system and > the program crashes the system then that is hardly a bug in the system, > now is it? As Bill Fumerola said, running a process as root is not giving it life-or-death authority over the system. In my message about section 2 and 3 of the manual, I stated my opinion that Diablo probably does not go around fudging kernel structures directly - THAT would be giving it life-or-death authority. Part of the purpose of system and library calls is exactly to give the OS some opportunity to limit processes' ability to do damage by supplying incorrect data. > > > >IMO the submitter was being entirely reasonable in making that > >assumption - or > >at least, on finding a violation of that assumption, to report it > >and expect it > >to be treated as a bug. (Even if the response is "we know it's a > >bug and it's > >hard to fix, here's a workaround using login.conf".) > > > > But that WASN'T my response, re-read my response to the PR, I did > not tell him to fix his problem with login.conf. I merely pointed him > to it because he stated: > > "An application should not cause a kernel panic if it only uses > the system calls documented in section 2 or the library functions > documented in section 3 of the manual." > > which is obviously incorrect, and if he read the manpage to login.conf > he would have realized this. And just for the record, this was not posted by the submitter, it was posted by myself; just BTW, I like to consider myself a FreeBSD Project member, albeit only a meager ports committer, which would once more indicate that your opinion is not really shared by all of the Project's members :) About the login.conf thing - yes, I know that a forkbomb or an excessive memory allocation can crash FreeBSD. But - apparently unlike you - I consider that to be an OS bug. If a process (or processes) should decide to go haywire, the OS may be allowed to go down on its knees, slow down to a crawl, but it should NOT panic. Thus, I maintain my opinion that a userland process should not be able to panic the OS, and that, consequently, this PR points out a problem in FreeBSD that happens to be triggered by the Diablo code. G'luck, Peter -- You have, of course, just begun reading the sentence that you have just finished reading. From: "Ted Mittelstaedt" To: "Peter Pentchev" Cc: , Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Fri, 11 Jan 2002 09:26:31 -0800 >-----Original Message----- >From: Peter Pentchev [mailto:roam@ringlet.net] >Sent: Thursday, January 10, 2002 5:59 PM >To: Ted Mittelstaedt > >About the login.conf thing - yes, I know that a forkbomb or an excessive >memory allocation can crash FreeBSD. But - apparently unlike you - >I consider that to be an OS bug. If a process (or processes) should >decide to go haywire, the OS may be allowed to go down on its knees, >slow down to a crawl, but it should NOT panic. I don't know that there's much practical difference to the user between the system panicing and the system slowing to a crawl - both make the system unusable. I guess I'd say that if your consistent you should be arguing that if a process goes haywire the system shouldn't panic, it should remain unaffected. I also agree that this should be a design goal of FreeBSD but I assume that perfection is impossible to achieve. Therefore I allow that it's always going to be possible for an errant application program to crash the system. The difference between us is that I call that an application bug, you call that a kernel bug. >Thus, I maintain my >opinion that a userland process should not be able to panic the OS, >and that, consequently, this PR points out a problem in FreeBSD >that happens to be triggered by the Diablo code. > Of the various hypothesis I consider this to be the more likely although I think the trigger is a combination of things of which Diablo is the major part. But there's no guarentee that fixing the FreeBSD code is going to get the user going again because if Diablo has a bug in it that is the trigger then Diablo is still going to have a bug in it which still may erupt. This is why one of my first suggestions was to try it with a uniprocessor kernel which if the user was willing to do (he wasn't, reread the PR) might be the quickest bandaid fix, because if the problem only showed up in SMP mode then it would get him a stable Diablo server immediately. (It also would be useful info to the kernel developer) The user also admitted he didn't know if SMP was a requirement or not in his application. One of the cardinal rules of troubleshooting is to start by removing as much extraneous stuff as possible to break the system down into simple components and test them. Ted From: Jung-uk Kim To: freebsd-gnats-submit@FreeBSD.org, kay@sonofon.dk Cc: Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Tue, 16 Apr 2002 12:55:49 -0400 This patch fixed my problem. Can you try this? http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sys_pipe.c.diff?r1=1.60.2.11&r2=1.60.2.12&only_with_tag=RELENG_4&f=h From: Karsten Thygesen To: freebsd-gnats-submit@FreeBSD.org, kay@sonofon.dk Cc: Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0 Date: Fri, 31 May 2002 13:21:59 +0200 Hi The patch solves the problem. Have been stable for 6 weeks now! Thanks a lot! Karsten State-Changed-From-To: feedback->closed State-Changed-By: mp State-Changed-When: Sun Jul 14 17:57:45 PDT 2002 State-Changed-Why: The originator says the bug has been fixed. http://www.freebsd.org/cgi/query-pr.cgi?pr=33637 >Unformatted: