From dan@kulesh.obluda.cz Sun Sep 2 22:27:07 2007 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D605616A418 for ; Sun, 2 Sep 2007 22:27:07 +0000 (UTC) (envelope-from dan@kulesh.obluda.cz) Received: from smtp1.kolej.mff.cuni.cz (smtp1.kolej.mff.cuni.cz [78.128.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id 8C18813C46B for ; Sun, 2 Sep 2007 22:27:07 +0000 (UTC) (envelope-from dan@kulesh.obluda.cz) Received: from kulesh.obluda.cz (openvpn.ms.mff.cuni.cz [195.113.20.87]) by smtp1.kolej.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id l82MQlZi085049 for ; Mon, 3 Sep 2007 00:26:53 +0200 (CEST) (envelope-from dan@kulesh.obluda.cz) Received: from kulesh.obluda.cz (localhost. [127.0.0.1]) by kulesh.obluda.cz (8.14.1/8.14.1) with ESMTP id l82MQlT3002156 for ; Mon, 3 Sep 2007 00:26:47 +0200 (CEST) (envelope-from dan@kulesh.obluda.cz) Received: (from root@localhost) by kulesh.obluda.cz (8.14.1/8.13.8/Submit) id l82MQlSu002155; Mon, 3 Sep 2007 00:26:47 +0200 (CEST) (envelope-from dan) Message-Id: <200709022226.l82MQlSu002155@kulesh.obluda.cz> Date: Mon, 3 Sep 2007 00:26:47 +0200 (CEST) From: Dan Lukes Reply-To: Dan Lukes To: FreeBSD-gnats-submit@freebsd.org Cc: Subject: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 116034 >Category: kern >Synopsis: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) >Confidential: no >Severity: serious >Priority: medium >Responsible: rwatson >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Sep 02 22:30:02 GMT 2007 >Closed-Date: Sat Jan 12 22:30:16 UTC 2008 >Last-Modified: Sat Jan 12 22:30:16 UTC 2008 >Originator: Dan Lukes >Release: FreeBSD 6.2-STABLE i386 >Organization: Obludarium >Environment: System: FreeBSD 6.2-STABLE (sources from Sep 2 21:30:25 CEST 2007) src/sys/netinet/tcp_sack.c,v 1.26.2.3 2007/06/12 19:21:54 jhb src/sys/netinet/tcp_subr.c,v 1.228.2.14 2006/12/30 17:58:46 jhb src/sys/kern/kern_mbuf.c,v 1.9.2.9 2007/02/11 03:31:18 mohans src/sys/netinet/tcp_input.c,v 1.281.2.13 2007/06/12 18:53:32 jhb Kernel compiled with IPSEC (=> MPSAFE network stack forced disabled; IPSEC require Giant) Kernel compiled with INVARIANTS and INVARIANT_SUPPORT so missing lock trigger panic NOTE: This doesn't apply for CURRENT, but may be significant for 6.3-RELEASE unless Giant dependency completely MFCed-out from network stack code >Description: tcp_clean_sackreport() called from tcp_drain() without Giant but calee require it when not on MPSAFE network stack The relevant code fragments (the INP_LOCK_ASSERT call is the point of panic): ----------------------- tcp_clean_sackreport(tp) struct tcpcb *tp; { int i; INP_LOCK_ASSERT(tp->t_inpcb); tp->rcv_numsacks = 0; for (i = 0; i < MAX_SACK_BLKS; i++) tp->sackblks[i].start = tp->sackblks[i].end=0; } ----------------------- #define INP_LOCK_ASSERT(inp) do { \ mtx_assert(&(inp)->inp_mtx, MA_OWNED); \ NET_ASSERT_GIANT(); \ } while (0) ----------------------- #define NET_ASSERT_GIANT() do { \ if (!debug_mpsafenet) \ mtx_assert(&Giant, MA_OWNED); \ } while (0) ----------------------- 1. debug->mpsafenet is forced to off because IPSEC compiled in kernel. 2. The INP_LOCK_ASSERT(tp->t_inpcb) test for Giant when mpsafenet off. 3. tcp_clean_sackreport called from tcp_drain called from mb_reclaim without Giant 4. ABEND on kernel compiled with INVARIANTS/INVARIANT_SUPPORT; possible race condition when compiled without INVARIANT =================================================== Kernel code backtrace: #2 0xc04b9b7d in panic (fmt=0xc064cd1e "mutex %s not owned at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc04b1e4f in _mtx_assert (m=0xc06a8880, what=0, file=0xc065a3bb "/usr/src/sys/netinet/tcp_sack.c", line=271) at /usr/src/sys/kern/kern_mutex.c:768 #4 0xc05480bb in tcp_clean_sackreport (tp=0xc49aa910) at /usr/src/sys/netinet/tcp_sack.c:271 #5 0xc0549927 in tcp_drain () at /usr/src/sys/netinet/tcp_subr.c:891 #6 0xc04b0a69 in mb_reclaim (junk=0x0) at /usr/src/sys/kern/kern_mbuf.c:557 #7 0xc05d6cd4 in vm_pageout_scan (pass=0) at /usr/src/sys/vm/vm_pageout.c:724 #8 0xc05d7eaf in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1554 >How-To-Repeat: Compile kernel with INVARIANTS/INVARIANT_SUPPORT + (IPSEC or set mpsafenet to 0) Do something that trigger mb_reclaim >Fix: Workaround is simple: net.inet.tcp.do_tcpdrain=0 Fix: add NET_LOCK_GIANT() / NET_UNLOCK_GIANT() into tcp_drain() unless someone smarter than me claim that Giant is NOT required here >Release-Note: >Audit-Trail: State-Changed-From-To: open->feedback State-Changed-By: kmacy State-Changed-When: Fri Nov 16 21:09:15 UTC 2007 State-Changed-Why: Does this still occur on RELENG_7? IPSEC has been replaced with FAST_IPSEC. http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 State-Changed-From-To: feedback->open State-Changed-By: kmacy State-Changed-When: Sun Nov 18 08:24:04 UTC 2007 State-Changed-Why: Feedback received. Probably still an issue, will follow up when I have time to set up IPSEC. Responsible-Changed-From-To: freebsd-bugs->kmacy Responsible-Changed-By: kmacy Responsible-Changed-When: Sun Nov 18 08:24:04 UTC 2007 Responsible-Changed-Why: Feedback received. Probably still an issue, will follow up when I have time to set up IPSEC to test. http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 Responsible-Changed-From-To: kmacy->rwatson Responsible-Changed-By: rwatson Responsible-Changed-When: Sat Nov 24 22:20:05 UTC 2007 Responsible-Changed-Why: Grab ownership of this as debug.mpsafenet is something I've spent some time working with. A casual glance seems to confirm that we should be conditionally acquiring Giant in the VM drain routine for the network stack. Not having Giant here probably isn't a big problem in practice (other than the INVARIANTS panics, of course) since the drain routine shouldn't lead to transmission paths via IPSEC, but it's definitely worth fixing. http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 From: Robert Watson To: Dan Lukes Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) Date: Tue, 4 Dec 2007 16:26:26 +0000 (GMT) On Mon, 3 Sep 2007, Dan Lukes wrote: > add NET_LOCK_GIANT() / NET_UNLOCK_GIANT() into tcp_drain() unless someone > smarter than me claim that Giant is NOT required here There's actually a slightly more general problem here--could you try this more general patch? I've requested to merge this to 6.3. Robert N M Watson Computer Laboratory University of Cambridge Index: kern_mbuf.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_mbuf.c,v retrieving revision 1.9.2.9 diff -u -r1.9.2.9 kern_mbuf.c --- kern_mbuf.c 11 Feb 2007 03:31:18 -0000 1.9.2.9 +++ kern_mbuf.c 4 Dec 2007 15:53:52 -0000 @@ -550,9 +550,11 @@ WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK | WARN_PANIC, NULL, "mb_reclaim()"); + NET_LOCK_GIANT(); mbstat.m_drain++; for (dp = domains; dp != NULL; dp = dp->dom_next) for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++) if (pr->pr_drain != NULL) (*pr->pr_drain)(); + NET_UNLOCK_GIANT(); } From: Eugene Grosbein To: bug-followup@freebsd.org Cc: Dan Lukes , Robert Watson Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) Date: Wed, 5 Dec 2007 01:00:46 +0700 Robert Watson wrote: > There's actually a slightly more general problem here--could you try this more > general patch? I've requested to merge this to 6.3. Hi! I'm experiencing the same panic for INVARIANTS-enabled kernel now. Really, it panices every time just after boot. Your patch solves the problem, thank you. Eugene Grosbein From: Dan Lukes To: Robert Watson Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) Date: Tue, 04 Dec 2007 20:40:56 +0100 Robert Watson napsal/wrote, On 12/04/07 17:26: > There's actually a slightly more general problem here--could you try > this more general patch? It should resolve the problem. I tried to test it, the system is going to panic, but I'm not sure the mb_reclaim() has been called. I inserted a printf just after NET_LOCK_GIANT() but never seen the text on console even the do_drain sysctl has been set to 1. Do you know a reliable way how to trigger vm_lowmem event ? I tried to allocate as much as possible memory, i tried to call EVENTHANDLER_INVOKE from kernel module, but it seems the mb_reclaim has not been invoked. Dan -- Dan Lukes SISAL MFF UK AKA: dan at obluda.cz, dan at freebsd.cz, dan at (kolej.)mff.cuni.cz From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/116034: commit references a PR Date: Wed, 5 Dec 2007 00:00:21 +0000 (UTC) rwatson 2007-12-05 00:00:10 UTC FreeBSD src repository Modified files: (Branch: RELENG_6) sys/kern kern_mbuf.c Log: Call NET_LOCK_GIANT/NET_UNLOCK_GIANT around calls to protocol drain methods in mb_reclaim(). This is not an MFC, as debug.mpsafenet is not present in 7.x or 8.x. In practice, this likely resulted in instability only on kernels with INVARIANTS, as the protocol drain paths are generally MPSAFE. PR: 116034 Reported by: Dan Lukes Discussed with: kmacy, alc Approved by: re (kensmith) Tested by: Eugene Grosbein Revision Changes Path 1.9.2.10 +2 -0 src/sys/kern/kern_mbuf.c _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" State-Changed-From-To: open->patched State-Changed-By: rwatson State-Changed-When: Wed Dec 5 00:11:31 UTC 2007 State-Changed-Why: Merged as kern_mbuf.c:1.9.2.10. http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 From: Robert Watson To: Dan Lukes Cc: bug-followup@freebsd.org Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) Date: Wed, 5 Dec 2007 00:20:54 +0000 (GMT) On Tue, 4 Dec 2007, Dan Lukes wrote: > The following reply was made to PR kern/116034; it has been noted by GNATS. > > From: Dan Lukes > To: Robert Watson > Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org > Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp) > Date: Tue, 04 Dec 2007 20:40:56 +0100 > > Robert Watson napsal/wrote, On 12/04/07 17:26: > > There's actually a slightly more general problem here--could you try > > this more general patch? > > It should resolve the problem. > > I tried to test it, the system is going to panic, but I'm not sure the > mb_reclaim() has been called. I inserted a printf just after > NET_LOCK_GIANT() but never seen the text on console even the do_drain sysctl > has been set to 1. > > Do you know a reliable way how to trigger vm_lowmem event ? I tried to > allocate as much as possible memory, i tried to call EVENTHANDLER_INVOKE > from kernel module, but it seems the mb_reclaim has not been invoked. Hmm. I'm not sure I've ever tried to cause it to occur directly. I guess the problem is that this is not easily tested by just hooking up mb_reclaim to a proc sysctl, as sysctl will hold Giant. You could have a sysctl proc that kicks off mb_reclaim in a task queue however. Robert N M Watson Computer Laboratory University of Cambridge From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/116034: commit references a PR Date: Wed, 5 Dec 2007 00:47:55 +0000 (UTC) rwatson 2007-12-05 00:47:48 UTC FreeBSD src repository Modified files: (Branch: RELENG_6_3) sys/kern kern_mbuf.c Log: Merge kern_mbuf.c:1.9.2.10 from RELENG_6 to RELENG_6_3: Call NET_LOCK_GIANT/NET_UNLOCK_GIANT around calls to protocol drain methods in mb_reclaim(). This is not an MFC, as debug.mpsafenet is not present in 7.x or 8.x. In practice, this likely resulted in instability only on kernels with INVARIANTS, as the protocol drain paths are generally MPSAFE. PR: 116034 Reported by: Dan Lukes Discussed with: kmacy, alc Approved by: re (kensmith) Tested by: Eugene Grosbein Revision Changes Path 1.9.2.9.2.1 +2 -0 src/sys/kern/kern_mbuf.c _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" State-Changed-From-To: patched->closed State-Changed-By: rwatson State-Changed-When: Sat Jan 12 22:29:11 UTC 2008 State-Changed-Why: Close PR -- the patch has been committed to 6-STABLE and will appear in FreeBSD 6.3-RELEASE. I've had an additional success report from Eugene Grosbein as well. http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 >Unformatted: