From sean@farley.org Mon Feb 21 01:14:54 2005 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FC4216A4CE for ; Mon, 21 Feb 2005 01:14:54 +0000 (GMT) Received: from mail.farley.org (farley.org [67.64.95.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9740B43D39 for ; Mon, 21 Feb 2005 01:14:53 +0000 (GMT) (envelope-from sean@farley.org) Received: from thor.farley.org (thor.farley.org [IPv6:2001:470:1f01:290:1::5]) by mail.farley.org (8.13.1/8.13.1) with ESMTP id j1L1Epgh061377 for ; Sun, 20 Feb 2005 19:14:51 -0600 (CST) (envelope-from sean@gw.farley.org) Received: from thor.farley.org (localhost [127.0.0.1]) by thor.farley.org (8.13.1/8.13.1) with ESMTP id j1L1F1Ce067882 for ; Sun, 20 Feb 2005 19:15:01 -0600 (CST) (envelope-from sean@thor.farley.org) Received: (from sean@localhost) by thor.farley.org (8.13.1/8.13.1/Submit) id j1L1F1jK067881; Sun, 20 Feb 2005 19:15:01 -0600 (CST) (envelope-from sean) Message-Id: <200502210115.j1L1F1jK067881@thor.farley.org> Date: Sun, 20 Feb 2005 19:15:01 -0600 (CST) From: Sean Farley Reply-To: Sean Farley To: FreeBSD-gnats-submit@freebsd.org Cc: Subject: GDB locks in wait4() when running applications X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 77818 >Category: gnu >Synopsis: GDB locks in wait4() when running applications >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Feb 21 01:20:17 GMT 2005 >Closed-Date: Sun Jun 19 11:46:05 GMT 2005 >Last-Modified: Sun Jul 3 00:31:13 GMT 2005 >Originator: Sean Farley >Release: FreeBSD 5.3-STABLE i386 >Organization: >Environment: System: FreeBSD thor.farley.org 5.3-STABLE FreeBSD 5.3-STABLE #0: Thu Feb 17 15:11:46 CST 2005 root@thor.farley.org:/usr/obj/usr/src/sys/THOR i386 >Description: Whenever I run an application through the system's GDB, GDB locks in wait4(). It does not matter if the application has debugging information or not. /bin/ls will lock GDB up for me until I type Ctrl-C. Two systems of mine exhibit this behavior. One has the binary nvidia driver with a lot of changes in libmap.conf. The other is headless without a libmap.conf. >How-To-Repeat: gdb /bin/ls >Fix: >Release-Note: >Audit-Trail: From: Greg 'groggy' Lehey To: Sean Farley Cc: FreeBSD-gnats-submit@FreeBSD.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 21 Feb 2005 14:12:45 +1030 --3O1VwFp74L81IIeR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sunday, 20 February 2005 at 19:15:01 -0600, Sean Farley wrote: > >> Synopsis: GDB locks in wait4() when running applications > >> Description: > > Whenever I run an application through the system's GDB, GDB locks in > wait4(). It does not matter if the application has debugging > information or not. /bin/ls will lock GDB up for me until I type > Ctrl-C. Is this an SMP system? > Two systems of mine exhibit this behavior. One has the binary > nvidia driver with a lot of changes in libmap.conf. The other is > headless without a libmap.conf. I've found something similar with SMP systems only. It wasn't as consistent as the way you describe, and I was able to work around the problem by turning off all but one CPU. See kern/77537 for more details. Greg -- See complete headers for address and phone numbers. --3O1VwFp74L81IIeR Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFCGVi1IubykFB6QiMRAn1DAKCHmVcC9GzWgPlsgJTQ7CgfvZ/fcQCgiBfS oTVYzEkdypz0f8S2Hbe5iY4= =AIOY -----END PGP SIGNATURE----- --3O1VwFp74L81IIeR-- From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: "Greg 'groggy' Lehey" Cc: FreeBSD-gnats-submit@FreeBSD.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 21 Feb 2005 10:24:17 -0600 (CST) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1375927335-1109003057=:73374 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 21 Feb 2005, Greg 'groggy' Lehey wrote: > On Sunday, 20 February 2005 at 19:15:01 -0600, Sean Farley wrote: >> >>> Synopsis: GDB locks in wait4() when running applications >> >>> Description: >> >> Whenever I run an application through the system's GDB, GDB locks in >> wait4(). It does not matter if the application has debugging >> information or not. /bin/ls will lock GDB up for me until I type >> Ctrl-C. > > Is this an SMP system? Neither system is SMP nor using HyperThreading. sysctl shows that the systems believe they only have one CPU (as expected). >> Two systems of mine exhibit this behavior. One has the binary nvidia >> driver with a lot of changes in libmap.conf. The other is headless >> without a libmap.conf. > > I've found something similar with SMP systems only. It wasn't as > consistent as the way you describe, and I was able to work around the > problem by turning off all but one CPU. See kern/77537 for more > details. It does sound similar. I wonder if it was something MFC'd from CURRENT, but I do not remember when it started hanging. Se=E1n --=20 sean-freebsd@farley.org --0-1375927335-1109003057=:73374-- From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: freebsd-gnats-submit@FreeBSD.org Cc: "Greg 'groggy' Lehey" Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 22 Feb 2005 15:06:19 -0600 (CST) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-852539414-1109106379=:24801 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE I have narrowed the issue down to a reproducible case. With Zsh 4.2.1 and 4.2.4, Zsh will hang when GDB runs the requested command (i.e. /bin/ls). The execution runs as: /usr/local/bin/zsh -c exec /bin/ls Besides Zsh being the shell used, the shell needs to run a backtick command in .zshenv. Example: TESTING=3D`date` I do not know why this only happens when called via GDB. Running the command from the command-line returns without any obvious problems. I was unable to reproduce this with /bin/tcsh or /bin/sh by adding the above setting to .login (or .tcshrc) or .profile respectively. With Linux, GDB and Zsh do not exhibit this problem, so I am unsure if it is Zsh's or FreeBSD's bug. Se=E1n --=20 sean-freebsd@farley.org --0-852539414-1109106379=:24801-- From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 18 Apr 2005 22:26:41 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-4531466-1113858140=:11216 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Content-ID: <20050418160331.P11216@thor.farley.org> I finally compiled Zsh with debugging and got the following trace. I ran it by the following line: LD_LIBRARY_PATH=3D/tmp/libc SHELL=3D/bin/sh gdb /usr/local/bin/zsh Program received signal SIGINT, Interrupt. 0x281a9efb in sigsuspend () from /lib/libc.so.5 (gdb) where #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 #1 0x280e07dc in signal_suspend (sig=3D20, sig2=3D2) at signals.c:367 #2 0x280b92e8 in waitforpid (pid=3D11195) at jobs.c:1120 #3 0x280a3dda in getoutput (cmd=3D0x8060e5c "uname -s", qt=3D1) at exec.c:= 2869 #4 0x280e2a0c in stringsubst (list=3D0xbfbfe7c0, node=3D0xbfbfe7d0, ssub= =3D4, asssub=3D0) at subst.c:189 #5 0x280e2378 in prefork (list=3D0xbfbfe7c0, flags=3D6) at subst.c:74 #6 0x280a0886 in addvars (state=3D0xbfbfe880, pc=3D0xbfbfe7c0, export=3D0) at exec.c:1614 #7 0x2809ec5d in execsimple (state=3D0x1) at exec.c:802 #8 0x2809edb7 in execlist (state=3D0xbfbfe880, dont_change_job=3D0, exitin= g=3D0) at exec.c:855 #9 0x2809eb9a in execode (p=3D0x80608e0, dont_change_job=3D0, exiting=3D0) at exec.c:775 #10 0x280b3a8a in loop (toplevel=3D0, justonce=3D0) at init.c:165 #11 0x280b58b8 in source (s=3D0xbfbfe950 "/root/.zshenv") at init.c:1043 #12 0x280b5b1c in sourcehome (s=3D0x280f458d ".zshenv") at init.c:1088 #13 0x280b54e7 in run_init_scripts () at init.c:937 #14 0x280b63da in zsh_main (argc=3D1, argv=3D0xbfbfeac0) at init.c:1262 #15 0x08048583 in main (argc=3D1, argv=3D0xbfbfeac0) at ./main.c:93 I hope this can help find the problem. Se=E1n --=20 sean-freebsd@farley.org --0-4531466-1113858140=:11216-- From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 19 Apr 2005 16:37:15 +0800 I have commited a fix to -CURRENT, don't know if it will fix the problem you have seen, but it is worth to try. David Xu Seán C. Farley wrote: > I finally compiled Zsh with debugging and got the following trace. I > ran it by the following line: > LD_LIBRARY_PATH=/tmp/libc SHELL=/bin/sh gdb /usr/local/bin/zsh > > Program received signal SIGINT, Interrupt. > 0x281a9efb in sigsuspend () from /lib/libc.so.5 > (gdb) where > #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 > #1 0x280e07dc in signal_suspend (sig=20, sig2=2) at signals.c:367 > #2 0x280b92e8 in waitforpid (pid=11195) at jobs.c:1120 > #3 0x280a3dda in getoutput (cmd=0x8060e5c "uname -s", qt=1) at > exec.c:2869 > #4 0x280e2a0c in stringsubst (list=0xbfbfe7c0, node=0xbfbfe7d0, ssub=4, > asssub=0) at subst.c:189 > #5 0x280e2378 in prefork (list=0xbfbfe7c0, flags=6) at subst.c:74 > #6 0x280a0886 in addvars (state=0xbfbfe880, pc=0xbfbfe7c0, export=0) > at exec.c:1614 > #7 0x2809ec5d in execsimple (state=0x1) at exec.c:802 > #8 0x2809edb7 in execlist (state=0xbfbfe880, dont_change_job=0, > exiting=0) > at exec.c:855 > #9 0x2809eb9a in execode (p=0x80608e0, dont_change_job=0, exiting=0) > at exec.c:775 > #10 0x280b3a8a in loop (toplevel=0, justonce=0) at init.c:165 > #11 0x280b58b8 in source (s=0xbfbfe950 "/root/.zshenv") at init.c:1043 > #12 0x280b5b1c in sourcehome (s=0x280f458d ".zshenv") at init.c:1088 > #13 0x280b54e7 in run_init_scripts () at init.c:937 > #14 0x280b63da in zsh_main (argc=1, argv=0xbfbfeac0) at init.c:1262 > #15 0x08048583 in main (argc=1, argv=0xbfbfeac0) at ./main.c:93 > > I hope this can help find the problem. > > Seán From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 25 Apr 2005 14:17:28 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-583420044-1114456648=:25522 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 19 Apr 2005, David Xu wrote: > I have commited a fix to -CURRENT, don't know if it will fix the > problem you have seen, but it is worth to try. > > David Xu > > Se=E1n C. Farley wrote: > >> I finally compiled Zsh with debugging and got the following trace. I >> ran it by the following line: >> LD_LIBRARY_PATH=3D/tmp/libc SHELL=3D/bin/sh gdb /usr/local/bin/zsh >>=20 >> Program received signal SIGINT, Interrupt. >> 0x281a9efb in sigsuspend () from /lib/libc.so.5 >> (gdb) where >> #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 >> #1 0x280e07dc in signal_suspend (sig=3D20, sig2=3D2) at signals.c:367 >> #2 0x280b92e8 in waitforpid (pid=3D11195) at jobs.c:1120 >> #3 0x280a3dda in getoutput (cmd=3D0x8060e5c "uname -s", qt=3D1) at exec= =2Ec:2869 >> #4 0x280e2a0c in stringsubst (list=3D0xbfbfe7c0, node=3D0xbfbfe7d0, ssu= b=3D4, >> asssub=3D0) at subst.c:189 >> #5 0x280e2378 in prefork (list=3D0xbfbfe7c0, flags=3D6) at subst.c:74 >> #6 0x280a0886 in addvars (state=3D0xbfbfe880, pc=3D0xbfbfe7c0, export= =3D0) >> at exec.c:1614 >> #7 0x2809ec5d in execsimple (state=3D0x1) at exec.c:802 >> #8 0x2809edb7 in execlist (state=3D0xbfbfe880, dont_change_job=3D0, exi= ting=3D0) >> at exec.c:855 >> #9 0x2809eb9a in execode (p=3D0x80608e0, dont_change_job=3D0, exiting= =3D0) >> at exec.c:775 >> #10 0x280b3a8a in loop (toplevel=3D0, justonce=3D0) at init.c:165 >> #11 0x280b58b8 in source (s=3D0xbfbfe950 "/root/.zshenv") at init.c:1043 >> #12 0x280b5b1c in sourcehome (s=3D0x280f458d ".zshenv") at init.c:1088 >> #13 0x280b54e7 in run_init_scripts () at init.c:937 >> #14 0x280b63da in zsh_main (argc=3D1, argv=3D0xbfbfeac0) at init.c:1262 >> #15 0x08048583 in main (argc=3D1, argv=3D0xbfbfeac0) at ./main.c:93 >>=20 >> I hope this can help find the problem. Would you happen to have a patch to -STABLE for the fix? I looked at some of the code you committed recently and was unsure of what fix(es) needed to be made to test it. A few of the changes did not look like they matched easily to -STABLE. Thank you. Se=E1n --=20 sean-freebsd@farley.org --0-583420044-1114456648=:25522-- From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 26 Apr 2005 07:27:31 +0800 Seán C. Farley wrote: > > Would you happen to have a patch to -STABLE for the fix? I looked at > some of the code you committed recently and was unsure of what fix(es) > needed to be made to test it. A few of the changes did not look like > they matched easily to -STABLE. > > Thank you. > > Seán It was MFCed onto -STABLE. From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 26 Apr 2005 11:23:02 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-195141191-1114532582=:823 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 26 Apr 2005, David Xu wrote: > It was MFCed onto -STABLE. I had updated after your last e-mail but had seen no change. I updated again this morning (changes to kern_exit.c and kern_sig.c) and am still seeing the GDB problem with Zsh. Is this a separate problem? Se=E1n --=20 sean-freebsd@farley.org --0-195141191-1114532582=:823-- From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Wed, 27 Apr 2005 08:52:34 +0800 Seán C. Farley wrote: > On Tue, 26 Apr 2005, David Xu wrote: > >> It was MFCed onto -STABLE. > > > I had updated after your last e-mail but had seen no change. > > I updated again this morning (changes to kern_exit.c and kern_sig.c) and > am still seeing the GDB problem with Zsh. > > Is this a separate problem? > > Seán Yes, I think it is a separated problem. From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: bug-followup@FreeBSD.org, sean-freebsd@farley.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Fri, 13 May 2005 00:47:41 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1997951983-1115963261=:7235 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE I think I may have found the issue or at least an issue. It has to do with signal suspensions not being copied from the initial process down to a grandchild. This causes the parent to miss the SIGCHLD when the process exits too quickly. I have an example program[1] to illustrate the problem. This program does work as expected on FreeBSD-4.10 and Linux. Se=E1n 1. http://www.farley.org/freebsd/tmp/grandparent.c --=20 sean-freebsd@farley.org --0-1997951983-1115963261=:7235-- From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: bug-followup@FreeBSD.org, sean-freebsd@farley.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Fri, 20 May 2005 13:49:13 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-116671042-1116614953=:42991 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE I am using 5.4-STABLE with sources from May 15th's: FreeBSD thor.farley.org 5.4-STABLE FreeBSD 5.4-STABLE #0: Sun May 15 01:46:56 CDT 2005 root@thor.farley.org:/usr/obj/usr/src/sys/THOR i386 I have a new test program[1] that I think shows the problem. My previous program actually showed another bug that has since been fixed. Actually, it shows two problems. When run within the debugger (SHELL=3D/bin/sh gdb a.out), the parent will be stuck waiting for a signal it will never receive in sigsuspend(). The other problem is that nanosleep() is exiting immediately with a return of zero although the time to sleep has not passed. To see it do this, remove the BROKEN_NANOSLEEP_WITHIN_GDB definition at the top of the program and recompile. Se=E1n 1. http://www.farley.org/freebsd/tmp/parent.c --=20 sean-freebsd@farley.org --0-116671042-1116614953=:42991-- From: David Xu To: bug-followup@freebsd.org, sean-freebsd@farley.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Wed, 25 May 2005 14:08:51 +0800 > > > I have a new test program[1] that I think shows the problem. My > previous program actually showed another bug that has since been fixed. > Actually, it shows two problems. When run within the debugger > (SHELL=3D/bin/sh gdb a.out), the parent will be stuck waiting for a signal > it will never receive in sigsuspend(). > Please try following patch, I believe the old hack is incorrect now with jhb's sleep queue. Index: kern_thread.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_thread.c,v retrieving revision 1.215 diff -u -r1.215 kern_thread.c --- kern_thread.c 23 Apr 2005 02:32:31 -0000 1.215 +++ kern_thread.c 25 May 2005 06:01:00 -0000 @@ -929,14 +929,6 @@ p->p_suspcount++; TD_SET_SUSPENDED(td); TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq); - /* - * Hack: If we are suspending but are on the sleep queue - * then we are in msleep or the cv equivalent. We - * want to look like we have two Inhibitors. - * May already be set.. doesn't matter. - */ - if (TD_ON_SLEEPQ(td)) - TD_SET_SLEEPING(td); } void > The other problem is that nanosleep() is exiting immediately with a > return of zero although the time to sleep has not passed. To see it do > this, remove the BROKEN_NANOSLEEP_WITHIN_GDB definition at the top of > the program and recompile. > This is a long history bug, I believe it is still in RELENG_4. Now the bug is in kern_sig.c: do_tdsignal(), when process is being debugged, a masked signal can wake up a sleeping thread! that's why it is broken. David Xu From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: bug-followup@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Wed, 25 May 2005 10:41:14 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-237620714-1117035674=:1320 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 25 May 2005, David Xu wrote: >> I have a new test program[1] that I think shows the problem. My >> previous program actually showed another bug that has since been >> fixed. Actually, it shows two problems. When run within the >> debugger (SHELL=3D3D/bin/sh gdb a.out), the parent will be stuck >> waiting for a signal it will never receive in sigsuspend(). >>=20 > > Please try following patch, I believe the old hack is incorrect now > with jhb's sleep queue. Thank you, thank you! That fixes my bug. >> The other problem is that nanosleep() is exiting immediately with a >> return of zero although the time to sleep has not passed. To see it >> do this, remove the BROKEN_NANOSLEEP_WITHIN_GDB definition at the top >> of the program and recompile. >> > > This is a long history bug, I believe it is still in RELENG_4. Now the > bug is in kern_sig.c: do_tdsignal(), when process is being debugged, a > masked signal can wake up a sleeping thread! that's why it is broken. Is it fixable? Should I open a PR for it, or is there one already? Se=E1n --=20 sean-freebsd@farley.org --0-237620714-1117035674=:1320-- From: David Xu To: =?UTF-8?B?IlNlw6FuIEMuIEZhcmxleSI=?= Cc: bug-followup@freebsd.org Subject: RE: gnu/77818: GDB locks in wait4() when running applications Date: Thu, 26 May 2005 11:54:29 +0800 -----Original Message----- From: Se¨¢n C. Farley [mailto:sean-freebsd@farley.org] Sent: 2005å¹´5月25æ—¥ 23:41 To: David Xu Cc: bug-followup@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications >> The other problem is that nanosleep() is exiting immediately with a >> return of zero although the time to sleep has not passed. To see it >> do this, remove the BROKEN_NANOSLEEP_WITHIN_GDB definition at the top >> of the program and recompile. >> > > This is a long history bug, I believe it is still in RELENG_4. Now the > bug is in kern_sig.c: do_tdsignal(), when process is being debugged, a > masked signal can wake up a sleeping thread! that's why it is broken. Is it fixable? Should I open a PR for it, or is there one already? Seán -- sean-freebsd@farley.org ------------------------------- It is fixable, I am working on it. please keep this PR. David Xu From: David Xu To: bug-followup@freebsd.org, sean-freebsd@farley.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Fri, 03 Jun 2005 13:14:29 +0800 Please try following patch, this patch fixes nanosleep problem, the patch just removed some unnecessary code. Index: kern_sig.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.305 diff -u -r1.305 kern_sig.c --- kern_sig.c 19 Apr 2005 08:11:28 -0000 1.305 +++ kern_sig.c 3 Jun 2005 05:05:20 -0000 @@ -1690,34 +1690,25 @@ } /* - * If proc is traced, always give parent a chance; - * if signal event is tracked by procfs, give *that* - * a chance, as well. + * If the signal is being ignored, + * then we forget about it immediately. + * (Note: we don't set SIGCONT in ps_sigignore, + * and if it is set to SIG_IGN, + * action will be SIG_DFL here.) */ - if ((p->p_flag & P_TRACED) || (p->p_stops & S_SIG)) { - action = SIG_DFL; - } else { - /* - * If the signal is being ignored, - * then we forget about it immediately. - * (Note: we don't set SIGCONT in ps_sigignore, - * and if it is set to SIG_IGN, - * action will be SIG_DFL here.) - */ - mtx_lock(&ps->ps_mtx); - if (SIGISMEMBER(ps->ps_sigignore, sig) || - (p->p_flag & P_WEXIT)) { - mtx_unlock(&ps->ps_mtx); - return; - } - if (SIGISMEMBER(td->td_sigmask, sig)) - action = SIG_HOLD; - else if (SIGISMEMBER(ps->ps_sigcatch, sig)) - action = SIG_CATCH; - else - action = SIG_DFL; + mtx_lock(&ps->ps_mtx); + if (SIGISMEMBER(ps->ps_sigignore, sig) || + (p->p_flag & P_WEXIT)) { mtx_unlock(&ps->ps_mtx); + return; } + if (SIGISMEMBER(td->td_sigmask, sig)) + action = SIG_HOLD; + else if (SIGISMEMBER(ps->ps_sigcatch, sig)) + action = SIG_CATCH; + else + action = SIG_DFL; + mtx_unlock(&ps->ps_mtx); if (prop & SA_CONT) { SIG_STOPSIGMASK(p->p_siglist); @@ -1866,14 +1857,16 @@ * Mutexes are short lived. Threads waiting on them will * hit thread_suspend_check() soon. */ - } else if (p->p_state == PRS_NORMAL) { - if ((p->p_flag & P_TRACED) || (action != SIG_DFL) || - !(prop & SA_STOP)) { + } else if (p->p_state == PRS_NORMAL) { + if (p->p_flag & P_TRACED || action == SIG_CATCH) { mtx_lock_spin(&sched_lock); tdsigwakeup(td, sig, action); mtx_unlock_spin(&sched_lock); goto out; } + + MPASS(action == SIG_DFL); + if (prop & SA_STOP) { if (p->p_flag & P_PPWAIT) goto out; @@ -1959,34 +1952,26 @@ if ((td->td_flags & TDF_SINTR) == 0) return; /* - * Process is sleeping and traced. Make it runnable - * so it can discover the signal in issignal() and stop - * for its parent. + * If SIGCONT is default (or ignored) and process is + * asleep, we are finished; the process should not + * be awakened. */ - if (p->p_flag & P_TRACED) { - p->p_flag &= ~P_STOPPED_TRACE; - } else { - /* - * If SIGCONT is default (or ignored) and process is - * asleep, we are finished; the process should not - * be awakened. - */ - if ((prop & SA_CONT) && action == SIG_DFL) { - SIGDELSET(p->p_siglist, sig); - /* - * It may be on either list in this state. - * Remove from both for now. - */ - SIGDELSET(td->td_siglist, sig); - return; - } - + if ((prop & SA_CONT) && action == SIG_DFL) { + SIGDELSET(p->p_siglist, sig); /* - * Give low priority threads a better chance to run. + * It may be on either list in this state. + * Remove from both for now. */ - if (td->td_priority > PUSER) - sched_prio(td, PUSER); + SIGDELSET(td->td_siglist, sig); + return; } + + /* + * Give low priority threads a better chance to run. + */ + if (td->td_priority > PUSER) + sched_prio(td, PUSER); + sleepq_abort(td); } else { /* From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: bug-followup@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Fri, 3 Jun 2005 13:01:06 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1475645835-1117821666=:2420 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 3 Jun 2005, David Xu wrote: > Please try following patch, this patch fixes nanosleep problem, > the patch just removed some unnecessary code. > > Index: kern_sig.c Thank you. That fixed the nanosleep() bug for me. I had to apply the patch by hand since it was against 6-CURRENT. Also, I unsquished some comments. Here is the patch against 5-STABLE: -------------------------------------------------------------- --- kern_sig.c.orig=09Fri Jun 3 07:32:03 2005 +++ kern_sig.c=09Fri Jun 3 07:50:56 2005 @@ -1689,34 +1689,23 @@ =09} =09/* -=09 * If proc is traced, always give parent a chance; -=09 * if signal event is tracked by procfs, give *that* -=09 * a chance, as well. +=09 * If the signal is being ignored, then we forget about it immediately. +=09 * (Note: we don't set SIGCONT in ps_sigignore, and if it is set to +=09 * SIG_IGN, action will be SIG_DFL here.) =09 */ -=09if ((p->p_flag & P_TRACED) || (p->p_stops & S_SIG)) { -=09=09action =3D SIG_DFL; -=09} else { -=09=09/* -=09=09 * If the signal is being ignored, -=09=09 * then we forget about it immediately. -=09=09 * (Note: we don't set SIGCONT in ps_sigignore, -=09=09 * and if it is set to SIG_IGN, -=09=09 * action will be SIG_DFL here.) -=09=09 */ -=09=09mtx_lock(&ps->ps_mtx); -=09=09if (SIGISMEMBER(ps->ps_sigignore, sig) || -=09=09 (p->p_flag & P_WEXIT)) { -=09=09=09mtx_unlock(&ps->ps_mtx); -=09=09=09return; -=09=09} -=09=09if (SIGISMEMBER(td->td_sigmask, sig)) -=09=09=09action =3D SIG_HOLD; -=09=09else if (SIGISMEMBER(ps->ps_sigcatch, sig)) -=09=09=09action =3D SIG_CATCH; -=09=09else -=09=09=09action =3D SIG_DFL; +=09mtx_lock(&ps->ps_mtx); +=09if (SIGISMEMBER(ps->ps_sigignore, sig) || +=09 (p->p_flag & P_WEXIT)) { =09=09mtx_unlock(&ps->ps_mtx); +=09=09return; =09} +=09if (SIGISMEMBER(td->td_sigmask, sig)) +=09=09action =3D SIG_HOLD; +=09else if (SIGISMEMBER(ps->ps_sigcatch, sig)) +=09=09action =3D SIG_CATCH; +=09else +=09=09action =3D SIG_DFL; +=09mtx_unlock(&ps->ps_mtx); =09if (prop & SA_CONT) { =09=09SIG_STOPSIGMASK(p->p_siglist); @@ -1865,14 +1854,16 @@ =09=09 * Mutexes are short lived. Threads waiting on them will =09=09 * hit thread_suspend_check() soon. =09=09 */ -=09} else if (p->p_state =3D=3D PRS_NORMAL) { -=09=09if ((p->p_flag & P_TRACED) || (action !=3D SIG_DFL) || -=09=09=09!(prop & SA_STOP)) { +=09} else if (p->p_state =3D=3D PRS_NORMAL) { +=09=09if ((p->p_flag & P_TRACED) || action =3D=3D SIG_CATCH) { =09=09=09mtx_lock_spin(&sched_lock); =09=09=09tdsigwakeup(td, sig, action); =09=09=09mtx_unlock_spin(&sched_lock); =09=09=09goto out; =09=09} + +=09=09MPASS(action =3D=3D SIG_DFL); + =09=09if (prop & SA_STOP) { =09=09=09if (p->p_flag & P_PPWAIT) =09=09=09=09goto out; @@ -1955,35 +1946,27 @@ =09=09 */ =09=09if ((td->td_flags & TDF_SINTR) =3D=3D 0) =09=09=09return; + =09=09/* -=09=09 * Process is sleeping and traced. Make it runnable -=09=09 * so it can discover the signal in issignal() and stop -=09=09 * for its parent. +=09=09 * If SIGCONT is default (or ignored) and process is asleep, we +=09=09 * are finished; the process should not be awakened. =09=09 */ -=09=09if (p->p_flag & P_TRACED) { -=09=09=09p->p_flag &=3D ~P_STOPPED_TRACE; -=09=09} else { +=09=09if ((prop & SA_CONT) && action =3D=3D SIG_DFL) { +=09=09=09SIGDELSET(p->p_siglist, sig); =09=09=09/* -=09=09=09 * If SIGCONT is default (or ignored) and process is -=09=09=09 * asleep, we are finished; the process should not -=09=09=09 * be awakened. +=09=09=09 * It may be on either list in this state. +=09=09=09 * Remove from both for now. =09=09=09 */ -=09=09=09if ((prop & SA_CONT) && action =3D=3D SIG_DFL) { -=09=09=09=09SIGDELSET(p->p_siglist, sig); -=09=09=09=09/* -=09=09=09=09 * It may be on either list in this state. -=09=09=09=09 * Remove from both for now. -=09=09=09=09 */ -=09=09=09=09SIGDELSET(td->td_siglist, sig); -=09=09=09=09return; -=09=09=09} - -=09=09=09/* -=09=09=09 * Give low priority threads a better chance to run. -=09=09=09 */ -=09=09=09if (td->td_priority > PUSER) -=09=09=09=09td->td_priority =3D PUSER; +=09=09=09SIGDELSET(td->td_siglist, sig); +=09=09=09return; =09=09} + +=09=09/* +=09=09 * Give low priority threads a better chance to run. +=09=09 */ +=09=09if (td->td_priority > PUSER) +=09=09=09td->td_priority =3D PUSER; + =09=09sleepq_abort(td); =09} else { =09=09/* -------------------------------------------------------------- Now that you have fixed my bugs, I will now have to go find some more to play with. :) Se=E1n --=20 sean-freebsd@farley.org --0-1475645835-1117821666=:2420-- State-Changed-From-To: open->closed State-Changed-By: davidxu State-Changed-When: Sun Jun 19 11:45:41 GMT 2005 State-Changed-Why: Fixed in -CURRENT. http://www.freebsd.org/cgi/query-pr.cgi?pr=77818 From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 19 Apr 2005 16:37:15 +0800 I have commited a fix to -CURRENT, don't know if it will fix the problem you have seen, but it is worth to try. David Xu Seán C. Farley wrote: > I finally compiled Zsh with debugging and got the following trace. I > ran it by the following line: > LD_LIBRARY_PATH=/tmp/libc SHELL=/bin/sh gdb /usr/local/bin/zsh > > Program received signal SIGINT, Interrupt. > 0x281a9efb in sigsuspend () from /lib/libc.so.5 > (gdb) where > #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 > #1 0x280e07dc in signal_suspend (sig=20, sig2=2) at signals.c:367 > #2 0x280b92e8 in waitforpid (pid=11195) at jobs.c:1120 > #3 0x280a3dda in getoutput (cmd=0x8060e5c "uname -s", qt=1) at > exec.c:2869 > #4 0x280e2a0c in stringsubst (list=0xbfbfe7c0, node=0xbfbfe7d0, ssub=4, > asssub=0) at subst.c:189 > #5 0x280e2378 in prefork (list=0xbfbfe7c0, flags=6) at subst.c:74 > #6 0x280a0886 in addvars (state=0xbfbfe880, pc=0xbfbfe7c0, export=0) > at exec.c:1614 > #7 0x2809ec5d in execsimple (state=0x1) at exec.c:802 > #8 0x2809edb7 in execlist (state=0xbfbfe880, dont_change_job=0, > exiting=0) > at exec.c:855 > #9 0x2809eb9a in execode (p=0x80608e0, dont_change_job=0, exiting=0) > at exec.c:775 > #10 0x280b3a8a in loop (toplevel=0, justonce=0) at init.c:165 > #11 0x280b58b8 in source (s=0xbfbfe950 "/root/.zshenv") at init.c:1043 > #12 0x280b5b1c in sourcehome (s=0x280f458d ".zshenv") at init.c:1088 > #13 0x280b54e7 in run_init_scripts () at init.c:937 > #14 0x280b63da in zsh_main (argc=1, argv=0xbfbfeac0) at init.c:1262 > #15 0x08048583 in main (argc=1, argv=0xbfbfeac0) at ./main.c:93 > > I hope this can help find the problem. > > Seán _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org Cc: Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 18 Apr 2005 22:26:41 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-4531466-1113858140=:11216 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Content-ID: <20050418160331.P11216@thor.farley.org> I finally compiled Zsh with debugging and got the following trace. I ran it by the following line: LD_LIBRARY_PATH=3D/tmp/libc SHELL=3D/bin/sh gdb /usr/local/bin/zsh Program received signal SIGINT, Interrupt. 0x281a9efb in sigsuspend () from /lib/libc.so.5 (gdb) where #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 #1 0x280e07dc in signal_suspend (sig=3D20, sig2=3D2) at signals.c:367 #2 0x280b92e8 in waitforpid (pid=3D11195) at jobs.c:1120 #3 0x280a3dda in getoutput (cmd=3D0x8060e5c "uname -s", qt=3D1) at exec.c:= 2869 #4 0x280e2a0c in stringsubst (list=3D0xbfbfe7c0, node=3D0xbfbfe7d0, ssub= =3D4, asssub=3D0) at subst.c:189 #5 0x280e2378 in prefork (list=3D0xbfbfe7c0, flags=3D6) at subst.c:74 #6 0x280a0886 in addvars (state=3D0xbfbfe880, pc=3D0xbfbfe7c0, export=3D0) at exec.c:1614 #7 0x2809ec5d in execsimple (state=3D0x1) at exec.c:802 #8 0x2809edb7 in execlist (state=3D0xbfbfe880, dont_change_job=3D0, exitin= g=3D0) at exec.c:855 #9 0x2809eb9a in execode (p=3D0x80608e0, dont_change_job=3D0, exiting=3D0) at exec.c:775 #10 0x280b3a8a in loop (toplevel=3D0, justonce=3D0) at init.c:165 #11 0x280b58b8 in source (s=3D0xbfbfe950 "/root/.zshenv") at init.c:1043 #12 0x280b5b1c in sourcehome (s=3D0x280f458d ".zshenv") at init.c:1088 #13 0x280b54e7 in run_init_scripts () at init.c:937 #14 0x280b63da in zsh_main (argc=3D1, argv=3D0xbfbfeac0) at init.c:1262 #15 0x08048583 in main (argc=3D1, argv=3D0xbfbfeac0) at ./main.c:93 I hope this can help find the problem. Se=E1n --=20 sean-freebsd@farley.org --0-4531466-1113858140=:11216 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" --0-4531466-1113858140=:11216-- From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 26 Apr 2005 07:27:31 +0800 Seán C. Farley wrote: > > Would you happen to have a patch to -STABLE for the fix? I looked at > some of the code you committed recently and was unsure of what fix(es) > needed to be made to test it. A few of the changes did not look like > they matched easily to -STABLE. > > Thank you. > > Seán It was MFCed onto -STABLE. _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Mon, 25 Apr 2005 14:17:28 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-583420044-1114456648=:25522 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 19 Apr 2005, David Xu wrote: > I have commited a fix to -CURRENT, don't know if it will fix the > problem you have seen, but it is worth to try. > > David Xu > > Se=E1n C. Farley wrote: > >> I finally compiled Zsh with debugging and got the following trace. I >> ran it by the following line: >> LD_LIBRARY_PATH=3D/tmp/libc SHELL=3D/bin/sh gdb /usr/local/bin/zsh >>=20 >> Program received signal SIGINT, Interrupt. >> 0x281a9efb in sigsuspend () from /lib/libc.so.5 >> (gdb) where >> #0 0x281a9cfb in sigsuspend () at sigsuspend.S:2 >> #1 0x280e07dc in signal_suspend (sig=3D20, sig2=3D2) at signals.c:367 >> #2 0x280b92e8 in waitforpid (pid=3D11195) at jobs.c:1120 >> #3 0x280a3dda in getoutput (cmd=3D0x8060e5c "uname -s", qt=3D1) at exec= =2Ec:2869 >> #4 0x280e2a0c in stringsubst (list=3D0xbfbfe7c0, node=3D0xbfbfe7d0, ssu= b=3D4, >> asssub=3D0) at subst.c:189 >> #5 0x280e2378 in prefork (list=3D0xbfbfe7c0, flags=3D6) at subst.c:74 >> #6 0x280a0886 in addvars (state=3D0xbfbfe880, pc=3D0xbfbfe7c0, export= =3D0) >> at exec.c:1614 >> #7 0x2809ec5d in execsimple (state=3D0x1) at exec.c:802 >> #8 0x2809edb7 in execlist (state=3D0xbfbfe880, dont_change_job=3D0, exi= ting=3D0) >> at exec.c:855 >> #9 0x2809eb9a in execode (p=3D0x80608e0, dont_change_job=3D0, exiting= =3D0) >> at exec.c:775 >> #10 0x280b3a8a in loop (toplevel=3D0, justonce=3D0) at init.c:165 >> #11 0x280b58b8 in source (s=3D0xbfbfe950 "/root/.zshenv") at init.c:1043 >> #12 0x280b5b1c in sourcehome (s=3D0x280f458d ".zshenv") at init.c:1088 >> #13 0x280b54e7 in run_init_scripts () at init.c:937 >> #14 0x280b63da in zsh_main (argc=3D1, argv=3D0xbfbfeac0) at init.c:1262 >> #15 0x08048583 in main (argc=3D1, argv=3D0xbfbfeac0) at ./main.c:93 >>=20 >> I hope this can help find the problem. Would you happen to have a patch to -STABLE for the fix? I looked at some of the code you committed recently and was unsure of what fix(es) needed to be made to test it. A few of the changes did not look like they matched easily to -STABLE. Thank you. Se=E1n --=20 sean-freebsd@farley.org --0-583420044-1114456648=:25522 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" --0-583420044-1114456648=:25522-- From: David Xu To: =?ISO-8859-1?Q?=22Se=E1n_C=2E_Farley=22?= Cc: freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Wed, 27 Apr 2005 08:52:34 +0800 Seán C. Farley wrote: > On Tue, 26 Apr 2005, David Xu wrote: > >> It was MFCed onto -STABLE. > > > I had updated after your last e-mail but had seen no change. > > I updated again this morning (changes to kern_exit.c and kern_sig.c) and > am still seeing the GDB problem with Zsh. > > Is this a separate problem? > > Seán Yes, I think it is a separated problem. _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" From: =?ISO-8859-1?Q?Se=E1n_C=2E_Farley?= To: David Xu Cc: freebsd-bugs@freebsd.org Subject: Re: gnu/77818: GDB locks in wait4() when running applications Date: Tue, 26 Apr 2005 11:23:02 -0500 (CDT) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-195141191-1114532582=:823 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 26 Apr 2005, David Xu wrote: > It was MFCed onto -STABLE. I had updated after your last e-mail but had seen no change. I updated again this morning (changes to kern_exit.c and kern_sig.c) and am still seeing the GDB problem with Zsh. Is this a separate problem? Se=E1n --=20 sean-freebsd@farley.org --0-195141191-1114532582=:823 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org" --0-195141191-1114532582=:823-- >Unformatted: