From nobody@FreeBSD.org Mon Sep 22 17:50:59 2008 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 898F4106566B for ; Mon, 22 Sep 2008 17:50:59 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 781B18FC20 for ; Mon, 22 Sep 2008 17:50:59 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id m8MHoxbH023566 for ; Mon, 22 Sep 2008 17:50:59 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id m8MHoxcI023565; Mon, 22 Sep 2008 17:50:59 GMT (envelope-from nobody) Message-Id: <200809221750.m8MHoxcI023565@www.freebsd.org> Date: Mon, 22 Sep 2008 17:50:59 GMT From: Philip Semanchuk To: freebsd-gnats-submit@FreeBSD.org Subject: POSIX (1003.1b) semaphores can become negative X-Send-Pr-Version: www-3.1 X-GNATS-Notify: >Number: 127545 >Category: kern >Synopsis: [sysvipc] POSIX (1003.1b) semaphores can become negative >Confidential: no >Severity: non-critical >Priority: low >Responsible: bms >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Sep 22 18:00:07 UTC 2008 >Closed-Date: Sat Mar 14 01:13:36 UTC 2009 >Last-Modified: Sat Mar 14 01:13:36 UTC 2009 >Originator: Philip Semanchuk >Release: 7.0 >Organization: >Environment: FreeBSD whiskey.nc.rr.com 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008 root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 >Description: Specifically, when two processes are contending to acquire a POSIX semaphore, the value of the semaphore can become negative. In the sample code (see below), this manifests itself as an overflow error when the following sequence of events occurs: 1) The semaphore has the value 1. 2) Process 1 acquires the semaphore, value becomes 0 3) Process 2 acquires the semaphore, value becomes -1 4) Either process calls sem_post(). That function sees that the semaphore's value == SEM_VALUE_MAX (which is #defined as ~0U) and concludes that it is already incremented to its maximum so it sets errno = EOVERFLOW. I have two different test machines both running FreeBSD 6.0. Using the sample code, both exhibit the problem 100% of the time. One of those test machines also has FreeBSD 7.0 installed. It exhibits the problem 100% of the time. The same sample code never exhibits the problem under OS X 10.5.4 and various flavors of Linux. This problem makes semaphores pretty useless for IPC, but I'm marking this is non-critical and low priority since /usr/src/sys/conf/NOTES says, "p1003_1b_semaphores are very experimental". >How-To-Repeat: The sample code, log, etc. is here: http://semanchuk.com/philip/temp/freebsd_semaphore_test.tar.gz If you can attach a copy of that tarball to this PR, that'd be great. The tarball contains mk.sh which will build the applications premise and conclusion. (As in Mrs. Premise and Mrs. Conclusion from the Monty Python sketch. What else would you expect from a Python extension developer?) They talk with one another to demonstrate usage of shared memory and semaphores. Premise writes a random string (the current time) to shared memory, conclusion reads it, md5s it, writes that back. Premise reads conclusion's message, verifies that it is the md5 of what she (Mrs. Premise) wrote, md5s that and writes it back. And so on, for up to 1000 iterations. However on FreeBSD they fail almost immediately due to the aforementioned EOVERFLOW error. I was curious what was happening so I hacked up uipc_sem.c a little to print more debugging info. (I had to #define SEM_DEBUG in opt_posix.h to get the debug messages to appear in /var/log/messages.) The changes I made are described in uipc_sem.diff (which was made against the version of uipc_sem.c that shipped w/7.0). With these additions to uipc_sem.c, I was able to see the semaphore's value go to -1 and even -2. I included my /var/log/messages. Have a look: grep "post-decrement" messages >Fix: I think a clue to the source of the problem (if you're interested in my opinion) is visible with this: grep -n "pid=" messages Line 18 of the grep output shows pid 58126 entering the the critical section inside kern_sem_wait() and it doesn't leave until line 179. In the meantime, pid 58125 enters and leaves the critical section over and over. It looks like the mtx_lock() call isn't behaving as expected. I noticed that uipc_sem.c changed a lot in rev 1.34, but a inspection didn't lead me to believe that the changes would alter the behavior I'm seeing. >Release-Note: >Audit-Trail: From: Bruce Simpson To: freebsd-gnats-submit@freebsd.org Cc: Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 08:51:57 +0000 I just managed to reproduce this on RELENG_7, and I clearly see the semaphore value becoming negative. I am attempting to track down why Python multiprocessing doesn't work. It looks like it's related to how Python processes hang on FreeBSD when a thread forks a new process. On the other hand, multiprocessing does use POSIX semaphores for synchronization; the Python interpreter's use of this itself is optional, and it tries to emulate it with condition variables and mutexes. jhb@ recently merged some cleanups however they don't have much to do with the issue here, they just make the implementation cleaner. From: Bruce Simpson To: freebsd-gnats-submit@freebsd.org, Philip Semanchuk Cc: Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 10:38:38 +0000 This is a multi-part message in MIME format. --------------070200030208040807070007 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Can you please try this patch? thanks, BMS --------------070200030208040807070007 Content-Type: text/plain; name="1" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="1" Index: uipc_sem.c =================================================================== --- uipc_sem.c (revision 189736) +++ uipc_sem.c (working copy) @@ -722,7 +722,7 @@ #endif DP(("kern_sem_wait value = %d, tryflag %d\n", ks->ks_value, tryflag)); vfs_timestamp(&ks->ks_atime); - if (ks->ks_value == 0) { + while (ks->ks_value == 0) { ks->ks_waiters++; if (tryflag != 0) error = EAGAIN; --------------070200030208040807070007-- State-Changed-From-To: open->patched State-Changed-By: bms State-Changed-When: Thu 12 Mar 2009 16:43:10 UTC State-Changed-Why: i'll take it Responsible-Changed-From-To: freebsd-bugs->bms Responsible-Changed-By: bms Responsible-Changed-When: Thu 12 Mar 2009 16:43:10 UTC Responsible-Changed-Why: i'll take it http://www.freebsd.org/cgi/query-pr.cgi?pr=127545 From: Bruce Simpson To: Philip Semanchuk Cc: freebsd-gnats-submit@freebsd.org, bug-followup@FreeBSD.org Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 16:43:00 +0000 Philip Semanchuk wrote: > ... > I applied the patch to my 7.0-RELEASE #0 code and it works for me. > > Previously my test code would hit the EOVERFLOW error after a few (< > 10) iterations. After applying the patch I ran over 1,000,000 > iterations with no problems. Thank you so much. It's in HEAD and a pending MFC has been set. thanks, BMS From: Bruce Simpson To: Philip Semanchuk Cc: freebsd-gnats-submit@freebsd.org, bug-followup@FreeBSD.org Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 16:43:00 +0000 Philip Semanchuk wrote: > ... > I applied the patch to my 7.0-RELEASE #0 code and it works for me. > > Previously my test code would hit the EOVERFLOW error after a few (< > 10) iterations. After applying the patch I ran over 1,000,000 > iterations with no problems. Thank you so much. It's in HEAD and a pending MFC has been set. thanks, BMS From: Philip Semanchuk To: Bruce Simpson Cc: freebsd-gnats-submit@freebsd.org, bug-followup@FreeBSD.org Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 12:40:42 -0400 On Mar 12, 2009, at 6:38 AM, Bruce Simpson wrote: > Can you please try this patch? > > thanks, > BMS > Index: uipc_sem.c > =================================================================== > --- uipc_sem.c (revision 189736) > +++ uipc_sem.c (working copy) > @@ -722,7 +722,7 @@ > #endif > DP(("kern_sem_wait value = %d, tryflag %d\n", ks->ks_value, > tryflag)); > vfs_timestamp(&ks->ks_atime); > - if (ks->ks_value == 0) { > + while (ks->ks_value == 0) { > ks->ks_waiters++; > if (tryflag != 0) > error = EAGAIN; I applied the patch to my 7.0-RELEASE #0 code and it works for me. Previously my test code would hit the EOVERFLOW error after a few (< 10) iterations. After applying the patch I ran over 1,000,000 iterations with no problems. Thanks Philip From: Philip Semanchuk To: Bruce Simpson Cc: freebsd-gnats-submit@freebsd.org, bug-followup@FreeBSD.org Subject: Re: kern/127545: [sysvipc] POSIX (1003.1b) semaphores can become negative Date: Thu, 12 Mar 2009 12:40:42 -0400 On Mar 12, 2009, at 6:38 AM, Bruce Simpson wrote: > Can you please try this patch? > > thanks, > BMS > Index: uipc_sem.c > =================================================================== > --- uipc_sem.c (revision 189736) > +++ uipc_sem.c (working copy) > @@ -722,7 +722,7 @@ > #endif > DP(("kern_sem_wait value = %d, tryflag %d\n", ks->ks_value, > tryflag)); > vfs_timestamp(&ks->ks_atime); > - if (ks->ks_value == 0) { > + while (ks->ks_value == 0) { > ks->ks_waiters++; > if (tryflag != 0) > error = EAGAIN; I applied the patch to my 7.0-RELEASE #0 code and it works for me. Previously my test code would hit the EOVERFLOW error after a few (< 10) iterations. After applying the patch I ran over 1,000,000 iterations with no problems. Thanks Philip State-Changed-From-To: patched->closed State-Changed-By: bms State-Changed-When: Sat 14 Mar 2009 01:12:57 UTC State-Changed-Why: I think this is a wrap. We need more in-depth testing with Python before we can turn on its use of POSIX sems for internal thread synchronization, though. http://www.freebsd.org/cgi/query-pr.cgi?pr=127545 >Unformatted: test code saved at: http://people.freebsd.org/~vwe/files/freebsd_semaphore_test.tar.gz