From vince@pele.WURLDLINK.NET Tue Nov 9 01:11:54 1999 Return-Path: Received: from pele.WURLDLINK.NET (pele.WURLDLINK.NET [208.164.68.2]) by hub.freebsd.org (Postfix) with ESMTP id 31A1515164 for ; Tue, 9 Nov 1999 01:11:53 -0800 (PST) (envelope-from vince@pele.WURLDLINK.NET) Received: (from root@localhost) by pele.WURLDLINK.NET (8.9.3/8.9.3) id XAA35665; Mon, 8 Nov 1999 23:11:18 -1000 (HST) (envelope-from vince) Message-Id: <199911090911.XAA35665@pele.WURLDLINK.NET> Date: Mon, 8 Nov 1999 23:11:18 -1000 (HST) From: Vincent Poy Reply-To: vince@pele.WURLDLINK.NET To: FreeBSD-gnats-submit@freebsd.org Subject: Serious locking problem in CURRENT X-Send-Pr-Version: 3.2 >Number: 14797 >Category: kern >Synopsis: Serious locking problem in CURRENT >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 9 01:20:00 PST 1999 >Closed-Date: Mon Aug 7 07:41:35 PDT 2000 >Last-Modified: Mon Aug 07 07:43:16 PDT 2000 >Originator: Vincent Poy >Release: FreeBSD 4.0-CURRENT i386 >Organization: Wurldlink Corporation - San Francisco - Honolulu - Hong Kong >Environment: FreeBSD -CURRENT as of November 8, 1999 12:00AM PDT. >Description: There is something broken in -CURRENT with file locking since I've experienced this with sendmail 8.9.3. I compared this to a 3.3-RELEASE machine running sendmail 8.9.3 and it doesn't exhibit the same problem. >How-To-Repeat: You can do a little test of the file locking, might be a bit tricky if you have a busy system, but it would be interesting to see the result: Run sendmail with -bd -q1m Send a message to an "unused" IP address on your local network, e.g. date | sendmail 'nobody@[123.123.123.123]' (substitute an appropriate IP address of course). This should have the (backgrounded) original sendmail process sitting waiting with the queue file locked for just over one minute, so you need to hurry a bit with the rest: Run 'mailq' - does this message have a '*' in the first column (it should)? Take the queue ID for the message - shown in the first column of mailq output (immediately following the '*', if any) - say XAA01234, and do a verbose queue run for just that ID: sendmail -v -qIXAA01234 (substituting the queue ID you got of course, i.e. -qI) - this should just print Running XAA03875 (sequence 1 of 1) XAA03875: locked and then exit - does it? >Fix: >Release-Note: >Audit-Trail: From: Sheldon Hearn To: vince@pele.WURLDLINK.NET Cc: FreeBSD-gnats-submit@FreeBSD.ORG Subject: Re: kern/14797: Serious locking problem in CURRENT Date: Tue, 09 Nov 1999 12:45:36 +0200 On Mon, 08 Nov 1999 23:11:18 -1000, Vincent Poy wrote: > You can do a little test of the file locking, might be a bit tricky if > you have a busy system, but it would be interesting to see the result: > > Run sendmail with -bd -q1m Sendmail isn't a "little test" of anything. :-) There are discussions on the -current mailing list (which you should be reading if you're posting CURRENT problem reports) regarding file locking. I believe Brian Feldman fixed a locking problem over the week-end. You'd know this too if you followed your commit mail, eh? ;-) Please try kern_descrip.c rev 1.72 and see if it fixes whatever problem you're having. Ciao, Sheldon. State-Changed-From-To: open->feedback State-Changed-By: sheldonh State-Changed-When: Tue Nov 9 03:14:15 PST 1999 State-Changed-Why: Waiting for Vincent to try with up to date CURRENT sources. State-Changed-From-To: feedback->closed State-Changed-By: sheldonh State-Changed-When: Mon Aug 7 07:41:35 PDT 2000 State-Changed-Why: Johan Karlsson reminded me that this one has timed out waiting for feedback. http://www.freebsd.org/cgi/query-pr.cgi?pr=14797 >Unformatted: >From the above tests, the file locking does work in general. However, it could still be a race condition. Here's another test, which will be more of the sendmail situation: Create a little shell script #!/bin/sh sleep 300 cat > /tmp/message.$$ and an alias pointing to it: testalias: "|/path/to/script" - then set the daemon to run with -q1m, and send a single mail to "testalias". If the problem appears in this test, you should have (after 5 minutes) multiple /tmp/message.nnnnn (the nnnnn being process IDs) files, each containing the message you sent. If you check /tmp in 10 minutes, you will notice that some messages will overlap in -CURRENT of having the same message regenerated a few times while on 3.3-RELEASE, it will only show one /tmp/message.nnnnn file. And then just to repeat the test, do the following but this time send the single message to testalias with the command: 5 minutes) multiple /tmp/message.nnnnn (the nnnnn being process IDs) files, each containing the message you sent. If you check /tmp in 10 minutes, you will notice that some messages will overlap in -CURRENT of having the same message regenerated a few times while on 3.3-RELEASE, it will only show one /tmp/message.nnnnn file. And then just to repeat the test, do the following but this time send the single message to testalias with the command: sendmail -odq -oi testalias < messagefile It might also be worth testing with sendmail -odi -oi testalias < messagefile The last form will seem to hang until the message is delivered. If there is only one '/tmp/message.nnnnn' produced in each of these tests, it will suggest that your system is losing its locks over the fork made for delivery. With '-odq', the message is placed in the queue for later delivery attempts, and the queue run does not normally fork for delivery. With '-odi' it is delivered interactively without a fork. With neither of those operands, or with '-odb', there is a fork before delivery. On all of these tests, 3.3-RELEASE will generate only one /tmp/message.nnnnn while -CURRENT will generate multiple /tmp/message.nnnnn.