From nobody@FreeBSD.org Fri Apr 8 14:28:50 2005 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8269C16A4CE for ; Fri, 8 Apr 2005 14:28:50 +0000 (GMT) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E25043D39 for ; Fri, 8 Apr 2005 14:28:50 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j38ESok1077372 for ; Fri, 8 Apr 2005 14:28:50 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id j38ESoUK077371; Fri, 8 Apr 2005 14:28:50 GMT (envelope-from nobody) Message-Id: <200504081428.j38ESoUK077371@www.freebsd.org> Date: Fri, 8 Apr 2005 14:28:50 GMT From: Steve Sears To: freebsd-gnats-submit@FreeBSD.org Subject: svctcp_create() fails if multiple threads call at the same time, it's not thread-safe X-Send-Pr-Version: www-2.3 >Number: 79683 >Category: threads >Synopsis: svctcp_create() fails if multiple threads call at the same time, it's not thread-safe >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Apr 08 14:30:20 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Steve Sears >Release: 5.3-STABLE, 5.4-STABLE >Organization: self >Environment: FreeBSD sjs-bsd 5.4-STABLE FreeBSD 5.4-STABLE #17: Thu Apr 7 08:15:24 EDT 2005 root@sjs-bsd:/usr/src/sys-sjs/i386/compile/SJSKERN i386 >Description: The svctcp_create() function returns NULL if it fails to create a TCP/IP-based RPC service transport for the server. On BSD, this function often fails when several threads concurrently try to register as servers. When it begins to fail, subsequent retries very rarely work (although I have seen it happen). I frequently see svctcp_create() fail (returns NULL). SVCXPRT *transp =svctcp_create(sockfd, 0, 0); with the syslog message "Could not get tcp transport" frmom the code lib/rpc code: static SVCXPRT * svc_com_create(fd, sendsize, recvsize, netid) .. if ((nconf = __rpc_getconfip(netid)) == NULL) { (void) syslog(LOG_ERR, "Could not get %s transport", netid); return (NULL); } The culprit appears to be the setnetconfig(), getnetconfig(), endnetconfig() code used by __rpc_getconfip(netid). The setnetconfig() returns a handle, accessed through getnetconifg(), and terminated by endnetconfig(). This scary code attempts to share in-memory structures and an open file pointer with multiple accessing threads, but it appears that not all access paths guarantee that all the data will be available for the other threads. >How-To-Repeat: Don't have a small test program, sorry. Our program creates 5 listener threads, which try to invoke this code are roughly the same time. >Fix: Staggering the threads by 1 second and invoking lots of retries seems to work. But it's ugly! >Release-Note: >Audit-Trail: >Unformatted: