From nomad@crow.ee.washington.edu Thu Mar 27 23:55:57 2008 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FD49106564A for ; Thu, 27 Mar 2008 23:55:57 +0000 (UTC) (envelope-from nomad@crow.ee.washington.edu) Received: from crow.ee.washington.edu (crow.ee.washington.edu [128.208.232.10]) by mx1.freebsd.org (Postfix) with ESMTP id 7F59C8FC1D for ; Thu, 27 Mar 2008 23:55:57 +0000 (UTC) (envelope-from nomad@crow.ee.washington.edu) Received: from goose.ee.washington.edu (goose.ee.washington.edu [128.208.232.11]) by crow.ee.washington.edu (8.13.1/8.13.3) with ESMTP id m2RNtsIU019424 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 27 Mar 2008 16:55:54 -0700 Received: from goose.ee.washington.edu (localhost [127.0.0.1]) by goose.ee.washington.edu (8.14.2/8.12.10) with ESMTP id m2RNts2W004226; Thu, 27 Mar 2008 16:55:54 -0700 (PDT) Received: (from nomad@localhost) by goose.ee.washington.edu (8.14.2/8.14.2/Submit) id m2RNtsb9004225; Thu, 27 Mar 2008 16:55:54 -0700 (PDT) (envelope-from nomad) Message-Id: <200803272355.m2RNtsb9004225@goose.ee.washington.edu> Date: Thu, 27 Mar 2008 16:55:54 -0700 (PDT) From: Lee Damon Reply-To: Lee Damon To: FreeBSD-gnats-submit@freebsd.org Cc: nomad@crow.ee.washington.edu Subject: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 122172 >Category: bin >Synopsis: [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-fs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Mar 28 00:00:03 UTC 2008 >Closed-Date: >Last-Modified: Tue May 27 02:00:07 UTC 2008 >Originator: Lee Damon >Release: FreeBSD 6.3-STABLE i386 >Organization: Univ. of Washington Electrical Engr, SSLI LAB >Environment: System: FreeBSD goose.ee.washington.edu 6.3-STABLE FreeBSD 6.3-STABLE #6: Wed r 26 17:03:35 PDT 2008 root@goose.ee.washington.edu:/usr/obj/usr/src/sys/NIKO i386 goose was CVSupd, buildworld, buildkernel and installed around 15:00 PDT on 26 MAR, 2008. This was done trying to solve the problem. The problem showed up from CVSup, buildworld, buildkernel on 14 FEB, 2008 at 11:32 PST. The other i386 system with same problem and 8 amd64 systems which don't have the problem were all CVSup'd and built on 14 FEB, 2008 at 11:32 PST. >Description: amd(8) is launched on boot (or later) and runs briefly then aborts. If it is launched on boot then it never gets past reclaiming all the children it starts to help it boot up. One of the children (or the parent in some cases) aborts with a SIG 11. The attached gdb & truss output were obtained by starting amd manually after boot. It gets past the part where the children finish setup but eventually dies. Sometimes it is SIG 10, sometimes SIG 11. I have a truss output and amd log file available but gnats thought they were too big to include in the pr email. The core file and amd binary are available for examination if needed. The amd.conf and map files are the same on all 10 systems. >How-To-Repeat: configure and launch amd on a i386 6.3-STABLE system. >Fix: none known. --- gdb.out begins here --- Script started on Thu Mar 27 14:42:26 2008 goose# gdb -c amd.core amd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Core was generated by `amd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done. Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1 Reading symbols from /usr/local/lib/libldap-2.3.so.2...done. Loaded symbols for /usr/local/lib/libldap-2.3.so.2 Reading symbols from /usr/local/lib/liblber-2.3.so.2...done. Loaded symbols for /usr/local/lib/liblber-2.3.so.2 Reading symbols from /usr/local/lib/libgssapi_krb5.so...done. Loaded symbols for /usr/local/lib/libgssapi_krb5.so Reading symbols from /usr/local/lib/libssl.so.5...done. Loaded symbols for /usr/local/lib/libssl.so.5 Reading symbols from /usr/local/lib/libcrypto.so.5...done. Loaded symbols for /usr/local/lib/libcrypto.so.5 Reading symbols from /usr/local/lib/libkrb5.so...done. Loaded symbols for /usr/local/lib/libkrb5.so Reading symbols from /usr/local/lib/libk5crypto.so...done. Loaded symbols for /usr/local/lib/libk5crypto.so Reading symbols from /usr/local/lib/libcom_err.so...done. Loaded symbols for /usr/local/lib/libcom_err.so Reading symbols from /usr/local/lib/libkrb5support.so...done. Loaded symbols for /usr/local/lib/libkrb5support.so Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307 307 if (fp->fh_fs == fs || fs == NULL) { (gdb) bt #0 0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307 #1 0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157 #2 0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215 #3 0x28112673 in svc_getreq_common () from /lib/libc.so.6 #4 0x281126e8 in svc_getreqset () from /lib/libc.so.6 #5 0x0805c2a5 in run_rpc () at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294 #6 0x0805c505 in mount_automounter (ppid=2487) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448 #7 0x0804deaa in main (argc=5, argv=0xbfbfecd0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564 (gdb) where #0 0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307 #1 0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157 #2 0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215 #3 0x28112673 in svc_getreq_common () from /lib/libc.so.6 #4 0x281126e8 in svc_getreqset () from /lib/libc.so.6 #5 0x0805c2a5 in run_rpc () at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294 #6 0x0805c505 in mount_automounter (ppid=2487) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448 #7 0x0804deaa in main (argc=5, argv=0xbfbfecd0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564 (gdb) goose# exit exit Script done on Thu Mar 27 14:42:38 2008 --- gdb.out ends here --- --- gdb1.out begins here --- Script started on Thu Mar 27 15:34:48 2008 goose# gdb -c amd.core amd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Core was generated by `amd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done. Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1 Reading symbols from /usr/local/lib/libldap-2.3.so.2...done. Loaded symbols for /usr/local/lib/libldap-2.3.so.2 Reading symbols from /usr/local/lib/liblber-2.3.so.2...done. Loaded symbols for /usr/local/lib/liblber-2.3.so.2 Reading symbols from /usr/local/lib/libgssapi_krb5.so...done. Loaded symbols for /usr/local/lib/libgssapi_krb5.so Reading symbols from /usr/local/lib/libssl.so.5...done. Loaded symbols for /usr/local/lib/libssl.so.5 Reading symbols from /usr/local/lib/libcrypto.so.5...done. Loaded symbols for /usr/local/lib/libcrypto.so.5 Reading symbols from /usr/local/lib/libkrb5.so...done. Loaded symbols for /usr/local/lib/libkrb5.so Reading symbols from /usr/local/lib/libk5crypto.so...done. Loaded symbols for /usr/local/lib/libk5crypto.so Reading symbols from /usr/local/lib/libcom_err.so...done. Loaded symbols for /usr/local/lib/libcom_err.so Reading symbols from /usr/local/lib/libkrb5support.so...done. Loaded symbols for /usr/local/lib/libkrb5support.so Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307 307 if (fp->fh_fs == fs || fs == NULL) { (gdb) frame 0 #0 0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0) at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307 307 if (fp->fh_fs == fs || fs == NULL) { (gdb) list 302 flush_nfs_fhandle_cache(fserver *fs) 303 { 304 fh_cache *fp; 305 306 ITER(fp, fh_cache, &fh_head) { 307 if (fp->fh_fs == fs || fs == NULL) { 308 /* 309 * Only invalidate port info for non-WebNFS servers 310 */ 311 if (!(fp->fh_fs->fs_flags & FSF_WEBNFS)) (gdb) info frame Stack level 0, frame at 0xbfbfe450: eip = 0x805d8fa in flush_nfs_fhandle_cache (/usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307); saved eip 0x805272d called by frame at 0xbfbfe470 source language c. Arglist at 0xbfbfe448, args: fs=0x0 Locals at 0xbfbfe448, Previous frame's sp is 0xbfbfe450 Saved registers: ebp at 0xbfbfe448, eip at 0xbfbfe44c (gdb) info args fs = (fserver *) 0x0 (gdb) info locals fp = (fh_cache *) 0x8 (gdb) print fp $1 = (fh_cache *) 0x8 (gdb) Script done on Thu Mar 27 15:35:19 2008 --- gdb1.out ends here --- >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-i386->freebsd-fs Responsible-Changed-By: remko Responsible-Changed-When: Sat Apr 5 08:11:45 UTC 2008 Responsible-Changed-Why: The backtraces show that amd(8) has a problem, reassign to the fs team to investigate this. http://www.freebsd.org/cgi/query-pr.cgi?pr=122172 From: John Hein To: bug-followup@FreeBSD.org Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Mon, 7 Apr 2008 21:16:37 -0600 This doesn't help your problem directly, but we've been using amd with NIS maps and 6.3/i386 without any problems. What's your configuration? You might have to debug a little further to find out how fp gets set to NULL. You could also try the newer version of am-utils in ports just to see if it behaves differently. Have you tried searching back from your cvsup date to see when it stops seg faulting for you? From: John Hein To: Lee Damon Cc: bug-followup@FreeBSD.org Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Tue, 8 Apr 2008 11:52:18 -0600 Lee Damon wrote at 09:40 -0700 on Apr 8, 2008: > John Hein wrote: > > This doesn't help your problem directly, but we've been using amd with > > NIS maps and 6.3/i386 without any problems. What's your configuration? > > The maps are flat files but we use LDAP. > > > You could also try the newer version of am-utils in ports just > > to see if it behaves differently. > > thanks for the hints. Sadly the version in the ports tree tied the same > horrible death. You should put that information in the PR (CC restored). > > Have you tried searching back from your cvsup date to see when > > it stops seg faulting for you? > > These are production machines, I can't take them down for the time it > would take to do that :( Unfortunately, all I have are debugging suggestions... - Bring up a non-production machine to play with. - Bring up a virtual machine or jail to play with. - Start with a bare bones amd config (e.g., without anything but the default maps & .conf files). If there's no core dump, then add back parts of your config until it dies. - Compile amd with debug on and turn up the debug level to see if you get any hints. - Trace deeper into the code to find the source of the null ptr. - Try asking on the am-utils mailing list. From: Lee Damon To: bug-followup@FreeBSD.org, nomad@crow.ee.washington.edu Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Tue, 08 Apr 2008 10:59:27 -0700 > You could also try the newer version of am-utils in ports just > to see if it behaves differently. Just tried, same failure (exited with signal 10). Corefile & binary are available if you want them but the port compile defaulted to no debugging and I forgot to turn it on so there's not a lot of information there. Since these are both production machines and amd crashing requires the host to reboot I can't easily test again. nomad >Unformatted: