From nobody@FreeBSD.org Mon Aug 17 22:29:45 2009 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B24A1065696 for ; Mon, 17 Aug 2009 22:29:45 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 5B4FE8FC55 for ; Mon, 17 Aug 2009 22:29:45 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n7HMTjrZ028833 for ; Mon, 17 Aug 2009 22:29:45 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n7HMTiow028832; Mon, 17 Aug 2009 22:29:44 GMT (envelope-from nobody) Message-Id: <200908172229.n7HMTiow028832@www.freebsd.org> Date: Mon, 17 Aug 2009 22:29:44 GMT From: Bruce Cran To: freebsd-gnats-submit@FreeBSD.org Subject: [libkvm] ps segfaults with -ax when inspecting core files X-Send-Pr-Version: www-3.1 X-GNATS-Notify: >Number: 137890 >Category: kern >Synopsis: [libkvm] [patch] ps segfaults with -ax when inspecting core files >Confidential: no >Severity: serious >Priority: medium >Responsible: brucec >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Aug 17 22:30:09 UTC 2009 >Closed-Date: Sun Feb 28 14:10:46 UTC 2010 >Last-Modified: Sun Feb 28 14:10:46 UTC 2010 >Originator: Bruce Cran >Release: 8.0-BETA2 >Organization: >Environment: FreeBSD tau.draftnet 8.0-BETA2 FreeBSD 8.0-BETA2 #0: Sun Aug 16 19:32:23 BST 2009 brucec@tau.draftnet:/usr/obj/usr/src/sys/DELL amd64 >Description: When recovering from a crash, crashinfo(8) is run; it executes 'ps -ax -M corefile' which causes ps to segfault and attempt to write a 1GB core file to / The crash can be reproduced after the system has booted by running 'ps -ax -M /var/crash/vmcore.x'. The faulty code appears to be in lib/libkvm/kvm_proc.c around line 561, though the underlying cause is that the symbol table appears to be unreadable (inferred from the -1 return value of kvm_nlist). It seems it's stepping past the nlist array and calls vsnprintf with a bad argument. kvm_nlist returns -1 to report that the symbol table couldn't be read, but the code assumes it has returned a positive number to indicate that there's an invalid entry, so it starts searching for that entry where n_type is 0. tau# gdb ps GNU gdb 6.1.1 [FreeBSD] [...] (gdb) run -ax -M /var/crash/vmcore.3 Starting program: /bin/ps -ax -M /var/crash/vmcore.3 Program received signal SIGSEGV, Segmentation fault. 0x000000080096340b in strlen (str=Variable "str" is not available. ) at /usr/src/lib/libc/string/strlen.c:88 88 if (*p == '\0') (gdb) bt #0 0x000000080096340b in strlen (str=Variable "str" is not available. ) at /usr/src/lib/libc/string/strlen.c:88 #1 0x000000080095c082 in __vfprintf (fp=0x7fffffffd9a0, fmt0=0x800773915 "%s: no such symbol", ap=0x7fffffffdb10) at /usr/src/lib/libc/stdio/vfprintf.c:825 #2 0x00000008008cc696 in vsnprintf (str=Variable "str" is not available. ) at /usr/src/lib/libc/stdio/vsnprintf.c:70 #3 0x0000000800772e89 in _kvm_err (kd=Variable "kd" is not available. ) at /usr/src/lib/libkvm/kvm.c:104 #4 0x0000000800770907 in kvm_getprocs (kd=0x800b02300, op=8, arg=0, cnt=0x7fffffffdf1c) at /usr/src/lib/libkvm/kvm_proc.c:561 #5 0x0000000000405322 in main (argc=4, argv=0x7fffffffe9a8) at /usr/src/bin/ps/ps.c:511 (gdb) frame 4 #4 0x0000000800770907 in kvm_getprocs (kd=0x800b02300, op=8, arg=0, cnt=0x7fffffffdf1c) at /usr/src/lib/libkvm/kvm_proc.c:561 561 _kvm_err(kd, kd->program, (gdb) list 556 nl[5].n_name = 0; 557 558 if (kvm_nlist(kd, nl) != 0) { 559 for (p = nl; p->n_type != 0; ++p) 560 ; 561 _kvm_err(kd, kd->program, 562 "%s: no such symbol", p->n_name); 563 return (0); 564 } 565 if (KREAD(kd, nl[0].n_value, &nprocs)) { (gdb) print nl $1 = {{n_name = 0x8007738ef "_nprocs", n_type = 240 'ð', n_other = -1 'ÿ', n_desc = -1, n_value = 34365215744}, { n_name = 0x8007738f7 "_allproc", n_type = 160 ' ', n_other = -100 '\234', n_desc = 80, n_value = 0}, { n_name = 0x800773900 "_zombproc", n_type = 57 '9', n_other = 2 '\002', n_desc = 81, n_value = 34367538496}, { n_name = 0x80077390a "_ticks", n_type = 74 'J', n_other = 0 '\0', n_desc = 0, n_value = 34365215744}, { n_name = 0x800773911 "_hz", n_type = 168 '¨', n_other = -23 'é', n_desc = -1, n_value = 140737488349576}, {n_name = 0x0, n_type = 1 '\001', n_other = 0 '\0', n_desc = 0, n_value = 34365024109}} >How-To-Repeat: Run 'ps -ax -M /var/crash/vmcore.x' >Fix: >Release-Note: >Audit-Trail: State-Changed-From-To: open->feedback State-Changed-By: gavin State-Changed-When: Tue Aug 18 13:14:10 UTC 2009 State-Changed-Why: Can you try http://people.freebsd.org/~gavin/PRs/137890.diff ? The failing part is attempting to check that all symbols were found. Looking at the kvm_nproc manpage, the list returned by kvm_nlist is supposed to be terminated by "p->n_name == NULL", however this wasn't being checked. We were therefore wandering off the end of the list. Responsible-Changed-From-To: freebsd-bugs->gavin Responsible-Changed-By: gavin Responsible-Changed-When: Tue Aug 18 13:14:10 UTC 2009 Responsible-Changed-Why: Track http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 From: Gavin Atkinson To: bug-followup@FreeBSD.org, bruce@cran.org.uk Cc: Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Tue, 18 Aug 2009 14:46:46 +0100 Hmm, there may be more to this. I'm pretty sure that patch is correct regardless, however it does appear that kvm_nlist() is returning !=0 even though the structure returned seems to have been fully filled in. Can you add a printf to the code to determine what kvm_nlist() is returning? It will be interesting to see if it is -1, or a positive integer. The patch at least fixes one bug and should prevent the core dump you are seeing. Gavin From: Bruce Cran To: Gavin Atkinson Cc: bug-followup@FreeBSD.org Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Tue, 18 Aug 2009 15:22:17 +0100 kvm_nlist is returning -1, which from the manpage indicates that it couldn't read the symbol table. But, the structure does seem to have been filled in. I'll debug kvm_nlist itself to see why it's filling it in but not returning 0. -- Bruce From: Gavin Atkinson To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Tue, 18 Aug 2009 15:37:15 +0100 On Tue, 2009-08-18 at 13:50 +0000, Gavin Atkinson wrote: > Hmm, there may be more to this. I'm pretty sure that patch is correct > regardless, however it does appear that kvm_nlist() is returning !=0 > even though the structure returned seems to have been fully filled in. Ignore this, I don't think the structure has been filled in at all, and is instead just random contents of memory. I've created a new patch with slightly better error handling at http://people.freebsd.org/~gavin/PRs/137890.2.diff - please give that a go and see if it solves the coredump for you and properly fails with an error message. FWIW, it looks like several other uses of kvm_nlist() in libkvm suffer the same bug with how they check the validity of the returned data. The root cause of why libkvm it is failing on your coredump is still unknown. Gavin From: Bruce Cran To: bug-followup@FreeBSD.org, bruce@cran.org.uk Cc: Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Sun, 23 Aug 2009 21:48:53 +0100 --MP_/Hne4C+2LQmeLzp1pVRjPwT0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline The attached patches fix the crash. The first bug is that ps(1) passes "/dev/null" into kvm_open(3) instead of NULL. The second problem is that the bcopy call fails in kvm_proc.c; it looks like it's because ucred.cr_groups is a kernel address, but without knowing the details of the code I can't be sure. Translating the address with KREAD stops the crash occurring, but may not be the correct solution. -- Bruce --MP_/Hne4C+2LQmeLzp1pVRjPwT0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=kvm_proc.c.diff.txt --- kvm_proc.c.orig 2009-08-03 09:13:06.000000000 +0100 +++ kvm_proc.c 2009-08-23 20:37:26.000000000 +0100 @@ -118,6 +118,7 @@ struct timeval tv; struct sysentvec sysent; char svname[KI_EMULNAMELEN]; + void *crg; kp = &kinfo_proc; kp->ki_structsize = sizeof(kinfo_proc); @@ -150,8 +151,14 @@ kp->ki_ngroups = KI_NGROUPS; kp->ki_cr_flags |= KI_CRF_GRP_OVERFLOW; } - kp->ki_ngroups = ucred.cr_ngroups; - bcopy(ucred.cr_groups, kp->ki_groups, + kp->ki_ngroups = ucred.cr_ngroups; + if (KREAD(kd, (u_long)ucred.cr_groups, &crg)) { + _kvm_err(kd, kd->program, + "can't read cr_groups at %p", + ucred.cr_groups); + return (-1); + } + bcopy(&crg, kp->ki_groups, kp->ki_ngroups * sizeof(gid_t)); kp->ki_uid = ucred.cr_uid; if (ucred.cr_prison != NULL) { @@ -472,7 +479,7 @@ { int mib[4], st, nprocs; size_t size; - int temp_op; + int err, temp_op; if (kd->procbase != 0) { free((void *)kd->procbase); @@ -555,11 +562,16 @@ nl[4].n_name = "_hz"; nl[5].n_name = 0; - if (kvm_nlist(kd, nl) != 0) { - for (p = nl; p->n_type != 0; ++p) - ; + err = kvm_nlist(kd, nl); + if (err == -1) { _kvm_err(kd, kd->program, - "%s: no such symbol", p->n_name); + "cannot read symbol table"); + return (0); + } else if (err > 0) { + for (p = nl; p->n_name != NULL; ++p) + if (p->n_type == 0) + _kvm_err(kd, kd->program, + "%s: no such symbol", p->n_name); return (0); } if (KREAD(kd, nl[0].n_value, &nprocs)) { --MP_/Hne4C+2LQmeLzp1pVRjPwT0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=ps.c.diff.txt --- /usr/src/bin/ps/ps.c 2009-08-03 09:13:06.000000000 +0100 +++ ps.c 2009-08-22 21:03:56.000000000 +0100 @@ -212,7 +212,8 @@ init_list(&sesslist, addelem_pid, sizeof(pid_t), "session id"); init_list(&ttylist, addelem_tty, sizeof(dev_t), "tty"); init_list(&uidlist, addelem_uid, sizeof(uid_t), "user"); - memf = nlistf = _PATH_DEVNULL; + memf = _PATH_DEVNULL; + nlistf = NULL; while ((ch = getopt(argc, argv, PS_ARGS)) != -1) switch (ch) { case 'A': --MP_/Hne4C+2LQmeLzp1pVRjPwT0-- From: Bruce Cran To: bug-followup@FreeBSD.org, bruce@cran.org.uk Cc: Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Mon, 24 Aug 2009 19:54:58 +0100 Since gnats mangled the patches, I've uploaded copies to http://www.cran.org.uk/~brucec/freebsd/pr137890.kvm_proc.c.diff and http://www.cran.org.uk/~brucec/freebsd/pr137890.ps.c.diff -- Bruce State-Changed-From-To: feedback->analyzed State-Changed-By: gavin State-Changed-When: Tue Aug 25 09:40:25 UTC 2009 State-Changed-Why: Mark as analysed, it seems that the problem is well understood, and the patches in the PR fix the issue Responsible-Changed-From-To: gavin->freebsd-bugs Responsible-Changed-By: gavin Responsible-Changed-When: Tue Aug 25 09:40:25 UTC 2009 Responsible-Changed-Why: Back into the pool, in the hope it'll be picked up and committed before 8.0 http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 From: Bruce Cran To: bug-followup@FreeBSD.org, bruce@cran.org.uk Cc: Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when inspecting core files Date: Mon, 18 Jan 2010 12:45:16 +0000 The libkvm bug has now been fixed, but the patch for bin/ps hasn't been committed yet. -- Bruce State-Changed-From-To: analyzed->patched State-Changed-By: brucec State-Changed-When: Mon Feb 8 21:44:36 UTC 2010 State-Changed-Why: Fix has been checked in to -CURRENT. Responsible-Changed-From-To: freebsd-bugs->brucec Responsible-Changed-By: brucec Responsible-Changed-When: Mon Feb 8 21:44:36 UTC 2010 Responsible-Changed-Why: Take http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 State-Changed-From-To: patched->closed State-Changed-By: brucec State-Changed-When: Sun Feb 28 14:10:19 UTC 2010 State-Changed-Why: Fix has been merged to stable/7 and stable/8. http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 >Unformatted: