From stesin@beast.gu.net Wed Nov 27 08:01:34 1996 Received: from beast.gu.net ([194.93.190.7]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id IAA26493 for ; Wed, 27 Nov 1996 08:01:21 -0800 (PST) Received: (from root@localhost) by beast.gu.net (8.7.5/8.7.3) id SAA00869; Wed, 27 Nov 1996 18:00:26 +0200 (EET) Message-Id: <199611271600.SAA00869@beast.gu.net> Date: Wed, 27 Nov 1996 18:00:26 +0200 (EET) From: stesin@gu.net Reply-To: stesin@gu.net To: FreeBSD-gnats-submit@freebsd.org Subject: Report on "gated+OSPF" crashes with June 2.2-SNAP X-Send-Pr-Version: 3.2 >Number: 2113 >Category: kern >Synopsis: 2-ether router crashes almost immediately after Gated starts with OSPF >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Nov 27 08:10:02 PST 1996 >Closed-Date: Sat Nov 30 16:18:57 PST 1996 >Last-Modified: Sat Nov 30 16:19:23 PST 1996 >Originator: Andrew Stesin >Release: FreeBSD 2.2-960612-SNAP i386 >Organization: GU.net >Environment: Generic Amd5x86 133 PC, two ethernets: ep0 and ed0. Pretty complex network topology around, OSPF IGP used. >Description: Machine crashes almost immediately after Gated starts with OSPF. Though under some circumstances (under uncommonly low network load) 2 or 3 times it was up for 10-20 minutes with OSPF kinda-of-working. Problem showed up both with Gated 3.6a2 and 3.5b3. Machine now stands in production network (a single FreeBSD among AIXes, Solarises, linuces, bsdis, ciscos) and routing is a bit crazy here now because it's the single box which demands to run RIPv2. Though with RIPv2 (note: it also uses multicasts!) it's stable. I neither want to kill freebsd on this box, nor I am able to play with it, upgrade, take down, reboot often and so on. Upgrade to 2.2-BETA is being considered. I built a '-g' kernel and rebooted the box today, and provoced a single crash, dump is available as like as nm /kernel | sort output. kgdb -k kernel.notstrip vmcore.0 | tee OUT.kgdb OUT.kgdb follows: GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.13 (i386-unknown-freebsd), Copyright 1994 Free Software Foundation, Inc... IdlePTD 1f7000 current pcb at 1e2420 panic: page fault #0 boot (howto=256) at ../../i386/i386/machdep.c:940 940 dumppcb.pcb_cr3 = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../i386/i386/machdep.c:940 #1 0xf01161f6 in panic (fmt=0xf01af30c "page fault") at ../../kern/subr_prf.c:127 #2 0xf01afe66 in trap_fatal (frame=0xefbffd38) at ../../i386/i386/trap.c:737 #3 0xf01af958 in trap_pfault (frame=0xefbffd38, usermode=0) at ../../i386/i386/trap.c:648 #4 0xf01af63b in trap (frame={tf_es = -249036784, tf_ds = 16, tf_edi = -272630324, tf_esi = -249912736, tf_ebp = -272630388, tf_isp = -272630432, tf_ebx = -248947456, tf_edx = 0, tf_ecx = -253083904, tf_eax = -249063756, tf_trapno = 12, tf_err = -253100032, tf_eip = -267075922, tf_cs = -267124728, tf_eflags = 66182, tf_esp = -249063756, tf_ss = -266455048}) at ../../i386/i386/trap.c:319 #5 0xf01a7501 in calltrap () #6 0xf01410fe in ether_output (ifp=0xf01e3754, m0=0xf0ea3f00, dst=0xf12796b0, rt0=0x0) at ../../net/if_ethersubr.c:161 #7 0xf01511df in ip_output (m0=0xf0ea3f00, opt=0x0, ro=0xf12796ac, flags=48, imo=0xf1282d80) at ../../netinet/ip_output.c:354 #8 0xf0152614 in rip_output (m=0xf0ea3f00, so=0xf1293600, dst=96361922) at ../../netinet/raw_ip.c:191 #9 0xf0152a1f in rip_usrreq (so=0xf1293600, req=9, m=0xf0ea3f00, nam=0xf0ea3e80, control=0x0) at ../../netinet/raw_ip.c:415 #10 0xf0125126 in sosend (so=0xf1293600, addr=0xf0ea3e80, uio=0xefbffee8, top=0xf0ea3f00, control=0x0, flags=4) at ../../kern/uipc_socket.c:475 #11 0xf01277f3 in sendit (p=0xf128d600, s=11, mp=0xefbfff2c, flags=4, retsize=0xefbfff84) at ../../kern/uipc_syscalls.c:467 #12 0xf01278d0 in sendto (p=0xf128d600, uap=0xefbfff94, retval=0xefbfff84) at ../../kern/uipc_syscalls.c:518 #13 0xf01b0111 in syscall (frame={tf_es = 135462951, tf_ds = -272695257, tf_edi = 0, tf_esi = 5, tf_ebp = -272641036, tf_isp = -272629788, tf_ebx = 536870912, tf_edx = 0, tf_ecx = 887256, tf_eax = 133, tf_trapno = 7, tf_err = 7, tf_eip = 135328769, tf_cs = 31, tf_eflags = 662, tf_esp = -272641096, tf_ss = 39}) at ../../i386/i386/trap.c:887 #14 0xf01a7555 in Xsyscall () #15 0x7510a in ?? () #16 0x8ce3f in ?? () #17 0x88f85 in ?? () #18 0x73101 in ?? () #19 0x8a164 in ?? () #20 0x8444b in ?? () #21 0x84fff in ?? () #22 0x749c4 in ?? () #23 0x2dce0 in ?? () #24 0x3317d in ?? () #25 0x1095 in ?? () (kgdb) frame 6 #6 0xf01410fe in ether_output (ifp=0xf01e3754, m0=0xf0ea3f00, dst=0xf12796b0, rt0=0x0) at ../../net/if_ethersubr.c:161 161 if (!arpresolve(ac, rt, m, dst, edst, rt0)) (kgdb) list 156 } 157 switch (dst->sa_family) { 158 159 #ifdef INET 160 case AF_INET: 161 if (!arpresolve(ac, rt, m, dst, edst, rt0)) 162 return (0); /* if not yet resolved */ 163 /* If broadcasting on a simplex interface, loopback a copy */ 164 if ((m->m_flags & M_BCAST) && (ifp->if_flags & IFF_SIMPLEX)) 165 mcopy = m_copy(m, 0, (int)M_COPYALL); (kgdb) 166 off = m->m_pkthdr.len - m->m_len; 167 type = ETHERTYPE_IP; 168 break; 169 #endif 170 #ifdef IPX 171 case AF_IPX: 172 type = ETHERTYPE_IPX; 173 bcopy((caddr_t)&(((struct sockaddr_ipx *)dst)->sipx_addr.x_host), 174 (caddr_t)edst, sizeof (edst)); 175 if (!bcmp((caddr_t)edst, (caddr_t)&ipx_thishost, sizeof(edst))) (kgdb) print ifp $1 = (struct ifnet *) 0xefbffdcc (kgdb) print *ifp $2 = {if_softc = 0xf11ce25d, if_name = 0xf12796b2 "", if_next = 0xefbffe24, if_addrlist = 0xf01511df, if_pcount = -266455212, if_bpf = 0xf0ea3f00, if_index = 38576, if_unit = -3801, if_timer = 0, if_flags = 0, if_recvquota = 128 '\200', if_sendquota = 150 '\226', if_ipending = 39 '\'', if_data = {ifi_type = 48 '0', ifi_physical = 0 '\000', ifi_addrlen = 0 '\000', ifi_hdrlen = 0 '\000', ifi_mtu = 96361922, ifi_metric = 4078769492, ifi_baudrate = 4022337320, ifi_ipackets = 4045058048, ifi_ierrors = 4045903536, ifi_opackets = 0, ifi_oerrors = 4045987072, ifi_collisions = 20, ifi_ibytes = 4028512084, ifi_obytes = 4041883392, ifi_imcasts = 4041883468, ifi_omcasts = 4022337208, ifi_iqdrops = 4022337100, ifi_noproto = 4027917844, ifi_lastchange = {tv_sec = -253083904, tv_usec = 0}}, if_output = 0xf12796ac , if_start = 0x30, if_done = 0xf1282d80 , if_ioctl = 0, if_watchdog = 0xf1293600 , if_poll_recv = 0xf1279680 , if_poll_xmit = 0xefbffe6c, if_poll_intren = 0xf0152a1f , if_poll_slowinput = 0xf0ea3f00 , if_snd = { ifq_head = 0xf1293600, ifq_tail = 0x5be5dc2, ifq_len = -2147483648, ifq_maxlen = 32, ifq_drops = -248957440}, if_poll_slowq = 0xefbffeac} (kgdb) print *m0 $3 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xf0ea3f4c "E", mh_len = 52, mh_type = 1, mh_flags = 2}, M_dat = {MH = {MH_pkthdr = { rcvif = 0x0, len = 52}, MH_dat = {MH_ext = {ext_buf = 0x170b9 "o\001", ext_free = 0x500005e, ext_size = 3434029056}, MH_databuf = "¹p\001\000^\000\000\005\000 ¯Ì}z\b\000E\0004\000M|\000\000\001Y\000\000Â]¾\005à\000\000\005\002\001\0004Â]¾\005Â]¾\000E\000\0004\024h\000\000\001Y¤AÂ]¾\aÂ]¾\005\002\002\000 Â]¾\aÂ]¾\000û\a", '\000' , "\002\a\000\000\000\013"}}, M_databuf = "\000\000\000\0004\000\000\000¹p\001\000^\000\000\005\000 ¯Ì}z\b\000E\0004\000M|\000\000\001Y\000\000Â]¾\005à\000\000\005\002\001\0004Â]¾\005Â]¾\000E\000\0004\024h\000\000\001Y¤AÂ]¾\aÂ]¾\005\002\002\000 Â]¾\aÂ]¾\000û\a", '\000' , "\002\a\000\000\000\013"}} (kgdb) print *dst $4 = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = "\000\000Â]¾\005\000\000\000\000\000\000\000"} (kgdb) print arpresolve $5 = {int ()} 0xf014bd4c (kgdb) $7 = {int ()} 0xf014bd4c (kgdb) q >How-To-Repeat: Just do "gdc stop; gated -f gated.conf.ospf" on the box :-) >Fix: I'm not a kernel guru :-((( >Release-Note: >Audit-Trail: From: Bill Fenner To: freebsd-gnats-submit@freebsd.org Cc: Subject: Re: kern/2113: 2-ether router crashes almost immediately after Gated starts with OSPF Date: Wed, 27 Nov 1996 12:28:55 PST Andrew, Have you updated your kernel since the 960612-SNAP? Could you try updating your sources with cvsup and building a -current kernel, or failing that, applying the patch at http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c?r1=1.31&r2=1.32 to your /sys/netinet/if_ether.c? Assuming that the traceback that you posted was slightly wrong, this looks like a bug that was fixed right after the 960612-SNAP. (if you want to verify, go back to that dump, go to frame 4, say "frame frame->tf_ebp frame->tf_eip", and then do a "where", the dump would be in arpresolve(), trying to deference rt0). Bill State-Changed-From-To: open->closed State-Changed-By: fenner State-Changed-When: Sat Nov 30 16:18:57 PST 1996 State-Changed-Why: Updated kernel fixed the problem. >Unformatted: