From nobody@FreeBSD.org Thu Sep 24 15:53:47 2009 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 47EF31065679 for ; Thu, 24 Sep 2009 15:53:47 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 378D88FC0A for ; Thu, 24 Sep 2009 15:53:47 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n8OFrkAo016110 for ; Thu, 24 Sep 2009 15:53:46 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n8OFrkS8016097; Thu, 24 Sep 2009 15:53:46 GMT (envelope-from nobody) Message-Id: <200909241553.n8OFrkS8016097@www.freebsd.org> Date: Thu, 24 Sep 2009 15:53:46 GMT From: David Horn To: freebsd-gnats-submit@FreeBSD.org Subject: [lagg] + wlan boot timing (EBUSY) X-Send-Pr-Version: www-3.1 X-GNATS-Notify: >Number: 139117 >Category: kern >Synopsis: [lagg] + wlan boot timing (EBUSY) >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-net >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Sep 24 16:00:11 UTC 2009 >Closed-Date: >Last-Modified: Sat Mar 20 01:00:13 UTC 2010 >Originator: David Horn >Release: 8.0 RC1 >Organization: >Environment: FreeBSD lagg 8.0-RC1 FreeBSD 8.0-RC1 #11 r197417: Wed Sep 23 01:05:15 EDT 2009 root@lagg:/usr/obj/usr/src/sys/GENERIC amd64 >Description: I have been trying to track down a problem with my lagg connection sometimes not properly enabling wlan as fallback on boot. It would work properly about 60% of the time. The other times, it would fail with SIOCSLAGGPORT: Device busy Here is the relevant rc.conf entries: ifconfig_bfe0="up" wlans_iwn0="wlan0" ifconfig_wlan0="WPA" ifconfig_iwn0="ether 00:1c:23:98:2c:5d" cloned_interfaces="lagg0" ipv6_network_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP" ipv6_enable="YES" So, I turned on some logging of all ifconfig commands with timestamps and stdout/stderr/returncode, and noticed this: Wed Sep 23 01:39:56 EDT 2009 ifconfig: lagg0 create ; ;; Wed Sep 23 01:39:56 EDT 2009 lagg0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: -l ; iwn0 bfe0 fwe0 fwip0 lo0 lagg0 ;; Wed Sep 23 01:39:56 EDT 2009 -l rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: -l ; iwn0 bfe0 fwe0 fwip0 lo0 lagg0 ;; Wed Sep 23 01:39:56 EDT 2009 -l rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: lo0 inet 127.0.0.1 ; ;; Wed Sep 23 01:39:56 EDT 2009 lo0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: lo0 up ; ;; Wed Sep 23 01:39:56 EDT 2009 lo0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: iwn0 ether 00:1c:23:98:2c:5d ; ;; Wed Sep 23 01:39:56 EDT 2009 iwn0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: iwn0 up ; ;; Wed Sep 23 01:39:56 EDT 2009 iwn0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: wlan0 create wlandev iwn0 ; ;; Wed Sep 23 01:39:56 EDT 2009 wlan0 rc='0' end. Wed Sep 23 01:39:56 EDT 2009 ifconfig: wlan0 ; wlan0: flags=8802 metric 0 mtu 1500 ether 00:1c:23:98:2c:5d media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier ssid "" channel 1 (2412 Mhz 11b) country US authmode OPEN privacy OFF txpower 14 bmiss 10 scanvalid 60 wme bintval 0 ;; Wed Sep 23 01:39:56 EDT 2009 wlan0 rc='0' end. Wed Sep 23 01:39:57 EDT 2009 ifconfig: lagg0 laggproto failover laggport bfe0 laggport wlan0 ; ifconfig.real: SIOCSLAGGPORT: Device busy ;; Wed Sep 23 01:39:57 EDT 2009 lagg0 rc='1' end. So, I started looking at the /sys/net/if_lagg.c source, and found the EBUSY response cases: This one /* New lagg port has to be in an idle state */ if (ifp->if_drv_flags & IFF_DRV_OACTIVE) return (EBUSY); seems to be the culprit, but unfortunately, I'm not familiar enough with the code to take this much further. I did build a kernel without this check, and everything seems to be fixed, but this is obviously not a real fix to the problem. So, I would say the fact that wpa_supplicant is talking to wlan0 (trying to scan/associate/auth) while lagg is trying to add wlan0 to the portlist is the timing issue. I confirmed this behavior as follows: ifconfig wlan0 destroy ifconfig lagg0 destroy ifconfig lagg0 create ifconfig wlan0 create wlandev iwn0 & ; ifconfig lagg0 laggproto failover laggport bfe0 laggport wlan0 results in: ifconfig: SIOCSLAGGPORT: Device busy Someone more clueful than me know of a correct way to fix this contention issue ? It is possible that the following reports are related: http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008776.html http://lists.freebsd.org/pipermail/freebsd-current/2009-June/007641.html >How-To-Repeat: After playing around with this issue, it is 100% reproducible at boot time (at least for me) when you create a lagg group with more than 1 wlan interface. e.g.: ifconfig_bfe0="up" wlans_iwn0="wlan0" wlans_ural0="wlan1" wlans_ath0="wlan2" ifconfig_wlan0="WPA" ifconfig_wlan1="WPA" ifconfig_wlan2="WPA" ifconfig_iwn0="ether 00:1c:23:98:2c:5d" ifconfig_ural0="ether 00:1c:23:98:2c:5d" ifconfig_ath0="ether 00:1c:23:98:2c:5d" cloned_interfaces="lagg0" ipv6_network_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 laggport wlan1 laggport wlan2 DHCP" ipv6_enable="YES" >Fix: >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Thu Sep 24 20:30:23 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=139117 From: David Horn To: bug-followup@freebsd.org Cc: Subject: Re: kern/139117: [lagg] + wlan boot timing (EBUSY) Date: Wed, 2 Dec 2009 20:58:06 -0500 Just for anyone else running into this problem, a "hack" workaround is to force the raw wireless interface (wlan0) and the raw wired interface (bfe0 in my case) to not accept IPv6 router advertisements, and just allow the lagg0 interface to accept them. This causes the timing window for OACTIVE (and thus the EBUSY response) to be MUCH smaller, and now I only very rarely see this issue. Example 8.0 rc.conf hack to use ndp to disable accepting IPv6 rtadv: ifconfig_bfe0="ether 00:1c:23:98:2c:5d `ndp -i bfe0 nud -accept_rtadv >/dev/null 2>&1`" ifconfig_wlan0="WPA `ndp -i wlan0 nud -accept_rtadv >/dev/null 2>&1`" ifconfig_iwn0="ether 00:1c:23:98:2c:5d" cloned_interfaces="lagg0" ipv6_network_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP" ipv6_enable="YES" Note, you will need to ignore any ndp error messages out of rc.conf at boot time, as this will cause spurious (but innocuous) console error messages about ndp. Example 9.0 rc.conf settings: ifconfig_iwn0="ether 00:1c:23:98:2c:5d" wlans_iwn0="wlan0" ifconfig_bfe0="UP" ifconfig_bfe0_ipv6="inet6 ifdisabled -nud -auto_linklocal" ifconfig_wlan0="WPA" ifconfig_wlan0_ipv6="inet6 ifdisabled -nud -auto_linklocal" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP" ifconfig_lagg0_ipv6="inet6 accept_rtadv" ipv6_prefer="YES" This was thanks to sam@ for pointing me away from wpa_supplicant and towards IPv6 as a likely culprit for causing the OACTIVE flag to be set on the interface causing the ifconfig lagg0 call to not add the wireless interface. Of course in 9.0 (-current as of the moment), there is the new facility to disable IPv6 router advertisements (and other ipv6 flags) via ifconfig so this hack is not needed there. The contention/timing issue still exists, but since users likely to NOT want the non-lagg interfaces to get IPv6 addresses from a router advertisement message anyway, the work around makes this problem less severe. The 9.0 example here is of course subject to change... ---Dave Horn From: David Horn To: bug-followup@FreeBSD.org, dhorn2000@gmail.com Cc: Subject: Re: kern/139117: [lagg] wlan boot timing (EBUSY) Date: Fri, 19 Mar 2010 20:59:19 -0400 --001485f7d62453ff1d048230f91d Content-Type: text/plain; charset=ISO-8859-1 Fixed by r204901 - delphij removal of OACTIVE check (also attached as a text diff) After testing this several different ways, this seems to work 100% for me. Please close as fixed after MFC. --Thanks! --Dave Horn --001485f7d62453ff1d048230f91d Content-Type: text/plain; charset=US-ASCII; name="lagg_oactive.diff.txt" Content-Disposition: attachment; filename="lagg_oactive.diff.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_g6zpiz4i0 SW5kZXg6IGlmX2xhZ2cuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBpZl9sYWdnLmMJKHJldmlzaW9uIDIwNDg3 NykKKysrIGlmX2xhZ2cuYwkod29ya2luZyBjb3B5KQpAQCAtNDg0LDEwICs0ODQsNiBAQAogCWlm IChzYy0+c2NfY291bnQgPj0gTEFHR19NQVhfUE9SVFMpCiAJCXJldHVybiAoRU5PU1BDKTsKIAot CS8qIE5ldyBsYWdnIHBvcnQgaGFzIHRvIGJlIGluIGFuIGlkbGUgc3RhdGUgKi8KLQlpZiAoaWZw LT5pZl9kcnZfZmxhZ3MgJiBJRkZfRFJWX09BQ1RJVkUpCi0JCXJldHVybiAoRUJVU1kpOwotCiAJ LyogQ2hlY2sgaWYgcG9ydCBoYXMgYWxyZWFkeSBiZWVuIGFzc29jaWF0ZWQgdG8gYSBsYWdnICov CiAJaWYgKGlmcC0+aWZfbGFnZyAhPSBOVUxMKQogCQlyZXR1cm4gKEVCVVNZKTsK --001485f7d62453ff1d048230f91d-- >Unformatted: