From Tor.Egge@broadpark.no Tue Jun 3 02:02:47 2008 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7661F1065686 for ; Tue, 3 Jun 2008 02:02:47 +0000 (UTC) (envelope-from Tor.Egge@broadpark.no) Received: from osl1smout1.broadpark.no (osl1smout1.broadpark.no [80.202.4.58]) by mx1.freebsd.org (Postfix) with ESMTP id 308DC8FC33 for ; Tue, 3 Jun 2008 02:02:47 +0000 (UTC) (envelope-from Tor.Egge@broadpark.no) Received: from osl1sminn1.broadpark.no ([80.202.4.59]) by osl1smout1.broadpark.no (Sun Java(tm) System Messaging Server 6.3-3.01 (built Jul 12 2007; 32bit)) with ESMTP id <0K1V00ETS48LORC0@osl1smout1.broadpark.no> for FreeBSD-gnats-submit@freebsd.org; Tue, 03 Jun 2008 03:02:45 +0200 (CEST) Received: from tegge-laptop.trondheim.corp.yahoo.com ([84.48.203.244]) by osl1sminn1.broadpark.no (Sun Java(tm) System Messaging Server 6.3-3.01 (built Jul 12 2007; 32bit)) with ESMTP id <0K1V00FAI48KFKT3@osl1sminn1.broadpark.no> for FreeBSD-gnats-submit@freebsd.org; Tue, 03 Jun 2008 03:02:45 +0200 (CEST) Received: from tegge-laptop.trondheim.corp.yahoo.com (localhost [127.0.0.1]) by tegge-laptop.trondheim.corp.yahoo.com (8.14.2/8.14.2) with ESMTP id m5312hlG001802 for ; Tue, 03 Jun 2008 03:02:43 +0200 Received: (from tegge@localhost) by tegge-laptop.trondheim.corp.yahoo.com (8.14.2/8.14.2/Submit) id m5312g4O001801; Tue, 03 Jun 2008 03:02:42 +0200 (CEST envelope-from tegge) Message-Id: <200806030102.m5312g4O001801@tegge-laptop.trondheim.corp.yahoo.com> Date: Tue, 03 Jun 2008 03:02:42 +0200 (CEST) From: Tor Egge Reply-To: Tor Egge To: FreeBSD-gnats-submit@freebsd.org Cc: Subject: ndis network driver sometimes loses network connection X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 124225 >Category: kern >Synopsis: [ndis] [patch] ndis network driver sometimes loses network connection >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-net >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jun 03 02:10:02 UTC 2008 >Closed-Date: >Last-Modified: Sat Mar 20 02:38:26 UTC 2010 >Originator: Tor Egge >Release: FreeBSD 8.0-CURRENT i386 >Organization: >Environment: System: FreeBSD tegge-laptop.trondheim.corp.yahoo.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sun Jun 1 00:42:26 CEST 2008 root@tegge-laptop.trondheim.corp.yahoo.com:/usr/src/sys/i386/compile/TEGGE_LAPTOP i386 >Description: Normally, when packets are queued to the ndis network interface, ndis_start() is called to move packets from the interface send queue to the underlying NDIS driver. If the network link is down or the underlying driver is busy transmitting data, ndis_start() just returns. When the link goes up, ndis_starttask() is supposed to be called after ndis_ticktask() in order to transmit already queued packets. After a watchdog timeout, ndis_starttask() is likewise supposed to be called after ndis_resettask(). Unfortunately, work items used for triggering calls to ndis_ticktask(), ndis_starttask() and ndis_resettask() are placed on separarate task lists which are handled by separate kernel processes, thus losing ordering information about when the tasks should be performed in relation to each other. If the interface send queue is full after a watchdog timeout or link up event and the tasks were handled in the wrong order then further attempts to send packets via the interface results in ENOBUFS ("No buffer space available"). >How-To-Repeat: Use the ndis driver for a wireless network card in an area with many APs on nearby channels and on a machine with many active tcp connections, causing link to temporarily go down every few hours, and the interface send queue to be filled while the link is temporarily down. >Fix: A proper fix is to ensure that related tasks are handled in the correct order. The following kludge justs add extra attempts at scheduling calls to ndis_starttask() as part of the processing of ndis_ticktask() and ndis_resettask(). It depends on defensive coding in IoQueueWorkItem(), i.e. that nothing is done if the work item is already queued. Index: sys/dev/if_ndis/if_ndis.c =================================================================== RCS file: /home/ncvs/src/sys/dev/if_ndis/if_ndis.c,v retrieving revision 1.140 diff -u -r1.140 if_ndis.c --- sys/dev/if_ndis/if_ndis.c 30 May 2008 07:17:51 -0000 1.140 +++ sys/dev/if_ndis/if_ndis.c 31 May 2008 21:24:14 -0000 @@ -1617,6 +1617,7 @@ IoQueueWorkItem(sc->ndis_tickitem, (io_workitem_func)ndis_ticktask_wrap, WORKQUEUE_CRITICAL, sc); + /* XXX: startitem might be handled before tickitem */ IoQueueWorkItem(sc->ndis_startitem, (io_workitem_func)ndis_starttask_wrap, WORKQUEUE_CRITICAL, ifp); @@ -1699,6 +1700,11 @@ } NDIS_LOCK(sc); if_link_state_change(sc->ifp, LINK_STATE_UP); + /* XXX: Start kludge */ + IoQueueWorkItem(sc->ndis_startitem, + (io_workitem_func)ndis_starttask_wrap, + WORKQUEUE_CRITICAL, sc->ifp); + /* XXX: End kludge */ } if (sc->ndis_link == 1 && @@ -3112,6 +3118,11 @@ sc = arg; ndis_reset_nic(sc); + /* XXX: Start kludge */ + IoQueueWorkItem(sc->ndis_startitem, + (io_workitem_func)ndis_starttask_wrap, + WORKQUEUE_CRITICAL, sc->ifp); + /* XXX: End kludge */ return; } @@ -3131,6 +3142,7 @@ IoQueueWorkItem(sc->ndis_resetitem, (io_workitem_func)ndis_resettask_wrap, WORKQUEUE_CRITICAL, sc); + /* XXX: startitem might be handled before resetitem */ IoQueueWorkItem(sc->ndis_startitem, (io_workitem_func)ndis_starttask_wrap, WORKQUEUE_CRITICAL, ifp); >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Tue Jun 3 02:46:09 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 Responsible-Changed-From-To: freebsd-net->cokane Responsible-Changed-By: cokane Responsible-Changed-When: Wed Jul 2 14:56:51 UTC 2008 Responsible-Changed-Why: PR refers to a recent commit of changes that I made, I will look into solving this problem in my development branch. http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 Responsible-Changed-From-To: cokane->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Sat Mar 20 02:37:53 UTC 2010 Responsible-Changed-Why: returned to the pool by request (some time ago.) http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 >Unformatted: I was the last one with my hand in this jar. I'll look into it and see what I can do.