From nobody@FreeBSD.org Fri Jan 30 18:54:26 2004 Return-Path: Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B415816A4CE for ; Fri, 30 Jan 2004 18:54:26 -0800 (PST) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id C315A43D1D for ; Fri, 30 Jan 2004 18:54:25 -0800 (PST) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.12.10/8.12.10) with ESMTP id i0V2sPdL038923 for ; Fri, 30 Jan 2004 18:54:25 -0800 (PST) (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.12.10/8.12.10/Submit) id i0V2sPkR038921; Fri, 30 Jan 2004 18:54:25 -0800 (PST) (envelope-from nobody) Message-Id: <200401310254.i0V2sPkR038921@www.freebsd.org> Date: Fri, 30 Jan 2004 18:54:25 -0800 (PST) From: Rostislav Krasny To: freebsd-gnats-submit@FreeBSD.org Cc: rosti.bsd@gmail.com Subject: User cannot login through telnet or ssh because of reverse resolving delay X-Send-Pr-Version: www-2.0 >Number: 62139 >Category: bin >Synopsis: User cannot login through telnet or ssh because of reverse resolving delay >Confidential: no >Severity: non-critical >Priority: medium >Responsible: yar >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Jan 30 19:00:37 PST 2004 >Closed-Date: Mon Mar 13 11:42:44 GMT 2006 >Last-Modified: Sun Mar 19 18:25:28 GMT 2006 >Originator: Rostislav Krasny >Release: 5.2-RELEASE and 5.2-CURRENT >Organization: >Environment: FreeBSD localhost 5.2-CURRENT FreeBSD 5.2-CURRENT #0: Thu Jan 29 13:03:29 IST 2004 root@localhost:/usr/obj/usr/src/sys/GENERIC i386 >Description: When a user tries to login to the system remoutly (by telnet or ssh) the system is trying to reverse resolve its IP address. Because the system does this resolving synchronously the login process delays. When recorded in the /etc/resolv.conf address of single DNS server is unreachable the delay will be very long and it will produce login timeout. It can made this system inaccessible for remote administration through telnet and ssh. >How-To-Repeat: To reproduce this problem write non-existent IP address of your subnet as the address of single DNS server in /etc/resolv.conf file. Then try to login to this system remoutly from somewhere. There should be no previously openned connections from the second system. >Fix: To fix this problem you can either disable the reverse resolving or do it asynchronously. >Release-Note: >Audit-Trail: State-Changed-From-To: open->feedback State-Changed-By: yar State-Changed-When: Mon Aug 30 13:19:01 GMT 2004 State-Changed-Why: To my mind, this is a host configuration issue. First, you may list multiple nameservers in your resolv.conf so that should one of them fail, the others will still respond to queries. Second, the resolver timeout and attempts may be set to a lower value (see resolver(5) for details) if your network can suffer from all its nameservers being unavailable. Please also note that some ways of ssh authentication may rely on a name service being available. http://www.freebsd.org/cgi/query-pr.cgi?pr=62139 From: Yar Tikhiy To: Rostislav Krasny Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: bin/62139: User cannot login through telnet or ssh because of reverse resolving delay Date: Tue, 7 Sep 2004 19:52:29 +0400 On Wed, Sep 01, 2004 at 09:25:25AM -0700, Rostislav Krasny wrote: > > > To my mind, this is a host configuration issue. First, you > > may list multiple nameservers in your resolv.conf so that > > should one of them fail, the others will still respond to > > queries. Second, the resolver timeout and attempts may be > > set to a lower value (see resolver(5) for details) if your > > network can suffer from all its nameservers being unavailable. > > Please also note that some ways of ssh authentication may > > rely on a name service being available. > > I think that resolver(3) is buggy. Consider the tests described below, > that I've done. > > > uname -a > FreeBSD localhost 5.3-BETA2 FreeBSD 5.3-BETA2 #1: Sat Aug 28 21:29:15 > UTC 2004 root@mack.dcsl.buffalo.edu:/usr/obj/usr/src/sys/GENERIC > i386 > > > I changed the /etc/resolv.conf file, so it had only one following line: > > nameserver 21.21.21.21 > > Then I ran a 'date ; ping yahoo.com ; date' one line command four > times. This way I measured the time between 'ping yahoo.com' started > and failed. The results are: > > N mm:ss > 1 2:30 > 2 2:31 > 3 2:31 > 4 2:30 > > Why it taked so long time with default "options" settings? It's because the resolver library tries hard to perform its job reliably. You may like to read RFC 1536 for the discussion of the algorithm used. Also note that resolver will try to add the local domain name as found in the hostname if it is unable to resolve a name as-is. > According to man sshd_config: > > LoginGraceTime > The server disconnects after this time if the user has not suc- > cessfully logged in. If the value is 0, there is no time > limit. > The default is 120 seconds. > > So this is not surprising why my attempts connecting to this box from > another one by ssh failed with following sshd error: > > Aug 31 00:18:06 localhost sshd[1443]: fatal: Timeout before > authentication for 192.168.1.1 > > Workaround of this problem was seting 'UseDNS no' in > /etc/ssh/sshd_config file. But I still don't know what the workaround > of the same problem with ftpd (enabled in /etc/inetd.conf). > > > Then I ran 'tcpdump -nvi ed1' in a second pseudo-terminal and counted a > number of "A? yahoo.com" requests during a run of the above 'ping > yahoo.com'. With default "options" settings my box is sending 8 "A? > yahoo.com" requests to one DNS before 'ping yahoo.com' is failed. Why > there are so many requests to one non-working DNS? Since you never know in advance how many times the program you are trying, e.g., ping, calls the resolver functions. > Finally I add a custom "options" settings line in /etc/resolv.conf > file: > > options attempts:1 > > With this option my box is sending 2 "A? yahoo.com" requests. With > 'attempts:2' it sends 4 requests, with 'attempts:3' it sends 6 > requests, with 'attempts:5' it sends 10 requests... and so on. Why the > numbers of actual requests are double of the defined numbers? It means that ping seems to call the resolver twice each time. > What is the default value of the 'attempts' option? The resolver(5) man > page states that the default value is defined by RES_DFLRETRY in > . But there is no RES_DFLRETRY in /usr/include/resolv.h file. > In other systems the RES_DFLRETRY is defined as 2. RES_MAXRETRY. 5. The man page seems to give a wrong name there. I'll fix it later. > IMHO the default value of the 'attempts' option should be 2 and it must > not be doubled. With the default value of 'timeout' option (5 seconds) > it should take no more than 10 seconds to decide that one DNS is > unreachable or not. You are misinterpreting the `timeout' option. See RFC 1536 or the code. And `attempts' is not doubled, that is a consequence of the application behaviour. I feel that losing all DNS servers is just slightly better than losing the network connection at all. Therefore console access to such machine is the answer. Trying to overcome that in software is too risky, at least for the default configuration. I'd rather close this PR. -- Yar From: Rostislav Krasny To: Yar Tikhiy Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: bin/62139: User cannot login through telnet or ssh because of reverse resolving delay Date: Thu, 16 Sep 2004 13:51:56 -0700 (PDT) --- Yar Tikhiy wrote: > On Wed, Sep 01, 2004 at 09:25:25AM -0700, Rostislav Krasny wrote: > > > > > To my mind, this is a host configuration issue. First, you > > > may list multiple nameservers in your resolv.conf so that > > > should one of them fail, the others will still respond to > > > queries. Second, the resolver timeout and attempts may be > > > set to a lower value (see resolver(5) for details) if your > > > network can suffer from all its nameservers being unavailable. > > > Please also note that some ways of ssh authentication may > > > rely on a name service being available. > > > > I think that resolver(3) is buggy. Consider the tests described > > below, that I've done. > > > > > uname -a > > FreeBSD localhost 5.3-BETA2 FreeBSD 5.3-BETA2 #1: Sat Aug 28 > > 21:29:15 UTC 2004 > > root@mack.dcsl.buffalo.edu:/usr/obj/usr/src/sys/GENERIC > > i386 > > > > > > I changed the /etc/resolv.conf file, so it had only one following > > line: > > > > nameserver 21.21.21.21 > > > > Then I ran a 'date ; ping yahoo.com ; date' one line command four > > times. This way I measured the time between 'ping yahoo.com' > > started and failed. The results are: > > > > N mm:ss > > 1 2:30 > > 2 2:31 > > 3 2:31 > > 4 2:30 > > > > Why it taked so long time with default "options" settings? > > It's because the resolver library tries hard to perform its job > reliably. You may like to read RFC 1536 for the discussion of the > algorithm used. Also note that resolver will try to add the local > domain name as found in the hostname if it is unable to resolve a > name as-is. This box have no domain name. Its hostname is 'localhost'. Also no domain and no search list are defined in the /etc/resolv.conf file. > > According to man sshd_config: > > > > LoginGraceTime > > The server disconnects after this time if the user has not > > suc- > > cessfully logged in. If the value is 0, there is no time > > limit. > > The default is 120 seconds. > > > > So this is not surprising why my attempts connecting to this box > > from another one by ssh failed with following sshd error: > > > > Aug 31 00:18:06 localhost sshd[1443]: fatal: Timeout before > > authentication for 192.168.1.1 > > > > Workaround of this problem was seting 'UseDNS no' in > > /etc/ssh/sshd_config file. But I still don't know what the > > workaround of the same problem with ftpd (enabled > > in /etc/inetd.conf). > > > > > > Then I ran 'tcpdump -nvi ed1' in a second pseudo-terminal and > > counted a number of "A? yahoo.com" requests during a run of the > > above 'ping yahoo.com'. With default "options" settings my box is > > sending 8 "A? yahoo.com" requests to one DNS before 'ping yahoo.com' > > is failed. Why there are so many requests to one non-working DNS? > > Since you never know in advance how many times the program you are > trying, e.g., ping, calls the resolver functions. Did you mean functions like gethostbyname(3)? I've wrote a short test program and used it instead of ping: ======= test program ======== #include #include #include #include #include #include int main(void) { const char *name="yahoo.com"; struct hostent *ps_hostent; char **st; ps_hostent=gethostbyname(name); if (ps_hostent!=NULL) { printf("%s\n", ps_hostent->h_name); for (st=ps_hostent->h_addr_list; *st!=NULL; st++) { printf("%s\n", inet_ntoa(*(struct in_addr *)*st)); } if (st==ps_hostent->h_addr_list) fputs("It have no address.\n", stderr); } else { herror(name); } return 0; } ======= test program ======== When I repeated the tests with this program I've got the same number of "A? yahoo.com." requests to one DNS and the same periods of time until the program was running. > > Finally I add a custom "options" settings line in /etc/resolv.conf > > file: > > > > options attempts:1 > > > > With this option my box is sending 2 "A? yahoo.com" requests. With > > 'attempts:2' it sends 4 requests, with 'attempts:3' it sends 6 > > requests, with 'attempts:5' it sends 10 requests... and so on. Why > > the numbers of actual requests are double of the defined numbers? > > It means that ping seems to call the resolver twice each time. In my test program a gethostbyname(3) function is called only once. > > What is the default value of the 'attempts' option? The resolver(5) > > man page states that the default value is defined by RES_DFLRETRY in > > . But there is no RES_DFLRETRY in /usr/include/resolv.h > > file. In other systems the RES_DFLRETRY is defined as 2. > > RES_MAXRETRY. 5. The man page seems to give a wrong name there. > I'll fix it later. Thank you for the fixing. I've seen your commits: http://docs.freebsd.org/cgi/mid.cgi?200409091739.i89HdlwM019548 http://docs.freebsd.org/cgi/mid.cgi?200409091742.i89HgIan019681 http://docs.freebsd.org/cgi/mid.cgi?200409091719.i89HJRGu019026 According to them the default value of the 'attempts' option was and still is 4 and RES_DFLRETRY is the right name. But most of UNIX and UNIX-like operating systems that I checked have RES_DFLRETRY defined as 2, not as 4. They are: Solaris, AIX, Linux and even NetBSD. Only OpenBSD have it hardcoded as 4. > > IMHO the default value of the 'attempts' option should be 2 and it > > must not be doubled. With the default value of 'timeout' option > > (5 seconds) it should take no more than 10 seconds to decide that > > one DNS is unreachable or not. > > You are misinterpreting the `timeout' option. See RFC 1536 or the > code. And `attempts' is not doubled, that is a consequence of the > application behaviour. Maybe I was wrong with the `timeout' option but I think I was right with the `attempts' one. > I feel that losing all DNS servers is just slightly better > than losing the network connection at all. Therefore console > access to such machine is the answer. Trying to overcome that > in software is too risky, at least for the default configuration. > I'd rather close this PR. The point is that the default configuration of resolver(5) in FreeBSD is different from most of other Unices and even NetBSD. Why it is different? Also the double number of DNS requests is not clear for me yet. _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From: Yar Tikhiy To: Rostislav Krasny Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: bin/62139: User cannot login through telnet or ssh because of reverse resolving delay Date: Thu, 30 Sep 2004 13:50:37 +0400 On Thu, Sep 16, 2004 at 01:51:56PM -0700, Rostislav Krasny wrote: > > > > Finally I add a custom "options" settings line in /etc/resolv.conf > > > file: > > > > > > options attempts:1 > > > > > > With this option my box is sending 2 "A? yahoo.com" requests. With > > > 'attempts:2' it sends 4 requests, with 'attempts:3' it sends 6 > > > requests, with 'attempts:5' it sends 10 requests... and so on. Why > > > the numbers of actual requests are double of the defined numbers? > > > > It means that ping seems to call the resolver twice each time. > > In my test program a gethostbyname(3) function is called only once. I suspect that gethostbyname(3) may call resolver more than once. gethostbyname(3) is a "multiplexor" for many name resolution interfaces, e.g., DNS, hosts(5), NIS, etc. When it does its job it has to canonize the name etc. This may lead to more than 1 call to underlying mechanisms, e.g., the DNS resolver library. > > > What is the default value of the 'attempts' option? The resolver(5) > > > man page states that the default value is defined by RES_DFLRETRY > in > > > . But there is no RES_DFLRETRY in /usr/include/resolv.h > > > file. In other systems the RES_DFLRETRY is defined as 2. > > > > RES_MAXRETRY. 5. The man page seems to give a wrong name there. > > I'll fix it later. > > Thank you for the fixing. I've seen your commits: > > http://docs.freebsd.org/cgi/mid.cgi?200409091739.i89HdlwM019548 > http://docs.freebsd.org/cgi/mid.cgi?200409091742.i89HgIan019681 > http://docs.freebsd.org/cgi/mid.cgi?200409091719.i89HJRGu019026 > > According to them the default value of the 'attempts' option was and > still is 4 and RES_DFLRETRY is the right name. But most of UNIX and > UNIX-like operating systems that I checked have RES_DFLRETRY defined as > 2, not as 4. They are: Solaris, AIX, Linux and even NetBSD. Only > OpenBSD have it hardcoded as 4. > > > > IMHO the default value of the 'attempts' option should be 2 and it > > > must not be doubled. With the default value of 'timeout' option > > > (5 seconds) it should take no more than 10 seconds to decide that > > > one DNS is unreachable or not. > > > > You are misinterpreting the `timeout' option. See RFC 1536 or the > > code. And `attempts' is not doubled, that is a consequence of the > > application behaviour. > > Maybe I was wrong with the `timeout' option but I think I was right > with the `attempts' one. > > > I feel that losing all DNS servers is just slightly better > > than losing the network connection at all. Therefore console > > access to such machine is the answer. Trying to overcome that > > in software is too risky, at least for the default configuration. > > I'd rather close this PR. > > The point is that the default configuration of resolver(5) in FreeBSD > is different from most of other Unices and even NetBSD. Why it is > different? Also the double number of DNS requests is not clear for me yet. If you believe the default configuration should be adjusted, please feel free to conduct a discussion on a FreeBSD mailing list, e.g., freebsd-net or freebsd-hackers. Personally I don't feel like touching the default configuration, but even if I did, our two votes wouldn't be enough. -- Yar State-Changed-From-To: feedback->open State-Changed-By: yar State-Changed-When: Tue Feb 21 17:02:34 UTC 2006 State-Changed-Why: I had an opportunity to reconsider this issue after I had had bitten by it. http://www.freebsd.org/cgi/query-pr.cgi?pr=62139 State-Changed-From-To: open->closed State-Changed-By: yar State-Changed-When: Mon Mar 13 11:37:35 UTC 2006 State-Changed-Why: This problem has been fixed in HEAD and RELENG_6 by making sure that the resolver will make exactly the specified number of attempts and by reducing the default number of attempts to 2 per nameserver in accordance with common practice in other friendly OSen. Now sshd, telnetd, or ftpd won't time out even if all 3 nameservers (the possible maximum) are down. Responsible-Changed-From-To: freebsd-bugs->yar Responsible-Changed-By: yar Responsible-Changed-When: Mon Mar 13 11:37:35 UTC 2006 Responsible-Changed-Why: So I can see feedback. http://www.freebsd.org/cgi/query-pr.cgi?pr=62139 >Unformatted: