From nobody@FreeBSD.org Sun Jan 2 10:33:34 2011 Return-Path: Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63298106566B for ; Sun, 2 Jan 2011 10:33:34 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 522968FC0A for ; Sun, 2 Jan 2011 10:33:34 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p02AXYno070847 for ; Sun, 2 Jan 2011 10:33:34 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id p02AXYYJ070846; Sun, 2 Jan 2011 10:33:34 GMT (envelope-from nobody) Message-Id: <201101021033.p02AXYYJ070846@red.freebsd.org> Date: Sun, 2 Jan 2011 10:33:34 GMT From: Greg Holmberg To: freebsd-gnats-submit@FreeBSD.org Subject: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) X-Send-Pr-Version: www-3.1 X-GNATS-Notify: >Number: 153620 >Category: kern >Synopsis: [xen] Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-xen >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Jan 02 10:40:13 UTC 2011 >Closed-Date: >Last-Modified: Sat Apr 2 20:40:10 UTC 2011 >Originator: Greg Holmberg >Release: 9.0-CURRENT i386 >Organization: >Environment: FreeBSD domU-12-31-39-13-00-E9 9.0-CURRENT FreeBSD 9.0-CURRENT #88: Wed Dec 29 09:55:39 UTC 2010 root@chch.daemonology.net:/usr/obj/i386.i386/usr/src/sys/XEN i386 >Description: 9.0-CURRENT system running as AMI in Amazon EC2 cloud keeps poor time. System was under heavy load, repeatedly compiling packages to exercise memory allocation code. Clock in guest should be updated faithfully by the host. Clock in this AMI drifted 2200 seconds over 11 hours. >How-To-Repeat: From Amazon AWS Console, start a FreeBSD instance. I used ami-a0fc0dc9, the most recent 9.0-CURRENT available on Dec 30, 2010. Wait a few hours. (Maybe use it heavily?) Compare the correct time from a good NTP source with the AMI system clock. >Fix: No known fix. (Didn't rtfs yet) Workaround: perhaps run ntpdate out of cron every twenty minutes? >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-xen Responsible-Changed-By: cperciva Responsible-Changed-When: Sun Jan 2 11:12:29 UTC 2011 Responsible-Changed-Why: Assign Xen clock bug to freebsd-xen list. http://www.freebsd.org/cgi/query-pr.cgi?pr=153620 From: Colin Percival To: bug-followup@FreeBSD.org, fbsd-9.0-aws-ec2-1293964000@holmberg.to Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Sun, 02 Jan 2011 03:16:52 -0800 This is interesting -- I thought I had squashed all the clock drift bugs. Can you tell me: 1. Did the clock run ahead, or behind? 2. Can you reproduce this? 3. Did the clock _drift_, or _jump_? The 2200 seconds mentioned is almost exactly the 2^41 ns period of the Xen timecounter, so if the clock jumped it's probably safe to guess that it's involved somehow... -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid From: Greg Holmberg To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Sun, 2 Jan 2011 12:51:33 +0100 On Sun, Jan 02, 2011 at 03:16:52AM -0800, Colin Percival wrote: > > Can you tell me: > 1. Did the clock run ahead, or behind? The NTP adjustment is a positive number (see below). Does this mean the clock is running slow? > 2. Can you reproduce this? > Yes. In the existing AMI, I just reset the clock again. Since I filed the PR, it had drifted "offset 0.124360 sec". While writing this email, it has drifted another "offset 0.009534 sec". I will let it go without correction for a while now. > 3. Did the clock _drift_, or _jump_? > Good question. Based on a handful of invocations of ntpdate, I would say that it drifts. The offset is different each time. ... Regards, Greg From: Greg Holmberg To: FreeBSD PR kern/153620 Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Tue, 4 Jan 2011 02:10:29 +0100 > On Sun, Jan 02, 2011 at 03:16:52AM -0800, Colin Percival wrote: > > 3. Did the clock _drift_, or _jump_? > I started a new VM (ami-5b82b72f) using the latest available 2011-01-01 code. The problem -- the system clock losing 2200 seconds -- is still present. Last night, the clock in the new AMI seems to have lost 2200 seconds twice in a 14 hour idle period. The system clock seemed to be stopped before login. It started incrementing smoothly again when I logged in to check on the system in the morning. The system clock itself only managed to advance 14 minutes (837 seconds) overnight. After subtracting the accumulated drift from the previous evening's work, we see that the offset with a nearby NTP server at the moment I logged in was 4398.107077. This value is very close to 2 * (2^41) / 1e9. system clock NTP source ... 20110103-141513UTC 0.437444 # drift is gradual, linear 20110103-141613UTC 0.438124 # clock is running slow 20110103-141713UTC 0.438786 20110103-141813UTC 0.439584 20110103-141914UTC 0.440247 20110103-142014UTC 0.440957 20110103-142114UTC 0.441663 20110103-142214UTC 0.442313 20110103-142314UTC 0.443064 20110103-143711UTC 4398.550141 <--- 4398.107077 second difference 20110103-143811UTC 4398.550755 between slow clock and NTP src ... I have noticed that it only seems to happen when there are no active login sessions. I have seen a large jump in system time happen with a monitor script backgrounded from a single login shell, sometime after logout. I have seen it happen when the script runs in the foreground of a window in GNU screen. I have not been able to provoke it during an interactive login session. Maybe I just need to be more creative. Best regards, Greg Holmberg From: Colin Percival To: bug-followup@FreeBSD.org, fbsd-9.0-aws-ec2-1293964000@holmberg.to Cc: Subject: Re: kern/153620: [xen] Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Tue, 04 Jan 2011 03:32:21 -0800 Ok, I think I see what's happening here: Under some conditions it seems that the clock stops running. If there are no interrupts from any source, the FreeBSD instance never gets scheduled; and the clock loses time in multiples of the timecounter period (2^41 ns) if it doesn't tick for that long. The reason this doesn't show up with an interactive login session open is that ssh generates enough network traffic to wake the kernel periodically; this is also provides a workaround for this bug: Send a ping to the instance once every 30 minutes. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid From: Greg Holmberg To: FreeBSD PR kern/153620 Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Wed, 5 Jan 2011 17:32:53 +0100 The latest AMI (FreeBSD 9.0-CURRENT @ 2011-01-04) still drops time in chunks. It loses time in multiples of 2200 seconds while idle. The clock in the latest AMIs and older AMIs also drifts. Without some kind of external correction, it runs slightly slower, losing time at a rate of a little less than one second every 24 hours. In the 2011-01-01 and 2011-01-04 AMIs, a single ping from a remote host once every thirty minutes keeps an otherwise idle VM awake enough to prevent any time from being lost in chunks. In the latest AMIs and older AMIs, no amount of system activity or interrupts seems to prevent, aggravate, or change the rate of gradual clock drift. Regards, Greg Holmberg From: Greg Holmberg To: FreeBSD PR kern/153620 Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Sun, 9 Jan 2011 06:39:44 +0100 This problem -- dropping 2200 seconds of clock time -- is not seen in a FreeBSD 8.2-RC1 AMI on Amazon EC2 (ami-f77e4a83), using the same type of T1-micro VM as in the initial report. The clock drift noted with 9-CURRENT is also present in 8.2-RC1, but at about 40% of the rate, taking three hours and twenty-two minutes to lose an entire second. Best regards, Greg Holmberg From: Greg Holmberg To: FreeBSD PR kern/153620 Cc: Subject: Re: kern/153620: Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Sun, 9 Jan 2011 15:01:44 +0100 Both Intel and AMD systems experience this problem. Interestingly, the dual Xeon E5430 lost 4x more time over six hours of idling than the Opteron 2218 HE. The image was ami-e388bd97. Best, Greg Holmberg From: Adam Baldwin To: bug-followup@FreeBSD.org, fbsd-9.0-aws-ec2-1293964000@holmberg.to Cc: Subject: Re: kern/153620: [xen] Xen guest system clock drifts in AWS EC2 (FreeBSD 9.0-CURRENT i386 T1-micro) Date: Sat, 02 Apr 2011 16:20:50 -0400 I'm noticing the same behavior on FreeBSD 8.2-RELEASE on t1.micro instances (AMI ami-423bc82b). I'm routinely losing ~4400 seconds (which is enough to force ntpd to abort) when the instance is idle (no SSH sessions connected). >Unformatted: