Troubleshooting NTP

On our workstations we use ntpd to synchronize time with our time servers. Normally, ntpd automatically adjusts for small inaccuracies of the system clock. This is mainly done through the drift parameter which is recorded in /var/lib/ntp/ntp.drift

Occasionally, a system clock may perform so badly that ntpd cannot find an appropriate drift paramter. In such a case we can help with altering the tick/frequency settings. Here are some hints to find appropriate values.

Tweaking the Clock on a Workstation

The hardware and the system clock should be calibrated separately.

Preparation: Install Necessary Packages

We need

apt-get install util-linux ntpdate ntp adjtimex libstatistics-descriptive-perl

Calibrating the Hardware Clock

In normal operation ntpd (or rather the kernel via a setting done by ntpd) keeps the hardware clock calibrated. Its drift rate is registered in /etc/adjtime (in seconds per day in the first number of that file). To calibrate manually (or rather through a cron job conveniently provided in a git repository) you need to switch off ntpd.

Use these commands as root on the workstation:

/etc/init.d/ntp stop
cd /usr/local
git clone http://git.phys.ethz.ch/adjust-clock.git
ln -s /usr/local/adjust-clock/adjust-clock.cron /etc/cron.d/adjust-clock

With this the time of the system clock will be set every 5 minutes and synchronized with the hardware clock. This is essentially the same that ntpd and the kernel are doing except that it works even when ntpd fails to synchronize with the time servers.

Give it at least a few hours or better a couple of days. Disable the Xymon tests for ntp and ntpd during this time.

Calibrating the System Clock

Now that we have a calibrated hardware clock we can calibrate the system clock as well. First, switch off the periodic ntpdate adjustments:

rm /etc/cron.d/adjust-clock

The simplest way to find the right calibration parameters for system clock might be to run adjtimexconfig (again, as last time it was called when adjtimex was installed)

/usr/sbin/adjtimexconfig

Alternatively, you can get an estimate for TICK/FREQ from the syslog entries collected during the hardware clock calibration as follows:

/usr/local/adjust-clock/syslog2adjtimex | tee /etc/default/adjtimex
/etc/init.d/adjtimex start

The latter might be more accurate if you ran the hardware clock calibration over the period of a few hours or days.

You can then start the ntp daemon making sure to start with prestine defaults:

/etc/init.d/ntp stop
rm /var/lib/ntp/ntp.drift
ntpdate -s time1.ethz.ch
/etc/init.d/ntp start

Then watch the offset to the time servers converge or drift apart:

watch ntpq -p

If the offset still drifts too much you can play around with adjtimex and calculate some numbers by yourself according to the adjtimex(8) man page at the end. You may also find it convenient to use the Javascript form on http://www.ep.ph.bham.ac.uk/general/support/adjtimex.html

If you find good values for adjtimex you should write them to /etc/default/adjtimex

Also have a look at the ntp drift file /var/lib/ntp/ntp.drift

Further Reading