Troubleshooting NTP¶
On our workstations we use ntpd to synchronize time with our time servers. Normally, ntpd automatically adjusts for small inaccuracies of the system clock. This is mainly done through the drift parameter which is recorded in /var/lib/ntp/ntp.drift
Occasionally, a system clock may perform so badly that ntpd cannot find an appropriate drift paramter. In such a case we can help with altering the tick/frequency settings. Here are some hints to find appropriate values.
Tweaking the Clock on a Workstation¶
The hardware and the system clock should be calibrated separately.
Preparation: Install Necessary Packages¶
We need
apt-get install util-linux ntpdate ntp adjtimex libstatistics-descriptive-perl
Calibrating the Hardware Clock¶
In normal operation ntpd (or rather the kernel via a setting done by ntpd) keeps the hardware clock calibrated. Its drift rate is registered in /etc/adjtime
(in seconds per day in the first number of that file). To calibrate manually (or rather through a cron job conveniently provided in a git repository) you need to switch off ntpd.
Use these commands as root on the workstation:
/etc/init.d/ntp stop
cd /usr/local
git clone http://git.phys.ethz.ch/adjust-clock.git
ln -s /usr/local/adjust-clock/adjust-clock.cron /etc/cron.d/adjust-clock
With this the time of the system clock will be set every 5 minutes and synchronized with the hardware clock. This is essentially the same that ntpd and the kernel are doing except that it works even when ntpd fails to synchronize with the time servers.
Give it at least a few hours or better a couple of days. Disable the Xymon tests for ntp and ntpd during this time.
Calibrating the System Clock¶
Now that we have a calibrated hardware clock we can calibrate the system clock as well. First, switch off the periodic ntpdate adjustments:
rm /etc/cron.d/adjust-clock
The simplest way to find the right calibration parameters for system clock might be to run adjtimexconfig (again, as last time it was called when adjtimex was installed)
/usr/sbin/adjtimexconfig
Alternatively, you can get an estimate for TICK/FREQ from the syslog entries collected during the hardware clock calibration as follows:
/usr/local/adjust-clock/syslog2adjtimex | tee /etc/default/adjtimex
/etc/init.d/adjtimex start
The latter might be more accurate if you ran the hardware clock calibration over the period of a few hours or days.
You can then start the ntp daemon making sure to start with prestine defaults:
/etc/init.d/ntp stop
rm /var/lib/ntp/ntp.drift
ntpdate -s time1.ethz.ch
/etc/init.d/ntp start
Then watch the offset to the time servers converge or drift apart:
watch ntpq -p
If the offset still drifts too much you can play around with adjtimex and calculate some numbers by yourself according to the adjtimex(8) man page at the end. You may also find it convenient to use the Javascript form on http://www.ep.ph.bham.ac.uk/general/support/adjtimex.html
If you find good values for adjtimex you should write them to /etc/default/adjtimex
Also have a look at the ntp drift file /var/lib/ntp/ntp.drift