regular statistics dumps getting out of sync
Arnt Gulbrandsen
arnt at gulbrandsen.priv.no
Mon Aug 7 14:54:15 UTC 2006
I've done such things, and in my experience, the quality of the output
is better if you resync.
If you resync, you get effectively the right interval until conditions
are completely horrible, and then it falls back to 2*interval,
3*interval, etc. The interval's _effectively_ right because when a
signal is delivered a second late, generally the same reason has also
prevented you from doing anything that would be reflected in the
statistics you report.
I've only seen the 2*interval thing in case of disasters. True disasters
like another process eating all RAM+swap. (IIRC rrdtool can be
configured to detect 2*interval periods and display them as outages.)
By comparison, if you don't resync, the period changes by a much smaller
factor, and it starts deteriorating much sooner. You don't need a fork
bomb to affect data quality, you just need a bit of overload or bad
luck.
The algorithms I've used are (translating from my select() to nsd's alarm()):
alarm( nsd->st.boot - time(NULL) % nsd->st.period );.
and
alarm( nsd->st.period - ( time(NULL) % nsd->st.period ) );
The first gives better data for a single process, since its forst
st.period is optimally reported. The second gives better aggregate data
across a process restart or when data from several nsds are combined.
Arnt
More information about the nsd-users
mailing list