Skip to Content.
Sympa Menu

perfsonar-dev - rrdtool peculiarities and implications on RRD MA

Subject: perfsonar development work

List archive

rrdtool peculiarities and implications on RRD MA


Chronological Thread 
  • From: "Vedrin Jeliazkov" <>
  • To: GN2-JRA1-list <>,
  • Cc: Barney Garrett <>, Sven Ubik <>
  • Subject: rrdtool peculiarities and implications on RRD MA
  • Date: Sun, 08 Oct 2006 01:11:05 +0300
  • Disposition-notification-to: "Vedrin Jeliazkov" <>

Hi,

We would like to share our thoughts and start a discussion, related to some
important questions raised recently by Barney Garrett (and also by Sven Ubik a
while ago) about the choices of RRD resolutions and averages, as well as their
implications in the context of perfSONAR.

First of all, we would like to clarify that, contrary to what people might be
expecting, “rrdtool fetch” does not support computing of averages on the fly,
during queries. It can return only averages which have been defined by
“rrdtool create” and only for the time periods for which such averages have
been stored in the RRD file. In particular, if you decide to create a RRD with
the following command:

rrdtool create <filename> --step 60 \
DS:in:COUNTER:120:U:U \
DS:out:COUNTER:120:U:U \
RRA:AVERAGE:0.5:1:1147680 \
RRA:MAX:0.5:1:1147680 \
RRA:MIN:0.5:1:1147680

all time series that will be retrievable later on through “rrdtool fetch”
would necessarily have 60 sec resolution, because this is the only “average”
stored in the RRD during updates. That is to say, averages are defined by
“rrdtool create” once and for all and are computed (and stored) only during
“rrdtool update”. We’ve made extensive tests with rrdtool, which have
definitely proved the above statements. This was necessary step because
rrdtool’s documentation is a little vague on this subject.

On the other hand, it is perfectly reasonable to expect that different users
might prefer to store data in RRD with various granularities (resolutions,
time periods and averages). Here come the implications for perfSONAR.
Currently, perfSONAR services don’t provide means to query and retrieve info
about the available resolutions, time periods and averages in a given RRD
file. That means that clients have to guess these parameters, which is not
failsafe. For instance, we have already know that the default parameters of
RRDs created by various monitoring tools differ and moreover users could
override those defaults. In some extreme cases (like Barney’s example above)
this could easily lead to disaster. Consider a client which asks for a time
series covering the last 1 year with a 2 hours resolution (4380 data points),
but receives back 525600 data points, because the only available resolution in
the RRD is 60 sec (or the other available averages for this time period in the
RRD archive don’t include 2h and therefore the finest available resolution is
returned instead). Then multiply the number of data points with at least 20
bytes (maybe more) for each of them and you’ll get an idea on how much data
will have to be exchanged – something like tens of megabytes for a single
endpoint. Subsequently the client would have to handle/visualize this data and
most probably will either produce an error (out of memory) or will have to
aggregate it on its own, throwing out much of the previously retrieved (raw)
data.

Considering all of the above, we would like to hear different opinions about
the most appropriate solution/tradeoff. Some possible options are:

1) define reasonable defaults for RRD file's structure and mandate that
perfSONAR services should support only these well known defaults (easy, but
not user friendly; the defaults already differ between CRICKET, MRTG and CACTI
and reaching an agreement on universal defaults might be virtually
impossible);

2) implement support for “rrdtool info” in RRD MA and specify
queries/responses for exchange of information between RRD MA and clients about
available averages/time periods (elegant, but adds complexity both on the
service and client side);

3) mandate that clients should always retrieve the finest available resolution
and perform aggregation only when required - either upon user demand or
because of system limitations (easier than the second option, but wastes
bandwidth and CPU cycles, which might be critical in some cases, especially
with huge RRDs, busy servers and/or saturated links);

4) mandate that RRD MAs should calculate the requested averages on the fly if
a finer than requested resolution is returned by rrdtool (transparent for
clients, but would put a heavy burden on the services, which are already quite
slow even without having to deal with such additional workload);

We have a slight preference for option (2) with a fallback mechanism to a a
variant of (3):

- try to satisfy most queries with available data fetched from RRDs, or
- perform aggregation on the client side when necessary (in extreme cases as
in Barney’s example above).

Of course, someone could come up with some better solution ;-)

BTW, it would be also interesting to consider the same tradeoffs in the
context of SQL MA – could someone shed some light on this?

Best regards,
Nina & Vedrin
--
-----------------------------------------------------------------
* * Vedrin JELIAZKOV - Network Engineer
* * Institute for Parallel Processing
* IST Foundation * Bulgarian Academy of Sciences
* The Bulgarian NREN * Acad. G. Bonchev St 25-A
* * 1113 Sofia, Bulgaria
* * Tel: +3592 9796606
http://www.ist.bg ICQ: 42633308
-----------------------------------------------------------------
PGP Public Key
http://cert.acad.bg/pgp-keys/keys/vedrin-jeliazkov-0x0F7EF249.asc
7EA1 7539 9B83 D8BF 4C1D 4890 0CE5 0B4B 0F7E F249
-----------------------------------------------------------------





Archive powered by MHonArc 2.6.16.

Top of Page