Hi all,
I think we have a fix for this as well. The problem was that only the first 1000 results were getting returned to the graphs. When dealing with 24 hours worth of 1 minute summaries for latency data, that’s 1440 points so things got cut-off. I just pushed out a new esmond RPM that should return the rest of the results. It should rsync its way into the main yum repo in the next 30 minutes and then work its way through the rest of the mirrors over the next day or so. the new version is 2.0.2. As soon as you get that RPM your graphs should look correct again with no further action.
Thanks, Andy
On March 11, 2016 at 9:32:23 AM, Christopher J. Tengi () wrote:
Michael,
The interesting thing, to me, is that ping data is
graphing just fine. It is the latency and loss data that is
problematic. Attached are 2 images of 1-day graphs generated on one
of our endpoints that is testing against a machine at Yale. The
graph is bing generated on the toolkit host perfsonar-87prospect.princeton.edu, using the local esmond
to generate the graph. Note that on the 1-day graph from today
(where latency is predominant), the latency and loss numbers stop
around 02:00. Clicking on the “Previous 1d” link generates a graph
(with loss predominant) with a similar cut-off around 02:00. The
ping data just chugs on steadily through both graphs. That just
seems odd to me.
/Chris
On Mar 10, 2016, at 4:11 PM,
Michael Johnson <> wrote:
Hi Joon,
Thanks for following up with more information -- we're still trying
to figure out where the problem is. We will most likely wait until
next week, since one developer who I want to discuss this with
is gone this week.
Thanks,
Michael
On Wed, Mar 09, 2016 at 11:43:06PM +0000, Hyojoon Kim
wrote:
Hello Michael,
Thanks for filing a bug report!
In the MA machine:
I have not found any error messages in
/var/log/cassandra/cassrandra.log or
/var/log/cassandra/system.log. However, I did find some
interesting errors in /var/log/esmond/django.log and
/var/log/esmond.log. I am attaching them. One interesting
error message in django.log is
OverflowError: list size out of the sanity limit (10000 items
max)
One thing to mention is, the error message above is only found in
the MA machine. But, the charts viewed directly from the test point
node (which has the toolkit installed) also shows the same
display issue. (The test point node sends data both to the MA
machine’s esmond and also to its local esmond). However, the
test point node with the toolkit does *not* have any error messages
in any of the log files mentioned above.
Thanks,
Joon
On Mar 9, 2016, at 5:12 PM, Michael Johnson <<mailto:>>
wrote:
Hi Hyojoon,
Thanks for your bug report. We're looking into it. Looking at your
MA, it looks like you're getting 500 errors for some requests,
i.e.:
http://perfsonar-ma-2.princeton.edu/esmond/perfsonar/archive/dfbfe496326a49d1a55219ac831f9b26/histogram-owdelay/statistics/0?format=json
Do you see any interesting errors in
/var/log/cassandra/cassandra.log or /var/log/cassandra/system.log?
These could be preventing the graphs from getting all the
necessary data.
I suspect there are some issues with your MA's summary windows,
although it's not quite clear yet where the problem lies. I've
opened a bug report here:
https://github.com/perfsonar/graphs/issues/33
I put it with the graphs, even though it's not clear yet whether
it's a problem with the charts themselves, or esmond.
Thanks,
Michael
On Wed, Mar 09, 2016 at 08:25:22PM +0000, Hyojoon Kim
wrote:
Hi,
After an upgrade to v3.5-1, the chart that shows measured data
seems to have a display issue. To be more specific, the chart does
not display all available ‘Latency’ data it has *when zoomed
to 1 day* by mouse *click* on ‘1d’ above the chart (not by window
dragging, which is below the chart).
For example, the following links show charts that display
measurement data between two nodes (128.112.228.24 and
128.112.228.23). The query was made on March 9, 2016, at
around *2:54 pm*. However, when zoomed to *1 day*, the
display cuts off at around 7:33 am. When zoomed to *3 day*,
the data is displayed up to roughly 2:54pm.
In other words, zooming to *1d* seems to cut off displayed latency
data, even though the data *is* actually there. Note that
‘Ping’ data is not cut off.
Zoom: 1 day (1d)
https://perfsonar-ma-2.princeton.edu/perfsonar-graphs/graphWidget.cgi?url="https://perfsonar-ma-2.princeton.edu/esmond/perfsonar/archive&source=perfsonar-hpcrc-delay.princeton.edu&dest=perfsonar-87prospect-delay.princeton.edu#timeframe=1d
Zoom: 3 day (3d)
https://perfsonar-ma-2.princeton.edu/perfsonar-graphs/graphWidget.cgi?url="https://perfsonar-ma-2.princeton.edu/esmond/perfsonar/archive&source=perfsonar-hpcrc-delay.princeton.edu&dest=perfsonar-87prospect-delay.princeton.edu#timeframe=3d
* Zoomed to between 6am - 2:54pm by dragging the window with mouse,
so that the difference between 1d and 3d is more obvious (i.e., 1d
zoom is missing latency data)
Zoom: 1 day
https://perfsonar-ma-2.princeton.edu/perfsonar-graphs/graphWidget.cgi?url="https://perfsonar-ma-2.princeton.edu/esmond/perfsonar/archive&source=perfsonar-hpcrc-delay.princeton.edu&dest=perfsonar-87prospect-delay.princeton.edu#timeframe=1d&zoom_start=1457520917158&zoom_end=1457553245000
Zoom: 3 day
https://perfsonar-ma-2.princeton.edu/perfsonar-graphs/graphWidget.cgi?url="https://perfsonar-ma-2.princeton.edu/esmond/perfsonar/archive&source=perfsonar-hpcrc-delay.princeton.edu&dest=perfsonar-87prospect-delay.princeton.edu#timeframe=3d&zoom_start=1457520917158&zoom_end=1457553245000
Any help or insight would help!
Thanks,
Joon
--
Michael Johnson
GlobalNOC Software Engineering
Indiana University
812-856-2771
--
Michael Johnson
GlobalNOC Software Engineering
Indiana University
812-856-2771
|