perfsonar-user - AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

Subject: perfSONAR User Q&A and Other Discussion

List archive

AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

From: "Garnizov, Ivan" <>
To: Phil Reese <>, "" <>
Subject: AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day
Date: Mon, 4 Nov 2019 14:11:43 +0000

Hi Phil,

Sorry for the delay. Friday was a bank holiday here in Germany. I am happy to know your worries come to an end. Have no worries for me, it is a pleasant duty to support the user list.

With regards the TWAMP measurements I do not have much experience, but I have a running example.

IMHO there’ll be no real benefit for you with a TWAMP mesh.

Possibly you might think a gain would come from reduced number of measurements schedules and number of measurement archival runs, but that is unrealistic.

TWAMP indeed dictates measurements in both directions and with a single test request pS triggers a single results archival, BUT The problem is there is no elegant way to split the mesh in 2 halves and have a disjoint pS mesh for TWAMP. That would lead to missing measurements in between the nodes in each group. There is no good way to even space the measurements in a way that allows for 1 half to operate after the other half. You have practically the same problem.

As a result you will end up with measurements in both directions initiated by each node in the mesh, which effectively only increases the number of the tests with no relaxation of the work flow. Actually the pSc archiver will have to process even more data.

The one minor benefit that one can find is in having RTT results along with OWD in a single TWAMP test/archival.

I would be glad to meet you as well, but unfortunately the SC quota is usually full for me ;)

Regards,

Ivan Garnizov

GEANT WP6T3: pS development team

GEANT WP7T1: pS deployments GN Operations

GEANT WP9T2: Software governance in GEANT

Von: Phil Reese [mailto:]
Gesendet: Freitag, 1. November 2019 04:15
An: Garnizov, Ivan (RRZE) <>;
Betreff: Re: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day

Hi Ivan,

Good news continues today. No hung systems and we've re-enabled the RabbitMQ archiver. So far so good.

The only retries that have been in the .json were in the RabbitMQ archive spec. The original retry config was way to aggressive. When we added the RabbitMQ archiver back today, it has a single retry after 120sec. We'll see if that works.

The main test that we've been watching, on a every 5 min basis is the OWAMP Loss test. In some additional research today, it seems the new tool for Loss testing is 'twamp'. I think the party line is that it isn't ready for production use, but I wonder if we could use 'twamp' instead of OWAMP oping?

Do you know if using twamp would lower the load on the archiver workers? If so, I'd be interested in at least temporarily shifting to use twamp on our isolated setup. If this seems rationale, might you have example .json lines to be used?

Thanks!
Phil

PS- if you happen to be coming to SC19, I'll be having a few presentations on the project at the Stanford booth, #1255, please stop by if you are attending SC19.

On 10/31/19 8:27 AM, Garnizov, Ivan wrote:

Interestingly enough initially you had configured retries in your config, but these were in the matter of seconds, which only did harm, than help the pscheduler operation.

Re: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day, Phil Reese, 11/01/2019
- AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day, Garnizov, Ivan, 11/04/2019
  - Re: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day, Phil Reese, 11/05/2019

List archive

AW: [perfsonar-user] Modest sized grid has agent failure to archive once or twice a day