perfsonar-user - Re: [perfsonar-user] Current status of PS in Amazon Cloud services

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Current status of PS in Amazon Cloud services

From: Casey Russell <>
To: Mark Feit <>
Cc: "" <>
Subject: Re: [perfsonar-user] Current status of PS in Amazon Cloud services
Date: Fri, 22 Feb 2019 13:32:34 -0600

Actually, allow me to adjust my wording on that just a bit. For this smaller member, who uses Cloud Connect for day to day student and administrative software, that ability for KR to do this testing "would be nice".

I suspect however, that K-State and KU, as they begin their deployments will see it as a bit more critical than "it would be nice". They'll likely see it as a more critical capability since they'll be doing more research related tasks and disaster recovery type functions down the road.

Sincerely,

Casey Russell

Network Engineer

785-856-9809

2029 Becker Drive, Suite 282
Lawrence, Kansas 66047

On Fri, Feb 22, 2019 at 1:28 PM Casey Russell <> wrote:

Thanks Mark,

I have a member who was asking about latency to the other end of their Cloud Connect provisioning to Azure, and (while that's not Amazon) I just became curious about what the status of PS on the cloud provider networks was. You've answered that question nicely. I rather doubt they'll actually go down that road for now.

I've started them down the basic troubleshooting road with the standard tools (traceroute, etc), but as for the I2 perfSONAR node question, I happen to do some regular PS testing to a node in the Equnix datacenter and was able to tell him what our regular latency was to Ashburn, although I cautioned that his exact path through KR and I2 might be somewhat different. Speaking for just KanREN, It would be really helpful to have a published list, and the ability to do ad-hoc testing, particularly to nodes that are near (or at) the Cloud Connect or Cloud Access datacenters. I know large, managed, multi-domain meshes are a mess and I wouldn't advocate for that, but it would be nice as a REN or RON to be able to do a quick ad-hoc test for a member when a question like this comes up.

Sincerely,

Casey Russell

Network Engineer

785-856-9809

2029 Becker Drive, Suite 282
Lawrence, Kansas 66047

On Fri, Feb 22, 2019 at 12:45 PM Mark Feit <> wrote:

Casey Russell writes:

The last time I see the question addressed was two years ago in 2017 when it was acknowled that there were a host of problems trying to deploy a PS node in the amazon cloud. It appears the problems were mostly surrounding the NAT issues. Has there been any progress made in the intervening time? Is anyone deploying a set of PS tools in Amazon Cloud services to monitor their new Cloud Connect connectivity or something similar? If so, do you have a roadmap or template for doing so?

Things are much the same.

NAT isn’t difficult to deal with for tests like TCP throughput, where you can do reverse testing to dodge the translation. It’s a lot stickier for UDP throughput, latency and RTT, where you can do tests from the behind the NAT to the outside, but anything you can do the other way won’t traverse it and there isn’t necessarily a server available to play the “other end” role. We’ve discussed things like UDP hole punching, but the number of variables would make it difficult to support. The only other way around it is to have public IPs.

The harder problem to overcome is the uncertainty that comes from deploying infrastructure in what’s essentially a geographically- and topologically-diverse black box. An AWS instance in us-east-1 will materialize as one VM on one machine in one rack in one aisle in one of almost three dozen buildings in two counties in Northern Virginia. (I live near Ashburn and drive past Amazon’s data centers often enough to have a good feel for how large and spread out their presence here is. It’s quite something.) Making end-to-end measurements meaningful is difficult in an environment where there’s no control over the location of one end, but that lack of control is what makes commodity computing cheap enough to be attractive. I’ve run into researchers who install perfSONAR in a container on the same instances where their applications run and treat any measurements they get as valid for only that instance. If I remember correctly, their institution was getting a large-enough break on network traffic that the cost of outbound throughput testing wasn’t an issue.

This has been mentioned before but is worth repeating: Check the agreements you have with cloud providers before sharing the results of performance measurements involving their systems and networks. I don’t know if it’s still true, but at least one provider forbade that.

Putting on my Internet2 hat for a moment: We have internal-use-only perfSONAR nodes in the vicinity of some, but not all, of our cloud provider handoffs. There have been discussions about adding more at those points but no solid action. If you’re a member and consider this important to your operations, your feedback helps drive what we do.

HTH.

--Mark

[perfsonar-user] Current status of PS in Amazon Cloud services, Casey Russell, 02/22/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Mark Feit, 02/22/2019
  - Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Casey Russell, 02/22/2019
    - Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Casey Russell, 02/22/2019

List archive

Re: [perfsonar-user] Current status of PS in Amazon Cloud services