perfsonar-user - Re: [perfsonar-user] Current status of PS in Amazon Cloud services
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Dan Pritts <>
- To: Casey Russell <>
- Cc: Mark Feit <>, "" <>
- Subject: Re: [perfsonar-user] Current status of PS in Amazon Cloud services
- Date: Wed, 3 Apr 2019 17:06:33 -0400
Picking up this recent thread...
My group here at UMich moved production workloads to AWS in early 2018. A week and a half ago we started seeing performance issues between campus and AWS us-east-1. I did some low-bandwidth UDP iperfs and saw 0.3%-0.5% packet loss on the path. UM networking engaged the Internet2 NOC, who found and fixed a problem with an Internet2 link to AWS in Ashburn.
Mark mentioned below that the way around NAT for perfsonar is to get public IPs. Unfortunately, it's not that simple in AWS. Even when you have a public address, your linux instance is given an RFC1918 address and AWS does 1:1 NAT for you. It's weird, but for almost any application this works just fine. (With IPv6, they've done away with this, and you can get a plain old routed address.)
While gathering information for the NOC, I brought up the latest perfsonar on an aws instance and a system here in Ann Arbor. I could source tests from AWS, but sourcing from Ann Arbor didn't work - pscheduler never started up iperf3. Manual iperf3's worked fine, of course, and when I got both ends up with IPv6, it worked fine in both directions.
Having tools like perfsonar available and workable is key when moving to the cloud. The network has to be nearly perfect, and without being able to monitor it that seems like a long shot.
Regarding the uncertainty that comes with the cloud - you are absolutely right. That makes tools even more useful.
So let me put in a feature request - make perfsonar tools work with 1:1 NAT like AWS uses. Manual configuration would be OK - a quick little entry somewhere that says "my public IP is x.y.z.1". Sorry if this is there and I missed it; it wasn't for lack of searching.
Hi to anyone who remembers me, hope you are all well.
danno
Casey Russell wrote on 2/22/19 2:32 PM:
Actually, allow me to adjust my wording on that just a bit. For this smaller member, who uses Cloud Connect for day to day student and administrative software, that ability for KR to do this testing "would be nice".
Lawrence, Kansas 66047
Lawrence, Kansas 66047
Casey Russell writes:
The last time I see the question addressed was two years ago in 2017 when it was acknowled that there were a host of problems trying to deploy a PS node in the amazon cloud. It appears the problems were mostly surrounding the NAT issues. Has there been any progress made in the intervening time? Is anyone deploying a set of PS tools in Amazon Cloud services to monitor their new Cloud Connect connectivity or something similar? If so, do you have a roadmap or template for doing so?
Things are much the same.
NAT isn’t difficult to deal with for tests like TCP throughput, where you can do reverse testing to dodge the translation. It’s a lot stickier for UDP throughput, latency and RTT, where you can do tests from the behind the NAT to the outside, but anything you can do the other way won’t traverse it and there isn’t necessarily a server available to play the “other end” role. We’ve discussed things like UDP hole punching, but the number of variables would make it difficult to support. The only other way around it is to have public IPs.
The harder problem to overcome is the uncertainty that comes from deploying infrastructure in what’s essentially a geographically- and topologically-diverse black box. An AWS instance in us-east-1 will materialize as one VM on one machine in one rack in one aisle in one of almost three dozen buildings in two counties in Northern Virginia. (I live near Ashburn and drive past Amazon’s data centers often enough to have a good feel for how large and spread out their presence here is. It’s quite something.) Making end-to-end measurements meaningful is difficult in an environment where there’s no control over the location of one end, but that lack of control is what makes commodity computing cheap enough to be attractive. I’ve run into researchers who install perfSONAR in a container on the same instances where their applications run and treat any measurements they get as valid for only that instance. If I remember correctly, their institution was getting a large-enough break on network traffic that the cost of outbound throughput testing wasn’t an issue.
This has been mentioned before but is worth repeating: Check the agreements you have with cloud providers before sharing the results of performance measurements involving their systems and networks. I don’t know if it’s still true, but at least one provider forbade that.
Putting on my Internet2 hat for a moment: We have internal-use-only perfSONAR nodes in the vicinity of some, but not all, of our cloud provider handoffs. There have been discussions about adding more at those points but no solid action. If you’re a member and consider this important to your operations, your feedback helps drive what we do.
HTH.
--Mark
-- To unsubscribe from this list: https://lists.internet2.edu/sympa/signoff/perfsonar-user
ICPSR Computing & Network Services
University of Michigan
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Dan Pritts, 04/03/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Mark Feit, 04/03/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Dan Pritts, 04/04/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Mark Feit, 04/04/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Dan Pritts, 04/04/2019
- Re: [perfsonar-user] Current status of PS in Amazon Cloud services, Mark Feit, 04/03/2019
Archive powered by MHonArc 2.6.19.