Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSONAR 4.0 available on April 17th

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSONAR 4.0 available on April 17th


Chronological Thread 
  • From: Andrew Lake <>
  • To: Brian Candler <>, "" <>
  • Subject: Re: [perfsonar-user] perfSONAR 4.0 available on April 17th
  • Date: Mon, 10 Apr 2017 11:46:25 -0700
  • Ironport-phdr: 9a23:LZUhMRyIh+KP/17XCy+O+j09IxM/srCxBDY+r6Qd2+4VIJqq85mqBkHD//Il1AaPBtSFrasewLOM4+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fdbghMhTexe65+IAu5oQnMqMUbgpZpJ7osxBfOvnZGYfldy3lyJVKUkRb858Ow84Bm/i9Npf8v9NNOXLvjcaggQrNWEDopM2Yu5M32rhbDVheA5mEdUmoNjBVFBRXO4QzgUZfwtiv6sfd92DWfMMbrQ704RSiu4qF2QxLzliwJKyA2/33WisxojaJUvhShpwBkw4XJZI2ZLedycr/Bcd8fQ2dKQ8RfWDFbAo6kb4UADeQBM+FXoIfzpFUAsAWwChW3Cez11jNFnGX70Lcm3+kjFwzNwQwuH8gJsHTRtNj4KKESXv2vzKbWwzTIcvVY1i3+6IfWbxsspuuDXbRtfsvR0kQgCRjFgk+WqYP7IzOYz+IAuHWV4epnUOKgkW8nqwdprzi32MgskIfJhpkSylDC7yp52pw5JdumR05nZ9OvDZhetzmCOodrXM8vQHtktSggxrAJpJK3ZioHxIk/yxPdZPGLaZWE7gzjWeqLIjp1imhpdbChixu07EOu0PfzVtOu31ZPtidFksfDtnQK1xHL5MiIVPRw8l2l2TmU0wDf8O5EIUcqlabDKp4hxKA/loYLvEjeACP7m1/6gaGMekgr5+Sk8ebqbqj+qp+ZLYB0iwX+Mqo0msy4BOQ1KhIBX26G9uW8z7Ds41b5TK9MjvIojqnVqIraKtgDpq6lHw9V1Z4u6w6hADe83tQYhn4HLFRfdxKdloTpJkrOL+7iDfqkh1SskSxrx+zdPrH/GJnNL37DkKv/crZn7U5T1hYzwc5F651KF74BPaG7ZkiknsbZBxlxGAWyz+LqEp0p3MUVUGSDBqKDGLvVu1iYoOkoJr/ILMUNtSzzMP8j7uSrkGQ0g3cce7Wkx50adCr+E/h7aQ3NeXf2jMwGF24Q+xclQfbCiVueXCRVamroGa8w+2doJpihCNLqQIy3jaPJ+C64E9UCb2ZKG3iBCjHuepnSCKREUz6bPsI0ym9MbrOmUYJ0kEj27AI=

Hi Brian,

Comments inline:

On April 10, 2017 at 11:39:44 AM, Brian Candler () wrote:
On 03/04/2017 20:02, Andrew Lake wrote:
If you have a single-core CPU, we recommend disabling auto-updates. Our testing shows that the combination of running CPU intensive throughput tests and/or OWAMP tests with high write activity to the measurement archive on top of the new scheduling system does not perform as desired beyond more than a few tests. Such a setup may be fine if you plan to use it as an ad-hoc tester as opposed to for use with dedicated measurements.  It is worth noting here that this is below the minimum recommendations for 3.5.

If your host has a CPU with two cores but the clock speed is below 2GHz or less than 4 GB of RAM, we recommend using your best judgement. If you are running a couple dozen tests or less you are likely fine with regards to CPU. 4GB has been the memory recommendation for many years on the toolkit, and if you are below that and running a bundle without the measurement archive you may be fine as well. If you are running on less than 4GB with a local measurement archive, we encourage you to add memory or disable auto-updates (and we would have recommended this the last few releases as well).

Sad that perfsonar isn't going to work well on smaller boxes any more, especially since the new version "contains numerous performance optimizations" [^1]

For one, we have the world’s worst CMS and those are not the RC1 release notes, it looks like somehow they got overwritten with an early draft of the RC3 notes, so now I get to see if I can dig up the correct ones. That quote is not meant to highlight a difference between 3.5.1 and 4.0 but rather, RC2 vs RC3. RC3 does contain many optimizations over RC2. perfSONAR is also a project with a lot of moving parts, so in some areas 4.0 contains optimizations over 3.5, in other areas it contain higher requirements for the sake of adding increased functionality just as with any software project. While we have increased our recommendations for CPU, I think its an oversimplification to simply say it won't work well on smaller boxes anymore and dependent on a number of factors as we tried our best to summarize in the previous email. 



I'd like to understand the issue a bit more.  Is this the same as the issue of "On sufficiently loaded hosts it is not uncommon for archiving to fall behind"?  If so, is this an issue which is better with SSD than HDD, or is it really just about CPU utilisation? Is esmond now processing the data more, or has the on-disk format changed?

This is about CPU  utilization mainly. Very little has changed with regards to esmond and how things are stored on disk with this release, and is not what is driving this. What has changed is our scheduling software, mainly we have replaced BWCTL and the regular-testing component with a new component called pScheduler. The specific features it adds that drive-up these requirements I think relates to your question below...


Have these changes resulted in some benefits in terms of how the data can be queried or summarised, and if so, what are they?

There are a a whole bunch of new features in pScheduler vs BWCTL+regular testing. We actually keep a schedule in a database now, so unlike BWCTL, you can know when things ran, what’s running and when they are going to run. We also keep lots more diagnostics information so it’s easier to debug things when a test is missed, etc. In addition to that, it has a plug-in architecture for writing new tests, tools and archivers. This architecture leads to more processes which is part of the CPU increase, particularly on the archiver side (the piece that ships test results to esmond) of busy OWAMP hosts that write data frequently. This archiving architecture actually adds real features for registering data when the archive goes down, unlike 3.5 where the strategy was more akin to "fill up a directory with files such that your host can never catch-up until your disk gets filled and someone clears a directory” :)

That’s a pretty high-level overview and not anywhere near an exhaustive list, if you want more details I suggest you attend our webinar on April 20th the details of which got shared with the list last week. We’ll also have complete release notes next week and more documentation published. 



Fortunately, it looks like the nodes I've been deploying (NUC5CPYB - 2-core Celeon / 1.6GHz / 8GB) will probably be OK for a while.  But I've also had some VMs with 2GB RAM and local MAs which were fine with a few tests, and I'm a bit concerned about those.

First of all, both 2GB and running on a VM would be against what we recommend for 3.5 (and probably a version or two earlier than that as well). That being said I have 2 ESnet productions hosts upgraded to 4.0 running 2GB memory that are fine with a local MA, but they also run a handful of tests and I have optimized cassandra not to eat as much memory at the cost of running bit slower (fwiw I did this optimization when 3.5 was released, because it worked poorly otherwise). In other words, setups like ours have been skating on thin ice for awhile. If you’re willing to put in some extra work to optimize things like this or have the hosts doing a small workload you may be fine, but you’ll have an easier time and increased likelihood of success if you follow our recommendations.  As the email said, use your best judgement on hosts like this based on what you have it doing and your willingness to mess with it. 






Archive powered by MHonArc 2.6.19.

Top of Page