Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] a shellshocked experience

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] a shellshocked experience


Chronological Thread 
  • From: Alan Whinery <>
  • To: Stefan Piperov <>,"" <>
  • Subject: Re: [perfsonar-user] a shellshocked experience
  • Date: Thu, 02 Oct 2014 05:45:37 -1000

I find the question "why is perfSONAR so much more vulnerable (...)?" surprising. I don't think that it is. Having had zero break-ins in 10 or more boxes, over 4 years, and loss-of-contact only when there was underlying hardware/configuration issues, I have to say tjat your incident numbers are unusually high, and that there probably are unidentified causal factors, other than perfSONAR flaws.

Also, the aftermath of shellshock is not a typical scenario. The earth just shifted under our feet. Several kinds of security strategy just sprang to life, like avoiding over-reliance on a single code-base. (I never thought I'd want to install busybox on full sized servers).

-Alan

On October 2, 2014 3:49:06 AM HST, Stefan Piperov <> wrote:

Jason, indeed Brown is running central perfSONAR servers. We installed an
additional, dedicated one, on our network segment, because we wanted the
ability to test the throughput down to the very last point before it
reaches our Tier3 machines.

I guess the question that I am asking - and probably many others are - is
why is perfSONAR so much more vulnerable in its default installation than
our other servers, given that they are based basicaly on the same
operating system? Shouldn't the defaults - like the ones for httpd you
have currently described on the homepage of the project - be tightened up
to the max, and then having local admins releave them as needed?
Because most of us here - the large majority of these 700 users that you
quote - approach perfSONAR as users, and not as developers of any kind.
We know how to administer systems, and we can tell a problematic one when
we see one. I'll put it this way: In its current state and configuration,
my perfSONAR server requires a disproportionate amount of effort to
maintain secure.

Perhaps the project should aim at releasing a (probably scaled down?)
production version, which can be run in a typical low-maintenance mode as
we are used to see from other SL/OSG-based packages, and separate the new
and fancy features in a development version?

Regards,
Stefan.


On Thu, 2 Oct 2014, Jason Zurawski wrote:

Hi Stefan;

The argument cannot be boiled down to 'will it cause people to flee if it is hard', it comes down to putting out the best quality product that we can. 'Product' here is a combination of the software and the BCP that goes with it, and when we are wrong we need to own up to it - the appliance mentality is wrong. We cannot guarantee the 'appliance' use case - it does a disservice to our product and our customers to pretend we can. We are not a 24/7/365 effort. We have about 20 core people, devoting anywhere from 1/64th to all to all of their time, and we are split between many timezones. Appliances require almost centralized control to deliver on their promises, its just not something a pokey open source project can do without significant people and funding.

We can continue to be responsive to user suggestions, complaints, questions, and praise - we are a long way from the 1st release of this tool and there has been significant upward progress. As you note, It is a tool like a hammer, you use it to solve a task. There are many forms of hammers - lets go back to imagining the manual version, not the hydraulically operated kind.

One my of my previous notes made reference to 'sysadmin 101'. What follows is not meant to be an elitist statement: not everyone can be a sysadmin. The world needs scientists, sysadmins, and people to set up pins in a bowling alley. If maintenance of the machine is too hard for one person to manage - finding someone else who can assist is the best course of action. Someone who knows how to be a sysadmin can set up automated monitoring and maintenance routines - this is their job after all. They don't log on to the machine 2 times a day to apply patches, that would be ludicrous. They do watch security lists, react when needed, and trust that automated methods are doing what they need to be doing. There needs to be a better range of care between 'never patch since its an appliance' and 'patch hourly', and it is possible to strike a balance.

The proper answer, that I am going to read into from your mail below, is that you need to sit down and talk with your network security and systems administration people about reaching a better way to handle the care and feeding of this machine.

- Perhaps the answer is 'you shouldn't use it', and maybe that is best. If we look at other servers on a campus (DNS, Mail, OSG Compute nodes) they all have an administrator that need to grease the gears from time to time. If the perfSONAR node is treated in a different manner because the word 'appliance' was associated with it, that needs to change toward a model where its a server like the others. The number of users of perfSONAR is not as important to us as the value people see in using it along with the ability they possess to maintain it.

- Perhaps the answer will be 'this machine increases the value of CMS science - so let us help you maintain it'. Cyberinfrastructure matters to everyone, not just one overworked physicist. If there is not campus/laboratory buy-in, things are going to fail. It may be the case your local support staff (who really are helpful people - science and networking do not need to have an adversarial relationship) can assist. Maybe it means integrating the machine into a CFEngine/Puppet/etc. system. Maybe it means moving it out of your facility to the border of campus. Maybe it means reducing the number of distributed machines from many to 1 - there are lots of answers. I don't have these, your local support staff will.

The networking staff at Brown U are very reasonable, and I have worked with them (and you) on several issues over the years. If you would like someone from the perfSONAR project to help you make the case for this tool, we would be happy to do so (and that goes for anyone else that is reading this - all 700 of you). We make a crappy 'appliance', which is true, we do however want to try to make a useful tool that doesn't ruin someone's Friday.

I hope this helps you and others in a similar situation;

Thanks;

-jason

On Oct 2, 2014, at 8:19 AM, Stefan Piperov <> wrote:

On Wed, 1 Oct 2014, Jason Zurawski wrote:

- As a P.S. to the previous bullet - perfSONAR is not an appliance...

Jason, are you not afraid that a statement like this will push away many users? In our collaboration (CMS) perfSONAR was recommended as a _tool_ for diagnosing network problems. A (good) tool normally does not require maintenance or babysitting.

If I am to spend any significant amount of time (like patching the system twice per day, as suggested in one of the responses), I'd rather not use perfSONAR at all.

My server has been hacked already twice, which puts me in a really bad situation in relation to the network security people.
Plus the machine was found 'frozen' a couple of more times during this last year that I have uesd perfSONAR.
Quite disappointing.

Regards,
Stefan.


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Archive powered by MHonArc 2.6.16.

Top of Page