Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] a shellshocked experience

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] a shellshocked experience


Chronological Thread 
  • From: Stefan Piperov <>
  • To: John-Paul Robinson <>
  • Cc: <>
  • Subject: Re: [perfsonar-user] a shellshocked experience
  • Date: Thu, 2 Oct 2014 12:25:48 -0400


John-Paul, your idea of disposable probe, which feeds a secure server, is actually quite good!

Keeping the configuration of the tests, as well as the collected metrics in a secure server, and reinstalling the probe machine frequently - dayly, or hourly if needed - is a way of keeping all exploits at bay.


Jason, I don't want to sound like a great critic of the project.
I'm sorry if I do, it was not my intent. I just wanted to provide feedback, which - in the context of the recent events - turned out to sound somewhat critical.

Stefan.


On Thu, 2 Oct 2014, John-Paul Robinson wrote:

Jason and others,

Thanks for the very helpful replies and links. I had missed the
-announce list but am on it now. I'm also happy about the forthcoming
yum cron default on. The suggestion about multiple updates per day was
more about when the cron job runs rather than some manual checking. We
use puppet for much of the admin here and may incorporate that for
perfsonar in the future.

Let me clarify that perfsonar is an invaluable resource to our effort to
assess networking and build a science dmz. I couldn't do without it.

Regarding instances in the orphanage, I too keep only a light overview
on my system. It works. I'm happy. I trust the platform and the
community building it. I don't expect perfection or that nothing will
fail. I accept that from time to time things will need fixing or may be
even re-installing. I'm ok with that.

I'd rather take a more cloud-like or scale-up perspective on the
platform itself. I want cattle not pets. In other words, I'd rather
flush it down the toilet and get another one when it fails than worry
about maintaining the one precious instance I have now. I don't have
much time to dedicate to care and feeding. One of the functions of
personar is a sensor. I'd like it if i could just throw a failed sensor
out and install a new one.

What I don't want is to lose my telemetry data. I have been running a
regular batch of tests for several months to gather a performance
profile from various points of interest. I look at those throughput
reports regularly. They are building a narrative for what we have and
what we need.

I would like it If there were a way to secure my history of the data
collected so that i can either move it forward when i reinstall or view
it else where. I don't want a failed sensor to threaten the life of my
data. The rebuild recommendations I've seen so far don't appear to
protect my data. That's why it was worth the hours I spent verifying my
existing platform's integrity.

It would be very helpful to have a way to preserve test history data off
the platform so that I can look at it with the same interface elsewhere
(another box or a future box).

Again, I'm very happy with personar. I've found it very reliable. Then
again, I may just be benefiting from others pain, so thanks to those who
have suffered.

Keep up the good work and thanks for making network performance data
gathering so much easier than it was in the past.

~jpr

On 10/01/2014 06:17 PM, Jason Zurawski wrote:
Hey John-Paul;

To echo Brian, thanks for your thoughtful note. One quick point, that some
may not be aware of if they have been on this list for 7 years, is that there
is a low volume (lower than the user list at least…) announcement list:

https://lists.internet2.edu/sympa/subscribe/perfsonar-announce

I can also use this opportunity summarize some of the actions we will take
based on this, and some other internal discussions occurring of the prior
days:

- As Brian noted, a sensible default method for automatic updates is coming.
This will not be a panacea for security or maintenance of course - and some
of our development team has grave concerns about lulling anyone int a false
sense of security and making things far far worse then next time a piece of
systems software outside of our control barfs. The bottom line is still
that each site is responsible for care, feeding, and sensible sysadmin
practices, and we feel that's a statement everyone agrees with. We do (and
will continue to) assist where we can with automated software to observe and
upgrade (IDSs, yum, etc.), and we will also augment any technical solution
with a 'sysadmin 101' guide for those that need it. This is in-progress, and
will also be something we would encourage community input with over time.

- As a P.S. to the previous bullet - perfSONAR is not an appliance, and 2014
was the year that really made everyone (community members, stakeholders,
developers) reconsider what it should be. Recently someone noted that the
word 'appliance' strikes images of a box with no seems, that may not have
serviceable parts inside. Linux, unless it has been *drastically* altered,
doesn't fit this definition. As Heartbleed, Shellshock, and other CVEs have
shown, no matter how much we work to make something 'easy', it is not
invincible, and admitting a failure is part of the road to recovery. We want
to change the perception of the orphaned pS box, as it was the orphaned box
that was pegged and owned as an easy target last Friday morning all around
the world.

- We intend to document what a 'normal' list of running toolkit processes
look like, to prevent people from having that feeling of panic when they see
a strange perl script running (correctly, or maliciously). This is a very
obvious thing that many of us wouldn't have thought about.

- Some products (e.g. Cacti, JOWAMP) will be removed to reduce (not
eliminate) the risk footprint, and those that choose to use them will have to
make a conscious choice to download and install the tools.

We do appreciate the communities understanding, feedback, and patience this
year. Please continue to offer feedback as you all see fit, either to the
user's list or to the developers directly
().

Thanks;

-jason

On Oct 1, 2014, at 5:14 PM, John-Paul Robinson
<>
wrote:

To other shellshocked perfsonar users:

Our perfsonar node did not have automatic yum updates enabled and was
impacted by a shellshock-related exploit on Sept 26. This is was after both
bash updates had been announced on the perfsonar-user lists, so we may have
survived had automatic updates been enabled by default.

• Lesson learned: run automatic updates.
• Recommendation: It might benefit users to have it default to on in
the perfsonar distribution. Also it would be good if updates were checked
for more than once a day. In our case we would have missed the update
mid-day on Sept 26 and may still have gotten exploited before the next run at
4:00 am on Sept 27. Additionally a perfsonar-announce list might be useful
for hearing stuff even when you have -user discussions turned off.

After receiving a local exploit report I went to check on the machine and
immediately noticed Apache had restarted. Alarmingly, a root-owned process
called fakewww also started at the same start time and oddly so did one named
web100srv. Both of these processes had open ports and logs open. Yikes,
they got root! Killed them. But then they came back after I restarted
httpd, even after `rpm -V httpd` showed no corruption. Oh no! They've really
gotten a hold of the system.
• Lesson learned: not all unfamiliar processes are bad. I later
figured out that these are part of the ndt-server rpm and normal parts of
perfsonar.
• Recommendation: rename fakewww to something meaningful and less
scary to the uninitiated. ndtwwwhelper might be just as good.

Because of the potential for root exploits I ran rpm verifies of core
commands (eg rpm -V procps) some were good some reported prelink
inconsistencies. This caused some concerns at first but as I narrowed down
the exploit it became clear the problems were only due to a prelink bug.

https://access.redhat.com/solutions/25215
https://bugzilla.redhat.com/show_bug.cgi?id=204448

• Lesson learned: other bugs can make things seem worse than they are.
• Recommendation: look up unfamiliar errors before you panic.

Looking further into the state of the system I noticed an '/usr/sbin/sshd -i'
process running as apache and an time-wise unrelated httpd process. lsof
showed these were both perl codes running out of /var/tmp/ with established
tcp connections off site. Very suspicious and killed them.

• Lesson learned: some processes are really bad. The abrtd logged
the event of the first entry into the system via apache and showed the
command vector was bash. This is a very helpful log to determine important
time lines.
• Recommendation: keep your system up to date.

In the end, I traced the exploit down to the two suspicious perl processes
(/var/tmp/x). They were executing an ircbot as apache. There was no root
access to the system and simply clearing out the installed bots from /var/tmp
was a sufficient remedy. There was an attempted install of code to exploit
CVE-2013-2094 but thank fully that's a 3.8 kernel bug and perfsonar is still
on 2.6.

I hope this experience can be useful to others and that the recommendations
can be incorporated into future releases as warranted.

~jpr




Archive powered by MHonArc 2.6.16.

Top of Page