Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] ShellShocked perfSONAR: A Case Study in Integrity Resilience

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] ShellShocked perfSONAR: A Case Study in Integrity Resilience


Chronological Thread 
  • From: "Nickless, Bill" <>
  • To: "" <>
  • Cc: "Henderson, Dale L" <>, "Lancaster, Mary" <>, "McKinnon, A David" <>, "Tollbom, S Cullen" <>, "Younkin, Chance R" <>
  • Subject: [perfsonar-user] ShellShocked perfSONAR: A Case Study in Integrity Resilience
  • Date: Wed, 8 Oct 2014 18:10:43 +0000
  • Accept-language: en-US

On 1 October 2014 at 4:18 PM Pacific Time, Jason Zurawski wrote:

| perfSONAR is not an appliance, and 2014 was the year that really made
| everyone (community members, stakeholders, developers) reconsider what
| it should be.

The PNNL Asymmetric Resilient Cybersecurity initiative [1] is grappling
with how to think more clearly about problems like this. My colleagues
Dale Henderson, Mary Lancaster, David McKinnon, Cullen Tollbom, Chance
Younkin and I are working on a subsidiary project to develop resilience
metrics. This situation is a fascinating case study. They get credit
for the following good ideas but any mistakes are my responsibility.

One of our current working definitions of resiliency is "the ability of
a system to fulfill its functions in the presence of impediments". Or
more colloquially, we ask the question "how many things can go wrong
and it's still OK?" The practice of resiliency is well developed when
it comes to availability; we know how to reason about investing in
redundant power supplies, RAID controllers, dynamic routing protocols,
and geographic diversity. But in his paper Resilience is More than
Availability [2], Matt Bishop argues that we should also consider the
resilience of systems in the Integrity and Confidentiality dimensions.

PERFSONAR FUNCTIONS

Before we talk about improving perfSONAR resiliency we have to clearly
articulate its functions. perfSONAR provides several functions today:

1. On-demand throughput tests (e.g. bwctl, owamp)
2. End-user throughput diagnostic tests (e.g. Web100-enabled NDT/NPAD)
3. Discovery/directory services (Lookup Service)
4. Web-based configuration
5. Platform for useful tools (e.g. CACTI)
6. Scheduled throughput tests (BUOY)

There's also an implicit function for any Research & Education / public
Internet facing host:

7. First, do no harm (resist takeover and exploitation
by unauthorized parties)

AN OBSERVED IMPEDIMENT TO INTEGRITY

The shellshock vulnerability represents an impediment to integrity, just
as a failed power supply is an impediment to availability. In retrospect
we can see this impediment overcame all the available integrity resiliency,
causing perfSONAR hosts to be unable to meet function #7 (first, do no
harm) by being incorporated into one or more botnets.

On 1 October 2014 at 2:14 PM Pacific John-Paul Robinson wrote:

| The abrtd logged the event of the first entry into the system via apache
| and showed the command vector was bash.

Apache is included in perfSONAR to support function #4 (Web-based
configuration). John-Paul goes on:

| In the end, I traced the exploit down to the two suspicious perl
| processes (/var/tmp/x). They were executing an ircbot as apache.
| There was no root access to the system and simply clearing out the
| installed bots from /var/tmp was a sufficient remedy.

IMPROVING INTEGRITY RESILIENCE

We often install multiple power supplies to improve resilience against
availability impediments. In the same vein, what could be done to improve
resilience against integrity impediments in perfSONAR?

Integrity resilience comes in two forms. First is a class of "even if"
technical features; they provide additional depth of resilience that
passively ensure function "even if" an impediment is encountered.

The second class are detection/recovery features; they provide an
indication that an impediment exists and a human needs to take some action
to restore the full intended depth of resilience. (Again by analogy,
redundant power supplies often include a light or buzzer to indicate
a non-catastrophic failure, so that a human can [proactively] restore
the full depth of resilience before failure and/or [reactively] easily
figure out what broke to restore function after failure.)

Here are a few ideas for additional passive/"even if" integrity resilience
features:

A. Reconsider allowing all outbound TCP/IP connections. Once in place
the ircbot was able to successfully perform its function because all
outbound TCP/IP sockets were permitted by default. Would restrictions
on outbound TCP/IP sockets have enabled perfSONAR installations to
fulfil their functions with integrity, even with an exploitable
apache/bash and the ircbot software running?

B. Consider separating the perfSONAR functions. [3] already describes
how to set up a "Level 1" installation that only fulfils function #1.
I'm guessing that "Level 1" perfSONAR installations could fulfil
their (more limited) functions even running vulnerable bash.

More formally: given the monolithic nature of perfSONAR, the integrity
resilience of each perfSONAR function is no better than the integrity
resilience of the least resilient function, and may be less than that
due to unintended interactions between the functions.

C. Consider using a hypervisor to isolate functions. One might not
have enough money, space, power or cooling to run separate physical
perfSONAR hosts for each function. Just as Amazon Web Services uses
Xen [4] to keep customer VMs isolated, perhaps a future generation
of perfSONAR could be a "cloud in a box"?

Even if a perfSONAR functional virtual machine kernel was compromised,
use of a Xen Driver Domain [5] could provide additional integrity
resilience:

"Because of the nature of network protocols and routing, there is a
higher risk of an exploitable bug existing somewhere in the network
path (host driver, bridging, filtering, &c). Putting this in a
separate, unprivileged domain limits the value of attacking the
network stack: even if they succeed, they have no more access than
a normal unprivileged VM."

D. Consider preferring external MySQL services for BUOY. Amazon Web
Services provides MySQL [6] in their Free Usage Tier [7]. Alternatively,
many institutions running perfSONAR might already run MySQL for
other purposes.

In a .cloud in a box. architecture, one could leverage existing MySQL
virtual appliances (e.g. [8]). Even if the MySQL VM were compromised
by the addition of unwanted software like ircbot, hypervisor network
protections could block outbound traffic. This would add depth to
the passive integrity resilience of the overall system.

Part of the allure of an "appliance" is they (theoretically) don't require
ongoing maintenance. But intelligent adversaries (or naive users of
ever-more sophisticated intrusion tools) don't stop throwing up impediments
if a system overcomes the first one (or several). To achieve function #7
(first do no harm), our perfSONAR instances should be integrated into
the cyber defense process at each of our sites.

What additional technical features could support detection and recovery
of integrity resilience?

E. Include explicit configuration for logging events to Security Incident
Event Managers (SIEMs). Many institutional cyber security organizations
run SIEMs already, and can accept (e.g.) SYSLOG records. Integrators
of the various perfSONAR services might ensure that logging is being
fed to the platform's SYSLOG collector. perfSONAR configuration
mechanisms and training could explicitly support integration with
popular SIEMs like Arcsight [9] and Splunk [10].

F. Tripwire. The Bash exploit gained apache user level access. What
might have been compromised besides the IRC bot net dropper in /tmp?
A custom tuned Tripwire [11] configuration might have identified
the unusual files in /tmp, leveraging the fact that perfSONAR hosts
generally don't host interactive users that put unpredictable files
in /tmp.

Another way of reducing exposure to integrity impediments is to eliminate
exposed system components altogether:

G. Consider alternative configuration approaches. [12] is a good first
start, but its scope is limited to application configuration.

Large Linux-based cluster supercomputers don't expose Apache for
system administrators to configure each node. Instead, they use
tools like the Open Source Cluster Application Resources (OSCAR [13])
to "install[] and configure[] all required software for the selected
packages according to user input. Then it creates customized disk
images which are used to provision the computational nodes in the
cluster with all the client software and administrative tools needed
for immediate use."

SUSE's Yet another Setup Tool (YaST [14]) includes AutoYaST [15];
its control file is in XML and it includes a configuration interface
called the Configuration Management System (CMS).

See also Trey Dockendorf's survey of host configuration tools [16].

H. Eliminate CACTI support. Yes, I am on record arguing for its
inclusion [17]. But I've convinced myself by the above argument
(i.e. given the monolithic nature of perfSONAR, the integrity
resilience of each perfSONAR function is at best equivalent to the
integrity resilience of the least resilient function) that it's not
worth the risk.

Thank you for your consideration, and best regards,

Bill Nickless /

/ +1 509 713 2455

[1] http://cybersecurity.pnnl.gov/arc.stm
[2] http://nob.cs.ucdavis.edu/bishop/papers/2011-nspw/resilience.pdf
[3] https://fasterdata.es.net/performance-testing/perfsonar/ps-howto/level-1/
[4] http://www.xenproject.org/
[5] http://wiki.xen.org/wiki/Driver_Domain
[6] http://aws.amazon.com/rds/mysql/
[7] http://aws.amazon.com/rds/free/
[8] http://www.turnkeylinux.org/mysql
[9] http://www8.hp.com/us/en/software-solutions/siem-arcsight/
[10] http://www.splunk.com/
[11] http://www.tripwire.org/
[12] http://docs.perfsonar.net/multi_server_config.html
[13] http://svn.oscar.openclustergroup.org/trac/oscar
[14] http://en.wikipedia.org/wiki/YaST
[15] http://doc.opensuse.org/projects/autoyast/introduction.html
[16]
https://lists.internet2.edu/sympa/arc/perfsonar-user/2014-10/msg00053.html
[17]
https://lists.internet2.edu/sympa/arc/performance-node-users/2014-02/msg00034.html


  • [perfsonar-user] ShellShocked perfSONAR: A Case Study in Integrity Resilience, Nickless, Bill, 10/08/2014

Archive powered by MHonArc 2.6.16.

Top of Page