Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Notes from trying perfsonar-toolkit as an lxd container

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Notes from trying perfsonar-toolkit as an lxd container


Chronological Thread 
  • From: Brian Candler <>
  • To: "" <>
  • Subject: [perfsonar-user] Notes from trying perfsonar-toolkit as an lxd container
  • Date: Sun, 7 Aug 2016 11:14:04 +0100
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=to:from:subject :message-id:date:mime-version:content-type; q=dns; s=sasl; b=JgT hPxHYVfsu8PLdmivtMIMGRhIedvIrceydDPEjWnwtYpZnzexr0lpywnPE050L41S EPZXKjUR7cwhbu5pKIP+S9w+HVubFuM+NfbbTfJtI76wvoHFZA03awFkLVvYpIu3 VDHX/XtcUP/AeKYOTyf7CWxrm0pMqzFCOC59dJp0=

I decided to try installing perfsonar-toolkit as an lxd container: i.e.

    lxc launch images:centos/6/amd64 <CONTAINER_NAME>

and then install the perfsonar-toolkit according to the instructions at http://docs.perfsonar.net/install_centos.html

I realise this is an unsupported configuration, but I was doing this as an experiment, ultimately to see if I could run two independent toolkit instances (internal and external) on a low-power box, without the overhead of full virtualisation.

I more-or-less made it work, but with several provisos which I thought I'd document here for reference.

(1) TCP tuning had to be done globally at the host level, not inside the container. That's fair enough, and expected from this approach.

(2) The GUI said that ntp was not synchronised (even though it was - done in the outer host). I'm guessing it's looking for an instance of ntpd to query inside the container, and not finding it.

That could probably be worked around.

(3) However there was a more serious problem. In the GUI, under the section "Test Results", it always displayed " Error loading test listing: Internal Server Error"

Looking in the Chrome developer console I got a bit more detail:

Failed to load resource: the server responded with a status of 500 (Internal Server Error)

https://x.x.x.x/perfsonar-graphs/graphData.cgi?action="test_list&url=http%3A%2F%2Flocalhost%2Fesmond%2Fperfsonar%2Farchive%2F

Looking in the Apache error logs:

[Fri Aug 05 11:37:20 2016] [error] path= ['/usr/lib/esmond/esmond', '/usr/lib/esmond/lib/python2.7', '/usr/lib/esmond/src/dlnetsnmp/l
ib', '/usr/lib/esmond', '/usr/lib/esmond/esmond_client', '/usr/lib/esmond/lib64/python27.zip', '/usr/lib/esmond/lib64/python2.7', '/u
sr/lib/esmond/lib64/python2.7/plat-linux2', '/usr/lib/esmond/lib64/python2.7/lib-tk', '/usr/lib/esmond/lib64/python2.7/lib-old', '/us
r/lib/esmond/lib64/python2.7/lib-dynload', '/opt/rh/python27/root/usr/lib64/python2.7', '/opt/rh/python27/root/usr/lib/python2.7', '/
usr/lib/esmond/lib/python2.7/site-packages']
[Fri Aug 05 11:37:20 2016] [error] path= ['/usr/lib/esmond/esmond', '/usr/lib/esmond/lib/python2.7', '/usr/lib/esmond/src/dlnetsnmp/l
ib', '/usr/lib/esmond', '/usr/lib/esmond/esmond_client', '/usr/lib/esmond/lib64/python27.zip', '/usr/lib/esmond/lib64/python2.7', '/u
sr/lib/esmond/lib64/python2.7/plat-linux2', '/usr/lib/esmond/lib64/python2.7/lib-tk', '/usr/lib/esmond/lib64/python2.7/lib-old', '/us
r/lib/esmond/lib64/python2.7/lib-dynload', '/opt/rh/python27/root/usr/lib64/python2.7', '/opt/rh/python27/root/usr/lib/python2.7', '/
usr/lib/esmond/lib/python2.7/site-packages']
[Fri Aug 05 11:37:22 2016] [error]
[Fri Aug 05 11:37:22 2016] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] mod_wsgi (pid=2510): Exception occurred processing WSGI script '/usr/lib/esmond/esmond/wsgi.py'.
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] Traceback (most recent call last):
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/esmond/wsgi.py", line 28, in application
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     return get_wsgi_application()(environ, start_response)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     response = self.get_response(request)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 218, in get_response
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 264, in handle_uncaught_exception
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     if resolver.urlconf_module is None:
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 395, in urlconf_module
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     self._urlconf_module = import_module(self.urlconf_name)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/opt/rh/python27/root/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     __import__(name)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/esmond/urls.py", line 27, in <module>
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     from esmond.api.perfsonar.api_v2 import (
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]   File "/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 65, in <module>
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]     raise ConnectionException(error_msg)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] ConnectionException: 'Unable to connect to cassandra. Please verify cassandra is running.'

There is just one line in the cassandra log:

# cat /var/log/cassandra/cassandra.log
could not open session

Restarting didn't help:

# service cassandra status
cassandra is stopped
# service cassandra start
Starting Cassandra: OK
# ps auxwww | grep -i cassandra
root      8470  0.0  0.0   6500   548 ?        S+   05:33   0:00 grep -i cassandra
# tail /var/log/cassandra/cassandra.log
could not open session


Now, judging by https://github.com/docker/docker/issues/7056 and https://github.com/docker/docker/issues/7040 this is some sort of capability problem from being within a restricted cgroup. However unlike issue 7056, I was able to "su" to a different user without a problem.

After "yum install strace" I was able to do "strace -f service cassandra start" but it didn't really help too much; however it suggests an issue with pam unable to configure a pam session the way it wants.

...

[pid  8658] open("/etc/security/limits.d/90-nproc.conf", O_RDONLY) = 3
[pid  8658] fstat(3, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0
[pid  8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid  8658] read(3, "# Default limit for number of us"..., 512) = 191
[pid  8658] read(3, "", 512)            = 0
[pid  8658] close(3)                    = 0
[pid  8658] munmap(0x7f8217119000, 4096) = 0
[pid  8658] open("/etc/security/limits.d/cassandra.conf", O_RDONLY) = 3
[pid  8658] fstat(3, {st_mode=S_IFREG|0755, st_size=105, ...}) = 0
[pid  8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid  8658] read(3, "cassandra - memlock unlimited\nca"..., 512) = 105
[pid  8658] read(3, "", 512)            = 0
[pid  8658] close(3)                    = 0
[pid  8658] munmap(0x7f8217119000, 4096) = 0
[pid  8658] setrlimit(RLIMIT_NPROC, {rlim_cur=32*1024, rlim_max=32*1024}) = 0
[pid  8658] setrlimit(RLIMIT_NOFILE, {rlim_cur=100000, rlim_max=100000}) = -1 EPERM (Operation not permitted)
[pid  8658] open("/etc/localtime", O_RDONLY) = 3
[pid  8658] fstat(3, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid  8658] fstat(3, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid  8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid  8658] read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 3584) = 3519
[pid  8658] lseek(3, -2252, SEEK_CUR)   = 1267
[pid  8658] read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 3584) = 2252
[pid  8658] close(3)                    = 0
[pid  8658] munmap(0x7f8217119000, 4096) = 0
[pid  8658] socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 3
[pid  8658] connect(3, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
[pid  8658] sendto(3, "<83>Aug  7 05:39:35 su: pam_limi"..., 105, MSG_NOSIGNAL, NULL, 0) = 105
[pid  8658] setrlimit(RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = -1 EPERM (Operation not permitted)
[pid  8658] sendto(3, "<83>Aug  7 05:39:35 su: pam_limi"..., 106, MSG_NOSIGNAL, NULL, 0) = 106
[pid  8658] setrlimit(RLIMIT_AS, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
[pid  8658] setpriority(PRIO_PROCESS, 0, 0) = 0
[pid  8658] getuid()                    = 0
[pid  8658] getuid()                    = 0
[pid  8658] access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or directory)
[pid  8658] open("/var/run/utmp", O_RDONLY|O_CLOEXEC) = 4
[pid  8658] lseek(4, 0, SEEK_SET)       = 0
[pid  8658] alarm(0)                    = 0
[pid  8658] rt_sigaction(SIGALRM, {0x7f821687a890, [], SA_RESTORER, 0x7f8216788660}, {SIG_DFL, [], 0}, 8) = 0
[pid  8658] alarm(10)                   = 0
[pid  8658] fcntl(4, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid  8658] read(4, "\6\0\0\0\272!\0\0tty[1-6]\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
[pid  8658] read(4, "\2\0\0\0\0\0\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
[pid  8658] read(4, "\1\0\0\00033\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
[pid  8658] read(4, "\10\0\0\0<\0\0\0console\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
[pid  8658] read(4, "", 384)            = 0
[pid  8658] fcntl(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid  8658] alarm(0)                    = 10
[pid  8658] rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, 0x7f8216788660}, NULL, 8) = 0
[pid  8658] close(4)                    = 0
[pid  8658] getuid()                    = 0
[pid  8658] sendto(3, "<86>Aug  7 05:39:35 su: pam_unix"..., 90, MSG_NOSIGNAL, NULL, 0) = 90
[pid  8658] access("/usr/X11R6/bin/xauth", X_OK) = -1 ENOENT (No such file or directory)
[pid  8658] access("/usr/bin/xauth", X_OK) = -1 ENOENT (No such file or directory)
[pid  8658] access("/usr/bin/X11/xauth", X_OK) = -1 ENOENT (No such file or directory)
[pid  8658] socket(PF_NETLINK, SOCK_RAW, 9) = 4
[pid  8658] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid  8658] sendto(4, "p\0\0\0Q\4\5\0\3\0\0\0\0\0\0\0op=PAM:session_o"..., 112, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 112
[pid  8658] poll([{fd=4, events=POLLIN}], 1, 500) = 1 ([{fd=4, revents=POLLIN}])
[pid  8658] recvfrom(4, "\204\0\0\0\2\0\0\0\3\0\0\0\322!\0\0\221\377\377\377p\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 132
[pid  8658] recvfrom(4, "\204\0\0\0\2\0\0\0\3\0\0\0\322!\0\0\221\377\377\377p\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 132
[pid  8658] close(4)                    = 0
[pid  8658] write(2, "could not open session\n", 23) = 23

...

Since I don't see a way to set individual capabilities in lxd containers, I did:

lxc config set <CONTAINER_NAME> security.privileged true
lxc restart <CONTAINER_NAME>

After this it appears to be happy again.

So: there we are. I'm not really happy about running this way - apparently it is easier to break out of privileged containers. So maybe I should just run two fully independent KVM virtual machines (or better, buy two of these small servers)

Regards,

Brian Candler


  • [perfsonar-user] Notes from trying perfsonar-toolkit as an lxd container, Brian Candler, 08/07/2016

Archive powered by MHonArc 2.6.19.

Top of Page