I decided to try installing perfsonar-toolkit as an lxd
container: i.e.
lxc launch images:centos/6/amd64 <CONTAINER_NAME>
and then install the perfsonar-toolkit according to the
instructions at http://docs.perfsonar.net/install_centos.html
I realise this is an unsupported configuration, but I was doing
this as an experiment, ultimately to see if I could run two
independent toolkit instances (internal and external) on a
low-power box, without the overhead of full virtualisation.
I more-or-less made it work, but with several provisos which I
thought I'd document here for reference.
(1) TCP tuning had to be done globally at the host level, not
inside the container. That's fair enough, and expected from this
approach.
(2) The GUI said that ntp was not synchronised (even though it
was - done in the outer host). I'm guessing it's looking for an
instance of ntpd to query inside the container, and not finding
it.
That could probably be worked around.
(3) However there was a more serious problem. In the GUI, under
the section "Test Results", it always displayed " Error
loading test listing: Internal Server Error"
Looking in the Chrome developer console I got a bit more detail:
Failed to load resource: the server responded with a status of
500 (Internal Server Error)
https://x.x.x.x/perfsonar-graphs/graphData.cgi?action="test_list&url=http%3A%2F%2Flocalhost%2Fesmond%2Fperfsonar%2Farchive%2F
Looking in the Apache error logs:
[Fri Aug 05 11:37:20 2016] [error] path=
['/usr/lib/esmond/esmond', '/usr/lib/esmond/lib/python2.7',
'/usr/lib/esmond/src/dlnetsnmp/l
ib', '/usr/lib/esmond', '/usr/lib/esmond/esmond_client',
'/usr/lib/esmond/lib64/python27.zip',
'/usr/lib/esmond/lib64/python2.7', '/u
sr/lib/esmond/lib64/python2.7/plat-linux2',
'/usr/lib/esmond/lib64/python2.7/lib-tk',
'/usr/lib/esmond/lib64/python2.7/lib-old', '/us
r/lib/esmond/lib64/python2.7/lib-dynload',
'/opt/rh/python27/root/usr/lib64/python2.7',
'/opt/rh/python27/root/usr/lib/python2.7', '/
usr/lib/esmond/lib/python2.7/site-packages']
[Fri Aug 05 11:37:20 2016] [error] path=
['/usr/lib/esmond/esmond', '/usr/lib/esmond/lib/python2.7',
'/usr/lib/esmond/src/dlnetsnmp/l
ib', '/usr/lib/esmond', '/usr/lib/esmond/esmond_client',
'/usr/lib/esmond/lib64/python27.zip',
'/usr/lib/esmond/lib64/python2.7', '/u
sr/lib/esmond/lib64/python2.7/plat-linux2',
'/usr/lib/esmond/lib64/python2.7/lib-tk',
'/usr/lib/esmond/lib64/python2.7/lib-old', '/us
r/lib/esmond/lib64/python2.7/lib-dynload',
'/opt/rh/python27/root/usr/lib64/python2.7',
'/opt/rh/python27/root/usr/lib/python2.7', '/
usr/lib/esmond/lib/python2.7/site-packages']
[Fri Aug 05 11:37:22 2016] [error]
[Fri Aug 05 11:37:22 2016] [error] Unable to connect - presuming
stand-alone testing mode...
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] mod_wsgi
(pid=2510): Exception occurred processing WSGI script
'/usr/lib/esmond/esmond/wsgi.py'.
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] Traceback
(most recent call last):
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/esmond/wsgi.py", line 28, in application
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] return
get_wsgi_application()(environ, start_response)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py",
line 189, in __call__
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] response
= self.get_response(request)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py",
line 218, in get_response
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] response
= self.handle_uncaught_exception(request, resolver,
sys.exc_info())
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/handlers/base.py",
line 264, in handle_uncaught_exception
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] if
resolver.urlconf_module is None:
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py",
line 395, in urlconf_module
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]
self._urlconf_module = import_module(self.urlconf_name)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/opt/rh/python27/root/usr/lib64/python2.7/importlib/__init__.py",
line 37, in import_module
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]
__import__(name)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/esmond/urls.py", line 27, in <module>
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] from
esmond.api.perfsonar.api_v2 import (
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] File
"/usr/lib/esmond/esmond/api/perfsonar/api_v2.py", line 65, in
<module>
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1] raise
ConnectionException(error_msg)
[Fri Aug 05 11:37:22 2016] [error] [client 127.0.0.1]
ConnectionException: 'Unable to connect to cassandra. Please
verify cassandra is running.'
There is just one line in the cassandra log:
# cat /var/log/cassandra/cassandra.log
could not open session
Restarting didn't help:
# service cassandra status
cassandra is stopped
# service cassandra start
Starting Cassandra: OK
# ps auxwww | grep -i cassandra
root 8470 0.0 0.0 6500 548 ? S+ 05:33 0:00
grep -i cassandra
# tail /var/log/cassandra/cassandra.log
could not open session
Now, judging by https://github.com/docker/docker/issues/7056
and https://github.com/docker/docker/issues/7040
this is some sort of capability problem from being within a
restricted cgroup. However unlike issue 7056, I was able to "su"
to a different user without a problem.
After "yum install strace" I was able to do "strace -f service
cassandra start" but it didn't really help too much; however it
suggests an issue with pam unable to configure a pam session the
way it wants.
...
[pid 8658] open("/etc/security/limits.d/90-nproc.conf",
O_RDONLY) = 3
[pid 8658] fstat(3, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0
[pid 8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid 8658] read(3, "# Default limit for number of us"..., 512) =
191
[pid 8658] read(3, "", 512) = 0
[pid 8658] close(3) = 0
[pid 8658] munmap(0x7f8217119000, 4096) = 0
[pid 8658] open("/etc/security/limits.d/cassandra.conf",
O_RDONLY) = 3
[pid 8658] fstat(3, {st_mode=S_IFREG|0755, st_size=105, ...}) = 0
[pid 8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid 8658] read(3, "cassandra - memlock unlimited\nca"..., 512) =
105
[pid 8658] read(3, "", 512) = 0
[pid 8658] close(3) = 0
[pid 8658] munmap(0x7f8217119000, 4096) = 0
[pid 8658] setrlimit(RLIMIT_NPROC, {rlim_cur=32*1024,
rlim_max=32*1024}) = 0
[pid 8658] setrlimit(RLIMIT_NOFILE, {rlim_cur=100000,
rlim_max=100000}) = -1 EPERM (Operation not permitted)
[pid 8658] open("/etc/localtime", O_RDONLY) = 3
[pid 8658] fstat(3, {st_mode=S_IFREG|0644, st_size=3519, ...}) =
0
[pid 8658] fstat(3, {st_mode=S_IFREG|0644, st_size=3519, ...}) =
0
[pid 8658] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8217119000
[pid 8658] read(3,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"...,
3584) = 3519
[pid 8658] lseek(3, -2252, SEEK_CUR) = 1267
[pid 8658] read(3,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"...,
3584) = 2252
[pid 8658] close(3) = 0
[pid 8658] munmap(0x7f8217119000, 4096) = 0
[pid 8658] socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 3
[pid 8658] connect(3, {sa_family=AF_LOCAL, sun_path="/dev/log"},
110) = 0
[pid 8658] sendto(3, "<83>Aug 7 05:39:35 su: pam_limi"...,
105, MSG_NOSIGNAL, NULL, 0) = 105
[pid 8658] setrlimit(RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY,
rlim_max=RLIM64_INFINITY}) = -1 EPERM (Operation not permitted)
[pid 8658] sendto(3, "<83>Aug 7 05:39:35 su: pam_limi"...,
106, MSG_NOSIGNAL, NULL, 0) = 106
[pid 8658] setrlimit(RLIMIT_AS, {rlim_cur=RLIM64_INFINITY,
rlim_max=RLIM64_INFINITY}) = 0
[pid 8658] setpriority(PRIO_PROCESS, 0, 0) = 0
[pid 8658] getuid() = 0
[pid 8658] getuid() = 0
[pid 8658] access("/var/run/utmpx", F_OK) = -1 ENOENT (No such
file or directory)
[pid 8658] open("/var/run/utmp", O_RDONLY|O_CLOEXEC) = 4
[pid 8658] lseek(4, 0, SEEK_SET) = 0
[pid 8658] alarm(0) = 0
[pid 8658] rt_sigaction(SIGALRM, {0x7f821687a890, [],
SA_RESTORER, 0x7f8216788660}, {SIG_DFL, [], 0}, 8) = 0
[pid 8658] alarm(10) = 0
[pid 8658] fcntl(4, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET,
start=0, len=0}) = 0
[pid 8658] read(4,
"\6\0\0\0\272!\0\0tty[1-6]\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
384) = 384
[pid 8658] read(4,
"\2\0\0\0\0\0\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
384) = 384
[pid 8658] read(4,
"\1\0\0\00033\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
384) = 384
[pid 8658] read(4,
"\10\0\0\0<\0\0\0console\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
384) = 384
[pid 8658] read(4, "", 384) = 0
[pid 8658] fcntl(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET,
start=0, len=0}) = 0
[pid 8658] alarm(0) = 10
[pid 8658] rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER,
0x7f8216788660}, NULL, 8) = 0
[pid 8658] close(4) = 0
[pid 8658] getuid() = 0
[pid 8658] sendto(3, "<86>Aug 7 05:39:35 su: pam_unix"...,
90, MSG_NOSIGNAL, NULL, 0) = 90
[pid 8658] access("/usr/X11R6/bin/xauth", X_OK) = -1 ENOENT (No
such file or directory)
[pid 8658] access("/usr/bin/xauth", X_OK) = -1 ENOENT (No such
file or directory)
[pid 8658] access("/usr/bin/X11/xauth", X_OK) = -1 ENOENT (No
such file or directory)
[pid 8658] socket(PF_NETLINK, SOCK_RAW, 9) = 4
[pid 8658] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 8658] sendto(4,
"p\0\0\0Q\4\5\0\3\0\0\0\0\0\0\0op=PAM:session_o"..., 112, 0,
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 112
[pid 8658] poll([{fd=4, events=POLLIN}], 1, 500) = 1 ([{fd=4,
revents=POLLIN}])
[pid 8658] recvfrom(4,
"\204\0\0\0\2\0\0\0\3\0\0\0\322!\0\0\221\377\377\377p\0\0\0Q\4\5\0\3\0\0\0"...,
8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0,
groups=00000000}, [12]) = 132
[pid 8658] recvfrom(4,
"\204\0\0\0\2\0\0\0\3\0\0\0\322!\0\0\221\377\377\377p\0\0\0Q\4\5\0\3\0\0\0"...,
8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0,
groups=00000000}, [12]) = 132
[pid 8658] close(4) = 0
[pid 8658] write(2, "could not open session\n", 23) = 23
...
Since I don't see a way to set individual capabilities in lxd
containers, I did:
lxc config set <CONTAINER_NAME> security.privileged true
lxc restart <CONTAINER_NAME>
After this it appears to be happy again.
So: there we are. I'm not really happy about running this way -
apparently it is easier to break out of privileged containers. So
maybe I should just run two fully independent KVM virtual machines
(or better, buy two of these small servers)
Regards,
Brian Candler