perfsonar-user - Re: [perfsonar-user] new install problems with test results and adding tests [solved]

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] new install problems with test results and adding tests [solved]

From: "Fligor, Debbie" <>
To: "" <>
Cc: "Fligor, Debbie" <>, Michael Johnson <>
Subject: Re: [perfsonar-user] new install problems with test results and adding tests [solved]
Date: Tue, 19 Mar 2019 23:23:13 +0000

I sorted out the problem with saving tests giving me an Internal Server
Error.

If you mix default and non-default host configs in the same test, it gives a
server error. The work around is that for any single saved test, have
everything on the hosts the same.

Details:

a default config line is written like this in gui-tasks.conf:

target host.fqdn.edu

a non-default config looks like this in gui-tasks.conf:

<target>
address host.fqdn.edu
<override_parameters>
type bwping
force_ipv4 1
</override_parameters>
</target>

It will give an error if you mix and match those in the same test. So if I
made a ping test where some hosts were IPv4 only (non-default) and some were
both IPv4 and IPv6 (default). It will error when you try and save the tests
100% of the time. The config file is saved, but can’t be parsed. If I make
two separate tests, one with the IPv4 only hosts, and one with the dual-stack
hosts, it will work fine 100% of the time.

If you do get an error, I had the best luck recovering if I deleted or
erased/remade the file and started again, because the “cancel” button does
not re-write the gui-tasks.conf file back to what it was before it tried to
save. I tried just removing the “bad” test entry and re-loading, but that
didn’t consistently work, although once I figured out a consistent work
around I quit messing with it.

-debbie

> On Mar 6, 2019, at 9:31, Fligor, Debbie <> wrote:
>
> Thanks Michael,
>
> Sorry to put so much into one email. With the problem with the task file
> showing up part way through working on the “why don’t the tests results
> show up” problem, I ended up with everything rolled together and I wasn’t
> sure which things might be inter-related. I’ve provided some more details
> on the 3 issues, with details below. In short, the updates were on but not
> working, now they are on and working. Removing and replacing the
> gui-tasks.conf file doesn’t help, and the graphs not showing up with the
> older code but working after an upgrade would be hard to troubleshoot.
>
>> On Mar 5, 2019, at 14:39, Michael Johnson <> wrote:
>>
>> Hi Debbie,
>>
>> That's a lot all at once, I will try to answer all your questions, but let
>> me know if I've missed anything. Answers inline below:
>>
>>
>>> On Mar 5, 2019, at 8:00 AM, Fligor, Debbie <> wrote:
>>>
>>> Hi everyone,
>>>
>>> First thing I figured out was that even though the button for auto update
>>> was set, it hadn’t done any updates on either system. So I did by-hand
>>> yum update and it got everything up to the current versions as best I can
>>> tell (psconfig is 4.1.6-1el7). So that’s my first ask - what else do I
>>> need to do for auto update to work?
>>>
>>
>> Setting auto-updates via the GUI should work; does it show that
>> auto-updates are enabled ("green" switch)?
>
> yes, they both showed a green switch before and after using yum update to
> bring them up to current version.
>
>>
>> Regardless, you can enable/disable auto updates from the commandline. This
>> is handled by a service called yum-cron
>>
>> To view the current status of auto-updates:
>> $ sudo systemctl status yum-cron
>>
>
> This is on.
>
> ● yum-cron.service - Run automatic yum updates as a cron job
> Loaded: loaded (/usr/lib/systemd/system/yum-cron.service; enabled; vendor
> preset: disabled)
> Active: active (exited) since Mon 2019-03-04 23:08:46 CST; 21h ago
> Process: 12029 ExecStart=/bin/touch /var/lock/subsys/yum-cron
> (code=exited, status=0/SUCCESS)
> Main PID: 12029 (code=exited, status=0/SUCCESS)
> CGroup: /system.slice/yum-cron.service
>
> Mar 04 23:08:46 res-perfsonar.techservices.illinois.edu systemd[1]:
> Starting Run automatic yum updates as a cron job...
> Mar 04 23:08:46 res-perfsonar.techservices.illinois.edu systemd[1]: Started
> Run automatic yum updates as a cron job.
>
>
> I noticed that the bottom two lines look like they came out of a log file,
> so I looked for similar messages:
>
> [root@res-perfsonar log]# grep " yum updates" *
> grep: audit: Is a directory
> boot.log-20181207: Starting Run automatic yum updates as a cron
> job...
> boot.log-20181207:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20181208: Starting Run automatic yum updates as a cron
> job...
> boot.log-20181208:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190205: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190205:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190209: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190209:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190302: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190302:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190302: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190302:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190305: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190305:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190305: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190305:[ OK ] Started Run automatic yum updates as a cron job.
> boot.log-20190305: Starting Run automatic yum updates as a cron
> job...
> boot.log-20190305:[ OK ] Started Run automatic yum updates as a cron job.
> grep: cacti: Is a directory
> grep: cassandra: Is a directory
> grep: chrony: Is a directory
> grep: cups: Is a directory
> grep: esmond: Is a directory
> grep: httpd: Is a directory
> messages:Mar 4 10:27:51 res-perfsonar systemd: Starting Run automatic yum
> updates as a cron job...
> messages:Mar 4 10:27:51 res-perfsonar systemd: Started Run automatic yum
> updates as a cron job.
> messages:Mar 4 22:13:40 res-perfsonar systemd: Stopping Run automatic yum
> updates as a cron job...
> messages:Mar 4 22:15:05 res-perfsonar systemd: Starting Run automatic yum
> updates as a cron job...
> messages:Mar 4 22:15:05 res-perfsonar systemd: Started Run automatic yum
> updates as a cron job.
> messages:Mar 4 23:08:46 res-perfsonar systemd: Starting Run automatic yum
> updates as a cron job...
> messages:Mar 4 23:08:46 res-perfsonar systemd: Started Run automatic yum
> updates as a cron job.
> messages-20190210:Feb 8 09:05:45 res-perfsonar systemd: Starting Run
> automatic yum updates as a cron job...
> messages-20190210:Feb 8 09:05:45 res-perfsonar systemd: Started Run
> automatic yum updates as a cron job.
> messages-20190303:Mar 1 13:29:58 res-perfsonar systemd: Stopping Run
> automatic yum updates as a cron job...
> messages-20190303:Mar 1 13:31:26 res-perfsonar systemd: Starting Run
> automatic yum updates as a cron job...
> messages-20190303:Mar 1 13:31:26 res-perfsonar systemd: Started Run
> automatic yum updates as a cron job.
> messages-20190303:Mar 1 14:00:59 res-perfsonar systemd: Starting Run
> automatic yum updates as a cron job...
> messages-20190303:Mar 1 14:00:59 res-perfsonar systemd: Started Run
> automatic yum updates as a cron job.
>
>
> [root@res-perfsonar log]# ls messages*
> messages messages-20190210 messages-20190217 messages-20190224
> messages-20190303
>
>
> As you can see, it appeared to be turning on every boot. if I grep for
> “yum” I get a lot more hits, showing the cron entry that runs yum-hourly,
> but the only log entries that show “yum[pid]: Updated” lines are from 3/4
> when I ran it by hand, and 3 entries from 3/6. Nothing at all in the
> earlier logs. It appears the same on both systems.
>
> So it looks like it’s working now. I’m not sure there’s anything left to
> dig into here. I will not trust it in the future without checking on it,
> but I probably should have been checking on it anyway.
>
>> To enable auto-updates
>> $ sudo systemctl enable yum-cron
>>
>> Docs here:
>> http://docs.perfsonar.net/manage_update.html#managing-automatic-updates-from-the-command-line
>>
>>>
>>> Then I added some ping tests so that I should start seeing results
>>> sooner. I started getting errors when I changed tests, and would hit
>>> cancel, and try again, and fairly soon on that host clicking on the tests
>>> tab got me a spinning load circle that never resolved. So I thought I’d
>>> corrupted a database or something on one of the two servers.
>>
>> FWIW, there is no database backend for the test configs.
>
> thanks. good to know.
>
> [snipped details]
>
>> Indeed, you should not need to hand-edit gui-tasks.conf. I think two
>> things are happening. First, there might be something invalid in that
>> config file. The easiest way to fix this is probably to just delete it
>> (don't forget to back it up first), and then re-create it, as an empty
>> file.
>
>>
>> $ sudo rm -f /var/lib/perfsonar/toolkit/gui-tasks.conf
>> $ sudo touch /var/lib/perfsonar/toolkit/gui-tasks.conf
>>
>>
>> And, when you edited the file it may have gotten saved with the wrong
>> permissions, at which point the web backend can't edit it. On my system,
>> it looks like this:
>>
>> $ ls -l /var/lib/perfsonar/toolkit/gui-tasks.conf
>> -rw-r--r-- 1 perfsonar perfsonar 6867 Sep 28 15:02
>> /var/lib/perfsonar/toolkit/gui-tasks.conf
>>
>> Its owner and group are "perfsonar", and it's rw for user, and read-only
>> for everyone else. You could restore those permissions like this:
>> $ sudo chmod 0644 /var/lib/perfsonar/toolkit/gui-tasks.conf
>> $ sudo chown perfsonar:perfsonar /var/lib/perfsonar/toolkit/gui-tasks.conf
>>
>>
>
>
> Just to be clear, I had made throughput and latency tests, saved, edited,
> etc. with no problem on Friday 3/1, before upgrading. All the problems
> happened as I was troubleshooting the lack of graphs, and I think that they
> happened after the yum upgrade on Monday 3/4.
>
> When I first moved the file to gui-tasks.conf- and tried to recreate it the
> ownership was wrong (root was owner and group). So I moved the file back,
> made a copy for backup, and used vi to edit it and blank it out. After that
> it would let me add and save configurations, for a little while.
>
> For example I just copied my current config to a backup file, and then
> added a host as a latency test. it saved with no problem. Then I added the
> same host as a ping test, and went to save it, and got "Error - Internal
> Server Error”. This was probably already in a ping list, so I dismissed the
> error and hit cancel. everything looked fine from the gui. Then I chose
> edit to an existing test, and just changed the name. Clicked “okay” then
> “save” and again got an internal server error. At this point I can either
> blank the file, or edit it if I don’t want to have to start from scratch.
> I’ve been able to replicate this 100% on both servers, starting from an
> empty gui-tasks.conf file. To be sure it was really empty (and vi didn’t
> leave anything hidden), I just did it again, starting from touching a file,
> setting user/group, and checking permissions:
>
>
> [root@res-perfsonar toolkit]# mv gui-tasks.conf gui-tasks.conf--
> [root@res-perfsonar toolkit]# ls
> gui-tasks.conf-- gui-tasks.conf-2019-03-05-2048 gui-tasks.conf-bak
> [root@res-perfsonar toolkit]# touch gui-tasks.conf
> [root@res-perfsonar toolkit]# chown perfsonar gui-tasks.conf
> [root@res-perfsonar toolkit]# chgrp perfsonar gui-tasks.conf
> [root@res-perfsonar toolkit]# ls -l
> total 24
> -rw-r--r-- 1 perfsonar perfsonar 0 Mar 5 20:56 gui-tasks.conf
> -rw-r--r-- 1 perfsonar perfsonar 5020 Mar 5 20:55 gui-tasks.conf--
> -rw-r--r-- 1 root root 4135 Mar 5 20:48
> gui-tasks.conf-2019-03-05-2048
> -rw-r--r-- 1 root root 5940 Mar 4 22:20 gui-tasks.conf-bak
>
> I added throughput tests and ping tests, and saved it with no problem. then
> I added a latency test, and got a server error. This one is slightly
> different:
>
> [Tue Mar 05 20:58:57.931578 2019] [cgi:error] [pid 289353] [client
> 107.152.10.165:1877] AH01215: [Tue Mar 5 20:58:57 2019]
> regular_testing.cgi: Can't call method "json" on an undefined value at
> /usr/lib/perfsonar/web-ng/root/admin/services/../../../../lib/perfSONAR_PS/NPToolkit/Config/RegularTesting.pm
> line 128., referer:
> https://res-perfsonar.techservices.illinois.edu/toolkit/auth/admin/tests.cgi
>
> If I dismiss the error, and then press “cancel” instead of save. the
> gui-tasks.conf file does not appear to revert, it still has the stanza for
> the latency test, and can’t save.
>
> While I’m sure I could have done something that’s messing up the ability to
> cancel out of errors, if there’s any use of watching this happen, it’s
> really easy to replicate and I’d be happy to set up a screen share and find
> out why it can’t keep adding tests, on the off chance it’s not my install,
> but an actual problem.
>
>> At that point, hopefully you will be able to create a configuration, and
>> save it.
>>
>>>
>>> And once I had that solved, the host that I had deleted and re-installed
>>> perfsonar-toolkit on started showing test results (on the dashboard and
>>> the esmond archive), but the other host did not. Rebooting didn’t make
>>> the tests show up. So on that host I also uninstalled and reinstalled
>>> perfsonar-toolkit and now the tests results show up. On this one I’m
>>> curios as to why this worked, but since it’s working and continued to
>>> update all night, I’m not looking for a better way to do it.
>>
>> My guess is that maybe you didn't wait long enough for the test results to
>> show up, and reinstalling/rebooting most likely was just a coincidence.
>> Hard to say.
>
>
> I successfully made and saved tests on Friday, and there were still none
> displaying on Monday. It wasn’t until Monday that I upgraded versions, and
> started having problems saving tests. When I added ping tests (And finally
> got them to save) to the host that I had reinstalled on earlier, the first
> two showed up almost immediately, but they didn’t show up on the other
> server. I gave it more than 5 minutes, ran some of the cassandra and could
> see them in the log file for pscheduler. While it could have been
> coincidence, it really didn’t seem like it. This one there’s no point in
> really going after, there’s nothing I can do to replicate it that I can
> think of, short of re-installing the servers from the old image and seeing
> if it happens again. I’d rather not spend the time that would take.
>
>
>>
>>> Now I’m moving on to system tuning.
>>>
>>> One last minor question for anyone that got this far. How do I set the
>>> primary (default) interface so it’s what I want? one system is picking
>>> the 100G, the other the 1 G copper port. on both systems the 100G has
>>> the default route for v4 and v6.
>>
>>
>> You can set the primary interface in
>> /usr/lib/perfsonar/web-ng/etc/web_admin.conf by setting the
>> "primary_interface" directive.
>
> thanks.
>
>>
>> Hope this helps.
>>
>> Thanks,
>> Michael
>>
>> Michael Johnson
>> GlobalNOC DevOps Engineer
>>
>>
>
>
> --
> -debbie
> Debbie Fligor, n9dn Lead Network Engineer @ Univ. of Il
> email:

--
-debbie
Debbie Fligor, n9dn Lead Network Engineer @ Univ. of Il
email:

[perfsonar-user] new install problems with test results and adding tests, Fligor, Debbie, 03/05/2019
- Re: [perfsonar-user] new install problems with test results and adding tests, Michael Johnson, 03/05/2019
  - Re: [perfsonar-user] new install problems with test results and adding tests, Fligor, Debbie, 03/06/2019
    - Re: [perfsonar-user] new install problems with test results and adding tests [solved], Fligor, Debbie, 03/19/2019

List archive

Re: [perfsonar-user] new install problems with test results and adding tests [solved]