Skip to Content.
Sympa Menu

grouper-users - [grouper-users] Re: quiesce capability ( was RE: Status Monitoring - Two Errors )

Subject: Grouper Users - Open Discussion List

List archive

[grouper-users] Re: quiesce capability ( was RE: Status Monitoring - Two Errors )


Chronological Thread 
  • From: "Gettes, Michael" <>
  • To: "Black, Carey M." <>
  • Cc: "" <>
  • Subject: [grouper-users] Re: quiesce capability ( was RE: Status Monitoring - Two Errors )
  • Date: Mon, 27 Aug 2018 16:35:37 +0000
  • Accept-language: en-US
  • Ironport-phdr: 9a23:1M+kmxfKdahURcyBcZ1cdTOclGMj4u6mDksu8pMizoh2WeGdxcWyYR7h7PlgxGXEQZ/co6odzbaO7Oa4ASQp2tWoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6nK94iQPFRrhKAF7Ovr6GpLIj8Swyuu+54Dfbx9HiTahY75+Ngm6oRnMvcQKnIVuLbo8xAHUqXVSYeRWwm1oJVOXnxni48q74YBu/SdNtf8/7sBMSar1cbg2QrxeFzQmLns65Nb3uhnZTAuA/WUTX2MLmRdVGQfF7RX6XpDssivms+d2xSeXMdHqQb0yRD+v9LlgRgP2hygbNj456GDXhdJ2jKJHuxKquhhzz5fJbI2JKPZye6XQds4YS2VcRMZcTyxPDJ2hYYsTAeQPPuhYoIv6qVsPsRSxChKhC/nzxj9NnHL23bE23uYnHArb3AIgBdUOsHHModn7NakdT/y1zLXWwjXYd/9dxDDz6InUfRAhu/6DQ7ZwfcTMwkQoGAPKkEmQqZD7MDOOzekNrmab7+56We2xlmEnthh8rz6yzckijYnJg5gaylHC9Shhz4Y1P9q4SFNjYdG6CptcrTuVN5NuQs86X2Fnojw6xqcJuZ+6ZCQK1JQnxwTBZPOdboeE+AjjVPuXITtghHJlZK6/hw6p8Ue+0O38SM2030hWriZfkNnDrG4N2AbL5sifUPt9+UCh2TiX1wDU6+FEJ1w0mbDHJJ4mx748jocTsVjCHi/ygkn2irGZdlki9+O16Orneq3rqoKCO4J3kA3yLLoil86lDek6PQUCRXaX9OSz2bH74EH1XK9Gg/M3n6XDrZzXK8sWqrSkDwNIyooj5QiwAjS63NkdmHQKL1ZIdA6bgIXsJ17BO+r3APe6jliykjpk3O7KM7j8DZrWMHTOkrHsdqtn5UFG0go819Vf6opUCr4fJPLzXVf8tMfdDh8lKwy42fvnCNt51o8ER22AH7KZPLvTsV+O+O0vP/GBaJILtDv+MfQp+eDigH0jlVIfcqSlx4UbZXC3E/h+JkWWe3vsgtMPEWcQuQo+SfTniFKcXj5Pe3a9Qbk86yomCIKoCYfPXJ6ij6Gc3CujBJ1ZenhGCkyQEXfvb4iEVO0MaCWPIs95jDMET6GtS5I61R6wrg/60KFqLu7V+i0DqZLjz8Z56/fSlRE06Tx7Ed6d02eTQGFogG8EXSE5071ioR819lDWm4J8ivdbUZR47ulEQ08fc9SUm+Z+AtvxHFubJf+OU0vgT9m7V2IfVNU0lpUhblphFs7mxjXC1CqjDrtf3+iECYAo/7n0wnb1Yctx1iCVh+Eak1A6T54XZiWdjall+l2WXtaRng==

Yup - I completely agree with you line of thinking.

/mrg

On Aug 27, 2018, at 12:31 PM, Black, Carey M. <> wrote:

Michael,
 
I actually think the “quiesce capability” should be the default. ( You should need to “hard kill” it to do otherwise. )
 
In general: With a determined programmer most things are possible.
                As long as you’re not trying to violate a fundamental property of physics, or budgets. J
 
I could imagine:
                ) a “status” in the loader process that would “prevent new jobs from starting”. ( Basically behave as if all jobs are running for the check on start up. )
                ) when all the jobs are actually finished or dead, then exit. ( A job that looks for that “loader status” and checks every minute? Or on Job completion (last process turns off the lights) ? )
--
Carey Matthew 
 
From: Gettes, Michael <> 
Sent: Monday, August 27, 2018 12:00 PM
To: Black, Carey M. <>
Cc: Ryan Rumbaugh <>; 
Subject: Re: [grouper-users] RE: Status Monitoring - Two Errors
 
I’ve always wanted a quiesce capability.  Something that lets all the current work complete but the current loader instance won’t start any new jobs.  This would be needed for all loader daemons or just specific ones so we can safely take instances down.  I have no idea if this is possible with Quartz and haven’t had a chance to look into it.
 
/mrg


On Aug 27, 2018, at 11:20 AM, Black, Carey M. <> wrote:
 
Ryan,
 
RE: “I had been restarting the API daemon” …  ( due to docker use )
                I have often wondered how the “shutdown process” works for the daemon. Is it “graceful” ( and lets all running jobs complete before shutdown) or does it just “pull the plug”? 
                                I think it just pulls the plug.
                                Which “leaves” running jobs as “in progress”(in the DB status table) and they refuse to immediately start when the loader restarts. Well, until the “in progress” record(s) get old enough that they are assumed to be dead. Then the jobs will no longer refuse to start.
 
                I say that to say this:
                                If the loader is restarted repeatedly, quickly, and/or often, you may be interrupting the running jobs and leaving them as “in progress” (in the DB) and producing more delay on the jobs re-starting again. But it all depends on how fast/often those things are spinning up and down.
 
                                However, maybe If you always spinning up instances (and let the old ones run for a bit) you may be able to “wait till a good time” to turn them off.
                                Maybe if you cycle out the old instances gracefully by timing it with these settings?
                                “
                                ##################################
                                ## enabled / disabled cron
                                ##################################
                                
                                #quartz cron-like schedule for enabled/disabled daemon.  Note, this has nothing to do with the changelog
                                #leave blank to disable this, the default is 12:01am, 11:01am, 3:01pm every day: 0 1 0,11,15 * * ?
                                changeLog.enabledDisabled.quartz.cron = 0 1 0,11,15 * * ?
                                “
 
 
RE: how to schedule the “deprovisioningDaemon”
 
                Verify that your grouper-loader.base.properties has this block: ( or you can add it to your grouper-loader.properties )
                NOTE: it was added to the default base as of GRP-1623. ( which maps to grouper_v2_3_0_api_patch_107  ( and for the UI grouper_v2_3_0_ui_patch_44 ) ) You likely are past those patches… but just saying. J
                “
                #####################################
                ## Deprovisioning Job
                #####################################
                otherJob.deprovisioningDaemon.class = edu.internet2.middleware.grouper.app.deprovisioning.GrouperDeprovisioningJob
                otherJob.deprovisioningDaemon.quartzCron = 0 0 2 * * ?
                “
 
HTH.
 
-- 
Carey Matthew 
 
From:  <> On Behalf Of Ryan Rumbaugh
Sent: Monday, August 27, 2018 10:12 AM
To: 
Subject: [grouper-users] RE: Status Monitoring - Two Errors
 
An update to this issue that may be helpful to others…
 
Before I left the office on Friday I ran the gsh command “loaderRunOneJob(“CHANGE_LOG_changeLogTempToChangeLog”)” process and now the number of rows in the change_entry_temp table is zero! I tried running that before, but really didn’t see much of anything happening. Maybe I was just too impatient.
 
Now when accessing grouper/status?diagnosticType=all the only error is related to “OTHER_JOB_deprovisioningDaemon”. If anyone had any tips on how to get that kick started it would be greatly appreciated.
 
 
--
Ryan Rumbaugh
 
From:  <> On Behalf Of Ryan Rumbaugh
Sent: Friday, August 24, 2018 9:15 AM
To: 
Subject: [grouper-users] Status Monitoring - Two Errors
 
Good morning,
 
We would like to begin monitoring the status of grouper by using the diagnostic pages at grouper/status?diagnosticType=all, but before doing so I would like to take care of the two issues shown below.
 
Can anyone provide tips/suggestions on how to fix the two failures for CHANGE_LOG_changeLogTempToChangeLog and  OTHER_JOB_deprovisioningDaemon?
 
We had a Java heap issue late last week which I believe caused the “grouper_change_log_entry_temp” table to keep growing. It’s at 69,886 rows currently while earlier this week it was at 50k. Thanks for any insight.
 
 
 
2 errors in the diagnostic tasks:
 
DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog
 
DiagnosticLoaderJobTest, Loader job OTHER_JOB_deprovisioningDaemon
 
 
 
Error stack for: loader_CHANGE_LOG_changeLogTempToChangeLog
java.lang.RuntimeException: Cant find a success in job CHANGE_LOG_changeLogTempToChangeLog since: 2018/08/16 14:19:22.000, expecting one in the last 30 minutes
                at edu.internet2.middleware.grouper.j2ee.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:175)
                at edu.internet2.middleware.grouper.j2ee.status.DiagnosticTask.executeTask(DiagnosticTask.java:78)
                at edu.internet2.middleware.grouper.j2ee.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:180)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.owasp.csrfguard.CsrfGuardFilter.doFilter(CsrfGuardFilter.java:110)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
                at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
                at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
                at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
                at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
                at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624)
                at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
                at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341)
                at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:478)
                at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
                at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:798)
                at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1441)
                at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.lang.Thread.run(Thread.java:748)
 
 
Error stack for: loader_OTHER_JOB_deprovisioningDaemon
java.lang.RuntimeException: Cant find a success in job OTHER_JOB_deprovisioningDaemon, expecting one in the last 3120 minutes
                at edu.internet2.middleware.grouper.j2ee.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:173)
                at edu.internet2.middleware.grouper.j2ee.status.DiagnosticTask.executeTask(DiagnosticTask.java:78)
                at edu.internet2.middleware.grouper.j2ee.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:180)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.owasp.csrfguard.CsrfGuardFilter.doFilter(CsrfGuardFilter.java:110)
                at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)
                at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
                at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
                at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
                at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
                at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
                at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
                at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624)
                at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
                at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341)
                at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:478)
                at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
                at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:798)
                at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1441)
                at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.lang.Thread.run(Thread.java:748)
 
--
Ryan Rumbaugh




Archive powered by MHonArc 2.6.19.

Top of Page