Restart daemon on error

PenguinPonics · June 2, 2023, 4:49am

About every other day or so I get the error below. I can still login so it just takes a quick “sudo mycodo-commands restart-daemon” to get things rolling again but… is there anyway we can automate this?

2023-05-28 10:28:31,408 - ERROR - mycodo.controllers.controller_input_ea02dd83 - loop() Error
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/controllers/base_controller.py", line 81, in run
    self.loop()
  File "/var/mycodo-root/mycodo/controllers/controller_input.py", line 227, in loop
    return_dict = self.control.trigger_action(
  File "/var/mycodo-root/mycodo/mycodo_client.py", line 146, in trigger_action
    return self.proxy().trigger_action(
  File "/var/mycodo-root/env/lib/python3.9/site-packages/Pyro5/client.py", line 101, in __getattr__
    self._pyroGetMetadata()
  File "/var/mycodo-root/env/lib/python3.9/site-packages/Pyro5/client.py", line 392, in _pyroGetMetadata
    self.__pyroCreateConnection()
  File "/var/mycodo-root/env/lib/python3.9/site-packages/Pyro5/client.py", line 376, in __pyroCreateConnection
    connect_and_handshake(conn)
  File "/var/mycodo-root/env/lib/python3.9/site-packages/Pyro5/client.py", line 346, in connect_and_handshake
    raise errors.CommunicationError(error)
Pyro5.errors.CommunicationError: connection to ('127.0.0.1', 9080) rejected: no free workers, increase server threadpool size

Lucid3y3 · June 2, 2023, 6:32am

Instead of manually or automating a restart of the daemon, which shouldn’t be necessary, you should try looking for the reason that Input ID ea02dd83 keeps throwing the original error. Have you checked the Log Level: Debug box in that Input’s settings to see if you can get more info from the log as to the cause of the original error?

PenguinPonics · June 2, 2023, 3:48pm

ea02dd82 is the Atlas pH sensor. That and the PT-1000 are connected to a Whitebox T3 board on a Pi3. When the error occurs the data from both sensors stops coming in and restarting the daemon always resolves it. I just enabled log level debugging to see if it adds anything more helpful for troubleshooting.

Lucid3y3 · June 2, 2023, 7:48pm

I have also been having random problems with Inputs suddenly and for no apparent reason just stop sending data. It happens every few days and causes the database to lock, and then the Pi becomes unresponsive and “stuck” in whatever state it was in when it hangs. It happened to me this morning again and my humidifier got stuck on for several hours before I noticed and the whole grow space was soaking wet :-(.
I still have not been able to figure out what is causing this to happen. I need to go through the logs tonight try to determine the cause of the input suddenly not sending data.

PenguinPonics · June 2, 2023, 9:13pm

In my situation I can normally SSH back into the Pi and issue the restart-daemon without any issue. I have the data pushed to Home Assistant (and then and separate Influx/Grafana) via MQTT. If I notice there hasn’t been an update in more than a few minutes I know it has stopped but the Pi is almost never unresponsive.

Barth95 · June 5, 2023, 1:48pm

Does the mycodo service still run after the error occurs? If not, perhaps there is a way to automatically restart the service?
A rough workaround would be to have a cron job restart the mycodo service every X hours. Not as clean as having the error trigger a restart though.

Lucid3y3 · June 5, 2023, 5:49pm

This is not a solution, this is a workaround.
The proper solution would be to figure out how to prevent the error from happening in the first place.
You really shouldn’t be starting and stopping the Mycodo services manually for any reason other than troubleshooting.