daemonshepherd internals

Architecture

The central part of daemonshepherd is daemons controller object, an instance of seismometer.daemonshepherd.controller.Controller. This object runs the main event loop (Controller.loop()), which starts the daemons, restarts them on their termination, polls daemons’ STDOUT and STDERR to log it, polls control socket (if any) and executes commands received on it.

Controller object keeps daemons’ handles, restart queue, poll object for filehandles (daemons’ outputs and control socket connections), and config loader callback function. Controller also sets up handlers for signals (SIGHUP, SIGINT, SIGTERM).

Restart queue (seismometer.daemonshepherd.controller.RestartQueue) tracks the state of the daemons (started, stopped, died), their restart strategy, and decides when to restart each one that died (the queue contains an algorithm for increasing restart backoff).

Daemon handle (seismometer.daemonshepherd.daemon.Daemon) carries information about how to start and stop daemon (including its initial working directory, environment, user and group to run as, and so on), remembers PID of a running process and reading end of process’ STDOUT/STDERR. Comparing two daemon handles with == and != operators tells whether the definitions are the same. This allows to tell which daemons to restart when reloading config.

Daemon handle can also remember arbitrary metadata that doesn’t strictly belong to the handle, like restart strategy or start priority.

Control socket (seismometer.daemonshepherd.control_socket.ControlSocket and seismometer.daemonshepherd.control_socket.ControlSocketClient) is a unix socket that is removed from disk when closed. Reading/writing automatically converts messages from/to JSON lines. Messages that come through a control socket connection are treated by the controller object as administrative commands. For details about the protocol, see Administrative control channel.

Daemonizing daemonshepherd

Fork as a daemon process

seismometer.daemonshepherd.self_detach.detach(new_cwd=None)
Parameters:new_cwd – directory to chdir() to (None if no change needed)

Detach current program from terminal (fork() + exit()).

Detached (child) process will have STDIN, STDOUT and STDERR redirected to /dev/null.

seismometer.daemonshepherd.self_detach.detach_succeeded()

Acknowledge success of detaching child to the parent.

seismometer.daemonshepherd.self_detach.child_process()

Operations to initialize child process after detaching.

This consists mainly of redirecting STDIN, STDOUT and STDERR to /dev/null.

NOTE: This is not the place to acknowledge success. There are other operations, like creating listening sockets. See detach_succeeded().

seismometer.daemonshepherd.self_detach.parent_process()

Operations to do in parent process, including terminating the parent.

Set UID and GID of process

seismometer.daemonshepherd.setguid.setguid(user, group)
Parameters:
  • user – username to change UID to
  • group – group name (or list of group names) to change GID to

Set UID and GID of current process. If user is None, UID will not be changed. If group is None, then GID will be set to the primary group of user. If both user and group are None, neither UID nor GID will be changed.

Managing pid file

class seismometer.daemonshepherd.pid_file.PidFile(filename)

Handle for pid file. The file will be deleted when the instance is destroyed, if the file ownership was claimed (see claim()).

Parameters:filename (string) – name of the pid file
claim()

Claim the ownership of the pid file. Owner process is responsible for removing it at the end.

close()

Close pid file without removing it.

update()

Update content of pid file with current process’ PID.

Controlling child processes

Daemon starter and data dispatcher

class seismometer.daemonshepherd.controller.Controller(load_config, socket_address=None)

Daemons and command channel controller.

The controller responds to commands issued on command channel. Commands include reloading daemons specification and listing status of controlled daemons. See command_*() descriptions for details.

Parameters:
  • load_config – function that loads the file with daemons specification; see daemonshepherd for format documentation
  • socket_address – address of socket for command channel

Available attributes:

load_config

Zero-argument function that loads daemons specification file and returns a dictionary with keys being daemons’ names and values being daemons’ parameters. This function is called on reload command from command channel.

restart_queue

Daemons to restart at appropriate time. RestartQueue instance.

poll

Poll object to check for input from command channel or daemons. seismometer.poll.Poll instance.

daemons

Dictionary with all defined daemons (running, waiting for restart, and stopped). Keys are daemons’ names and values are seismometer.daemonshepherd.daemon.Daemon instances.

socket

Socket on which command channel works. seismometer.daemonshepherd.control_socket.ControlSocket instance.

keep_running

Marker to terminate loop() gracefully from inside of signal handlers.

collect_dead_children()

Function to collect statuses of dead children, mark them dead, and put them to the restart queue.

command_admin_command(**kwargs)

Run an administrative command.

command_cancel_restart(**kwargs)

Cancel pending restart of a process. The process stays stopped if it was waiting for restart and stays started (with backoff reset) if it was started.

Input data needs to contain "daemon" key specifying daemon’s name.

command_list_commands(**kwargs)

List administrative commands for a daemon.

command_ps(**kwargs)

List daemons that are expected, running and that stay in restart queue.

Returned data is a list of elements of following dictionaries:

{
  "daemon": <name>,
  "pid": <PID> | None,
  "running": True | False,
  "restart_at": None | <timestamp>
}
command_reload(**kwargs)

Reload daemon specifications. This command calls reload() method.

command_restart(**kwargs)

Restart a daemon. If it was running, it is stopped first. If it was waiting for restart or stopped altogether, it is started immediately. Restart backoff is reset in any case.

Input data needs to contain "daemon" key specifying daemon’s name.

command_start(**kwargs)

Start a stopped daemon. If daemon was waiting for restart, it is started immediately. Restart backoff is reset in any case.

Input data needs to contain "daemon" key specifying daemon’s name.

command_stop(**kwargs)

Start a stopped daemon. If daemon was waiting for restart, its restart is cancelled. In either case, restart backoff is reset.

Input data needs to contain "daemon" key specifying daemon’s name.

handle_command(command, client)
Parameters:
  • command – dictionary with command to execute
  • clientcontrol_socket.ControlSocketClient to send response to

Handle a command from command channel. See command_*() methods for details on particular commands.

handle_daemon_output(handle)

Handle output from a daemon according to daemon’s definition: read it all and log it.

loop()

Main operation loop: check output from daemons or command channels, restart daemons that died according to their restart strategy.

Returns when keep_running instance attribute changes to False, but does not stop its children. To do this use shutdown().

reload()
Returns:True when reload was successful, False on error

Reload daemon specifications from configuration and converge list of running daemons with expectations list.

Method resets the restart queue, trying to start all the missing daemons now.

shutdown()

Shutdown the controller along with all the running daemons.

signal_reload(signum, stack_frame)

Signal handler that reloads daemons specification file.

signal_shutdown(signum, stack_frame)

Signal handler that shuts down the controller.

waitpid(daemon)
Parameters:daemon – daemon handle

Wait for daemon to terminate, while still reading logs from all daemons.

Note: control socket is not processed here.

class seismometer.daemonshepherd.controller.RestartQueue

Schedule for daemon restarts.

add(daemon_name, backoff)
Parameters:
  • daemon_name (string) – name of daemon to add
  • backoff (list of integers) – backoff times (in seconds) for consequent restarts

Register daemon with its restart strategy.

cancel_restart(name)
Parameters:name – daemon, for which restart is cancelled

Abort any pending restart of a daemon.

clear()

Clear the restart queue, including restart strategies and queued daemons.

daemon_died(name, exit_code=None, signame=None)
Parameters:
  • name – daemon that has died
  • exit_code – exit code of the daemon or None if it died on signal
  • signame – signal name that terminated the daemon or None if the daemon exited

Notify restart queue that a daemon has just died. The queue schedules the daemon for restart according to the restart strategy (see add()).

List of daemons ready to restart can be retrieved using restart().

daemon_started(name)
Parameters:name – daemon that has been started

Notify restart queue that a daemon has just been started.

daemon_stopped(name)
Parameters:name – daemon that has been stopped

Notify restart queue that a daemon has just been stopped. Method resets backoff time for the daemon.

get_restart_ready()
Returns:list of names of daemons ready to restart

List daemons that are ready to restart (the ones for which restart time already passed).

Returned daemons are removed from restart queue.

list_restarts()
Returns:list of {"name": daemon_name, "restart_at": timestamp} dicts

List all daemons scheduled for restart along with their restart times.

Method intended for queue inspection.

remove(daemon_name)
Parameters:daemon_name (string) – name of daemon to remove

Unregister daemon from the queue.

seismometer.daemonshepherd.controller.DEFAULT_BACKOFF = [0, 5, 15, 30, 60]

List of backoff times for default restart strategy.

exception seismometer.daemonshepherd.controller.ControlCommandError

Error in control command execution.

Unix sockets

class seismometer.daemonshepherd.control_socket.ControlSocket(address)

Unix stream listening socket, bound to address. Connections process (encode and decode) line-based JSON messages.

Parameters:address (string) – address to bind to
accept()
Return type:ControlSocketClient

Accept new connection on this socket.

close()

Close the socket, possibly removing the file (unix socket).

fileno()

Return file descriptor for this socket.

Method intended for select.poll().

class seismometer.daemonshepherd.control_socket.ControlSocketClient(socket)

Client socket wrapper for line-based JSON communication.

Parameters:socket – connection to client
close()

Close the socket.

fileno()

Return file descriptor for this socket.

Method intended for select.poll().

read(blocking=False)
Return type:dict

Read single line of JSON hash and decode it.

This method by default is non-blocking; if no more data is ready for reading, the method returns immediately None.

When connection was closed, this method returns seismometer.daemonshepherd.filehandle.EOF.

send(message)
Parameters:message (dict, list or scalar) – data structure to serialize as JSON and send to the client

Send a JSON message to connected client.

File handle routines and constants

seismometer.daemonshepherd.filehandle.set_close_on_exec(handle)
Parameters:handle – file handle or file descriptor to set close-on-exec flag on

Set FD_CLOEXEC flag on a file handle or descriptor.

seismometer.daemonshepherd.filehandle.set_nonblocking(handle)
Parameters:handle – file handle or file descriptor

Set file handle to non-blocking mode.

seismometer.daemonshepherd.filehandle.EOF = <EOF>

Marker to be returned by read() methods when the connection or pipe is closed.

Running external program as daemon

seismometer.daemonshepherd.daemon.build(spec)
Parameters:spec – dictionary with daemon specification
Returns:Daemon

Build a Daemon instance according to specification.

This function always adds "stop" administrative command, either from daemon specification or by supplying default signal SIGTERM to process group.

TODO: Describe how this specification looks like (it comes from config file).

class seismometer.daemonshepherd.daemon.Daemon(start_command, admin_commands, metadata=None)

Single daemon representation and interaction channel. A daemon can be started or stopped.

Daemon uses an administrative command "stop" to shut down (and in __del__() method).

To set or read metadata (opaque to this class), use dictionary operations (get value, set value, del, in to check key existence, len(), iteration over keys).

Parameters:
  • start_command – command used to start the daemon
  • admin_commands – dictionary with administrative commands
  • metadata – dictionary with additional information about daemon
close()

Close read end (this process’ end) of daemon’s output.

command(cmd, env=None)
Parameters:cmd – name of the command to run
Returns:(code, output) tuple ((None, None) for signal command)

Run an administrative command.

If the command defined was a signal, (None, None) is returned.

Returned code will be a non-negative exit code, negative signal number, or None if the exit code couldn’t be collected.

Returned output will be either a string or None, if STDOUT was not ordered to be collected.

commands()
Returns:list of command names

Return list of administrative commands available for this daemon.

fileno()

Return file descriptor for the daemon’s pipe (None if daemon’s output is not intercepted).

Method intended for select.poll().

has_command(cmd)
Parameters:cmd – name of the command

Check if the daemon has particular administrative command.

is_alive()

Check if the daemon is still alive.

pid()

Return PID of the daemon (None if daemon is stopped).

readline()

Read a single line from daemon’s output. If nothing is ready to be read, also when daemon’s output is not intercepted, None is returned (the call is non-blocking).

Method returns seismometer.daemonshepherd.filehandle.EOF when the child or terminated or otherwise closed its STDOUT.

reap()

Close our end of daemon’s STDOUT and wait for daemon’s termination.

replace_commands(source)
Parameters:source (Daemon) – source of admin commands

Replace admin commands of this instance with commands from source.

start()

Start the daemon.

stop()

Stop the daemon.

class seismometer.daemonshepherd.daemon.Command(command, command_name=None, environment=None, cwd=None, stdout=None, user=None, group=None)

External command representation for doing fork() + exec() in a repeatable manner.

Class has defined operators == and !=, so objects are compared according to command line and its run environment (variables, CWD, STDOUT).

Parameters:
  • command – command to be run (could be a shell snippet)
  • environment (dict with string:string mapping) – environment variables to be added/replaced in current environment
  • command_name – command name (argv[0]) to be passed to exec()
  • cwd – directory to run command in
  • stdout – where to direct output from the command
  • user – user to run as
  • group – group to run as (defaults to primary group of user)

Command’s output could be sent as it is for parent process (stdout set to None), silenced out (DEVNULL) or intercepted (PIPE).

DEVNULL = <Command.DEVNULL>

Constant for directing command’s output to /dev/null.

PIPE = <Command.PIPE>

Constant for directing command’s output through a pipe.

run(environment=None)
Returns:(pid, read_end) or (pid, None)

Run the command within its environment. Child process starts its own process group, so if it’s a shell command, it’s easier to kill whole set of processes. STDIN of the child process is always redirected to /dev/null.

If stdout parameter to constructor was PIPE, read end of the pipe (file object) is returned in the tuple.

class seismometer.daemonshepherd.daemon.Signal(sig, group=False)

Signal representation for administrative commands for Daemon.

Converting an instance to integer (int(instance)) results in signal number.

Parameters:
  • sig – signal number or name
  • group – whether to send signal to process group or just process

Signal name ignores case and prepends “SIG” prefix if necessary, so any of the names are valid: "term", "sigterm", "TERM", "SIGTERM".

send(pid)
Returns:True if signal was sent successfully, False on error

Send a signal to process or process group.

Administrative control channel

daemonshepherd allows to control its supervised daemons through a unix socket. The protocol used for communication is a synchronous exchange of JSON documents, each in its own line.

Requests closely resemble what daemonshepherd command allows (see Commands). Command name is specified as command key, and arguments, if any, are passed as keys along with command.

Response is a document {"status": "ok"} or {"status": "ok", "result": ...}, depending on the command called. Errors are signaled with {"status": "error", "reason": "..."}.

Available requests

  • {"command": "reload"} – reload daemons definition file
    • no data returned, just {"status": "ok"}
  • {"command": "ps"} – list daemons names (all that were defined in configuration, currently running ones and the ones with restart pending)
    • response result: {"result": [<info1>, <info2>, ...], "status": "ok"}
    • <infoX> is a hash containing information about the daemon: {"daemon": <name>, "pid": <PID> | null, "running": true | false, "restart_at": null | <timestamp>}
  • {"command": "start", "daemon": <name>} – start a daemon that is stopped or waits in backoff for restart
    • no data returned, just {"status": "ok"}
  • {"command": "stop", "daemon": <name>} – stop a daemon that is running or cancel its restart if it is waiting in backoff
    • no data returned, just {"status": "ok"}
  • {"command": "restart", "daemon": <name>} – restart running daemon (immediately if it waits in backoff) or start stopped one
    • no data returned, just {"status": "ok"}
  • {"command": "cancel_restart", "daemon": <name>} – cancel pending restart of a daemon. If daemon was running, nothing changes. If daemon was waiting in backoff timer, backoff is reset and the daemon is left stopped.
    • no data returned, just {"status": "ok"}
  • {"command": "admin_command", "daemon": <name>, "admin_command": <command>} – run an administrative command according to daemon’s definition
    • response result: {"status": "ok"} if the administrative command was a signal to be sent and {"result": <result>, "status": "ok"} if it was a command to run
    • <result> is a hash of one of two forms: {"exit": <number>, "output": <output>} or {"signal": <number>, "output": <output>}, with <output>> being command’s output on STDOUT*+*STDERR (string)
  • {"command": "list_commands", "daemon": <name>} – list administrative commands available for this daemon
    • response result: {"result": [<command1>, <command2>, ...], "status": "ok"}
    • <commandX> is a string

Commands that operate on daemons (start, stop, restart, cancel_restart) always reset backoff, even if nothing was changed (e.g. stopping an already stopped daemon).