daemonshepherd

Synopsis

daemonshepherd [options] --daemons=<specfile>
daemonshepherd [options] --exec=<daemon-name>=<command> ...
daemonshepherd [options] reload
daemonshepherd [options] list
daemonshepherd [options] start <daemon-name>
daemonshepherd [options] stop <daemon-name>
daemonshepherd [options] restart <daemon-name>
daemonshepherd [options] cancel-restart <daemon-name>
daemonshepherd [options] list-commands <daemon-name>
daemonshepherd [options] command <daemon-name> <command-name>

Description

daemonshepherd is a tool for keeping other tools running. This task consists of starting the tools, capturing their STDOUT and restarting them if they die. This way user can focus on work the tool needs to do instead of reimplementing daemonization and logging over and over again.

Usage

Running daemonshepherd without any command starts a daemon supervisor mode. By default, daemonshepherd runs in the foreground and prints warnings to STDERR. Option --daemons list of options --exec is required in this mode.

Commands

daemonshepherd list

List daemons that are currently defined, one JSON per line.

daemonshepherd reload

Order daemonshepherd to reload its configuration. The same as sending SIGHUP signal.

daemonshepherd start <daemon-name>

Start the specified daemon.

daemonshepherd stop <daemon-name>

Stop the specified daemon.

daemonshepherd restart <daemon-name>

Restart (stop and then start) the specified daemon.

daemonshepherd cancel-restart <daemon-name>

Cancel pending restart of specified daemon.

daemonshepherd list-commands <daemon-name>

List administrative commands defined for specified daemon.

daemonshepherd command <daemon-name> <command-name>

Run an administrative command defined for specified daemon.

Options

Most of the options are only meaningful when daemonshepherd runs as a supervisor. The exception is --socket, which specifies administrative socket of a running daemonshepherd.

--daemons <specfile>

Specification of daemons to start (see Configuration for details). Option mutually exclusive with --exec.

--exec <daemon-name>=<command>

Simplified supervision mode. All daemons specified with --exec will be running with the same options (--user, --group, --cwd, --env, --stdout, --restart) and will be stopped with SIGTERM signal.

Option mutually exclusive with --daemons.

This option may be used multiple times.

--socket <path>

Unix socket path to listen for administrative commands.

--pid-file <path>

Path to file with PID of daemonshepherd instance.

--background

Detach from terminal and change working directory to /.

--logging <config>

Logging configuration, in JSON or YAML format (see Logging configuration for details). Default is to log to STDERR or to syslog (--background).

--silent

Don’t log anywhere. This option is overriden by --logging.

--stderr

Log to STDERR. This option is overriden by --logging.

--syslog

Log to syslog. This option is overriden by --logging.

--user <user>

User to run daemonshepherd as.

--group <group>

Group to run daemonshepherd as.

--cwd <directory>

Default working directory for daemons.

--env <name>=<value>

Default environment variables for daemons. Option may be used multiple times to specify multiple variables.

--stdout <destination>

Default daemon’s output destination. Valid values are console, /dev/null, and log.

--restart <strategy>

Restart strategy described as comma-separated list of backoff intervals. See Restart strategy section for details.

Configuration

Daemons specfile (YAML format) describes how to start and stop supervised daemons. Such specfile may look like this:

defaults:
  environment:
    PYTHONPATH: lib

daemons:
  collectd:
    user: collectd
    start_command: /usr/sbin/collectd -f -C ...
  # ...

Daemons in specfile are defined under hash called daemons. Each daemon has a name, by which it will be referred to in administrative commands (see Commands).

A daemon can have following variables:

  • start_command – command used to start the daemon (can be a shell command, too); daemon is started in its own process group and should not try to detach from terminal
  • argv0 – custom process name (argv[0]), though under Linux it’s a little less useful than it sounds (only shows with some ps(1) invocations, like ps -f)
  • stop_signal – signal (number or name, like SIGTERM or TERM) to stop the daemon; if specified, it’s delivered to the daemon process only, if not specified, defaults to SIGTERM and is delivered to the daemon’s process group
  • stop_command – command used to stop running daemon; it will be executed with the same environment and working directory as start_command, with $DAEMON_PID set to PID of the daemon; if both stop_signal and stop_command are defined, stop_command has the precedence
  • user, group – username and group name to run as (both start_command and stop_command will be run with these credentials); group can be a list of group names; obviously this requires daemonshepherd to be run as root
  • cwd – working directory to start daemon in
  • environment – additional environment variables to set (useful for setting $PYTHONPATH or similar)
  • stdout – what to do with daemon’s STDOUT and STDERR; following values are recognized:
    • console or undefined – pass the output directly to terminal
    • /dev/null – redirect output to /dev/null
    • log – intercept STDOUT/STDERR and log it with logging module; output will be logged by logger daemon.<name>, so it can be filtered in logging configuration
  • restart – restart strategy; see Restart strategy section for details
  • start_priority – start priority (lower number starts earlier); defaults to 10
  • commands – additional administrative commands for the daemon; see Daemon’s administrative commands section for details

Default values for above-mentioned variables can be stored in defaults hash.

NOTE: environment key will be replaced by daemon’s value, not merged. It’s not possible to add just one environment variable.

Daemon’s administrative commands

Daemon can have available some special commands, like reloading configuration or reopening log files. Such commands are defined under commands field in daemon specification.

A command can specify either a command to run or a signal to send. Some of the variables that can be set for daemon itself can also be set for a command, and if unset, the command inherits the value from daemon. Allowed variables are: user, group, cwd, environment, argv0.

By default, a command that specifies signal delivers the signal only to the daemon process. This can be changed by setting process_group to true.

Command’s environment will have $DAEMON_PID set to daemon’s PID (or empty string, if the daemon is not running).

NOTE: daemonshepherd will wait for administrative commands to terminate, so they should not be long-running operations.

daemons:
  example-daemon:
    user: nobody
    start_command: /usr/sbin/example-daemon ...
    commands:
      before-start:
        user: root
        command: >-
          mkdir -p /var/log/example;
          chown nobody: /var/log/example
      reload:
        signal: SIGHUP
      rotate-logs:
        user: root
        command: >-
          : > /var/log/example/daemon.log;
          kill -USR1 $DAEMON_PID
      murder:
        signal: SIGKILL
        process_group: true

With the configuration above an operator now can call following commands:

$ daemonshepherd command example-daemon reload
$ daemonshepherd command example-daemon rotate-logs
$ daemonshepherd command example-daemon murder

There are few commands with special meaning:

  • stop – command that will be used to stop the daemon; setting stop_command or stop_signal is a shorthand for defining this command
  • before-start – command that will be executed just before the daemon is started or restarted; non-zero exit code prevents the daemon from being started; handy for creating socket directory in /var/run for a daemon that otherwise runs as a non-privileged user
  • after-crash – command that will be executed immediately after the daemon’s unexpected termination (but not after before-start failed); the command will have set either $DAEMON_EXIT_CODE or $DAEMON_SIGNAL environment variable, depending on how the daemon terminated

Note that these commands can be invoked in the same manner as any other administrative command, e.g. daemonshepherd command $daemon after-crash, even though they’re not expected to make sense in this situation.

Restart strategy

When a daemon dies, it’s restarted after a backoff time. If it dies again, next backoff interval will be used. A list of backoff intervals (expressed as number of seconds before next try) is called a restart strategy. Typically it would be a increasing list of integers, so on first death daemon is restarted soon, but if it keeps dying, it will be restarted less often to limit the machine’s load.

After reaching the last interval R from the strategy, daemon is restarted every R seconds until success.

If the child is running long enough (how long depends on current position in restart strategy), restart strategy is reset.

If no restart strategy is defined (neither specific to daemon nor in defaults), assumed default is [0, 5, 15, 30, 60] (see seismometer.daemonshepherd.controller module for reference).

Example daemon spec file

This is an example specification file that starts a set of tools to collect monitoring data (dumb-probe), pass messages to another server (messenger), or store metrics (collectd).

defaults:
  stdout: /dev/null
  environment:
    PYTHONPATH: /usr/lib/seismometer/toolbox
  user: seismometer
  group: seismometer

daemons:
  # Seismometer Toolbox' own daemons: message router and monitoring
  # probe
  messenger:
    start_priority: 1
    # string folded for readability
    start_command: >-
        messenger
        --src=unix:/var/run/messenger/socket
        --dest=tcp:10.4.5.11:24222
        --tagfile=/etc/seismometer/messenger.tags
        --logging=/etc/seismometer/messenger.logging
    commands:
      before-start:
        user: root
        command: >-
          mkdir -p -m 755 /var/run/messenger;
          chown seismometer:seismometer /var/run/messenger
  dumbprobe:
    # string folded for readability
    start_command: >-
        dumb-probe
        --checks=/etc/seismometer/dumbprobe.py
        --dest=unix:/var/run/messenger/socket
        --logging=/etc/seismometer/dumbprobe.logging

  # some daemon that needs to be shut down by command instead of by
  # SIGTERM
  statetip:
    start_priority: 1
    cwd: /var/lib/statetip
    environment:
      ERL_LIBS: /usr/lib/statetip
    # strings folded for readability
    start_command: >-
        statetipd start
        --socket=/var/run/statetip/control
        --config=/etc/statetip.conf
    # shorthand for "commands.stop"
    stop_command: >-
        statetipd stop
        --socket=/var/run/statetip/control
    commands:
      before-start:
        user: root
        command: >-
          mkdir -p -m 750 /var/run/statetip;
          chown seismometer:seismometer /var/run/statetip
      reload:
        command: statetipd reload --socket=/var/run/statetip/control
      brutal-kill:
        signal: SIGKILL

  # custom collectd instance
  collectd:
    start_priority: 1
    user: collectd
    start_command: /usr/sbin/collectd -f -C /etc/collectd/clients.conf
  # a script that counts clients and formats the stats for collectd's
  # protocol; `socat' tool is obviously necessary here
  store-clients:
    # string folded for readability
    start_command: >-
        /etc/seismometer/bin/count-clients
        | socat - unix:/var/run/collectd/clients.sock

Logging configuration

Logging config file is a YAML or JSON that encodes dictionary suitable for Python’s logging.config.dictConfig() function. In short, this file requires several keys:

  • "version", always set to 1
  • "root", containing configuration for root logger
    • "level", a minimum severity that will be logged; possible values are "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL", or "NOTSET"
    • "handlers", a list of names of log handlers (destinations)
  • "handlers", a dictionary of handlers configuration
  • "formatters", a dictionary of formatters configuration

Handler configuration is a dictionary that requires keys "class" (Python class of the log handler) and "formatter", and any keys that handler class constructor requires.

Formatter controls how the message is formatted. It’s a dictionary with "format" field, which is a Python format string. It can also has "datefmt" field for %(asctime)s placeholder.

Some useful placeholders:

%(process)d

PID of the daemon process

%(name)s

name of the logger that produced message (daemon’s internals)

%(message)s

log message

%(levelname)s

log level (INFO, WARNING, ...)

%(asctime)s

log time, which is formatted according to "datefmt" field of the formatter (see strftime(3) for format details)

%(module)s, %(funcName)s, %(lineno)d

location of the log message origin in the code

Full reference of logging configuration can be found in Python’s documentation: <https://docs.python.org/2/library/logging.config.html> and <https://docs.python.org/2/library/logging.html>.

Example logging config

version: 1
root:
  level: INFO
  handlers: [syslog]
formatters:
  syslog_formatter:
    format: "[%(name)s] %(message)s"
handlers:
  syslog:
    class: seismometer.logging.SysLogHandler
    formatter: syslog_formatter
    facility: local0
    process_name: mydaemon

Log handlers handy reference

logging.StreamHandler

Python’s built-in handler logging to terminal. Expects "stream" key, which can be set to "ext://sys.stderr" or "ext://sys.stdout".

seismometer.logging.NullHandler

Seismometer’s own handler that ignores all the logs.

logging.handlers.SysLogHandler

Python’s built-in handler logging to syslog. Expects "address" key, which can specify a path to unix socket that local syslog listens on or an address of a remote syslog.

Formatter for this handler needs to include daemon’s name[pid] field, e.g. "daemonname[%(process)d]: [%(name)s] %(message)s".

NOTE: This handler can break the daemon during restart of local syslog, which is a serious drawback.

seismometer.logging.SysLogHandler

Seismometer’s own handler logging to local syslog. Expects "facility" (daemon, local0..local7, user, ...) and "process_name" keys (formatter doesn’t need to include daemon’s name).

This handler handles syslog restarts well, but it’s mainly suitable for top-level logger (which usually should be enough).

Signals

daemonshepherd recognizes following signals:

  • SIGTERM and SIGINT cause termination
  • SIGHUP causes reloading daemons specification