DumbProbe

Synopsis

dumb-probe [options] --checks=<checks-file>

Description

DumbProbe is a simple tool that checks whether all the services defined in its config are healthy and submits the results of the checks to monitoring system.

The checks file is a Python module that defines what, how, and how often should be checked. Results are packed into a Seismometer message and sent to a messenger(8) (or a compatible router).

Options

--checks <checks-file>

Python module that defines checks. See Configuration.

--once

Go through the checks immediately and just once and exit, instead of usual infinite loop with a schedule.

This mode of operation is only supported if CHECKS in checks file is a list or tuple, and it ignores any seismometer.dumbprobe.BaseHandle checks that were defined.

--destination stdout | tcp:<host>:<port> | udp:<host>:<port> | unix:<path>

Address to send check results to.

If unix socket is specified, it’s datagram type, like messenger(8) uses.

If no destination was provided, messages are printed to STDOUT.

--logging <config>

logging configuration, in JSON or YAML format (see Logging configuration for details); default is to log warnings to STDERR

Configuration

Configuration file is a Python module. The only thing expected from the module is defining CHECKS object, which usually will be a list of check objects (typically a seismometer.dumbprobe.BaseCheck subclass instances). DumbProbe will take care of scheduling runs of each of the checks according to their specified intervals.

If there is a need for any other scheduling logic, CHECKS can be an arbitrary Python object that has run_next() method, which is responsible for waiting for next check and running it. This method will be called with no arguments and should return a sequence (e.g. list) of messages that are either seismometer.message.Message objects or dictionaries (serializable to JSON). These messages will be sent to DumbProbe’s destination.

Supported check types

The simplest case of a check is a Python function that produces a dictionary, seismometer.message.Message object, or list of these. Such function is wrapped in seismometer.dumbprobe.Function object in CHECKS list.

There are also several built-in classes that facilitate working with external commands and scripts:

Typically, checks file will look somewhat like this:

from seismometer.dumbprobe import *
from seismometer.message import Message, Value
import os
import json

#--------------------------------------------------------------------

def hostname():
    return os.uname()[1]

#--------------------------------------------------------------------

def uptime():
    with open("/proc/uptime") as f:
        return Message(
            aspect = "uptime",
            location = {"host": hostname()},
            value = float(f.read().split()[0]),
        )

def df(mountpoint):
    stat = os.statvfs(mountpoint)
    result = Message(
        aspect = "disk space",
        location = {
            "host": hostname(),
            "filesystem": mountpoint,
        },
    )
    result["free"] = Value(
        stat.f_bfree  * stat.f_bsize / 1024.0 / 1024.0,
        unit = "MB",
    )
    result["total"] = Value(
        stat.f_blocks * stat.f_bsize / 1024.0 / 1024.0,
        unit = "MB",
    )
    return result

def parse_iostat(line):
    if not line.startswith("sd") and not line.startswith("dm-"):
        return ()
    (device, tps, rspeed, wspeed, rbytes, wbytes) = line.split()
    result = Message(
        aspect = "disk I/O",
        location = {
            "host": hostname(),
            "device": device,
        },
    )
    result["read_speed"] = Value(float(rspeed), unit = "kB/s")
    result["write_speed"] = Value(float(wspeed), unit = "kB/s")
    result["transactions"] = Value(float(tps), unit = "tps")
    return result

#--------------------------------------------------------------------

CHECKS = [
    # function called every 60s with empty arguments list
    Function(uptime, interval = 60),
    # function called every 30 minutes with a single argument
    Function(df, args = ["/"],     interval = 30 * 60),
    Function(df, args = ["/home"], interval = 30 * 60),
    Function(df, args = ["/tmp"],  interval = 30 * 60),
    # shell command (`sh -c ...'), prints list of JSON objects to
    # STDOUT
    ShellCommand(
        "/usr/local/bin/read-etc-passwd",
        parse = lambda stdout,code: [
            json.loads(l) for l in stdout.strip().split("\n")
        ],
        interval = 60
    ),
    # external command (run without `sh -c'), prints single number
    ShellCommand(
        ["/usr/local/bin/random", "0.5"],
        parse = lambda stdout,code: Message(
          aspect = "random",
          value = float(stdout),
        ),
        interval = 30,
        host = hostname(),
    ),
    # and two Monitoring Plugins
    Nagios(
        # this one runs without shell
        ["/usr/lib/nagios/plugins/check_load", "-w", "0.25", "-c", "0.5"],
        interval = 10,
        aspect = "load average",
        host = hostname(), service = "load",
    ),
    Nagios(
        # this one runs with shell
        "/usr/lib/nagios/plugins/check_users -w 3 -c 5",
        interval = 60,
        aspect = "wtmp",
        host = hostname(), service = "users",
    ),
    # spawn iostat(1), make it print statistics every 20s, and make
    # them proper Seismometer messages
    ShellStream(["/usr/bin/iostat", "-p", "20"], parse = parse_iostat),
]

Logging configuration

Logging config file is a YAML or JSON that encodes dictionary suitable for Python’s logging.config.dictConfig() function. In short, this file requires several keys:

  • "version", always set to 1
  • "root", containing configuration for root logger
    • "level", a minimum severity that will be logged; possible values are "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL", or "NOTSET"
    • "handlers", a list of names of log handlers (destinations)
  • "handlers", a dictionary of handlers configuration
  • "formatters", a dictionary of formatters configuration

Handler configuration is a dictionary that requires keys "class" (Python class of the log handler) and "formatter", and any keys that handler class constructor requires.

Formatter controls how the message is formatted. It’s a dictionary with "format" field, which is a Python format string. It can also has "datefmt" field for %(asctime)s placeholder.

Some useful placeholders:

%(process)d

PID of the daemon process

%(name)s

name of the logger that produced message (daemon’s internals)

%(message)s

log message

%(levelname)s

log level (INFO, WARNING, ...)

%(asctime)s

log time, which is formatted according to "datefmt" field of the formatter (see strftime(3) for format details)

%(module)s, %(funcName)s, %(lineno)d

location of the log message origin in the code

Full reference of logging configuration can be found in Python’s documentation: <https://docs.python.org/2/library/logging.config.html> and <https://docs.python.org/2/library/logging.html>.

Example logging config

version: 1
root:
  level: INFO
  handlers: [syslog]
formatters:
  syslog_formatter:
    format: "[%(name)s] %(message)s"
handlers:
  syslog:
    class: seismometer.logging.SysLogHandler
    formatter: syslog_formatter
    facility: local0
    process_name: mydaemon

Log handlers handy reference

logging.StreamHandler

Python’s built-in handler logging to terminal. Expects "stream" key, which can be set to "ext://sys.stderr" or "ext://sys.stdout".

seismometer.logging.NullHandler

Seismometer’s own handler that ignores all the logs.

logging.handlers.SysLogHandler

Python’s built-in handler logging to syslog. Expects "address" key, which can specify a path to unix socket that local syslog listens on or an address of a remote syslog.

Formatter for this handler needs to include daemon’s name[pid] field, e.g. "daemonname[%(process)d]: [%(name)s] %(message)s".

NOTE: This handler can break the daemon during restart of local syslog, which is a serious drawback.

seismometer.logging.SysLogHandler

Seismometer’s own handler logging to local syslog. Expects "facility" (daemon, local0..local7, user, ...) and "process_name" keys (formatter doesn’t need to include daemon’s name).

This handler handles syslog restarts well, but it’s mainly suitable for top-level logger (which usually should be enough).

Programming interface

NOTE: User doesn’t need to use these classes/functions if they happen not to suit the needs. They are merely a proposal, but the author thinks they should at least help somewhat in deployment.

Available check classes

The classes that work with external commands (e.g. ShellCommand or Nagios) assume that if the command is specified as simple string, it should be run with shell (/bin/sh -c ...), and if it’s specified as a list, it is run without invoking /bin/sh. The latter is especially important when the command is provided with calculated arguments.

class seismometer.dumbprobe.BaseCheck(interval, aspect=None, location={}, **kwargs)

Base class for checks.

Parameters:

If aspect, location, or kwargs are provided, all the messages produced by run() are expected to be either seismometer.message.Message instances or dictionaries conforming to the message schema. Both aspect and individual values from location overwrite whatever was set in the produced messages.

Fields defined by this class:

interval

interval at which this check should be run, in seconds

last_run

last time when this check was run (epoch timestamp)

aspect

name of monitored aspect to be set

location

location to be set (dictionary str => str)

check_name()
Returns:check’s name
Return type:string

Method not really defined in this class. If a subclass defines this method, it will be called to get a name of a check the object represents. If left undefined, default name composed of class name, module, and object’s id() will be used.

mark_run()

Update last run timestamp.

next_run()
Returns:epoch time when the check should be run next time
run()
Returns:check result
Return type:seismometer.message.Message, dict, list of these, or None

Run the check.

Implementing method should manually call mark_run() for next_run() to work correctly. To limit problems with unexpected exceptions, mark_run() should be run just at the beginning.

class seismometer.dumbprobe.Function(function, args=[], kwargs={}, **_kwargs)

Plugin to collect a message to send by calling a Python function (or any callable).

Function is expected to return a dict, seismometer.message.Message, a list of these, or None.

Parameters:
  • interval – number of seconds between consequent checks
  • function – function to run
  • args – positional arguments to pass to the function call
  • kwargs – keyword arguments to pass to the function call
  • _kwargs – keyword arguments to pass to BaseCheck constructor
class seismometer.dumbprobe.ShellCommand(command, parse, **kwargs)

Plugin to run external command and process its STDOUT and exit code with a separate function.

Parameters:
  • command – command to run (string for shell command, or list of strings for direct command to run)
  • parse – function to process command’s output
  • kwargs – keyword arguments to pass to BaseCheck constructor

parse should be a function (or callable) that accepts two positional arguments: first one will be command’s STDOUT, the second will be command’s exit code (or termination signal, if negative).

class seismometer.dumbprobe.Nagios(plugin, aspect, **kwargs)

Plugin to collect state and possibly metrics from a Monitoring Plugin.

Metrics to be recognized need to be specified as described in section Performance data of Monitoring Plugins Development Guidelines.

Parameters:
  • plugin – command to run (string for shell command, or list of strings for direct command to run)
  • aspect – aspect name, as in seismometer.message.Message

Available handle classes

These classes are for receiving check results from external sources, especially from command line tools that write status information in regular intervals and don’t exit on their own (a good example is vmstat 60, which prints OS statistics every 60 seconds).

class seismometer.dumbprobe.BaseHandle

Base class for handle-based checks.

Instance should start in a closed state, i.e. should not start any subprocesses nor setup sockets. This work should be left for open() method.

This class contains no special initialization (i.e. default __init__()).

close()

Close the handle. This method will be called after read_messages() reports an EOF.

fileno()
Returns:integer or None

Return a file descriptor the handle reads from for seismometer.poll.Poll.

open()

Open the handle. This is the method that should start subprocesses and setup necessary sockets.

On any error, the method should raise an exception.

read_messages()
Returns:list of messages (dict or seismometer.message.Message)
Throws :HandleEOF, Exception

Read and parse messages received on this handle. If no messages are available for read, the method should return empty list.

If an end-of-file is encountered, HandleEOF should be raised, after which close() will be called by the parent. Any other exception signals that the error was just a processing one, and close() method will not be called.

Note that this method should not block. To set a descriptor to a non-blocking state, see BaseHandle.set_nonblocking().

static set_close_on_exec(handle)
Parameters:handle – handle (object with fileno() method) or file descriptor (integer)

Set close-on-exec flag, so the file descriptor doesn’t leak to subprocesses.

static set_nonblocking(handle)
Parameters:handle – handle (object with fileno() method) or file descriptor (integer)

Set non-blocking flag, so the reads return immediately if no data is available.

Note that reading from a handle in non-blocking mode results in an IOError exception with errno set to errno.EAGAIN or errno.EWOULDBLOCK (under Linux these two errnos are equal).

class seismometer.dumbprobe.ShellStream(command, parse=None)

Handle for reading a stream of lines from an external tool (e.g. vmstat(8) or iostat(1)) and parsing them to messages for further processing.

Parameters:
  • command – command to run (string for shell command, or list of strings for direct command to run)
  • parse – function to parse a line read from the command

If parse argument is None, json.loads() is used, meaning that the command prints JSON objects, one per line.

Parse function should return a dict, seismometer.message.Message, or list or tuple of these. If the function needs to ignore the line, it should return an empty list or tuple rather than None.

exception seismometer.dumbprobe.HandleEOF

Exception to signal that BaseHandle instance encountered EOF when reading from its descriptor, thus it needs maintenance (close and open).

DumbProbe config interface

This interface is intended for use in script specified with --checks option, may also be useful as a basis for custom implementation.

class seismometer.dumbprobe.Checks(checks=None)

Container for checks to be executed.

add(check)
Parameters:check – check to run (typically an instance of BaseCheck subclass)

Add an entry to the list of checks to be run periodically.

check_name(check)
Parameters:check – a check or a handle
Return type:(integer, string)
Returns:check’s index and name

Return check’s identity for logging.

read_handles(handles)
Parameters:handles – list of BaseHandle objects to read
Returns:list (possibly empty) of dicts and seismometer.message.Message instances

Read all messages from passed handles and return them as a single, flat list.

Handles that encountered EOF are closed (BaseHandle.close()), removed from poll queue, and scheduled for reopen (BaseHandle.open()).

reopen_handle(handle)
Parameters:handleBaseHandle to be reopened

Reopen a handle and add it to poll list. If reopen fails, the handle is scheduled for another one.

run_check(check)
Parameters:check – check (BaseCheck or compatible) to be run
Returns:list or tuple of messages, a single message (dict or seismometer.message.Message), or None

Run a check and return its result. If the check raised an exception, the exception is logged and None is returned.

run_next()
Returns:non-empty list of dicts and seismometer.message.Message instances

Sleep until next check is expected to be run and run the check, or read messages from polled handles, if any are available.

setup_handles()

Open all added handles.

class seismometer.dumbprobe.RunQueue(elements=[])

Queue for running commands at specified times.

add(time, command)
Parameters:
  • time – Epoch timestamp to run command at
  • command – command to run

Add a command to run at specified time.

command is opaque to the queue.

empty()
Return type:Boolean

Check if the queue is empty.

get()
Return type:tuple (time, command)

Return command that has earliest run time.

command is the same object as it was passed to add().

peek()
Return type:tuple (time, command)

Return command that has earliest run time without removing it from the queue.

command is the same object as it was passed to add().