Integrating another stat extension in Powa

Clone the repository:

git clone https://github.com/powa-team/powa-archivist/
cd powa-archivist/
make && sudo make install

Any modification to the background-worker code will need a PostgreSQL restart.

Note

With PoWA 4 and the Remote setup mode, modifications to the background worker are less likely to be required.

In order to contribute another source of data, you will have to implement the following infrastructure. An exemple is provided for each required object, assuming a very naive datasource called my_datasource, that returns a single integer counter, called my_counter.

You can see any of the supported datasource implementation to have a better idea of how to efficiently store, aggregate and purge data.

Functions required for data snapshot

query_source(integer):

This function is reponsible to provide either the data corresponding to the local datasource if called with 0, or the data that have been stored in the transient table otherwise.

CREATE OR REPLACE FUNCTION powa_my_datasource_src(srvid integer)
RETURNS void AS $PROC$
BEGIN
    IF (srvid = 0) THEN
        RETURN QUERY SELECT now(), my_counter
            FROM my_datasource();
    ELSE
        RETURN QUERY SELECT ts, my_counter
            FROM public.powa_my_datasource_src_tmp
            WHERE srvid = _srvid;
    END IF;
END;
$PROC$ language plpgsql;

Note

This function is required for PoWA 4 for remote snapshot.

snapshot(integer):

This function is responsible for taking a snapshot of the data source data, and store it somewhere. Usually, this is done in a staging table named powa_my_datasource_history_current. It will be called every powa.frequency seconds. The function signature looks like this:

CREATE OR REPLACE FUNCTION powa_my_datasource_snapshot(srvid integer)
RETURNS void AS $PROC$
BEGIN
    INSERT INTO public.powa_my_datasource_snapshot_table
        SELECT srvid, *
        FROM public.powa_my_datasource_src(srvid);
END;
$PROC$ language plpgsql;

aggregate(integer):

This function will be called after every powa.coalesce number of snapshots. It is responsible for aggregating the current staging values into another table, to reduce the disk usage for PoWA. Usually, this will be done in an aggregation table named powa_my_datasource_history. The function signature looks like this:

CREATE OR REPLACE FUNCTION powa_my_datasource_aggregate(srvid integer)
RETURNS void AS $PROC$
...
$PROC$ language plpgsql;

purge(integer):

This function will be called after every 10 aggregates and is responsible for purging stale data that should not be kept. The function should take the powa.retention global parameter into account to prevent removing data that would still be valid.

CREATE OR REPLACE FUNCTION powa_my_datasource_aggregate(srvid integer)
RETURNS void AS $PROC$
...
$PROC$ language plpgsql;

unregister(integer):

This function will be called if the related extension is dropped.

Please note that the module name used in the powa_functions table has to be the same as the extension name, otherwise this function will not be called.

This function should at least remove entries from powa_functions table. A minimal function would look like this:

CREATE OR REPLACE function public.powa_my_datasource_unregister(srvid integer)
RETURNS bool AS
$_$
BEGIN
    DELETE FROM public.powa_functions WHERE module = 'my_datasource';
        RETURN true;
END;
$_$
language plpgsql;

Optional functions for data snapshot

query_cleanup:: This can contain any SQL code, which will be executed as-is for remote snapshot by powa-collector just after the query_source function has been executed. The same code should normally be present in the snapshot function for local snapshot, when the passed server id is 0. This is normally not required, but if for example you don’t want to store cumulated data in each snapshot, this is the right place to reset the metrics your extension stores. You can refer to pg_qualstats integration for a usage example.

Warning

This can’t be used as a way to free any resources the query_source function could have allocated, as this function isn’t guaranteed to be executed in some corner cases (error during query_source execution, fast shutdown of the collector…)

Warning

You should obviously be very careful with what you store in this field, or who you allow to update the powa_functions table, as no verification is done on the content.

Registering functions for data snapshot

Each of these functions should then be registered:

INSERT INTO powa_functions (module, operation, function_name, query_source, added_manually)
VALUES  ('my_datasource', 'snapshot',   'powa_mydatasource_snapshot',   'powa_my_datasource_src', true),
        ('my_datasource', 'aggregate',  'powa_mydatasource_aggregate',  NULL,                     true),
        ('my_datasource', 'unregister', 'powa_mydatasource_unregister', NULL,                     true),
        ('my_datasource', 'purge',      'powa_mydatasource_purge',      NULL,                     true);

Transient table required for data remote snapshot

When Remote setup is used, data from the source extensions have to be exported from the remote server to the repository server. Each data source therefore require a transient table to store those exported data on the repository server until the remote snapshot is finished.

The table must use this naming convention:

public.${snapshot_function_name}_tmp

For instance, if you’re adding the my_datasource data source, and the snapshot function is name powa_my_datasource_src(integer), the transient table has to be named:

public.powa_my_datasource_src_tmp

This table must have its first column declared as srvid integer NOT NULL. The rest of the column must match the output of the underlying datasource function. It’s usually recommended to have this function also returns the timestamp when the data was acquired.

For instance, if the my_datasource query source only returns a single integer value, the table would be declared like this:

CREATE TABLE public.powa_my_datasource_src_tmp(
    srvid integer NOT NULL,
    ts timestamp with time zone NOT NULL,
    my_counter integer NOT NULL
);

Note

The data stored in those table is only used, and valuable, while the snapshot is performed. So for performane reason, it’s highly recommended to declare those tables as unlogged, as their data does not need to survive any incident.