Time Series DBMS as a new trend?

von Paul Andlinger, 1. Juni 2015
Tags: Graphite, InfluxDB, RRDtool, Time Series DMBS

With the current ranking, we introduced the new category Time Series DBMS and added 15 systems of that kind to our ranking.

A Time Series DBMS is a database management system that is optimized for handling time series data: each entry, sometimes also called an 'event', has an associated timestamp.

Today there are many sources for time series data, for example:

measurements generated by sensors,
status information and other metrics produced from complex systems for monitoring purposes,
stock tickers of a high frequency stock trading system,
industrial fleets (trucks, ships, trains, aircrafts, …) produce location, velocity and other operational metrics,
calls, messages and other signals sent from communication devices or respective applications
and many more ('Internet of Things')

They have in common that the data tends to be immutable, large in volume, ordered by time and primarily aggregated for access.

As a consequence, we introduced the additional DBMS category 'Time Series DBMS' with the current DB-Engines ranking. At present we list 15 systems in that category, with the 3 top ranked systems being:

1.	RRDTool
2.	InfluxDB
3.	Graphite

Typical features of Time Series DBMS:

Time Series DBMS are designed to efficiently collect, store and query time series data with high transaction volumes. Although that type of data could be managed with other categories of DBMS (and some systems even provide appropriate design patterns or even extensions for handling time series), the specific challenges often benefit from specialized systems by supporting:

Downsampling data: e.g. a sensors value is stored per second and a query shall deliver the averaged value per minute. A typical SQL-query needs a group by clause with an expression similar to something like 'group by integerdivision(time, 60)', whereas Time Series DBMS support something like 'group by time (1 minute)'
Comparison with the previous record: e.g. a 'table' contains stock prices (one tick per day). A query should deliver all days, in which the price of a specific stock had increased. In a relational system (and using standard SQL), we would have to self join the table and figure out how to match each record with its previous one. That may be a non trivial task and is definitely not efficient. Time Series DBMS typically offer specific features for that requirement.
Joining time series: Joins will put two or more time series together, by matching timestamps. Those timestamps, however, may not match exactly. Time Series DBMS often provide features for that task, e.g. by allowing the following SQL-like syntax: SELECT errors_per_minute.value / page_views_per_minute.value from errors_per_minute INNER JOIN page_views_per_minute.

For the one or other system, we found it difficult to decide whether to add it as another Time Series DBMS, or to classify it as a monitoring application (which we do not take care of). We then applied as a rule of thumb, that for being a DBMS, a system at least has to offer an API for inserting and querying data and must not be specific to a single domain.

We will have an eye on how Time Series DBMS will evolve in future.

Teilen sie diese Seite mit ihrem Netzwerk