Some remarks about the DB-Engines ranking score

von Paul Andlinger, Matthias Gelbmann, 3. Dezember 2015
Tags: DB-Engines Ranking

The recent DB-Engines Ranking of December 2015 produced some surprises. Several systems, which continuosly showed a very positive trend (e.g. MongoDB, Cassandra, Redis, Neo4j) in the past, lost points in the recent ranking. Other major players, with a rather stuttering trend in the ranking so far, heavily increased their score (e.g. Oracle, MySQL, Microsoft SQL Server). Why did that come?

We have been asked that question by several DBMS vendors during the last days. Although we have disclosed the methodology of calculating the scores of the DB-Engines Ranking and many other details, we feel the need to give some additional info here:

1) The Ranking uses the raw values from several data sources as input. E.g. we count the number of Google and Bing results, the number of open jobs, number of questions on StackOverflow, number of profiles in LinkedIn, number of Twitter tweets and many more.

2) We normalize those raw values for each data source. That is done by dividing them with the average of a selection of the leading systems in each source. That is necessary to eliminate the bias of changing popularity of the sources itself. For example, LinkedIn increases the number of its members every month, and therefore the raw values for most systems increase over time. This increase, however, is rather due to the growing adoption of LinkedIn and not necessarily resulting from an increased popularity of a specific system in LinkedIn. Giving another example: an outage of twitter would reduce the raw values for most of the systems in that month, but obviously has nothing to do with their popularity. For that reason, we are using a selection of the best systems in each data source as a 'benchmark'.

3) The normalized values are then delinearized, summed up over all data sources (with weighting the sources), re-linearized and scaled. The result is the final score of the system.

The normalization step is the key to understanding the December results: the top three systems in the ranking (Oracle, MySQL and SQL Server) all increased their score. Oracle and MySQL gained formidable 16 and 11 points respectively. As a consequence the benchmark increased, leading to potentially less points for many other systems.

Why are we not using all systems as a benchmark for a data source? Well, we continuously add new systems to our ranking. Those systems typically have a low score (assuming that we are not missing major players). Then, each newly added system would reduce the benchmark and increase the score of most of the other systems.

Conclusion: it is important to understand the score as a relative value which has to be compared to other systems. Only that can guarantee a fair and unbiased score by eliminating influences of the usage of the data sources itself.
Detailed reports of the ranking performance of a system with charts and breakdowns into several categories can be ordered from us. Please find more info about that service at https://db-engines.com/en/services.

Teilen sie diese Seite mit ihrem Netzwerk