Data processing speed and reliability: in-memory synchronous replication

von Vladimir Perepelytsya, Tarantool (sponsor) , 9. November 2021
Tags: Replication, Tarantool

This is a sponsored post.

Despite the current difficult economic climate, large companies continue digitalizing their businesses. Mobile apps and personal dashboards have become indispensable service and communication channels, significantly increasing the load on IT infrastructure. There are many ways to deal with this load by improving system performance. One of the options is to use in-memory technology. But can it guarantee sufficient reliability?

About ten years ago, the capacity of 1,000 queries per second was enough to process all card-related operations in a bank. Today, only investment deals in a top 10 bank generate 5,000 queries per second. The traditional databases that many companies and banks use as their main data storage can't cope with the increased flow of requests from mobile app users. To avoid financial and reputational losses, IT departments are looking for ways to speed up the existing systems.

For many companies, the solution is in-memory technologies. The in-memory approach implies that data storage and processing take place in RAM. The data is also written to a hard drive for increased reliability. This allows achieving a much better capacity and processing hundreds of thousands of queries per second, while traditional databases can process at best 10–15 thousand queries per second. So you might ask, why isn't everyone using in-memory solutions if they are that much more efficient? The answer is simple: resources. Rebuilding a traditional database infrastructure to support in-memory technology takes a lot of money and time.

The IT landscape of most large companies had been formed long before the in-memory approach as we know it came to be. Many business processes revolve around outdated systems. Moving critical systems to another technology stack inevitably leads to massive financial expenses. Sometimes it might be cheaper to create a new company than to migrate from an outdated infrastructure. Each company makes a choice: to build new architecture from the ground up or update and improve the existing one.

In both cases, the solution must work fast enough and have sufficient fault tolerance, scalability, and flexibility. The answer to the fault tolerance problem is replication, which we will further discuss in more detail. The required scalability and flexibility can be achieved on the app or service level. But replication is built into the database architecture, and it affects the reliability and capacity of the whole system. When choosing your system vendor, pay attention to how they implement replication.

Why do we need database replication?

In simple terms, replication is copying data from one instance to several others---replicas. This improves the reliability of the solution and decreases the main database load. Since the same data is stored on several instances, the app or service can quickly switch to a copy and continue its work if the main server fails.

You can set up replicas using the technology of one or several different vendors. For example, GoldenGate, a replication solution from Oracle, is often used with databases from other vendors. However, you can build a similar solution with in-memory technology, and it would be cheaper. Replication between different vendors increases the solution's reliability. This approach lets you speed up your application without rewriting its main logic.

Sychronous and asynchronous replication

There are two types of replication---synchronous and asynchronous. In-memory solutions often use asynchronous replication since it doesn't affect server response time. The delay between making a change and seeing the result in the app is less than a millisecond, which is insignificant for user experience. The key metric that is achieved with asynchronous replication is the recovery time objective (RTO).

With asynchronous replication, transactions are sent to the replicas regardless of the response to the user. If the server fails, the data can be lost while the user gets a response about a successful write. That's why asynchronous replication is not a good solution for systems handling critical information like financial operations or client orders.

With synchronous replication, the main database sends information about the transaction to all the replicas. The replication is considered complete only when all the replicas confirm that they successfully executed the transaction. This way, the recovery point objective (RPO) is minimal. This ensures that all the replicas contain identical data and not a single transaction is lost in case of failure.

The synchronous approach has disadvantages as well. Waiting for a reply from replicas creates a lag that is a multiple of the number of confirmations. If the database is waiting for a single replica, the operation takes twice as much time to complete. If there are three replicas, it takes three times as much, etc.

In-memory synchronous replication

Any in-memory technology is designed for increased performance. The synchronous approach, where the database waits for a response from the replicas, leads to a doubled increase in latency. Thus, synchronous replication sacrifices speed for reliability. Maybe that's why few vendors implement it in their products.

Nevertheless, well-designed synchronous replication primarily affects latency and has little impact on the number of operations per second. While one query is waiting to be synchronized, the others can run in parallel. If the database has query isolation, it looks consistent, and the user doesn't see the in-between states.

With asynchronous replication, queries are finished immediately, so they don't need many resources. But when we isolate queries from each other in synchronous replication, each of them requires compute resources and memory space. It is not a problem for a thousand queries per second. However, when the number of queries reaches tens of thousands per second, those small expenses significantly load the memory and demand a lot of resources. That's why developing in-memory synchronous replication is a unique task that we have done here in Tarantool.

Searching for balance

After all, the company's needs dictate the choice of the vendor. The asynchronous approach allows faster application recovery in case of main database failure. Synchronous replication is a better choice if the service displays static information that doesn't depend on the user's actions.

However, if users can make critical changes to the service, the system priority is to ensure that data are consistent across replicas. Thus, if the main server fails, the app will continue working without losing any transaction data.

Even though synchronous replication can make in-memory solutions as reliable as traditional databases, this approach is rarely seen.

About the Author:
Vladimir Perepelytsya portrait photo

Vladimir Perepelytsya is a Technical Product Manager

He has 20+ years of programming experience and 8 years of using Tarantool in projects. Vladimir created S3-compatible Cloud Storage for VK.

If you have any questions about Tarantool please use the Telegram support chat. Tarantool can be downloaded here.

Teilen sie diese Seite mit ihrem Netzwerk