img
Home > Candidate Patterns > In-Memory Fault-tolerant Stateful Services

In-Memory Fault-tolerant Stateful Services

How can high availability and low latency data access speeds be provided for services that require state to be maintained across failure and restart or failover?

Problem

Context data and rules associated with a particular service activity can impose a great deal of runtime state management responsibilities upon composition controllers and other involved services, thereby reducing their scalability.

Typical failover, or failure and restart approaches involve transaction manager rollback and recovery, which may be untenable in a large scale distributed system, or may require manual intervention. Another typical approach is to use message replay and build complete idempotency into service implementations which either require that the architecture serialize all of the context of the business transaction over the wire, or the use of a state repository to store the current state of the business transaction.

Typical implementations of State Repository used by Stateful Services and Partial State Deferral and its variations can become a bottleneck for processing, single point of failure, and may require special programming models and best practices for state management.

How can I store and retrieve transient service state data in a fault-tolerant fashion without the use of disk persistence, without the need for special state passing models to be implemented and enforced across an organization?

Solution

Context data is managed and stored by dedicated system services that are intentionally stateful. By use of the Fault Tolerant Collection a service instance may put and get its transient state data reliably in and out of familiar data structures such as java Hashmap or .NET Dictionary, with near in-memory speeds while still achieving predictable scalability and high availability of services. This is achieved by using the System Service FT Collection that provides redundancy and failover of in-memory application data by synchronizing in-memory data structures with a duplicate copy that exists on another machine on the network.

Application

A shared or dedicated FT Hashmap or FT Dictionary utility is made available as part of the inventory or service architecture. A redundant backup or a restarted service may pick up state data from a previous invocation.

Impacts

Performance bottlenecks due to disk I/O are dramatically reduced or eliminated. Synchronization of primary and backup data between network nodes may be affected by network latency. Idempotency may still have to be built into the service logic, but to a lesser degree.

Architecture

Inventory, Service

Status

Under Review

Contributors

David Chappell

Problem

Context data and rules associated with a particular service activity can impose a great deal of runtime state management responsibilities upon composition controllers and other involved services, thereby reducing their scalability.

Typical failover, or failure and restart approaches involve transaction manager rollback and recovery, which may be untenable in a large scale distributed system, or may require manual intervention. Another typical approach is to use message replay and build complete idempotency into service implementations which either require that the architecture serialize all of the context of the business transaction over the wire, or the use of a state repository to store the current state of the business transaction.

Other common solutions to this problem involve the use of State Repository and/or Partial State Deferral. As noted in Partial State Deferral : "...the routines required to program service logic that carries out runtime state data deferral and retrieval adds overall design and development complexity and effort. Finally, if the aforementioned algorithm optimization is not possible, the retrieval of large amounts of business data as part of a sequential processing routine will introduce some extent of lag time."

In addition, in high volume throughput scenarios, conventional means of storing data such as file persistence or relational database storage can become a bottleneck for processing, increasing latency of service requests beyond the acceptable limits of Service Level Agreements (SLA)

img

Solution

Context data and rules are managed and stored by dedicated system services that are intentionally stateful. By use of the Fault Tolerant In Memory Collection a service instance may put and get its transient state data reliably in and out of familiar data structures such as java Hashmap or .NET Dictionary, with near in-memory speeds while still achieving predictable scalability and high availability of services. This is achieved by using the System Service FT Collection that provides redundancy and failover of in-memory service state data by synchronizing in-memory data structures with a duplicate copy that exists on another machine on the network.

img

Impacts

Performance bottlenecks due to disk I/O are dramatically reduced or eliminated. Object to relational mappings are not required. Synchronization of primary and backup data between network nodes may be affected by network latency. Reduces workload on composition controller for managing state on behalf of coordinated services.

Relationships

img