Sign In
Scalability Notes
[Read -> Think -> Write]
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Blog Home
Email Blog Author
Share this
RSS for posts
RSS for comments
Search
Advanced search options...
Search In:
Everything
Blogs
Forums
People
Groups
Places
Pages
Date range:
All Time
Last Year
Last 6 Months
Last 3 Months
Last Month
Last Week
Last Two Days
Tags
database
distributed system
engineering
hpc
network
parallel
scalability
search
Archive
Archives
December 2010
(1)
September 2010
(1)
August 2010
(1)
April 2010
(1)
February 2010
(2)
January 2010
(4)
December 2009
(1)
November 2009
(1)
October 2009
(1)
September 2009
(1)
August 2009
(4)
June 2009
(2)
May 2009
(1)
April 2009
(1)
March 2009
(2)
February 2009
(4)
January 2009
(1)
Consistency Model - A Survey
MSDN Blogs
>
Scalability Notes
>
Consistency Model - A Survey
Consistency Model - A Survey
changl
14 Jun 2009 6:16 PM
Comments
0
Part I - What's Data Consistency Model and Why Should We Care?
Data Consistency Model
- it is a
Semantic Contract
between a data storage system and its user. Here, data storage system may refer to hardware system ( for example : memory sub-system inDSM, SMP, CMP computers) , or software systems (for example: distributed file system,
web caching or distributed database).
Essentially
, Consistency Model defines what value to return in a read operation.
The most natural semantic for storage system is - "read should return the
last
written value". It is intuitive in memory system with uniprocessor associated, where no concurrent access and no data replication exist.
But when concurrent client access and multiple replica exist, it's not easy to identify what "
last
write
" means, mainly due to the lack of global time clock in parallel/distributed system.
So various data models are proposed to define various data semantic when
Contention
and
Replication
exists:
-
Data Contention
- In memory sub system of parallel computers, there may be multiple processors that access the same memory location at the same period of time, what's the semantic each processor should expect? (different process access the same data item)
-
Data Replication
- In distributed file/database system, each data block is replicated at multiple places, even access(read/write/modify) requests from the same client may involve more than one replicas, what's the semantic in client's perspective? (data item is replicated)
-
Both
Contention & Replication
: What if multiple clients access replicated data blocks? What if there is a local cache for each processor in SMP system? (many processes access replicated data items)
Part II - The Various Models
There are many consistency models exist in computer architectures and distributed storage community. Why people invent so many semantics between storage and its clients? The reason is the trade off between
Strict Semantic
VS
Available
Optimization
Space
- strict (also simple and easy to understand, use and implement) semantic will limit the space that we can use to improve availability and do performance optimization. So each model has its advantages and drawbacks.
Before describe various models, we should clarify some concepts first:
Program Order
- the order of operations (on data items) that is implied by the order of program statements(or assembly instructions). It's the issuing order of operations that are from the same thread/process.
Write Atomicity
- write operation performs as no concurrent/overlapped operations can occur. So the meaning of "
Atomicity
" here differs from that in DBMS field, where "Atomicity" means that
all
ops in a transaction will be performed or
none
of them will be performed.
Consistency
VS
Coherence/Replication
, Consistency focus on the semantic of data store (a set of data item) for multiple client accessing, while Coherence/Replication only deal with the semantic (and how to implement it) of a replicated data item (multiple replica exist) when it is accessed by multiple clients. Coherence/Replication is just part of the Consistency Model.
Coherence / Replication Protocol
only describes how to implement part of the semantic of some specific data consistency model.
[
Update@07/23: Coherence
and
Replication
are very similar and deal with the same problem, but the former is used in hardware system (for example, cache coherence) and the later is used in software system (for example, in distributed files system and database system]
Following are various Consistency Model descriptions:
1. Atomic Consistency
/
Strict Consistency
It means that for a particular data item (byte, cache block and file region etc.), each read will return the latest update. I.E. events are visible to all clients as they occurs/issues. As we already said, due to the lack of global time clock in distributed/parallel system, it is very hard to implement if not impossible.
2. Sequential Consistency
This model is defined as - "The result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appears in this sequence in the order specified by its program." - Leslie Lamport
This definition implies:
1. Events/Operations from the same process are executed in program order
2. Every client sees the same order of all events/updates, which is a sequential one
Goods:
- Ensures every body sees the same order, very good for data replication
- Preserves causality
Bads:
- Many interleavings are valid (two independent program with M mem ops and N mem ops will have (M + N)!/(M!N!) valid interleavings)
- Different runs of a program might act differently
- Execution order may be very different from issuing order
3. Causal Consistency
Events that are
causally related
must be seen by everybody in the same order. Unrelated ("concurrent") events can be seen in different orders.
The challenge in implementing this model lies in how to identify "causally related" events.
4. Eventual Consistency
If no updates take place for a long time (inconsistency window), all replicas will gradually become the latest version of the value.
In this model, if client reads data from different replica, it's hard to define a clear semantic about the return value. To resolve this problem,
client-centric consistency
is introduced. These models focus on the semantic in a single client's perspective.
Monotonic Reads
- If a process reads the value of an object, any successive read operations on that object by that process will always return the same or more recent value.
Monotonic Writes
- A write operation by a process on a data item x is completed before any successive write operation on x by the same process.
Read Your Writes
- The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process. A closely related model is
"Session Consistency".
In this model,
Read Your Write
semantic only works within a session context and not cross session boundary.
Write Follow Reads
- Any successive write operation by a process on a data item x will be performed on a replica of x that is up-to-date with the value most recently read by that process.
5. Weak Consistency
A storage system is said to support weak consistency if:
All synchronization operations are seen by all processors in the same sequentially order.
All data operations may be seen in different order on different processors. (But order of ops on the same data item is preserved)
The set of both read and write operations in between different synchronization operations is the same in each processor.
An alternative definition is as:
Accesses to synchronization variables are sequentially consistent.
No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere.
No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed.
This model is weaker than sequential consistency model, since no order assumption can be made on data operations between consecutive synchronization operations.
The weak consistency leverage the fact that - between consecutive sync operations, no other process can use the data being written. So it's safe to reorder write operations on different mem locations. But the order of operations on the same location must be preserved to ensure a reasonable memory semantic.
6. Release Consistency
Release consistency is an extension of weak consistency that exploits the information about
acquire
,
release
, and
nonsynchronization accesses
.
Before an ordinary LOAD or STORE access is allowed to perform with respect to any other processor, all previous
acquire
accesses must be performed, and
Before a
release
access is allowed to perform with respect to any other processor. all previous ordinary LOAD and STORE accesses must be performed, and
Special accesses are processor consistent with respect to one another.
In Release Consistency Model, all write operations by a certain node are seen by the other nodes after the former
releases
the object and before the latter
acquire
it, but a node must call
acquire
to get up-to-date values.
There are two kinds of coherence protocols that implement release consistency:
-
eager
, where all coherence actions are performed on release operations, and
-
lazy
, where all coherence actions are delayed until after a subsequent acquire
7. Entry Consistency
Like all variants of
Release Consistency
, it requires the programmer to use
acquire
and
release
at the start and end of each critical section, respectively. However,
entry consistency requires each ordinary shared variable to be associated with some synchronization variable such as a lock or barrier
.
- An
acquire
access of a synchronization variable
S
is not allowed to perform with respect to process
Pi
until all updates to the guarded shared data
Ds
have been performed with respect to process
Pi
.
- Before an exclusive mode access to a synchronization variable
S
by processor
Pi
is allowed to perform with respect to
Pi
, no other processor may hold
S
in non-exclusive mode.
- After an exclusive mode access to
S
has been performed, any processor's next non-exclusive mode access to
S
may not be performed until it is performed with respect to the owner of
S
.
8. Processor Consistency
- Writes done by a single processor are received by all other processors in the order in which they were issued, but
- Writes from different processors may be seen in a different order by different processors
The motivation behind this model is to better reflect the reality of networks in which the latency between different nodes can be different.
This model is also known as
FIFO Consistency
and
P
ipeline
R
andom
A
ccess
M
emory
Consistency
.
Notes on Nex Steps:
1. Why People Invent Each Model?
2. What's Good and What's Bad for Each Model?
[Reference]
[1]
Wiki on Consistency Model
[2]
Summary on Consistency Model by CTO@Amazon
[3]
Shared Memory Consistency Models: A Tutorial
[4]
Memory Consistency Models
[5]
Notes On Consistency
[6] Entry Consistency,
Midway : Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors
[7] Release Consistency,
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
[8] Weak Consistency,
Memory Access Buffering in Multiprocessors
[9] Memory Ordering in Modern Microprocessors (Part
I
, Part
II
)
[10]
Wiki on Cache Coherence
0 Comments
distributed system
Blog - Comment List MSDN TechNet
Comments
Loading...
Leave a Comment
Name
Comment
Please add 1 and 2 and type the answer here:
Post