July 26, 2016
This article was contributed by Jade Alglave,
Paul E. McKenney, Alan Stern, Luc Maranget, Andrea Parri and TBD
The herd program reads in a litmus test and
evaluates it according to a memory model.
The model is contained in a .cat file
specified by the “-model” command-line option
(or a “model” line in the configuration file)
plus an optional .bell file
specified by the “-bell” option
(or a “bell” line in the configuration file).
The Bell file, if present, gets processed first.
Both files consist of statements in the cat language,
and herd treats them almost identically—the
only real difference is that the Bell file is allowed to
contain “instructions” statements but the
Cat file is not.
(The “instructions” statement defines
tags which can decorate elementary operations.
For instance, one such statement might specify that a
barrier operation can be decorated with an “mb” tag,
which could indicate that the operation is a full memory barrier.)
Typically a Bell file is used for common code that is shared
among multiple models, where each model would have its own Cat file.
After loading the Bell and Cat files, herd parses the
litmus-test program.
The program may contain macros
(for instance, “smp_mb()”)
which are expanded to internal constructs
(such as a barrier with an “mb” tag).
The mapping from user-level macros to internal constructs
is defined in a file like linux.def that is specified
by the “-macros” command-line option
(or a “macros” line in the configuration file).
Next, herd interprets the program by constructing a list of
events for each thread.
For programs written in a high-level language like C, this involves
breaking statements and expressions down into a series of elementary
operations (read, write, arithmetic/logic on registers, branch, and so on);
for programs in assembly language, the individual instructions
generally correspond directly to these operations.
However, atomic read-modify-write instructions always get represented
by two operations, a read and a write, linked by the built-in rmw
relation.
The events are organized as one list per thread, in program order.
This is not always straightforward, because of a subtle but important fact:
“Program order” refers to the order of instructions
as they are presented to the processor's execution unit,
not their order in the program's source or object code.
While the two orders are often the same, they will differ when branches
are present.
A forward branch causes some instructions to be left out of the event list,
and a backward branch can cause some instructions to be repeated in the list.
Since it is not known in advance whether a conditional branch will be
taken, each such branch causes herd to generate two event lists:
one in which the branch is taken and one in which it isn't.
Thus, a program containing two conditional branches will give rise to four
lists, a program containing three will give rise to eight, and so on.
The po relation then refers to the order of the events in an
individual list,
and herd has to test each list separately, as a possible
program execution.
When the program contains a loop, a conditional branch may be taken
an indefinitely large number of times.
In this situation the number of possible executions would quickly get
out of hand, so there is a limit on how many times herd will
allow a particular branch to appear in an execution
(specified by the “-unroll” command-line option),
typically set to 2.
Executions with a higher number of iterations simply will not be considered.
Then, given a candidate execution, herd has to determine, for each
read event, which write event stored the value that the read will retrieve.
Again, there's no way to know this in advance, so if a program has
more than one write to a particular variable, herd has to try
all possible combinations for the rf relation.
Just as with conditional branches, this can lead to exponential growth
in the number of possible executions to be tested.
As part of its processing of a candidate execution, herd
carries out a dataflow analysis of the values computed and stored in
the local variables (or CPU registers) for each thread.
This analysis gets used in several ways:
- herd checks each conditional branch, making sure that
the branch's condition is true if and only if the execution
has decided to take the branch.
- herd checks the target address of each indirect memory
access (i.e., access through a pointer or relative to a CPU
register), making sure that the rf relation really
does link writes and reads to the same target address.
- herd determines exactly what data, address, or control
dependencies exist between
memory-access events.
These dependencies are made available to the model
through the built-in data, addr, and ctrl
relations.
As an example of this last point, given the statement
WRITE_ONCE(*x, READ_ONCE(*y));
in the litmus-test program,
herd would break it down into two events:
rtemp = READ_ONCE(*y)
WRITE_ONCE(*x, rtemp)
(where
rtemp is a temporary local variable), and it would add a link
rtemp = READ_ONCE(*y) ⟶ WRITE_ONCE(*x, rtemp)
to the
data relation.
Finally, once a particular choice for the po and rf
relations has been settled on, the execution checks out okay,
and the data, addr, and ctrl relations have
been set up, herd runs
the statements in the Bell and Cat files to see whether the memory model
considers the candidate execution to be allowed.
If any of the model's checks fail, the execution is abandoned.
Otherwise, herd evaluates the logical assertion at the end of
the litmus test.
It keeps track of the number of allowed executions for which the
assertion is true and the number for which it is false; these are the
numbers reported at the end in herd's output.
Unlike po and rf, the co relation is not built-in.
It has to be computed explicitly by the memory model.
In practice this is done by the cos.cat file,
which is included near the start of the model's Cat file
(see for example line 3 in each of the two RMO memory-model files above).
This involves another potentially exponential computation,
because it is necessary to try all possible orderings of the write accesses
to each variable.
herd works in terms of sets of events and relations between events.
(A relation is a collection of ordered pairs of events;
you can think of each ordered pair as a link
going from the first event in the pair to the second.)
The cat language used in the
Bell and Cat files
is rich in operators for constructing and testing these sets and relations.
To begin with, herd has a number of built-in sets
used for classifying events.
Each event is automatically added to the appropriate sets.
Name |
Contents |
Comment |
R |
Read events |
|
W |
Write events |
|
IW |
Initial Write events |
“writes” that set a variable's initial value |
FW |
Final Write events |
values that are tested in the final assertion |
M |
Memory access events |
same as “R | W” |
B |
Branch events |
|
RMW |
Read-Modify-Write events |
the component events of an atomic RMW instruction |
F |
Fence events |
also known as Barrier events |
_ |
All events |
wildcard |
In addition,
using the “enum” and “instructions”
statements, a Bell file can define an enumerated list of tag values
and specify that instructions of a specific kind must be labelled
with one of these tags.
herd then creates a set for each tag value;
the name of the set is the same as the name of the tag but with the
first letter capitalized, and the set contains all events generated
from instructions labelled by the corresponding tag.
For example, the following cat code:
enum Accesses = 'once || 'release || 'acquire || 'deref
instructions R[{'once,'acquire,'deref}]
defines a bunch of
Accesses tags,
and says that all read (“
R”) instructions
should be labelled with a
“
once”, an “
acquire”,
or a “
deref” tag.
Given this, the “
Once” set would contain all events
corresponding to an instruction (possibly a read, possibly something else)
labelled with the “
once” tag,
and similarly for the “
Release”,
“
Acquire” and “
Deref” sets.
herd also comes with a selection of
built-in relations,
some of which we have already mentioned:
Name |
Relation |
Comment |
0 |
Empty |
empty relation, contains no links |
id |
Identity |
links each event to itself |
int |
Internal |
links events that are in the same thread |
ext |
External |
links events that are in different threads |
loc |
Location |
links memory-access events that target the same variable |
rf |
Reads-From |
links a write event to any read event that loads the value
stored by that write |
rmw |
Read-Modify-Write |
links the read and write component events of an RMW instruction |
po |
Program Order |
links events in the same thread, in the order they occur
in the instruction stream |
addr |
Address dependency |
links a read event to any memory-access event whose target address
depends on the value loaded by the read |
ctrl |
Control dependency |
links a read event to all events that are executed conditionally
depending on the value loaded by the read |
data |
Data dependency |
links a read event to any write event that stores a value
which depends on the value loaded by the read |
herd's standard libraries stdlib.cat (automatically included
for all models) and cos.cat define a
few extra relations,
which can be used as though they were built-in:
Name |
Relation |
Comment |
po-loc |
po for the same location |
links memory-access events that target the same variable,
in program order;
same as “po & loc” |
rfe |
external reads-from |
rf restricted to pairs of accesses in different threads;
same as “rf & ext” |
rfi |
internal reads-from |
rf restricted to pairs of accesses in the same thread;
same as “rf & int” |
co |
coherence order |
total ordering of all writes to the each variable |
coe |
external coherence order |
co restricted to pairs of writes in different threads;
same as “co & ext” |
coi |
internal coherence order |
co restricted to pairs of writes in the same thread;
same as “co & int” |
fr |
from-read |
links a read event to any write event for the same variable
that comes after (in the variable's coherence order) the
write which the read event reads from;
same as “rf^-1 ; co” |
fre |
external from-read |
fr restricted to pairs of accesses in different threads;
same as “fr & ext” |
fri |
internal from-read |
fr restricted to pairs of accesses in the same thread;
same as “fr & int” |
Bell and Cat files can compute their own sets and relations
using
functions and operators
provided by the cat language
and libraries, as summarized in the following table.
Operators with higher precedence (tighter binding) are higher up in the table.
Operator |
Operation |
Example |
Applicability |
domain |
Domain of a relation |
domain(x)
computes the set of all events that are the start of
a link in x |
Applies to a relation, yields a set |
range |
Range of a relation |
range(x)
computes the set of all events that are the end of
a link in x |
Applies to a relation, yields a set |
fencerel |
Link events separated by a fence |
fencerel(x)
computes the relation consisting of all pairs of events
where the first precedes (in program order) an event
in the set x and the second follows it; the same
as “(po & (_ * x)) ; po” |
Applies to a set, yields a relation |
Postfix ^-1 |
Inversion |
x^-1
computes the relation obtained by reversing
all the links in x |
Applies to a relation |
Postfix ? |
Reflexive closure |
x?
computes the relation consisting of all pairs of events
that can be connected by a chain of links from
x of length 0 or 1;
the same as
“id | x” |
Applies to a relation |
Postfix + |
Transitive closure |
x+
computes the relation consisting of all pairs of events
that can be connected by a chain of links from
x of length 1 or greater;
the same as
“x | (x;x) | (x;x;x) | ...” |
Applies to a relation |
Postfix * |
Reflexive-transitive closure |
x*
computes the relation consisting of all pairs of events
that can be connected by a chain of links from
x of length 0 or greater;
the same as
“id | x | (x;x) | (x;x;x) | ...” |
Applies to a relation |
Prefix ~ |
Complement |
~x
computes the relation (or set) consisting of all links (or events)
not in x |
Applies to a relation or a set |
Infix * |
Cartesian product |
x * y
computes the relation consisting of all links from
an event in set x to an event
in set y |
Applies to sets, yields a relation |
\ |
Difference |
x \ y
computes the relation (or set) consisting of all links (or events)
in x that are not in y |
Applies to relations or sets |
& |
Intersection |
x & y
computes the relation (or set) consisting of all links (or events)
in both x and y |
Applies to relations or sets |
; |
Sequencing |
x ; y
computes the relation consisting of all links
a⟶c such that for some event b,
x contains a⟶b
and y contains b⟶c |
Applies to relations |
| |
Union |
x | y
computes the relation (or set) consisting of all links (or events)
in x or y or both |
Applies to relations or sets |
(* *) |
Encloses comments |
(* This is a comment *) |
|
Although the language includes a variety of statement types,
the ones found most frequently in Bell and Cat files are assignment
(“let” or “let rec”)
and check statements.
“let” statements are self-explanatory;
we have already seen several examples in the RMO memory models above.
“let rec” statements,
used for mutually recursive definitions,
are more complex; we will see an example
here.
Check statements come in three forms:
- “acyclic <expr>”
requires the relation computed from “expr”
not to have any cycles;
- “irreflexive <expr>”
requires the relation computed from “expr”
not to have any links from an event to itself;
- “empty <expr>”
requires the set or relation computed from “expr”
to be empty.
If a check fails, it indicates that the memory model prohibits the
candidate execution under consideration;
that is, the memory model says that this execution could never occur.
A check can be negated by prefixing it with “~”.
Also, a check can be flagged by putting the “flag”
keyword in front of it.
Unlike a normal check, when a flagged check succeeds it indicates that
the execution has encountered some sort of semantic problem.
If this happens, herd adds a warning message to its output.
Along with the “let”,
“include”, and check statements,
the only other types of statement we will see are
“enum” and “instructions”,
both briefly
mentioned earlier.
The cat language supports many features we won't cover here,
including higher-order sets and tuples, pattern matching, user-defined
functions, iteration, and recursion in the style of OCaml
(the language herd is written in).
herd itself also has a large number of features we won't discuss,
such as the ability to skip certain checks when
testing a memory model or to produce figures (like the ones in this
document) illustrating the events and relations in a particular
program execution.
More complete documentation can be found in the
herd manual.
With this background, we are ready to examine larger, more realistic
memory models.
Something like the
Linux-kernel memory models,
in fact.