Why is CSP Needlessly Difficult to Learn and so Easy to Forget, and Why are its Semantics Upside Down?
Leader: Peter Welch <p.h.welch@kent.ac.uk>
Estimated time: 1 hour plus
Background
There has been a gap between the ways CSP and occam developed that's troubled
me for a long time. My problem is that I think "occam" and that leads me to
make mistakes in "CSP". Of course, my answer is that CSP must be doing it
wrong and needs fixing. And, of course, I am probably very wrong!
Nevertheless, I shall attempt to explain my difficulties with CSP, with respect
to simplicity, programming concepts and an old-fashioned (but arguably more
natural) view of semantics.
Difficulties
Sequence ...
Why does CSP have two operators for sequence ("->" and ";") whereas occam
needs just one ("SEQ")? A CSP event is not a process, but is something
that prefixes (or guards) a process. Hence, there is one symbol for that prefixing
(which means sequence) and another symbol for combining processes in sequence
(which means sequence).
Hoare's earlier CSP made use of the separate symbols: the operands for
the external choice operator had to be prefix guarded processes
(e.g. "a -> P"). However, modern CSP dropped this restriction: so
that reason for the two symbols no longer applies.
Events and Processes ...
At the end of the day, CSP processes consist of little else but events,
engaged with and refused in various ways. If events were processes,
as they are in occam, CSP would be simpler. Here's one beginner's problem
that goes away. Consider:
(a -> b -> c -> X) [] (b -> e -> X)
We cannot factor out the "-> X" to shorten the above to:
((a -> b -> c) [] (b -> e)) -> X
since the above, of course, does not parse in CSP. However, with events as
processes:
(a; b; c; X) [] (b; e; X) = ((a; b; c) [] (b; e)); X
the factorisation is allowed.
Making such a change to CSP cannot damage
its existing consistency and semantics, since we could simply map "a" in
the new CSP to "a -> SKIP" in the old. But see the next section.
Channel Input Events ...
There is one exception threatening to spoil the above story: channel input events.
Channel inputs in CSP have a special semantics, different from those for channel outputs,
that is a bit of a surprise to new (and old) learners.
If we have:
out ! x -> P
this simply means output the value of "x" on channel "out" – an offer
to engage in a single event – and, then, do process "P".
On the other hand:
in ? x -> P
means input any value from channel "in" – an offer to engage
in a set of events (one for each possible arrival value) –
choose between them (if the environment offers more than one, though that
is unusual), bind the value chosen to the name "x" (which is a new name,
having no connection with any other "x" that may previously have been in
scope) and, then, do process "P" (in which this new "x" is now in scope).
Phew!
Unfortunately, these semantics mean that mapping, from our brave new CSP
to the old, by replacing "in?x" with "in?x -> SKIP" does not work,
since that "x" would only be in scope for the "SKIP".
CSP's use of its "->" operator, when combined with a left operand
channel input to introduce a new name with scope localised to its right
operand, is concise but overloads the syntax a little and confuses some of us.
Localised scope is a great thing and we want more – see the next
section! Why not do this in a way that generalises? For example:
int MAX = 1000: P -- I like explicit typing
introduces the name "MAX", bound to a value, and localises its scope
to the process "P". Currently, we can declare:
MAX = 1000
but its scope is global (unless restricted by the bounds of a "let ... within"
clause).
Now, declaring and binding a name for localised scope is what CSP channel input
does. So, let's have a syntax that makes this explicit (i.e. not overloaded
with event prefixing and misleadingly similar to channel output) and that
aligns with the syntax for all local declarations – for example:
int x ? in: P -- which I prefer over "int in ? x: P" and "int x = in?: P"
Reasons for preference: the first alternative, "int in ? x: P", places
the name being declared ("x") inconsistently with other declarations
(e.g. "MAX" above);
the second alternative, "int x = in?: P", implies that "in?"
is an expression ... which is bad because its evaluation has a side-effect
... and that breaks the simple rules of algebra
(e.g. the value of "(in?) - (in?)" depends on its order of evaluation).
These declarations of named values align with the proposal for local channel declarations
in the next section.
Hiding Globals versus Declaring Locals ...
Then, there's hiding, :(. Events in CSP live in a flat global space.
If processes in a network want to have private synchronisations just
between themselves, we first have to declare events (channels) in that
global space and then hide them from everyone else's view. Far more
natural – from the perspective of a programmer – would be for the
network to declare local events (in its private namespace) and just
use them. Nothing needs to be hidden post-construction of the network.
This may seem a small thing, but having that global name space has
caused me nasty problems (e.g. in mapping the verification extensions
for occam-pi into CSP) that I would rather not have had to confront.
occam has traditional block structured name spaces and events are not
excluded from this. Why can't CSP?
Here's a proposal. Consider:
chan int c: P
This declares a channel, "c", that carries integers and is in scope only
for the process "P".
This replaces:
channel c : Int -- global declaration (could be anywhere)
P \ {| c |} -- hide all 'c.x' events within 'P'
The proposal may seem a modest change in syntax; but the declaration of "c"
in the standard CSP above has global scope and can be used by any other process
(but had better not be!).
Channel Arrays ...
Then, there are channel arrays:
channel c: {0..9}.Bool
Is this a single channel carrying two values per message or an array
of channels carrying one value each? In CSP, they are the same.
From the system designer's (and programmer's) perspective, this seems nuts.
However, with regard to their semantics, they really are the same
and CSP deserves praise for avoiding unnecessary duplication –
Ockham's Razor.
I just worry that this good property from the mathematics might be a bit
too abstract and clever.
One consequence is this: in order to be able to select which channel (or
channels) of an array of channels from which to take input, processes
also need to be able to select which input value (or values) it is prepared
to take from an individual channel. For example:
c.42?b
might mean: "input a Boolean value from channel 42 of the array c".
Or it might mean: "accept only the value 42 from channel c and then
input a Boolean".
This second ability – refusing an input depending on the value being
sent – whilst having a simple semantic sense, throws up awkward
problems for implementation that may add significant overheads.
Consider two processes running on different processors with different
phyiscal memory. Suppose the message has considerable size. If the
receiving process refuses certain message values and accepts others,
the whole message must be received before its decision on acceptance
can be made. If the receiving process subsequently never accepts this
message, or the sender subsequently withdraws it, the transfer of the
message has wasted time (for these processes and others, through load
on the communication fabric) and memory (in the receiving process).
Without that ability to refuse input because of its content, this
waste does not happen – only an initial tag for the message
need be sent so that the receiver becomes aware of the offer and
commits to it or not, as it pleases. This was the position for
classical occam and is maintained in occam-pi.
On the other hand, refusing certain channels from an array of them
– a special case of the above if no distinction is made –
is simple to express and simple to implement, with none of the overheads
of the general case. Again, this is the position with occam.
Now, it can be argued that such implementation overheads may be the concern
of a programming language but is no concern to a language used for formal
reasoning or model checking. A problem arises, though, when systems are
designed using the formal language so that its properties can be verified
before translation into something executable (and filling in any necessary
computational detail not relevant for the formal model). If the verified
formal model uses capabilities not available from the programming language,
the model must be changed not to use them and, of course, re-verified.
Alternatively, if those capabilities are available from the programming
language but only at significant run-time costs, the model must still
be changed not to use them and re-verified for reasons of optimisation.
So, maybe avoid temptation and remove the capability from the formal language?
In this case, that means: accept any input or refuse any input,
which has the merit of simplicity.
And add, of course, a conventional array notation so that, for channel arrays,
we can choose which elements to use.
Complex Events versus Session Protocols ...
Next are the complex events in CSP – e.g.
d?x?y!f(x,y)
which means, I think, bind "x" to any value coming down "d"; then
(i.e. in sequence), bind "y" to any value coming down "d"; then
compute "f(x,y)" and send the answer back up "d". occam-pi has
a proposal for
two-way (or session) protocols
that is both more general than this and simpler.
Channels carry just one value (which may be a complex data structure)
per message. However channels are declared with a protocol that
enforces processes using them to follow only the declared sequences
and directions of operation.
See also Life of occam-Pi.
The CSP channel declaring the above "d"
only declares the types of message in the complex sequential events,
but not their directions. Again, this leads to (unnecessary) mistakes.
Processes Hold Channel Ends, not Channels ...
And talking of channel directions, occam-pi requires processes
engaging on channels to declare which end they hold. For simple
session protocols (e.g. a single message), this means whether
they are sending or receiving – which aids understanding and
enables better compiler error messages if mis-used. For more
complex session protocols, this means whether they follow the
declared input-output sequences or inverts the directions.
This also aids understanding and enables better error messages.
The Tyranny of Alphabets ...
Next we come to alphabets. In Hoare's original CSP, these had
to be declared for each process – and occam does the same (though
these are implicitly deduced by the compiler). For flexibility
in specifying which events are interleaved and which must wait
for synchronisation, modern CSP associates event alphabets with
a range of parallel composition operators, rather than with
processes – at least, I suppose that is the reason. The result,
though, is an alphabet tyranny that can be pretty difficult
to live with sometimes. Well, I have that trouble anyway.
Here's a simple example, which also shows that associativity of
parallel composition is not straightforward. Consider:
--------- ---------
| | a | |
| P |------------------| Q |
| | | |
--------- ---------
| |
| |
c | | b
| --------- |
| | | |
---------| R |----------
| |
---------
We have a cycle of processes, each one synchronising with the one
immediately before and after it in the cycle. Parallel composition
is a binary operator, so we must choose which two processes to
combine first and then add the third. There are three ways to do
this, ignoring commutativity (which is straightforward). Either:
(P [| {a} |] Q) [| {b, c} |] R
which gives no indication that "b" syncs "R" and "Q" and that "c"
syncs "R" and "P". Or:
P [| {a, c} |] (Q [| {b} |] R)
which gives no indication that "a" syncs "P" and "Q" and that "c"
syncs "R" and "P". Or:
(P [| {c} |] R) [| {a, b} |] Q
which gives no indication that "a" syncs "P" and "Q" and that "b"
syncs "R" and "Q".
An alternative, that gives more information, is to use the alphabetised parallel
operator.
This declares the alphabets on which the two operands synchronise separately
– the synchronisation set of the parallel processes is then their intersection.
For example:
(P [{a, c} || {a, b}] Q) [{a, b, c} || {b, c}] R
This does make all the details of who syncs with whom explicit.
But all those alphabets become overwhelming as systems scale in the number
of events engaged upon by their processes.
If alphabets were associated with processes, as in Hoare's original
CSP, we would just have:
alpha (P) = {a, c} alpha (Q) = {a, b} alpha (R) = {b, c}
(P || Q) || R = P || (Q || R) = (P || R) || Q = P || Q || R
which is easier on the eye (and where the alphabets for the combined processes arise
automatically from the union of the individual alphabets).
However, the original CSP gave little flexibility when it comes to
interleaving events. Two parallel processes either synchronised
on common events in their alphabets or interleaved on them all.
Consider:
a
------------------------------------------
| |
| |
--------- ---------
| | | |
| P | | Q |
| | | |
--------- ---------
| |
| |
------------------------------------------
b
We have two processes, each engaging in the same two events.
In the original CSP, we could combine them so that they synchronised
on "a" and "b", or interleaved on them. In modern CSP, we have
more flexibility. For example, we can arrange for them to synchronise
on "a" but interleave on "b":
P [| {a, b} |] Q -- sync on both
P [| {a} |] Q -- sync on 'a', interleave on 'b'
P [| {} |] Q -- interleave on both (P ||| Q)
To get this flexibility when alphabets are associated with processes,
the events in those processes need to have been marked as synchronising
events (the default) or shared events (for interleaving).
This can be done through qualifiers on process event parameters or,
if its events are free (i.e. declared global to the process) through
the property descending through the block structure. Either way (and
the former is a better way), the alphabets of the processes are both
{a, b} and the parallel composition is just (P || Q). The declared
natures of the events determine whether the processes synchronise or
interleave on them.
Modern CSP has further flexibility. Since determination of sync or
interleave is postponed until parallel composition, the same
process definition is used for whatever scenario is chosen. With
the above proposal, different process definitions (with different
event qualifiers) are needed for the different scenarios. But that's
where we feel such decisions should be declared – in the process.
It declares how the process should be used, enables less costly
run-time support and avoids the tyranny of having to keep declaring
alphabets (that can get very complex) for each stage of system build.
With occam-pi, the same flexibility is achieved through
the automatic association of alphabet and process. This requires
specifying not just the end of the channel used, but whether it
is to be "SHARED" with other processes – i.e. interleaving.
It should be noted that interleaving on events is something more
difficult and (run-time) costly to implement than synchronisation.
The maths may be simpler but the practice is not. Flagging that
a process is willing to share (interleave) its engagement on events
with other processes is not only useful semantic information (for
anyone looking to use that process), but it also enables the
extra run-time costs for interleaving to be incurred only for those
specified events / channel-ends.
From the perspective of the process designer, specifying such
interleaving or synchronising use is a natural part of the required behaviour.
Deferring that decision to the user of the component – in the
alphabets of the parallel operator with which processes are
combined – can be argued to give unnecessary responsibility to
the user, who will make mistakes. The tyranny referred
to earlier is that a decision on synchronisation alphabets must
be made every time a component is used.
In occam, we don't do that
and the following equivalences (associativity) always hold:
PAR = PAR = PAR
... A PAR ... A
PAR ... A ... B
... B ... B ... C
... C ... C
and, of course, everything commutes. In modern CSP, this is not
so pretty – as seen earlier.
Example: the Sieve of Eratosthenes (in standard CSP-M) ...
Generating prime numbers through the Sieve of Eratosthenes is a small,
but nice, example of parallel (non-tail) recursion.
As such, it provides a useful illustration of some of the issues discussed so far.
We do the modern CSP version first.
A key component of the sieve is a filter process that just removes
exact multiples of a given number from a stream of numbers:
filter (n, in, out) = -- inferred types and bi-directional channels
in ? x -> -- name declaration and binding
(
(if (x%n) == 0 then SKIP else out ! x -> SKIP);
filter (n, in, out)
)
Now, we can write the sieve.
The first number it gets is a prime; it outputs this and, then,
pipes an instance of filter (of multiples of that prime)
into a recursive instance of itself.
The coding is complicated by the need to keep track of recursion level
to ensure that a different global channel is used for each level of the pipe:
channel c : Int.Int -- global array of channels (carrying Ints)
-- we need a different channel for each level of recursion (in sieve') ...
-- we use the above channels in, and only in, sieve' ...
sieve (in, out) = sieve' (0, in, out) -- record recursion level 0
sieve' (n, in, out) =
in ? x -> out ! x -> -- first number received is prime, so output it
(
(
filter (x, in, c.n) -- filter multiples of that prime
[| {| c.n |} |]
sieve' (n+1, c.n, out) -- record increase in level of recursion
) \ {| c.n |}
)
Now, we just need a numbers generator with which to pour numbers into the sieve:
numbers (n, out) = out ! n -> numbers (n+1, out)
And we are done ... bar the trouble of declaring a global channel
to connect numbers to the sieve, and then hiding it:
channel d : Int -- global channel: only use in eratosthenes
eratosthenes (out) =
(
numbers (2, d) [| {| d |} |] sieve (d, out)
) \ {| d |}
Example: the Sieve of Eratosthenes (with revised CSP syntax) ...
The changes to the filter process replace type inference with explicit
typing (including channel directions) and implicit name declaration and binding
(via channel input) with explicit name declaration and binding (via channel input):
filter (int n, chan int in?, out!) = -- explicit types and channel-ends (i.e. uni-directional channels)
int x ? in: -- name declaration and binding
(
if (x%n) = 0 then SKIP else out ! x;
filter (n, in?, out!)
)
The sieve process uses a local channel to connect its sub-processes;
so it no longer needs to keep track of recursion level to ensure each level uses
a different (global) channel:
sieve (chan int in?, out!) =
int x ? in:
(
out ! x; -- first number received is prime, so output it
chan int c:
filter (x, in?, c!) || sieve (c?, out!)
)
The (parametrised) synchronisation alphabet of sieve is {in?, out!}, from its header
and non-use of any global events.
The (partially parametrised) synchronisation alphabets of the two processes in the above
parallel composition are trivial to deduce as {in?, c!} and {c?, out!};
they do not need to be spelt out here.
And finally:
numbers (int n, chan int out!) = out ! n; numbers (n+1, out!)
eratosthenes (chan int out!) =
chan int c:
numbers (2, c!) || sieve (c?, out!)
Well, almost finally ...
We are still somewhat plagued by either the tyranny of parenthesis overload
or the tyranny of operator precedence (the rules for which cannot be
remembered when there are so many operators).
This could be relieved by replacing the sequence (";") and
parallel ("||") infix operators with prefix ones (say
"seq" and "par") and letting indented layout determine scope.
For example:
sieve (chan int in?, out!) =
int x ? in:
seq
out ! x -- first number received is prime, so output it
chan int c:
par
filter (x, in?, c!)
sieve (c?, out!)
And maybe bring these process declarations into line with all other
declarations ...
proc sieve (chan int in?, out!)
int x ? in:
seq
out ! x
chan int c:
par
filter (x, in?, c!)
sieve (c?, out!)
:
And maybe capitalise keywords ... no, better not go there ... !
Upside Down Semantics ...
Then, there are the semantics of CSP. For some reason, many CSP
people seem to like to start from specifications that allow more
behaviour than is needed and, then, gradually cut that down to
an implementation by refinement. Everything refines CHAOS, but
that seems a silly place from which to start. This just seems
upside down.
A more usual way to construct systems is to build them up,
incrementally adding behaviour till the specification is reached
– not cutting away behaviour! Old-school semantics for a recursively
defined process:
X = F(X)
would start from bottom – the process that does nothing, not even
terminate (i.e. STOP) – and say that the semantics of "X" is the
limit of the sequence of processes:
STOP, F(STOP), F(F(STOP)), F(F(F(STOP))), ...
as the number of "F"s tends to infinity. For example:
Clock = tock; Clock -- using the new CSP syntax, :)
is the limit of the sequence of processes:
STOP
tock; STOP
tock; tock; STOP
tock; tock; tock; STOP
tock; tock; tock; tock; STOP
...
which seems very natural. The semantics needs to be continuous
across these limits – e.g. if the semantics of P and Q are the
limits of:
STOP, P1, P2, P3, ...
STOP, Q1, Q2, Q3, ...
then the semantics of (P || Q) need to be the limits of:
(STOP || STOP), (P1 || Q1), (P2 || Q2), (P3 || Q3), ...
Of course, all these limits need to be taken in an appropriate
mathematical domain (e.g. an inverse limit of lattices).
Action
Discuss the merits of the above. Formalise it (if worthy) into
a new syntax for CSP. Produce useful examples comparing the
current syntax with the proposed. Consider the semantic issues.
Propose further work.
|