--{{{ READ.ME
IMPORTANT NOTE
This document was intended just as documentation for the SEMAPHORE,
RESOURCE, EVENT and BUCKET process synchronisation primitives that
were released in the KRoC 0.8beta occam system. Motivation for the
primitives was added, together with numerous examples of their use,
so that it has now grown into a draft paper (that will eventually
be submitted somewhere for publication). It is released now since
documentation for the new primitives is needed. Please copy freely
but please respect the copyright.
KRoC releases may be found on the Internet Parallel Computing Archive:
Many thanks,
Peter Welch.
(5th. December, 1996)
--}}}
--{{{ Title
SEMAPHOREs, RESOURCEs, EVENTs and BUCKETs
(formerly: Higher Levels of Process Synchronisation in occam)
--}}}
--{{{ Authors
P.H.Welch and D.C.Wood
Computing Laboratory
The University
Canterbury
KENT -- CT2 7NF
ENGLAND
P.H.Welch@ukc.ac.uk and D.C.Wood@ukc.ac.uk
Copyright (1996) P.H.Welch and D.C.Wood.
--}}}
--{{{ Introduction
This document details the extra synchronisation primitives introduced in the
KRoC 0.8beta release of occam for SPARC (SunOS/Solaris) and Alpha (OSF/1) UNIX
workstations [1, 2].
The new primitives are experimental -- we would like feedback on their use.
They are designed to support higher-level mechanisms of SHARING between
parallel processes that give us greater powers of expression. They will
also let greater levels of concurrency be safely exploited from future
parallel architectures, such as those providing (virtual) shared-memory.
They demonstrate that occam is neutral in any debate between the merits of
message-passing versus shared-memory parallelism, enabling applications to
take advantage of whichever paradigm (or mixture of paradigms) is the most
appropriate.
The new primitives could be (but are not) implemented in terms of traditional
channels, but only at the expense of increased complexity and computational
overhead. The primitives are immediately useful even for uni-processors --
for example, the cost of a `fair' ALT can be reduced from O(n) to O(1).
In fact, all the operations associated with new primitives have constant
space and time complexities; and the constants are very low.
Direct use of these primitives enables the user to misuse them. They must be
used in the ways prescribed below else their semantics become unpredictable.
No tool is provided to check correct usage at this level.
The intention is to bind those primitives found to be useful into higher
level versions of occam. Some of the primitives (e.g. SEMAPHOREs) may never
themselves be made visible in the language, but may be used to implement
bindings of higher-level paradigms (such as SHARED channels and BLACKBOARDs).
The compiler will perform the relevant usage checking on all new language
bindings, closing the security loopholes opened by raw use of the primitives.
--}}}
--{{{ Channels are not enough ...
An occam channel is a primitive combining communication and synchronisation.
As a synchronisation primitive, it applies to two processes at a time. Some
applications require many processes to synchronise before any can continue --
for example, the barrier synchronisations used by common shared-memory parallel
algorithms.
Multi-way synchronisation is a fundamental idea in CSP, but is not implemented
in occam. The computational arrangements for allowing any of the synchronising
processes to back off (which CSP allows) is even more costly than allowing
both parties to back off during channel synchronisation. However, just as
allowing only the receiver to back off an offer to communicate enabled an
efficient channel implementation in occam, a similarly drastic rule -- allowing
no parties to back off a multi-way synchronisation -- makes possible an
efficient implementation of the (CSP) EVENT. Does such a restriction still
leave a useful primitive? Just as for occam channels, the answer seems to be
yes, but we would like to hear from potential users.
Another way of looking at channels is that they provide a peg on which to hang
a blocked process. If we have lots of processes we wish to suspend for some
common reason (e.g. they are waiting on a common event or for some shared
resource, access to which is restricted by some rules), we either have to have
lots of channels on which to hang them (and, later, organise their release)
or we put them on a timer queue. Neither of these may be convenient or
computationally light.
What are needed are different kinds of peg on which we may hang arbitrary
numbers of processes ... plus the ability to retrieve them one at a time
(SEMAPHOREs and RESOURCEs) ... or all at once (EVENTs and BUCKETs) ... or
some other way ...
--}}}
--{{{ Abstract Data Types
Each new primitive is presented as an Abstract Data Type. Each is implemented
as an occam2.1 DATA TYPE, together with a set of operations defined through
INLINEd occam2.1 PROCs and FUNCTIONs. Full source code is provided for each
primitive in a separate #INCLUDE file.
Although users have visibility of the data structures used for each primitive,
advantage must *not* be taken of this visibility. Components of the data
structures must *not* be accessed directly by user programs. Instances of
the primitives may *only* be operated on by calling the PROCs and FUNCTIONs
provided.
--}}}
--{{{ SEMAPHOREs
--{{{ SEMAPHORE Abstract Data Type (documentation)
These implement classic counting semaphores:
DATA TYPE SEMAPHORE
Users may declare their own SEMAPHORE variables and pass them as reference
parameters. One SEMAPHORE should be declared to control access to each
shared resource (which could be a data or channel structure). SEMAPHOREs
must not be duplicated by assignment or communication through channels.
PROC initialise.semaphore (SEMAPHORE s, VAL INT count)
Each SEMAPHORE must be initialised with this routine before it is used.
The `count' value is the number of processes allowed simultaneous access
to the shared resource. For exclusive access, set this to 1.
PROC claim.semaphore (SEMAPHORE s)
Before accessing the shared resource, a process must call this routine
to claim the associated SEMAPHORE. If there are less than `count' (where
`count' is the value with which the SEMAPHORE was initialised) processes
using the shared resource, this process will be allowed through -- i.e.
the call will return immediately. Otherwise, the process will be blocked
and put on the queue of processes associated with the SEMAPHORE.
PROC release.semaphore (SEMAPHORE s)
When a process has finished with the shared resource, it must call this
routine to register its release of the associated SEMAPHORE. If there are
processes waiting to claim that SEMAPHORE, the first process on that queue
is re-scheduled -- i.e. allowed through to use the resource.
--}}}
--{{{ Normal pattern of use
So, the normal pattern of use is:
... thing declaration (where thing is to be SHARED by many processes)
#PRAGMA SHARED thing -- suppress parallel usage checking
SEMAPHORE thing.s:
#PRAGMA SHARED thing.s -- suppress parallel usage checking
SEQ
initialise.semaphore (thing.s, 1) -- for exclusive access (for example)
PAR
... process using thing
... another process using thing
... another process using thing
... another process using thing
... etc.
Within each `process using thing', each use must be protected within a `claim'
and `release':
SEQ
claim.semaphore (thing.s)
... now use thing
release.semaphore (thing.s)
Footnote: in the literature, `claim' is sometimes referred to as `wait' (or
`P') and `release' is sometimes called `signal' (or `V').
--}}}
--{{{ occam3 SHARED channels (via SEMAPHOREs)
The main motivation for implementing SEMAPHOREs is to support the occam3
SHARED channel [3]. This is a language construct to describe client-server
applications, where multiple clients compete for exclusive access to a single
server. In occam2, this has to be implemented through an array of channels
(or channel-pairs for two-way interaction) over which the server performs a
"fair" ALT. The problems with this are:
o the array of channels has to be declared and made visible to the server,
which means that the number of clients has to be known at the point
where the server is installed;
o the computational complexity of the server ALT is O(n), where n is
the number of clients. For not very large n, especially in a hard
real-time application, this can become prohibitive.
On the other hand, with the SEMAPHORE implementation of a SHARED channel:
o there is a fixed-sized space overhead (3 words), regardless of the
number of clients;
o the computational complexity of setting up and closing down each
client-server transaction is O(1) -- i.e. independent of the number
of clients and the same order of magnitude as an ordinary context
switch (sub-microsecond);
o the server does not know that the client end is SHARED -- it sees
an ordinary channel (or channel-pair). This means that a server may
ALT over a set of SHARED (or ordinary) channels using normal mechanisms.
An occam3 SHARED channel (or channel-pair) connects any number of client
processes with a server process. To use the SHARED channel, the client
process must first claim it:
CLAIM c
... use any of the channels within c
occam3 has a CHAN TYPE structure that allows us to group a collection of
channels (each with differing PROTOCOLs and directions of use) as fields
in a record. So, a typical transaction might look like:
CLAIM c
--{{{ use any of the channels within c
SEQ
c[request] ! some.request
c[reply] ? some.reply
... follow-up questions and answers
--}}}
Note that any attempted use of `c' outside a CLAIM body would be jumped on
by the compiler. occam3 also forbids any synchronisation attempts inside
the CLAIM body other than those involving `c'. In particular, a process is
not allowed to accumulate resources through nested CLAIMs (which eliminates
the classic danger of deadlock through partially acquired resources).
The occam3 declaration of the SHARED channel looks like:
CHAN TYPE CONNECT -- CONNECT is the user-chosen name for this channel type
CHAN OF REQUEST request:
CHAN OF REPLY reply:
:
SHARED CONNECT c:
In occam2.1, the channel components need to be declared separately, together
with a controlling semaphore:
CHAN OF REQUEST c.request:
#PRAGMA SHARED c.request -- suppress parallel usage checking
CHAN OF REPLY c.reply:
#PRAGMA SHARED c.reply -- suppress parallel usage checking
SEMAPHORE c.s:
#PRAGMA SHARED c.s -- suppress parallel usage checking
The client transaction becomes:
SEQ
claim.semaphore (c.s)
--{{{ use any of the channels within c
SEQ
c.request ! some.request
c.reply ? some.reply
... follow-up questions and answers
--}}}
release.semaphore (c.s)
At the server end, occam3 establishes the client connection with an explicit:
GRANT c
... use any of the channels within c
As for the clients, the server is not allowed to use `c' outside a GRANT body
and any attempt would be disallowed by the compiler. However, servers *are*
allowed to make further synchronisations (e.g. CLAIMs or other GRANTs) within
a GRANT body.
In this example, a transaction matching the client CLAIM might be:
GRANT c
--{{{ use any of the channels within c
... local declarations
SEQ
c[request] ? some.request
... compute the correct response
c[reply] ! some.reply
... follow-up questions and answers
--}}}
The occam2.1 implementation for this GRANT is null. It simply maps to:
--{{{ use any of the channels within c
... local declarations
SEQ
c.request ? some.request
... compute the correct response
c.reply ! some.reply
... follow-up questions and answers
--}}}
Note that, provided each CLAIM opens with a communication to the server,
ALTing between the SHARED channel and any other ALT guard (SHARED or not)
is immediately possible by the server. If the transaction opens with a
communication in the other direction, a dummy `request' will need to be
added to allow the server to ALT.
Finally, some transaction bodies may contain no communications at all! For
example, the server may be a SHARED signal-handler (where the signal is raised
by a client simply making a claim with a SKIP body). In this case again,
a dummy `request' will need to be added to synchronise the client with the
server.
--}}}
--{{{ Dining Philosophers (via SEMAPHOREs)
Sometimes, SEMAPHOREs can be used to represent objects in their own right.
For example, the forks in Dijkstra's classic Dining Philosophers system
are simply binary SEMAPHOREs shared by the two philosophers whose place
settings are on either side of the fork. In classic occam, they are simply
modelled by a fork process such as:
PROC fork (CHAN OF BOOL left, right)
--{{{
WHILE TRUE
ALT -- should be a `fair' ALT
BOOL any:
left ? any -- philosopher left picks up fork
left ? any -- philosopher left puts down fork
right ? any -- philosopher right picks up fork
right ? any -- philosopher right puts down fork
--}}}
:
Similarly, the security guard (or butler), who only allows into the dining
room up to four philosophers at a time, is a counting semaphore initialised
to four. In classic occam, this is modelled:
PROC security ([5]CHAN OF BOOL down, up)
--{{{
VAL BYTE max IS 4:
INITIAL BYTE n.sat.down IS 0:
WHILE TRUE
ALT i = 0 FOR 5 -- should be a `fair' ALT
ALT
--{{{ philosopher i wants to sit down
BOOL any:
(n.sat.down < max) & down[i] ? any -- don't allow more than max at a time
n.sat.down := n.sat.down + 1
--}}}
--{{{ philosopher i wants to stand up
BOOL any:
up[i] ? any -- always allow this
n.sat.down := n.sat.down - 1
--}}}
--}}}
:
A philosopher interacts with two forks and the security guard:
PROC philosopher (CHAN OF BOOL left, right, -- forks
CHAN OF BOOL down, up) -- security guard
--{{{
WHILE TRUE
SEQ
... think-a-while
--{{{ get past the security guard
down ! TRUE
--}}}
--{{{ pick up the forks
PAR
left ! TRUE -- pick up left fork
right ! TRUE -- pick up right fork
--}}}
... eat-a-while
--{{{ put down the forks
PAR
left ! TRUE -- put down left fork (no wait)
right ! TRUE -- put down right fork (no wait)
--}}}
--{{{ notify security you have finished
up ! TRUE
--}}}
--}}}
:
The college consists of 5 philosophers, 5 forks and the security guard:
PROC college ()
--{{{
[5]CHAN OF BOOL left, right, down, up:
PAR
security (down, up)
PAR i = 0 FOR 5
PAR
philosopher (left[i], right[i], down[i], up[i])
fork (left[i], right[(i + 1)\5])
--}}}
:
With real SEMAPHOREs, there is no need for the fork and security processes.
The college becomes:
PROC college ()
--{{{
[5]SEMAPHORE fork:
#PRAGMA SHARED fork -- suppress parallel usage checking
SEMAPHORE security:
#PRAGMA SHARED security -- suppress parallel usage checking
SEQ
--{{{ initialise semaphores
SEQ
SEQ i = 0 FOR 5
initialise.semaphore (fork[i], 1) -- forks are used exclusively
initialise.semaphore (security, 4) -- security allows four at a time
--}}}
PAR i = 0 FOR 5
philosopher (fork[i], fork[(i + 1)\5], security)
--}}}
:
where the philosopher still interacts with two forks and the security guard:
PROC philosopher (SEMAPHORE left, right, -- forks
SEMAPHORE security) -- security guard
--{{{
WHILE TRUE
SEQ
... think-a-while
--{{{ get past the security guard
claim.semaphore (security)
--}}}
--{{{ pick up the forks
PAR
claim.semaphore (left) -- pick up left fork
claim.semaphore (right) -- pick up right fork
--}}}
... eat-a-while
--{{{ put down the forks
PAR
release.semaphore (left) -- put down left fork (no wait)
release.semaphore (right) -- put down right fork (no wait)
--}}}
--{{{ notify security you have finished
release.semaphore (security)
--}}}
--}}}
:
The SEMAPHORE implementations of the forks and security guard give us `fair'
sharing -- i.e. no philosopher can get locked out indefinitely by un-thinking
neighbours racing back to the dining room and grabbing forks. The SEMAPHORE
implementations don't need programming; they just need initialising! They
give us more functionality and execute far faster than the original processes.
However, their use in the above has nothing to do with occam3 SHARED channels,
so such application requires care.
--}}}
--{{{ Instrumenting parallel systems (via SEMAPHOREs)
One common problem solved by SHARED channels is multiplexing data streams
to single devices. For example, when animating the behaviour of a network
of processes (for diagnostic or demonstration purposes), we want to print
information to some display or file. Writing, installing and wiring up the
necessary multiplexor to route the information coming from all the processes
under inspection can be daunting ... we can't just put in print statements!
Or, at least, that used to be the case!! By making, for example, the screen
channel SHARED, we *can* just-put-in-print-statements and we can do it within
any number of parallel processes and have full control over the atomicity of
any particular message.
The dining philosophers' college (from the previous section) will compile and
run without deadlock, but is somewhat unexciting to watch -- all the action is
internal and we can't see it. The following modification connects the college
to the (KRoC) screen (= UNIX stdout) and shares this channel amongst all its
internal processes (15 in all, of which the number active simultaneously varies
dynamically between 5 and 10). It is wrapped in a complete (KRoC) occam2.1
program:
#INCLUDE "semaphore.inc"
#USE "utils" -- in the course directory of the KRoC release
PROC dining.philosophers (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ PROC philosopher (VAL INT id,
PROC philosopher (VAL INT id,
SEMAPHORE left, right, -- forks
SEMAPHORE security, -- security guard
CHAN OF BYTE screen, SEMAPHORE screen.s) -- shared screen
--{{{
#PRAGMA SHARED screen
#PRAGMA SHARED screen.s
VAL INT field.width IS 14:
WHILE TRUE
SEQ
--{{{ think-a-while
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" thinking*n", 0, screen)
release.semaphore (screen.s)
--}}}
--{{{ get past the security guard
SEQ
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" lemme in!!!*n", 0, screen)
release.semaphore (screen.s)
--}}}
claim.semaphore (security)
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" I*'m in ... *n", 0, screen)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" gimme my forks!!!*n", 0, screen)
release.semaphore (screen.s)
--}}}
--}}}
--{{{ pick up the forks
PAR
--{{{ pick up left fork
SEQ
claim.semaphore (left)
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" got left fork*n", 0, screen)
release.semaphore (screen.s)
--}}}
--}}}
--{{{ pick up right fork
SEQ
claim.semaphore (right)
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" got right fork*n", 0, screen)
release.semaphore (screen.s)
--}}}
--}}}
--}}}
--{{{ eat-a-while
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" eating*n", 0, screen)
release.semaphore (screen.s)
--}}}
--{{{ put down the forks
PAR
--{{{ put down left fork (no wait)
SEQ
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" done with left fork*n", 0, screen)
release.semaphore (screen.s)
--}}}
release.semaphore (left)
--}}}
--{{{ put down right fork (no wait)
SEQ
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" done with right fork*n", 0, screen)
release.semaphore (screen.s)
--}}}
release.semaphore (right)
--}}}
--}}}
--{{{ notify security you have finished
SEQ
--{{{ trace
SEQ
claim.semaphore (screen.s)
out.string ("#", field.width*id, screen)
out.number (id, 0, screen)
out.string (" lemme out !!!*n", 0, screen)
release.semaphore (screen.s)
--}}}
release.semaphore (security)
--}}}
--}}}
:
--}}}
--{{{ PROC college (CHAN OF BYTE screen)
PROC college (CHAN OF BYTE screen)
--{{{
#PRAGMA SHARED screen
SEMAPHORE screen.s:
#PRAGMA SHARED screen.s
[5]SEMAPHORE fork:
#PRAGMA SHARED fork
SEMAPHORE security:
#PRAGMA SHARED security
SEQ
--{{{ initialise semaphores
SEQ
SEQ i = 0 FOR 5
initialise.semaphore (fork[i], 1) -- forks are used exclusively
initialise.semaphore (security, 4) -- security allows four at a time
initialise.semaphore (screen.s, 1) -- screen is used exclusively
--}}}
PAR i = 0 FOR 5
philosopher (i, fork[i], fork[(i + 1)\5], security, screen, screen.s)
--}}}
:
--}}}
college (screen)
--}}}
:
This is not the most interesting animation of the dining philosophers life,
but it illustrates how easy it has become to instrument parallel processes
so that they can report their state. A more exciting animation is on:
This was designed by a second year undergraduate at Kent. The system contains
some 52 processes (42 to 47 simultaneously active), with 25 drawing on the
screen via a single SHARED channel.
--}}}
--}}}
--{{{ RESOURCEs
--{{{ RESOURCEs and SHARED channels
RESOURCEs and RESOURCE.CHANNELs are concepts designed into the instruction
set of the T9000 transputer. They were, probably, intended just to implement
occam3 SHARED channels and we only consider them for that purpose here.
Their advantage over SEMAPHOREs is that they directly yield a distributed
implementation for a SHARED channel over a non-shared-memory MIMD parallel
computer (such as a transputer network or cluster of workstations). Their
disadvantage is that they are more complex to understand and use. A good
description may be found in chapter 2 of [4].
[Note: there is a way to implement a distributed SHARED channel that just uses
SEMAPHOREs and holds waiting clients on a distributed `weak-FIFO' queue. This
will be described in a separate report.]
Anyway, for this method of implementing a SHARED channel (or channel-pair),
we need:
o an array of channels (or channel-pairs): a separate element connecting
each client to the server;
o an array of RESOURCE.NODEs: one for each client;
o a RESOURCE data structure.
Note that the first item above is just what we use for the standard occam2
method -- communications over the SHARED channel travel courtesy of ordinary
channels that are not physically shared at all!
Note also that in the T9000 implementation, the channel and RESOURCE.NODE
arrays are combined into a single `RESOURCE.CHANNEL' array. We have separated
them (for this prototype implementation) because occam2.1 does not directly
support the mixing of channel fields with data fields in a single structure.
There are other differences between this and the T9000 implementation that
will be explained later.
--}}}
--{{{ RESOURCE Abstract Data Type (documentation)
The current KRoC release implements:
DATA TYPE RESOURCE
Users may declare their own RESOURCE variables and pass them as reference
parameters. One RESOURCE should be declared to control access to each
SHARED channel. RESOURCE data-structures must not be duplicated by
assignment or communication through channels.
Only the server process for the shared resource sees the RESOURCE.
A RESOURCE is an occam2.1 RECORD with one public field [channel.id].
This must only be read by the server, immediately after executing a GRANT.
[This public field breaks the Abstract Data Type rules and will be removed
in the next release.]
DATA TYPE RESOURCE.NODE
Users may declare their own RESOURCE.NODE variables and pass them as
parameters. One RESOURCE.NODE should be declared for each client that
wishes to access each SHARED channel. RESOURCE.NODEs must not be
duplicated by assignment or communication through channels.
Each client process for the shared resource has one RESOURCE.NODE.
PROC initialise.resource (RESOURCE resource)
The RESOURCE structure used by the server for the SHARED channel must
be initialised with this routine.
PROC mark.resource.node (RESOURCE.NODE node, VAL INT chan.id, RESOURCE resource)
Each client must have a channel (or channel-pair) connection to the server,
as well as a RESOURCE.NODE. Each RESOURCE.NODE must be initialised using
this routine to associate it with this channel (or channel-pair), as well
as with the main RESOURCE data-structure held by the server. The `chan.id'
value is either an index into the channel array implementing the SHARED
channel or is a direct pointer to the corresponding channel.
This routine is not normally called directly. Instead, one of the two
following routines would be called to initialise *all* the RESOURCE.NODEs.
PROC mark.resource.indices ([]RESOURCE.NODE node, RESOURCE resource)
This routine calls `mark.resource.node' to initialise all the elements
of the RESOURCE.NODE array, using each element's index for the `chan.id'.
It assumes there is a matching array of channels (or channel-pairs) and
that each RESOURCE.NODE is to be associated with the corresponding element
of that array. This will be the usual case.
[This routine is an alternative to `mark.resource.pointers'.]
PROC mark.resource.pointers ([]RESOURCE.NODE node, VAL []INT pointer,
RESOURCE resource)
This routine calls `mark.resource.node' to initialise all the elements
of the RESOURCE.NODE array. It uses the corresponding element from the
`pointer' array for the `chan.id' associated with each RESOURCE.NODE.
The `pointer' array must, of course, be set to the addresses of the
channels we wish to associate with the corresponding RESOURCE.NODEs.
The channel addresses can be found through RETYPEs to VAL INTs.
[This routine is an alternative to `mark.resource.indices'.]
PROC claim.resource (RESOURCE.NODE node)
A client process must claim its RESOURCE.NODE before using its channel
(or channel-pair) to transact business over the SHARED channel. The
client will be queued (behind other clients) until the server can deal
with it.
PROC grant.resource (RESOURCE resource)
The server must grant its RESOURCE before it can transact business over
the SHARED channel. The server will be blocked until at least one client
is waiting. The channel identifier of the first waiting client will be
found in `resource[channel.id]' when the server is re-scheduled and this
procedure terminates.
[Note: from KRoC 0.9beta onwards, this routine will have an extra INT
parameter, passed by reference, into which this channel identifier will
be written. This will maintain a proper Abstract Data Type interface
for the RESOURCE.]
--}}}
--{{{ Implementation of CLAIM and GRANT
The earlier occam3 example of a client transaction over a SHARED channel was:
CLAIM c
--{{{ use any of the channels within c
SEQ
c[request] ! some.request
c[reply] ? some.reply
... follow-up questions and answers
--}}}
In occam2.1, the client will have its (unshared) pair of channels, `c.request'
and `c.reply', together with its own RESOURCE.NODE `c.rn'. Then, the above
translates into:
SEQ
claim.resource (c.rn)
--{{{ use any of the channels within c
SEQ
c.request ! some.request
c.reply ? some.reply
... follow-up questions and answers
--}}}
The matching server transaction in occam3 looks like:
GRANT c
--{{{ use any of the channels within c
... local declarations
SEQ
c[request] ? some.request
... compute the correct response
c[reply] ! some.reply
... follow-up questions and answers
--}}}
In occam2.1, the server sees an array of `c.request' and `c.reply' channels
plus the RESOURCE `c.r' (managing the SHARED channel-pair). The above code
becomes:
SEQ
grant.resource (c.r)
--{{{ use any of the channels within c
CHAN OF REQUEST client.request IS c.request[c.r[channel.id]]:
CHAN OF REQUEST client.reply IS c.reply[c.r[channel.id]]:
... local declarations
SEQ
client.request ? some.request
... compute the correct response
client.reply ! some.reply
... follow-up questions and answers
--}}}
--}}}
--{{{ Example client-server application
The following example is a complete (KRoC) occam2.1 program demonstrating the
RESOURCE method for implementing a SHARED channel connection.
There are ten clients and one server. A client-server transaction is the
transmission of one line of text over a BYTE stream. Clients are numbered
1 through 10. Client `n' tries to send its message every `n' seconds and
all clients start (approximately) together. On second 10, clients 1, 2, 5
and 10 compete for the channel. For prime numbered seconds above 10, only
client 1 is active. On second 30240 (5 hours, 24 minutes), all clients are
active ... if you want to wait to see them:
#INCLUDE "resource.inc"
#USE "utils"
PROC resource.test (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ client
PROC client (VAL INT n, TIMER tim, VAL INT start.time,
RESOURCE.NODE out.rn, CHAN OF BYTE out)
--{{{
VAL INT seconds IS 1000000:
INT real.time, logical.time:
SEQ
logical.time, real.time := 0, start.time
WHILE TRUE
SEQ
--{{{ wait for next time slot (one every n seconds)
SEQ
logical.time := logical.time PLUS n
real.time := real.time PLUS (n*seconds)
tim ? AFTER real.time
--}}}
claim.resource (out.rn)
--{{{ transact with server
SEQ
out.number (logical.time, 0, out)
out.string (": Hello resource world from ", 0, out)
out.number (n, 0, out)
out.string ("*c*n", 0, out)
--}}}
--}}}
:
--}}}
--{{{ server
PROC server (RESOURCE in.r, []CHAN OF BYTE in, CHAN OF BYTE screen)
--{{{
WHILE TRUE
SEQ
grant.resource (in.r)
--{{{ transact with client
CHAN OF BYTE client IS in[in.r[channel.id]]:
BYTE ch:
SEQ
ch := 0
WHILE ch <> '*n'
SEQ
client ? ch
screen ! ch
--}}}
--}}}
:
--}}}
VAL INT n.clients IS 10:
--{{{ SHARED CHAN OF BYTE c
RESOURCE c.r:
[n.clients]RESOURCE.NODE c.rn:
[n.clients]CHAN OF BYTE c:
--}}}
--{{{ SHARED TIMER tim
TIMER tim:
INT start.time:
--}}}
SEQ
--{{{ initialise
SEQ
initialise.resource (c.r)
mark.resource.indices (c.rn, c.r)
tim ? start.time
--}}}
--{{{ client-server network
PAR
--{{{ clients
PAR n = 0 FOR n.clients
client (n + 1, tim, start.time, c.rn[n], c[n])
--}}}
--{{{ server
server (c.r, c, screen)
--}}}
--}}}
--}}}
:
--}}}
--{{{ Differences between this prototype and T9000 RESOURCEs
In this prototype implementation of SHARED channels via RESOURCEs, both sides
have to be aware of the sharing. The clients have to make a CLAIM on their
RESOURCE.NODEs and the server has to GRANT its RESOURCE. This is in line
with the occam3 model for a SHARED channel.
However, the T9000 implementation is crucially different. Only the server
is aware of the sharing and it's this that enables the direct implementation
of a distributed shared channel.
As previously mentioned, the T9000 combines the channel and RESOURCE.NODE
arrays into a single RESOURCE.CHANNEL array. Its `mark.resource.channel'
operation (which includes the `mark.resource.node' initialisation) also marks
the channel -- it changes the normal `MOSTNEG INT' flag, which signifies an
empty channel, into `(MOSTNEG INT) + 2'.
Now, the client process need only output to its channel to start its server
transaction. The T9000 microcode implementing that output detects the special
flag in the channel and calls the `claim.resource' operation on the adjacent
RESOURCE.NODE automatically. KRoC could do the same, but its kernel (that
implements channel communication) would have to be modified and we didn't want
to do that for this prototype.
For a distributed SHARED channel, client processes on remote processors see
only ordinary (virtual) channels and use them taking no special precaution
(other than starting each transaction with an output). On the processor
containing the server process, the incoming (virtual) channels are implemented
by the array of RESOURCE.CHANNELs. The server processor also contains the
RESOURCE data structure that manages the queue of waiting RESOURCE.CHANNELs
as clients try to output to them. After the server completes a GRANT, it
communicates over the selected (virtual) channel connection(s) to the client
in the normal way.
--}}}
--{{{ RESOURCEs versus SEMAPHOREs for SHARED channels
There are four disadvantages with this RESOURCE mechanism for SHARED channels:
o At the end of a transaction, the server must re-mark the client channel
used to open transactions with the special flag indicating that it is
a RESOURCE.CHANNEL. This means that the client must synchronise with
the server one more time at the end of each transaction -- the client
cannot be released until the server has re-marked its channel.
This final synchronisation is not needed for this prototype implementation
of SHARED channels via RESOURCEs; nor is it needed for the SEMAPHORE
implementation ... nor for the distributed SEMAPHORE implementation.
o In the SEMAPHORE mechanism for SHARED channels, the clients are aware of
the sharing but the server is not -- this means the server can ALT between
the SHARED channel and other guards using normal mechanisms.
In the RESOURCE mechanism for SHARED channels, the clients are not aware
of the sharing but the server is -- this means the server cannot ALT
between the SHARED channel and other guards using normal mechanisms.
This means that new mechanisms are need for ALTing over SHARED channels.
In the T9000, these are implemented by `enable.grant' and `disable.grant'
instructions. These are not implemented in our prototype.
o The computational load for managing the queue of waiting clients takes
place just on the processor containing the server. In the distributed
SEMAPHORE implementation, this load is shared between the server and
all client processors. In particular, clients on separate processors
can join the distributed queue simultaneously.
o The SHARED channel isn't actually shared! We still have to define and
wire up a private channel between each client and the server.
There are two advantages with this RESOURCE mechanism for SHARED channels:
o The clients are held in a single queue and, hence, are fairly served
in a FIFO manner. In the distributed SEMAPHORE implementation, the
clients are held in a distributed `weak-FIFO' queue, which still
guarantees service ... but might not be completely `fair'.
o We haven't outlined the distributed SEMAPHORE mechanism in this document!
The current scheme involves one extra communication (of one word) between
the remote client and the server at the start of each transaction. This
matches the cost of the final synchronisation between the client and server
that is needed by the RESOURCE mechanism.
--}}}
--}}}
--{{{ EVENTs
--{{{ Barrier synchronisation
Barrier synchronisation is a common primitive in many models of parallel
computing -- in some cases, it is an essential element. In SIMD parallelism,
there is global synchronisation between all processors after each instruction.
In the slightly more flexible SPMD model, there is still just one barrier on
which all processors synchronise; however, the point at which synchronisation
takes place is application dependent and has to be programmed explicitly (and,
usually, cyclically).
For example, SPMD parallelism has the general form:
... shared global data
PAR i = 0 FOR n.processors
WHILE TRUE -- one identical serial process per processor
SEQ
... do something
SYNC -- barrier: wait for all processors to get here
occam already imposes an implicit barrier synchronisation at the end of each
PAR construct. This can be exploited to obtain the above model by moving the
external PAR inside the serial control structure:
... shared global data
WHILE TRUE
PAR i = 0 FOR n.processors
... do something
We now have a loop of parallel processes, each of which has to terminate,
instead of a parallel set of loops that have to synchronise once per cycle.
This would be disadvantageous if the start-up/shut-down overheads for
parallel processes were large in comparison to their compute times, but
this would not normally be the case for occam. A more serious problem
arises if the processors use local state that has to survive the barrier:
... shared global data
PAR i = 0 FOR n.processors
INITIAL INT x IS 0: -- local state
WHILE TRUE
SEQ
... do something
SYNC
This is, of course, a very common requirement. Bringing the parallelism
inside the loop forces the set of local states into the global data space:
... shared global data
INITIAL [n.processors]INT X IS [0 | i = 0 FOR n.processors]:
WHILE TRUE
PAR i = 0 FOR n.processors
INT x IS X[i]:
... do something
which is not very pretty and threatens unnecessary run-time overhead! It also
breaks the natural object-oriented encapsulation of local state that occam
processes normally provide.
So, we need to introduce an explicit SYNC primitive to regain simplicity.
However, occam is a MIMD parallel language and we don't want to be constrained
by SPMD thinking. In particular, we want to obtain a structured and dynamic
form of barrier synchronisation. For example, we want to allow our system
to be composed of multiple sets of processes, each set with its own local
barriers. We also want the flexibility of allowing the number of processes
synchronising on any particular barrier to grow and shrink at run-time.
To achieve this, we need to be able to name barriers and associate them with
particular sets of processes. A named barrier is simply a CSP event and its
association with a set of processes is just its inclusion in their alphabets.
Barrier synchronisation is event synchronisation, but with the restriction
that processes cannot use it as a guard in a choice operator (i.e. an ALT
guard in occam terms). There is no semantic problem in allowing the number
of processes interested in an event to change dynamically.
There is no pragmatic problem either for an extended occam:
... shared global data
PAR i = 0 FOR n.processors EVENT e
INITIAL BOOL running IS TRUE:
WHILE running
SEQ
... do something
SYNC e -- named barrier
The named EVENT is declared explicitly by the above PAR construct. This is
the only place where EVENTs can be declared. The EVENT is automatically in
the alphabet of all components of the PAR (which means that when one component
SYNCs on it, all components have to SYNC on it).
[Note that declaring items in constructors already takes place in occam -- for
example, the control value `i' in the above PAR.]
Since the EVENT is named, it can be passed as a parameter to PROCs that are
called in the body of the declaring PAR. Among other benefits, this allows
the separate compilation of processes that are later instanced to synchronise
on any EVENTs the installer chooses.
Note that processes sharing the same EVENT may terminate at different times
-- see the above example. Terminated processes do not block the barrier SYNCs
of those that are still running. An elegant application of this principle is
given later.
Since this is occam, we are not restricted to the replicated PAR of SPMD. For,
example, the following system has three different processes synchronising on
the named barrier:
... shared global data
PAR EVENT e
... process A
... process B
... process C
And the following system has different groups of processes synchronising on
different barriers:
... shared global data
PAR
PAR EVENT e
... process A
... process B
... process C
PAR EVENT f
... process P
... process Q
... process R
Finally, an EVENT synchronising process may contain parallel sub-processes.
Normal scoping rules imply that the sub-processes can see the EVENT and may,
therefore, SYNC on it. A logical policy would be to say that the number of
processes taking part in the barrier automatically grows for the duration of
those sub-processes. However, it would be more flexible to be able to specify
which sub-processes included the existing EVENT in their alphabet (and were,
therefore, obliged to SYNC if the barrier represented by the EVENT needs to
be overcome during their lifetime). There are two ways to get this: introduce
either a `hiding' operator or an `enrolling' operator into the PAR construct.
There are arguments both ways but, for now, we prefer the positive approach:
... shared global data
PAR i = 0 FOR n.processors EVENT e
IF
need.more.parallelism (i)
--{{{
PAR j = 0 FOR more ENROLL e
... inner processes can (and, probably, better had) SYNC on e
--}}}
TRUE
--{{{ as before
INITIAL BOOL running IS TRUE:
WHILE running
SEQ
... do something
SYNC e -- named barrier
--}}}
Alternatively, components of inner PARs may be enrolled individually in an
outer EVENT:
... shared global data
PAR i = 0 FOR n.processors EVENT e
IF
need.more.parallelism (i)
--{{{
PAR
ENROLL e
... process A (includes e in its alphabet)
ENROLL e
... process B (includes e in its alphabet)
... process C (does not include e in its alphabet)
--}}}
TRUE
--{{{ as before
INITIAL BOOL running IS TRUE:
WHILE running
SEQ
... do something
SYNC e -- named barrier
--}}}
Processes `A' and `B' in the above have to partake in `SYNC e', but process
`C' does not and cannot!
Explicit enrollment means that un-enrolled EVENTs are automatically hidden
from sub-components of the PAR and that any attempt to SYNC on them would be
rejected by the compiler.
Finally, we note that:
PAR ENROLL e
... process A
... process B
... process C
is equivalent to:
PAR
ENROLL e
... process A
ENROLL e
... process B
ENROLL e
... process C
--}}}
--{{{ EVENT Abstract Data Type (documentation)
The current KRoC release implements:
DATA TYPE EVENT
Users may declare their own EVENT variables and pass them as reference
parameters. They should only be declared in association with the PAR
construct that sets up the processes that synchronise on them. EVENTs
must not be duplicated by assignment or communication through channels.
PROC initialise.event (EVENT e, VAL INT count)
Each EVENT must be initialised with this routine before starting the
associated PAR construct. The `count' value is the number of processes
in that PAR.
PROC resign.event (EVENT e)
Each process in the associated PAR construct must execute this routine
just before it terminates.
PROC synchronise.event (EVENT e)
This may be called by any process in the associated PAR construct. The
calling process will be blocked until all its sibling processes (in the
PAR) have also called it or have resigned.
PROC enroll.event (EVENT e, VAL INT count)
This needs to be called before and after nested PAR constructs, whose
components are being enrolled on the EVENT. Before the PAR, the `count'
value is one less than the number of components being enrolled. After
the PAR, the `count' is one.
[Note: these routines exposed a bug in the KRoC implementation of a transputer
instruction (SAVEL) not previously exercised. This bug does not affect occam
programs unless they contain ASM blocks that generate SAVEL explicitly. This
bug has been corrected but the fix will not be available until the 0.9beta
release of KRoC. Meanwhile, routines `resign.event' and `synchronise.event',
as released in 0.8beta and 0.81beta, do not work reliably. For KRoC releases
up to 0.81beta, a temporary version of this EVENT abstract data type, that
compensates for the bad implementation of SAVEL, can be obtained from:
This temporary version must *not* be used following KRoC release 0.9beta.]
[Note: `enroll.event' is not included in the 0.8beta/0.81beta releases, but
will be in the 0.9beta release. However, its implementation is so trivial:
INLINE PROC enroll.event (EVENT e, VAL INT count)
--{{{
SEQ
e[active] := e[active] + count
e[count] := e[count] + count
--}}}
:
that we include it here for those who can't wait.]
--}}}
--{{{ Implementation of proposed barriers
Barrier synchronisation is normally intended to support physical concurrency.
The syntax and semantics with which we have been experimenting are aimed at
(virtual) shared-memory multi-processors.
The current implementation is only for uni-processors. Nevertheless, the
event synchronisation may still prove useful ...
Anyway, the following system:
... shared global data
PAR i = 0 FOR n.processors EVENT e
INITIAL BOOL running IS TRUE:
WHILE running
SEQ
... do something
SYNC e -- named barrier
is implemented as follows:
... shared global data
--{{{ SHARED EVENT e
EVENT e:
#PRAGMA SHARED e
--}}}
SEQ
initialise.event (e, n.processors)
PAR i = 0 FOR n.processors
SEQ
INITIAL BOOL running IS TRUE:
WHILE running
SEQ
... do something
synchronise.event (e)
resign.event (e)
The MIMD system:
... shared global data
PAR
PAR EVENT e
... process A
... process B
... process C
PAR EVENT f
... process P
... process Q
... process R
translates to:
... shared global data
PAR
--{{{ SHARED EVENT e
EVENT e:
#PRAGMA SHARED e
--}}}
SEQ
initialise.event (e, 3)
PAR
SEQ
... process A
resign.event (e)
SEQ
... process B
resign.event (e)
SEQ
... process C
resign.event (e)
--{{{ SHARED EVENT f
EVENT f:
#PRAGMA SHARED f
--}}}
SEQ
initialise.event (f, 3)
PAR
SEQ
... process P
resign.event (f)
SEQ
... process Q
resign.event (f)
SEQ
... process R
resign.event (f)
Nested enrollment:
PAR ENROLL e
... process A
... process B
... process C
maps to:
SEQ
enroll.event (e, 2)
PAR
SEQ
... process A
resign.event (e)
SEQ
... process B
resign.event (e)
SEQ
... process C
resign.event (e)
enroll.event (e, 1)
However, there is one loose end to be tied! The last enrolled sub-process to
resign should not really do so (as this may complete an external barrier that
is not warranted). The last resignation should be folded with the subsequent
re-enrollment and nothing should happen. With this prototype implementation,
we don't know when resigning whether we are the last resignation. With an
implementation via the run-time kernel, we will have this information and the
loose end can be tied.
--}}}
--{{{ A simple example
The following is a complete (KRoC) occam2.1 program demonstrating a simple
SPMD network of processes synchronising on an EVENT barrier. Each process
is cyclic, waiting for a variable amount of time before synchronising once
per cycle. Each process is a `client' of the SHARED `screen' channel, which
is protected by a SEMAPHORE. Each process announces to the `screen' when it
tries to synchronise and when it succeeds in synchronising.
#INCLUDE "semaphore.inc"
#INCLUDE "event.inc"
#USE "utils"
PROC event.test (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ SHARED screen
#PRAGMA SHARED screen
SEMAPHORE screen.s:
#PRAGMA SHARED screen.s
--}}}
--{{{ client
PROC client (VAL INT id, n.clients, EVENT e,
SEMAPHORE out.s, CHAN OF BYTE out)
--{{{
INT n:
SEQ
n := id
WHILE TRUE
SEQ
--{{{ wait n seconds
VAL INT seconds IS 1000000:
TIMER tim:
INT t:
SEQ
tim ? t
tim ? AFTER t PLUS (n*seconds)
--}}}
--{{{ say ready to synchronise
SEQ
claim.semaphore (out.s)
out.number (id, 0, out)
out.string (" ready to synchronise*c*n", 0, out)
release.semaphore (out.s)
--}}}
synchronise.event (e)
--{{{ tell the world
SEQ
claim.semaphore (out.s)
out.string ("==> ", 40, out)
out.number (id, 0, out)
out.string (" over the barrier ...*c*n", 0, out)
release.semaphore (out.s)
--}}}
n := (n.clients + 1) - n -- simple variation for the timeout
--}}}
:
--}}}
VAL INT n.clients IS 10:
--{{{ SHARED EVENT e
EVENT e:
#PRAGMA SHARED e
--}}}
SEQ
--{{{ initialise
SEQ
initialise.semaphore (screen.s, 1)
initialise.event (e, n.clients)
--}}}
--{{{ client network (serviced by the screen channel)
PAR n = 0 FOR n.clients
client (n + 1, n.clients, e, screen.s, screen)
--}}}
--}}}
:
It is hard to resist (and, so, we don't) turning this into its higher-level
form:
#USE "utils"
PROC event.test (CHAN OF BYTE keyboard, SHARED CHAN OF BYTE screen, error)
--{{{
--{{{ client
PROC client (VAL INT id, n.clients, EVENT e, SHARED CHAN OF BYTE out)
--{{{
INT n:
SEQ
n := id
WHILE TRUE
SEQ
... wait n seconds
--{{{ say ready to synchronise
CLAIM out
SEQ
out.number (id, 0, out)
out.string (" ready to synchronise*c*n", 0, out)
--}}}
SYNC e
--{{{ tell the world
CLAIM out
SEQ
out.string ("==> ", 40, out)
out.number (id, 0, out)
out.string (" over the barrier ...*c*n", 0, out)
--}}}
n := (n.clients + 1) - n -- simple variation for the timeout
--}}}
:
--}}}
VAL INT n.clients IS 10:
--{{{ client network (serviced by the screen channel)
PAR n = 0 FOR n.clients EVENT e
client (n + 1, n.clients, e, screen)
--}}}
--}}}
:
The point is not the modest reduction in code length but the absence of special
#PRAGMAs, explicit SEMAPHOREs and explicit SEMAPHORE and EVENT initialisation.
It is these absences, the automatic initialisation (by the compiler) of all
necessary primitives, the prevention (by the compiler) of any attempted EVENT
duplication via assignment or communication and the mandatory use of the CLAIM
mechanism for the SHARED channel (by the compiler) that makes the high-level
bindings secure and, hence, desirable.
--}}}
--{{{ A shared accumulator (and implicitly parallel recursive lazy functional occam)
occam2 has a formal denotational semantics in terms of the traces, failures
and divergences model of CSP [5, 6]. This can be extended, in a natural way,
to cover all the language extensions discussed here.
Otherwise, the state-of-the-art in parallel languages is somewhat bleak. There
are no formal semantics on offer for `lightweight' threads libraries (even
though they are becoming standardised) nor for Java threads (where, at least,
the concept is bound into the language). Indeed, it is very difficult to find
even informal descriptions -- their semantics are mainly given by example.
Reference [7] describes ParC, a language extension for C giving explicit
parallelism and synchronisation (with primitives not too far from those in
the extended occam). In a section titled `The Meaning of Parallelism',
after analysing the issues for nearly one page, it reaches the following
depressing conclusion:
``As a consequence, distinct execution of the same program may lead to
different results, and even to different behaviours. For example,
one execution may spawn many more activities than another, or one
execution may terminate with a result while another enters an infinite
loop. It is therefore impossible to specify the exact semantics of
ParC programs. In the absence of formal semantics, we make do with
a set of rules to guide the implementation of a ParC system.''
Such conclusions give no optimism for a sound engineering basis for parallel
computing and raise fundamental questions as to its viability. Fortunately,
the scientific insights of occam and CSP, along with those of BSP, are
becoming available to a wider audience. Unfortunately, resistance to the
concept of sound engineering is hard to underestimate in today's mainstream
computer culture.
Anyway, this example is taken from [7] but expressed now in terms of occam.
The problem is to add up the elements of an array in O(log n) time.
The first solution is expressed in only modestly upgraded occam, making use
of the natural barriers marking the termination of PAR constructs:
PROC sum ([]INT A) -- with implicit barrier sync
--{{{ COMMENT specification
--assume : (SIZE A) = (2**N), for some N >= 0
--spec : A'[0] = SIGMA {A[i] | i = 0 FOR SIZE A}
--}}}
--{{{ implementation
INITIAL INT n IS SIZE A: -- INVARIANT:
INITIAL INT stride IS 1: -- (n*stride) = (SIZE A)
WHILE n > 1
SEQ
n, stride := n >> 1, stride << 1
PAR i = 0 STEP stride FOR n
A[i] := A[i] + A[i + (stride >> 1)]
--}}}
:
The `modest' extensions are the INITIALising declarations (of occam3 and used
previously), the STEP in the replicator (very convenient for numeric algorithms
and straightforward to implement) and the variable number of replications in
the PAR construct (no semantic but a serious implementation problem, although
a unified virtual shared-memory address space simplifies matters considerably).
The algorithm is simple. In the first loop, a process is spawned for all the
even elements in the array (`stride' is 2 and `n' is half the array size);
this process adds into its element the value of its odd (senior) neighbour.
In the second loop, a process is spawned for every fourth element (stride = 4)
that accumulates the contents of their neighbour two (i.e. half-a-`stride')
above them; every fourth element now holds the sum of all the elements within
its `stride'. This continues until `stride' reaches the size of the array
(and `n' drops to 1); after which `A[0]' holds the complete sum and the loop
terminates.
A slight drawback is that the array size must be a power of two. Other sizes
could be handled but the simplicity of the code would be damaged.
Parallel security is easy to establish. Each parallel process updates element
`A[i]', where each `i' is different (so no race-hazard there). The process
updating `A[i]' uses the value in an element half-a-`stride' away. But no
other process is looking at these half-`stride' values since they are all
separated by a `stride' (so no race-hazard there). QED. Getting such proofs
checked mechanically (e.g. by the compiler) looks possible and will ultimately
be necessary.
Of course, the fine granularity of the parallelism in the above example would
scuttle any hoped-for performance gain from current parallel architectures,
although future designs (such as the `ParaPC' [8, 9]) could lap it up. In the
meantime, the example serves as a model for combining operators richer than
addition and which current architectures may be able to exploit.
The second solution commutes the PAR and the WHILE constructs, catering for
those who fear the costs of starting up and shutting down processes. The
result is an SPMD-like algorithm with explicit barrier synchronisation,
which occam can now comfortably express:
PROC sum ([]INT A) -- with explicit barrier sync
--{{{ COMMENT specification
--assume : (SIZE A) = (2**N), for some N >= 0
--spec : A'[0] = SIGMA {A[i] | i = 0 FOR SIZE A}
--}}}
--{{{ implementation
PAR i = 0 FOR SIZE A EVENT e
INT accumulate IS A[i]:
INITIAL INT n IS (i = 0) -> SIZE A, i:
INITIAL INT stride IS 1:
WHILE (n /\ 1) = 0 -- even (n)
SEQ
accumulate := accumulate + A[i + stride]
n, stride := n >> 1, stride << 1
SYNC e
--}}}
:
We have sneaked conditional expressions into the language (since they simplify
one of the initialising declarations above). The syntax used is just that
already implemented for the occam configuration language in the SGS-Thomson
Toolset:
boolean-expression -> expression, expression
which simply yields the first or second `expression', depending on the value
of the `boolean-expression'. Both `expression's must, of course, yield the same
type. Note that occam2 already can express the above:
[expression, expression][INT boolean-expression]
although it is not quite so understandable! The order of the `expression's
must be reversed since FALSE and TRUE map (under INT) to 0 and 1 respectively.
Also, the current implementation will evaluate both `expression's at run-time
before discarding the unselected one (unless `boolean-expression' is constant).
The conditional expression is a distraction ... forget them! Please compare
the new version of `sum' with the old. This time processes are set up once
for each element of the array (abbreviated locally to `accumulate'). The
odd processes immediately terminate. That leaves the even processes adding
their odd (senior) neighbours to themselves, exactly as before. At the end
of each loop, the active processes synchronise on the barrier and half of
them drop out. Recall that that does not prevent the remaining processes
synchronising on their next loop. Events continue until there is only one
process left, which accumulates the final answer into `A[0]' and terminates.
There are now no processes left and the outer PAR construct terminates and
the PROC returns.
Security against race-hazards is somewhat harder to prove than before. An
elegant way to establish this would be to find some semantic-preserving
transformations (that can be mechanised) to change the first version into
the second. This is left as an exercise for the reader.
One optimising transformation on the second version that is too tempting
to resist is as follows. Since the odd processes terminate without ever
doing anything, don't set them up in the first place! This is achieved
simply by changing the PAR constructor:
PROC sum ([]INT A) -- with explicit barrier sync
... COMMENT specification
--{{{ implementation
PAR i = 0 STEP 2 FOR (SIZE A) >> 1
... as before
--}}}
:
Finally, for those who find barrier synchronisation a little unnatural, here
is a taste of some much wilder ideas. The following code also sums its array
in O(log n) time but:
o is (first order) functional, relying on the compiler and run-time system
to extract the parallelism that is always implicit in occam expressions
(which are free from side-effects and can, therefore, be executed in any
order or concurrently);
o is recursive -- however, implementation techniques that enable variable
PAR replication also enable recursion;
o has no global synchronisations, only local synchronisations implied by
(add) operators requiring the (two) processes computing their operands
to terminate before they can operate;
o is efficient in that the (parallel) execution tree for small array
fragments has been preset by standard loop unrolling -- however, the
implementation of table lookup at run-time needs to be made lazy for the
way we have chosen to express the unravelled loop to work sensibly;
o has automatic parallel security, derived from the semantics of occam
expressions;
o handles arrays of any size -- not just powers of two.
It's also pretty neat:
INT FUNCTION sum (VAL []INT A) IS
--{{{ COMMENT specification
--spec : returns SIGMA {A[i] | i = 0 FOR SIZE A}
--}}}
--{{{ implementation
(SIZE A) <= 8) ->
[A[0],
A[0] + A[1],
A[0] + (A[1] + A[2]),
(A[0] + A[1]) + (A[2] + A[3]),
(A[0] + A[1]) + (A[2] + (A[3] + A[4])),
(A[0] + (A[1] + A[2])) + (A[3] + (A[4] + A[5])),
(A[0] + (A[1] + A[2])) + ((A[3] + A[4]) + (A[5] + A[6])),
((A[0] + A[1]) + (A[2] + A[3])) + ((A[4] + A[5]) + (A[6] + A[7]))]
[(SIZE A) - 1],
sum ([A FOR (SIZE A) >> 1]) + sum ([A FROM (SIZE A) >> 1 ])]:
--}}}
Now, all we need is a ParaPC on which to run it.
--}}}
--}}}
--{{{ BUCKETs
--{{{ Bucket synchronisation
This is an idea from Jon Kerridge of Napier University ... we are not to blame!
He was working on large scale discrete event simulation and we give a hint as
to why he wanted BUCKETs presently.
Bucket synchronisation is a small variation on barrier synchronisation. With
barrier synchronisation, processes are not released until all processes have
reached the barrier. With bucket synchronisation, processes are released
by explicit command, which can happen at any time. There is no automatic
release when `all' processes have synchronised. It opens up some interesting
control applications.
The bucket effect can be achieved using channel synchronisation and arrays.
However, pre-determined array sizes, wiring problems and computational overhead
(linear with the number of processes involved) make this pretty uncomfortable.
Our BUCKETs have a fixed space overhead (4 words), regardless of the number of
processes `falling' into them, and a fixed (very low) time overhead for each
`fall' and each `flush', regardless of the number of processes being released.
--}}}
--{{{ BUCKET Abstract Data Type (documentation)
The current KRoC release implements:
DATA TYPE BUCKET
Users may declare their own BUCKET variables and pass them as parameters.
They should only be declared in association with the PAR construct that
sets up the processes that synchronise on them. BUCKETs must not be
duplicated by assignment or communication through channels.
PROC initialise.bucket (BUCKET b)
Each BUCKET must be initialised with this routine before starting the
associated PAR construct.
PROC fall.into.bucket (BUCKET b)
This may be called by any process in the associated PAR construct. The
calling process will be blocked until the bucket is flushed.
PROC flush.bucket (BUCKET b)
This should be called by just one process (the BUCKET manager) in the
associated PAR construct. It reschedules all processes in the BUCKET.
INT FUNCTION number.in.bucket (VAL BUCKET b)
This returns the number of processes currently in the bucket.
[Although this `function' causes no side-effect, its argument is not
really a `value'. BUCKET variables (like CHAN, PORT, TIMER, SEMAPHORE,
RESOURCE, RESOURCE.NODE and EVENT variables) do not have the simple
semantics of data variables. They are passive, but volatile, objects
whose `values' can be changed under our feet by activities elsewhere.
They have no place in expressions. KRoC 0.9beta will, probably, change
this into a PROC returning its answer through a reference parameter.]
[Also, this operation yields information that may have no use; in which
case, it will be withdrawn from future releases (because its maintenance
imposes a small space and time overhead for the other operations). We
would like feedback on this.]
[Note: this abstract data type is currently implemented using EVENT. It is,
therefore, affected by the same bug (in the KRoC implementation of SAVEL)
previously reported. For KRoC versions before 0.9beta, the temporary version
of EVENT must be obtained -- please see the note at the end of the EVENT
documentation.]
--}}}
--{{{ A simple example
The following is a complete (KRoC) occam2.1 program demonstrating a simple
application of BUCKETs. There are ten clients, one bucket and one flusher.
The clients are numbered 1 through 10. Client `i' waits `i' seconds and,
then, falls into the bucket; when released it repeats this behaviour. The
flusher flushes the bucket whenever the keyboard delivers a character (i.e.
when most keys are pressed). Each client shares the screen channel and
reports what is happening.
#INCLUDE "semaphore.inc"
#INCLUDE "event.inc" -- buckets are currently implemented
#INCLUDE "bucket.inc" -- in terms of events
#USE "utils"
PROC bucket.test (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ SHARED screen
#PRAGMA SHARED screen
SEMAPHORE screen.s:
#PRAGMA SHARED screen.s
--}}}
--{{{ client
PROC client (VAL INT id, BUCKET b,
SEMAPHORE out.s, CHAN OF BYTE out)
--{{{
WHILE TRUE
SEQ
--{{{ wait id seconds
VAL INT seconds IS 1000000:
TIMER tim:
INT t:
SEQ
tim ? t
tim ? AFTER t PLUS (id*seconds)
--}}}
--{{{ say we are ready to synchronise
SEQ
claim.semaphore (out.s)
out.number (id, 0, out)
out.string (" falling into bucket*c*n", 0, out)
release.semaphore (out.s)
--}}}
fall.into.bucket (b)
--{{{ tell the world we are back
SEQ
claim.semaphore (out.s)
out.string ("==> ", 40, out)
out.number (id, 0, out)
out.string (" flushed ...*c*n", 0, out)
release.semaphore (out.s)
--}}}
--}}}
:
--}}}
--{{{ flusher
PROC flusher (CHAN OF BYTE in, BUCKET b)
--{{{
WHILE TRUE
SEQ
--{{{ wait for signal
BYTE ch:
in ? ch
--}}}
flush.bucket (b)
--}}}
:
--}}}
VAL INT n.clients IS 10:
--{{{ SHARED BUCKET b
BUCKET b:
#PRAGMA SHARED b
--}}}
SEQ
--{{{ explain
out.string ("*nPress a key to flush the bucket ...*n*n", 0, screen)
--}}}
--{{{ initialise
SEQ
initialise.semaphore (screen.s, 1)
initialise.bucket (b)
--}}}
--{{{ client-flusher network
PAR
PAR n = 0 FOR n.clients
client (n + 1, b, screen.s, screen)
flusher (keyboard, b)
--}}}
--}}}
:
There is an obvious high-level language binding for buckets:
#USE "utils"
PROC bucket.test (CHAN OF BYTE keyboard, SHARED CHAN OF BYTE screen, error)
--{{{
--{{{ client
PROC client (VAL INT id, BUCKET b, SHARED CHAN OF BYTE out)
--{{{
WHILE TRUE
SEQ
... wait id seconds
--{{{ say we are ready to synchronise
CLAIM out
SEQ
out.number (id, 0, out)
out.string (" falling into bucket*c*n", 0, out)
--}}}
FALL b
--{{{ tell the world we are back
CLAIM out
SEQ
out.string ("==> ", 40, out)
out.number (id, 0, out)
out.string (" flushed ...*c*n", 0, out)
--}}}
--}}}
:
--}}}
--{{{ flusher
PROC flusher (CHAN OF BYTE in, BUCKET b)
--{{{
WHILE TRUE
SEQ
--{{{ wait for signal
BYTE ch:
in ? ch
--}}}
FLUSH b
--}}}
:
--}}}
VAL INT n.clients IS 10:
SEQ
--{{{ explain
CLAIM out
out.string ("*nPress a key to flush the bucket ...*n*n", 0, screen)
--}}}
--{{{ initialise
SEQ
initialise.semaphore (screen.s, 1)
initialise.bucket (b)
--}}}
--{{{ client-flusher network
PAR BUCKET b
PAR n = 0 FOR n.clients
client (n + 1, b, screen.s, screen)
flusher (keyboard, b)
--}}}
--}}}
:
However, the number of new keywords is growing and we need to pay attention
to Occam's Razor! No language extensions have yet been implemented; the new
facilities have been prototyped through abstract data types. For security
reasons, this is unsatisfactory in the long term. We need to enforce correct
rules of use and the rules have been different for each of the primitives
introduced so far. Also, all the new primitives (along with CHANs, PORTs
and TIMERs) do not have the same rights as `data' -- for example, they are
not assignable or communicable -- so they should not be declared as `data'
variables.
SEMAPHOREs and RESOURCEs may just be alternative implementation techniques
for SHARED channels; neither appears explicitly in any proposed language
bindings. One of these ideas, therefore, may be able to be removed.
BUCKETs and EVENTs, although they share many implementation details, cannot
be merged into a single concept. For one thing, EVENTs (like CHANs) do not
introduce non-determinism into a system (and may, for example, be freely used
within FUNCTIONS). BUCKETs (like SEMAPHOREs, RESOURCEs, PORTs, TIMERs and
ALTing over CHANs) do introduce non-determinism and are intended to support
applications that require it.
--}}}
--{{{ Discrete event simulation
We go back to `resource.test', which was the example client-server application
used earlier to demonstrate the RESOURCE implementation of SHARED channels.
This contains ten `client' processes, each of which works for a while and,
then, has to wait for a specific time-event. Time is discretised into some
particular granularity (in this case, seconds). The specific time-events
are regularly spaced for each process, but in general could be determined
by what is happening in the model at run-time.
Anyway, first we re-present this system, simplifying the RESOURCE mechanism
for the SHARED screen channel to one using SEMAPHOREs:
#INCLUDE "semaphore.inc"
#USE "utils"
PROC discrete (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ SHARED screen
#PRAGMA SHARED screen
SEMAPHORE screen.s:
#PRAGMA SHARED screen.s
--}}}
--{{{ client
PROC client (VAL INT n, TIMER tim, VAL INT start.time,
SEMAPHORE out.s, CHAN OF BYTE out)
--{{{
VAL INT seconds IS 1000000:
INT real.time, logical.time:
SEQ
logical.time, real.time := 0, start.time
WHILE TRUE
SEQ
--{{{ wait for next time slot (one every n seconds)
SEQ
logical.time := logical.time PLUS n
real.time := real.time PLUS (n*seconds)
tim ? AFTER real.time
--}}}
--{{{ write to the SHARED screen
SEQ
claim.semaphore (out.s)
out.number (logical.time, 0, out)
out.string (": Hello discrete world from ", 0, out)
out.number (n, 0, out)
out.string ("*c*n", 0, out)
release.semaphore (out.s)
--}}}
--}}}
:
--}}}
VAL INT n.clients IS 10:
--{{{ SHARED TIMER
TIMER tim:
INT start.time:
--}}}
SEQ
--{{{ initialise
SEQ
initialise.semaphore (screen.s, 1)
tim ? start.time
--}}}
--{{{ clients
PAR n = 0 FOR n.clients
client (n + 1, tim, start.time, screen.s, screen)
--}}}
--}}}
:
The problem lies in the harmless-looking time-out within the `client':
tim ? AFTER real.time
Attaching a process to the timer queue is an O(n) operation, where n is the
number of processes in the queue. Releasing a bunch of processes that are
all queued on the same time-out (or within some discrete time interval)
depends on the number of processes in that bunch -- they are scheduled on
the run-queue individually. If the numbers of processes are large (and
they can easily reach thousands for modest simulations), the overheads can
become damaging.
If, instead, we represent time with a row of BUCKETs (one for each discrete
time interval in the model), we can achieve the required time-out by falling
into the BUCKET representing the desired wake-up time slot (a constant time
operation). We also need one `time.keeper', who sleeps through each time
interval, wakes up and kicks over the next BUCKET in the row, flushing out
the bunch of processes inside (a constant time operation). The `time.keeper'
is now the only process that ever sets real time-outs and, so, its timer
queue operations are also constant time.
We can't have an infinite row of BUCKETs and so make do with a finite number,
arranged in a circle:
#INCLUDE "semaphore.inc"
#INCLUDE "event.inc"
#INCLUDE "bucket.inc"
#USE "utils"
PROC discrete (CHAN OF BYTE keyboard, screen, error)
--{{{
--{{{ SHARED screen
#PRAGMA SHARED screen
SEMAPHORE screen.s:
#PRAGMA SHARED screen.s
--}}}
--{{{ client
PROC client (VAL INT n, TIMER tim, VAL INT start.time, []BUCKET b,
SEMAPHORE out.s, CHAN OF BYTE out)
--{{{
VAL INT seconds IS 1000000:
INT real.time, logical.time:
SEQ
logical.time, real.time := 0, start.time
WHILE TRUE
SEQ
--{{{ wait for next time slot (one every n seconds)
INT t:
SEQ
logical.time := logical.time PLUS n
real.time := real.time PLUS (n*seconds)
--{{{ fall into the right bucket and wait for the right flush
BUCKET target IS b[logical.time \ (SIZE b)]:
SEQ
tim ? t
WHILE NOT (t AFTER real.time)
SEQ
fall.into.bucket (target)
tim ? t
--}}}
--}}}
--{{{ write to the SHARED screen
SEQ
claim.semaphore (out.s)
out.number (logical.time, 0, out)
out.string (": Hello discrete world from ", 0, out)
out.number (n, 0, out)
out.string ("*c*n", 0, out)
release.semaphore (out.s)
--}}}
--}}}
:
--}}}
--{{{ time.keeper
PROC time.keeper (TIMER tim, VAL INT start.time, []BUCKET b)
--{{{
VAL INT seconds IS 1000000:
INT real.time, logical.time:
SEQ
logical.time, real.time := 0, start.time
WHILE TRUE
SEQ
--{{{ wait one second
SEQ
logical.time := logical.time PLUS 1
real.time := real.time PLUS (1*seconds)
tim ? AFTER real.time
--}}}
flush.bucket (b[logical.time \ (SIZE b)])
--}}}
:
--}}}
VAL INT n.clients IS 10: -- it is coincidental for this system
VAL INT n.buckets IS 10: -- that these two values are equal.
--{{{ SHARED BUCKETs b
[n.buckets]BUCKET b:
#PRAGMA SHARED b
--}}}
--{{{ SHARED TIMER
TIMER tim:
INT start.time:
--}}}
SEQ
--{{{ initialise
SEQ
initialise.semaphore (screen.s, 1)
SEQ i = 0 FOR SIZE b
initialise.bucket (b[i])
tim ? start.time
--}}}
--{{{ client-time.keeper network
PAR
PAR n = 0 FOR n.clients
client (n + 1, tim, start.time, b, screen.s, screen)
time.keeper (tim, start.time, b)
--}}}
--}}}
:
The `time.keeper' just wanders around the circle of BUCKETs, kicking one over
each second. The `client's still check their watches on each BUCKET operation:
--{{{ fall into the right bucket and wait for the right flush
BUCKET target IS b[logical.time \ (SIZE b)]:
SEQ
tim ? t
WHILE NOT (t AFTER real.time)
SEQ
fall.into.bucket (target)
tim ? t
--}}}
If the number of BUCKETs is at least the same as the largest required time-out
(measured in discrete time intervals), each `client' will only fall into its
target BUCKET once. Otherwise, the above loop may go around several times
before the correct timeout is reached.
In the above system, the number of BUCKETs is set the same as the number of
`client's. This is because the largest (logical) time-out is equal to the
number of `client's. The system will work correctly with *any* number of
BUCKETs, but will incur unnecessary (though minor) execution overheads if
there are less than ten and unnecessary (though minor) space overheads if
there are more.
The high-level binding of the system contains no surprises and is mostly
folded out below:
#USE "utils"
PROC discrete (CHAN OF BYTE keyboard, SHARED CHAN OF BYTE screen, error)
--{{{
... client
... time.keeper
VAL INT n.clients IS 10: -- it is coincidental for this system
VAL INT n.buckets IS 10: -- that these two values are equal.
... SHARED TIMER
SEQ
--{{{ initialise
tim ? start.time
--}}}
--{{{ client-time.keeper network
PAR [n.buckets]BUCKET b
PAR n = 0 FOR n.clients
client (n + 1, tim, start.time, b, screen.s, screen)
time.keeper (tim, start.time, b)
--}}}
--}}}
:
The only differences within the `client' and `time.keeper' codes are the
replacement of their respective BUCKET procedure calls with FALL and FLUSH.
The only new point of interest is the declaration of the BUCKET array in the
PAR constructor that sets up the network.
--}}}
--}}}
--{{{ Implementation overview and platform independence
The new primitives have been introduced as abstract data types and with no
changes to the KRoC kernel. This means they will automatically run on any
KRoC system (that correctly translates transputer instructions!). Currently,
this means KRoC 0.9beta for SPARC (SunOS/Solaris) and Alpha (OSF/1) UNIX
workstations. For earlier KRoC versions, you will need to obtain the patched
version of the EVENT routines (that compensate for the bug in implementing
SAVEL -- see the documentation on EVENTs above).
The routines operating on the primitives are programmed as INLINE PROCs and
use transputer ASM blocks. This means that, with certain restrictions, they
will also run on real transputers (using standard occam Toolsets).
--{{{ SEMAPHOREs
A SEMAPHORE is an occam2.1 RECORD with three fields: one holds a count and
the others hold front and back pointers to a process queue. Processes are
held on this queue using the same workspace link fields that hold them on
the run-queue (a process can never be on a SEMAPHORE-queue and the run-queue
at the same time). This means that no space needs to be reserved to manage
this queue (other than the front and back pointers).
A process claiming a SEMAPHORE will be put on its queue (and blocked) if its
count is zero. Otherwise the count is decremented.
A process releasing a SEMAPHORE will re-schedule the first process from its
queue if that queue is not null. Otherwise, it increments the count.
Both these operations work in constant time.
--{{{ Transputer restrictions
Scheduling of processes on and off the run-queue is managed using the normal
transputer scheduling instructions -- run-queue registers are not modified
directly. This means they will be secure on a transputer even in the presence
of high-priority process pre-emption (caused by transputer link, event or
timer interrupts).
However, manipulation of the SEMAPHORE queues themselves (or their counters)
can be corrupted by process pre-emption. No danger arises if all processes
sharing the same SEMAPHORE also share the same transputer priority. Otherwise,
the low-priority processes must protect their claims and releases by first
popping into high-priority -- for example:
PRI PAR
claim.semaphore (s)
SKIP
T9000 and ST20-derived transputers have extended instruction sets that include
SEMAPHORE operations (SIGNAL and WAIT) that provide identical functionality to
`claim.semapohre' and `release.semaphore'. If asked, we can provide a special
version of the SEMAPHORE abstract data type that exploits them. In that case,
the above work-around will not be necessary.
--}}}
--}}}
--{{{ RESOURCEs
A RESOURCE is an occam2.1 RECORD with four fields: a process descriptor for
a waiting GRANT process, front and back pointers for a queue of RESOURCE.NODEs
and a channel identifier of the selected CLAIMant (following a GRANT).
A RESOURCE.NODE is an occam2.1 RECORD with three fields: an identifier (i.e.
an array index or pointer) for the associated channel, a link either to the
controlling RESOURCE or to the next RESOURCE.NODE in the RESOURCE-queue
(initialised to the former) and the process descriptor of a waiting CLAIMant.
A process claiming a RESOURCE.NODE checks the status of the GRANTer field in
the controlling RESOURCE. If there is no waiting GRANT process, it writes its
descriptor into the RESOURCE.NODE, attaches this to the RESOURCE-queue and
blocks. Otherwise, it copies the channel identifier from the RESOURCE.NODE
to the RESOURCE and re-schedules the blocked GRANTer.
A process granting a RESOURCE checks the RESOURCE-queue. If this is empty,
it writes its descriptor into the RESOURCE and blocks. Otherwise, it copies
the channel identifier from the first RESOURCE.NODE in the queue into the
RESOURCE, removes that RESOURCE.NODE from the queue, resets its link to point
back to the RESOURCE and re-schedules the waiting CLAIMant.
Both the above operations work in constant time.
Regardless as to whether CLAIMants have to form a queue (boom) or a GRANTer
has to block (recession), the channel identifier of a waiting CLAIMant will
be in the RESOURCE upon termination of a GRANT.
--{{{ Transputer restrictions
These are the same as for SEMAPHOREs. If one of the CLAIMing processes (or
the GRANTing process) runs at high-priority, all low-priority processes must
protect the claims and grants associated with that RESOURCE by popping into
high-priority.
T9000 transputers have a RESOURCE/RESOURCE.CHANNEL mechanism slightly different
from the RESOURCE/RESOURCE.NODE abstract data type we have implemented. As
discussed above, occam2.1 does not quite give us all the structures necessary
to model the T9000 RESOURCE instructions -- their direct exploitation will
probably have to await a proper language binding.
--}}}
--}}}
--{{{ EVENTs
An EVENT is an occam2.1 RECORD with four fields: two integer counts (for the
number of processes registered and the number that have not yet synchronised)
and the front and back pointers to a process queue.
A process synchronising on an EVENT decrements its synchronisation count. If
this has not reached zero, it attaches itself to the EVENT-queue and blocks.
Otherwise, it releases *all* the processes on the EVENT-queue (by simply
concatenating it on to the run-queue) and resets the synchronisation count
back to the number currently registered. This is a constant time operation.
A process resigning an EVENT decrements both its registration count and its
synchronisation count. If the latter has reached zero, it releases *all* the
processes on the EVENT-queue (just like a synchronising process) and resets
the synchronisation count. Again, this is a constant time operation.
--{{{ Transputer restrictions
Transputers have no atomic (non-preemptable) instructions for concatenating
a process queue on to a run-queue. Therefore, to implement EVENT operations,
we have been updating run-queue registers through a sequence of instructions.
Transputer link, event and timer interrupts may preempt this sequence and try
to schedule processes on to the same run-queue -- with bad consequences!
Therefore, our EVENTs may be used safely on transputers provided:
o only low-priority processes use EVENTs and they protect `synchronise.event'
and `resign.event' operations by popping into high-priority;
o only high-priority processes handle transputer link, event or timer
interrupts. [NB: the term `transputer event' refers to the electronic
assertion of its event pin by some external device, which has nothing
to do with the `EVENT' primitive in this document.]
These are not severe restrictions -- the second point above should properly
be a design rule in any case. Of course, if the application had no need for
interrupt handling (which implies that it is uni-processor), the first point
can be ignored (unless, of course, the EVENT is shared between high and low
priority processes).
--}}}
--}}}
--{{{ BUCKETs
A BUCKET is an occam2.1 DATA TYPE currently implemented by an EVENT. It is
initialised by initialising the EVENT with a count of (MOSTPOS INT). Finding
the number of processes in the BUCKET is simply subtracting one of the EVENT
counts from the other. KRoC 0.9beta will implement BUCKETs independently from
EVENTs and only have one (or, possibly, no) count field.
A process falling into a BUCKET simply synchronises on the implementing EVENT.
[Warning: this implementation restricts the number of processes synchronising
on each BUCKET to (MOSTPOS INT) - 1. Otherwise, when (MOSTPOS INT) processes
have jumped in, they will be prematurely released.]
A process flushing a BUCKET concatenates the BUCKET-queue on to the run-queue.
Currently, this is implemented by setting the EVENT synchronisation count to
one and synchronising.
Both these operations work in constant time.
--{{{ Transputer restrictions
These are the same as for EVENTs. BUCKETs may be used safely on transputers
provided:
o only low-priority processes use BUCKETs and they protect `fall.into.bucket'
and `flush.bucket' operations by popping into high-priority;
o only high-priority processes handle transputer link, event or timer
interrupts.
If the application has no interrupt-handling needs (which implies that it is
uni-processor), the first point can be ignored (unless the BUCKET is shared
between high and low priority processes).
--}}}
--}}}
--}}}
--{{{ Postscript
--{{{ Summary
This document has described some new synchronisation primitives for occam and
some higher-level language bindings that make them secure. The new primitives
are directly usable within occam2.1 through the abstract data types released
from KRoC 0.8beta onwards. The primitives complement, but do not replace,
the traditional concept of channel communication for networks of synchronising
processes. In particular, they provide an implementation for SHARED channels
(as proposed for occam3), as well as a range of higher-level and relaxed forms
of safely SHARED resource that will allow more parallelism to be extracted
from applications. Details of these higher-level mechanisms for sharing will
be reported separately.
--}}}
--{{{ Performance and optimisation
The released primitives have not yet been benchmarked, but we believe that none
of the operations cost more in time than about two or three context-switches
(i.e. between one and two micro-seconds on a SPARC-20). For portability, the
prototypes have not been burnt into the KRoC kernel, but have been implemented
with transputer instructions (which KRoC uses as an abstract intermediate code).
It is straightforward to move the implementation of the primitives into the KRoC
kernel and this will be done later for those that prove useful. This should
reduce the overheads for each operation towards a single context-switch. It
will also nail down the loose end currently exposing a minor security problem
in EVENTs (when the number of processes engaged in a barrier synchronisation
grows temporarily).
--}}}
--{{{ Virtual transputers
Working with virtual transputers has its benefits over working with real ones:
modern micro-processors mean they run faster and we *can* experiment with new
instructions with relative ease. For example, to move the primitives into the
kernel, the abstract (or virtual) transputer machine used by KRoC will have to
extended with new instructions for the manipulation of process queues. We want
to do this anyway for other reasons -- such as a full implementation of PRI PAR
(that allows any number of prioritised components).
When KRoC goes multi-priority and multi-processor (:-), we want no constraints
on our use of the primitives (such as are necessary on real transputers). With
control over the virtual architecture, this should be possible to arrange.
The long-term (medium-term?) architectural goal, for which the new primitives
would give major benefit, is shared-memory (real or virtual) multi-processing.
To support this further, we are investigating a richer form of EVENT that
implements the contradictory-sounding `non-blocking barrier synchronisation',
recently proposed for MPI-2. This is a two-phase event synchronisation,
where the first phase registers that the process is ready to synchronise (but
doesn't block) and the second phase does block (unless and until all the other
processes in the barrier have registered their first phase). With the right
algorithm, processes may be able to sail through most such barriers without
ever blocking! Implementation looks easy within KRoC and we *may* try out a
version in the 0.9beta release to enable anyone to experiment with the ideas.
We also want to extend the instruction set to provide type information that
will allow KRoC to target Java Byte Code. Additionally, this type information
makes possible a Transputer Byte Code verifier that enables the distribution
of occam processes as compact binaries (`occlets') with the same object-level
security as Java. Note that KRoC compiles these Byte Codes to target object
code for execution -- it doesn't interpret them at run-time. In that sense,
it already provides `just-in-time' compilation (but we do realise there is a
little bit more involved than that).
--}}}
--{{{ Microcode, methods and objects
The algorithms underlying the new synchronisation primitives correspond to new
`micro-code' for the virtual transputer. Just like the real micro-code that
implements channel synchronisation on real transputers, great care has to be
taken in its design -- these algorithms are significantly harder to get right
than algorithms of a similar size at the occam application level.
The reason for this is that algorithms within occam processes are easy to make
object-oriented -- in the `true' sense of the term. By this, we mean that
they directly express the behaviour of objects from their own point of view --
not as a set of procedure calls that are made by (and, therefore, oriented
towards) external agents. This is achieved through implementing objects as
processes that run concurrently with other objects, each with their own threads
of control. Of course, this is something for which occam was specifically
designed.
Most `object-oriented' programming languages allow the encapsulation of object
data and algorithms, but only provide for the expression of those algorithms
through a set of `methods' (which are no different to procedures). Objects
interact by calling each other's methods and, as far as algorithm design is
concerned, there is no paradigm shift from traditional procedural programming.
The problem is that expressing the behaviour of an object through a set of
externally-called methods is unnatural (it's not object-oriented!), and it
gets especially hard in a multi-threaded (or multi-processing) environment.
The reason is that method semantics in these circumstances do not compose.
This means that in order to design/understand two methods of some object, we
have to design/understand both at the same time. Of course, this gets worse
the more methods an object contains and puts a limit on the complexity of what
can be done this way.
Unfortunately, we cannot build a system purely from active objects! An active
object cannot directly interfere with another active object (this would break
the principle of data abstraction and introduce all manner of semantic chaos)
and it cannot directly call on it (because active objects are active and don't
provide passive facilities). So, we need some passive medium through which
they can interact.
Fortunately, we only need a small variety of passive objects to construct this
medium and these can be hidden in `micro-code' or burnt into a high-level
language. Then, the system engineer only needs to work with active (truly
object-oriented) objects, whose semantics do compose and put no limit on the
complexity of the design.
Hence, we have occam and the virtual transputer, where the necessary (but hard
to program) passive objects are its CHANnels, SEMAPHOREs, RESOURCEs, EVENTs
and BUCKETs and, hopefully, not too many more!
The most complex in this list is the RESOURCE. See how the non-compositional
nature of its semantics bites as we are obliged to think simultaneously about
its `claim' and `release' algorithms -- they cannot be understood individually.
The same is true for the original occam CHANnel, whose `input' and `output'
methods (especially in the context of ALTs) are so elegant, but are completely
inter-dependent. It is an immense relief we don't have to understand this, or
program like this, at the occam level.
--}}}
--{{{ Java threads and occam
These ideas were examined in the context of Java at the Java Threads Workshop,
which took place at the University of Kent last September (1996). Java allows
both passive and active objects at the user-program level, with passive being
the default mechanism everyone learns first. Java threads are not based upon
CSP, but on the earlier concept of `monitors'. Thread synchronisation can only
be achieved through calling `synchronized' methods. These methods belong to
passive objects and have to be programmed -- which is hard.
The workshop has stimulated the development (by at least three research groups)
of CSP class libraries that provide a range of passive hard-to-program objects
(like channels, shared channels, buffers, shared buffers, events and buckets).
Multi-threaded Java applications can now be developed that interact with each
other in the occam/CSP style and use only active easy-to-program objects.
Details from this workshop can be found on:
--}}}
--{{{ If only ...
These ideas could have been introduced at low cost over ten years ago -- they
do not depend on any hardware technology. They remain vital today (we suspect)
because they enhance any multi-processing (or multi-threaded) technology that
doesn't have them -- and that seems to include most everything (?) available.
But if only we had had them since 1986, ...
--}}}
--}}}
--{{{ References
[1] D.C.Wood and P.H.Welch: The Kent Retargetable occam Compiler.
In Parallel Processing Developments, Proceedings of the 19th. WoTUG
Technical Conference, Nottingham-Trent University; edited by B. O'Neill;
pp. 143-166; IOS Press (Netherlands); ISBN 90-5199-261-0; March, 1996.
[2] occam-for-all group: Beta-release WWW site for KRoC.
[3] G.Barrett: occam3 reference manual (March 31 1992 draft).
INMOS Ltd.;
[4] M.D.May, P.W.Thompson and P.H.Welch: Networks, Routers and Transputers.
IOS Press (Netherlands); ISBN 90 5199 1290 1993.
[5] M.H.Goldsmith and A.W.Roscoe and B.G.O.Scott: Denotational Semantics
for occam2 (Part 1).
Transputer Communications, vol.1(2), pp. 65-91. John Wiley & Sons Ltd.;
ISSN 1070-454X; November, 1993.
[6] M.H.Goldsmith and A.W.Roscoe and B.G.O.Scott: Denotational Semantics
for occam2 (Part 2).
Transputer Communications, vol.2(1), pp. 25-67. John Wiley & Sons Ltd.;
ISSN 1070-454X; March, 1994.
[7] Y.Ben-Asher, D.G.Feitelson and L.Rudolf: ParC -- an Extension of C for
Shared Memory Parallel Processing.
In Software -- Practice and Experience, Vol. 26(5); pp. 581-612;
John Wiley & Sons Ltd.; ISSN 0038-0644; May, 1996.
[8] P.H.Welch: Parallel Hardware and Parallel Software: a Reconciliation.
In Proceedings of the ZEUS'95 and NTUG'95 Conference,
University of Linkoping, Sweden; pp. 287-301; IOS Press (Netherlands);
ISBN 90-5199-22-7; May, 1995.
[9] B.M.Cook and R.M.A.Peel: The Para-PC, an Analysis.
In Parallel Processing Developments, Proceedings of the 19th. WoTUG
Technical Conference, Nottingham-Trent University; edited by B.O'Neill;
pp. 89-102; IOS Press (Netherlands); ISBN 90-5199-261-0; March, 1996.
--}}}