Arhitectures, Languages and Patterns for Parallel and Distributed Applications
P.H.Welch (Ed.)
(WoTUG-21 Proceedings)
IOS Press, 1998
ISBN 90 5199 391 9 (IOS Press)
ISBN 4 274 90216 1 C3000 (Ohmsha)
ISSN 1383-7575
Paper for WoTUG-21
- 5-8 April 1998, Computing Laboratory, University of Kent at Canterbury, UK
The Proceedings are published by IOS Press,
Netherlands as part of the Concurrent Systems Engineering Series (ISSN 1383-7575).
Abstract. The article describes how SPoC (Southampton Portable occam Compiler) has been used - together with hand-written C - in Autronica's new GL-100 radar-based fluid level gauge. The final C-code is running on a Texas TMS320C32 DSP. Some 26000 lines of C-code have been automatically generated from the occam sources. SPoC's non-preemptive scheduling filled our needs with a few exceptions. The main problem has been aligning occam 2 and ANSI-C data abstractions. A realtime system based on language support of high-level concurrency abstractions (as opposed to separate real-time kernel and use of library calls without direct language support) is soon to monitor worldwide charging and discharging of oil-tankers.
PAR The parallel is one of the most useful constructs of the occam language. A parallel combines a number of processes which are performed concurrently [Inmos 88]. A PAR component is called either a process or a task in this paper. It is assumed that the reader of this article has some basic understanding of occam.
STARTP The C macro that the Southampton Portable occam Compiler (SPoC)[Debbage 94] uses to start an occam PAR. It is assumed that the reader of this article has some basic understanding of SPoC. (The Appendix is a small occam program translated to C by SPoC.) In the "occam2c.h" file that came with SPoC "STARTP" spells out
#define STARTP(t,f,s,n,p) \
{ \
(s)->Chain = FP; \
(s)->_Header.IP = 0; \
Start_Process(t,(tFuncPtr)f,s,n,p); \
}
The transputer instruction to start a process is startp and it "adds process with workspace Areg and instruction pointer at offset of Breg bytes from Iptr to current priority process queue" [Inmos88b]. The C-code that SPoC generates in essence emulates the semantics of the transputer instruction set.
Autronica GL-100 A Norwegian company's new generation of radar-based fluid-level gauges. Replaces the 1984 vintage GL-90 installed aboard some 330 ships, totaling 5700 tanks, plus 1000 processing units to sub-contractor for land-based tanks. The processing element of GL-100 is a Texas Instrument TMS320C32 DSP. Programs have been written in C and occam.
When we first set out on the encounter to develop a new version of the level gauge, we thought we were going to end up with a plug-in replacement of the old board. We knew we were going to throw out the old Texas TMS99105 processor and Pascal development system with built-in task support [Texas 81]. We did not see the result as it came to be: 26000 lines of automatically generated C-code from some 65 parallel occam processes, plus FFT's and interpolations etc. in C. At one stage we considered a separate real-time kernel and C - but ended up with a built-in scheduler. SPoC made all this feasible. Stay tuned, hear our story..
Autronica had had a student port the SPoC compiler to the PC, and have the compiled code run on a TMS320C32 DSP [Aarrestad 96]. He ftp'ed all the source files and edited make-files and got the thing compiled and running. He modified the two files "occam2c.h" and "occam2c.c" so that occam primitive data-types were mapped correctly to DSP data types. The timer-interrupt routine was modified so that we could use occam TIMER. Aarrestad ran the code on different PCs, and on the DSP. The speed was quite good, the accompanying "comstime" program [Welch 94] ran faster on a 486DX-2/66MHz than on a 20 MHz T805 transputer with native occam.
Friction arose in the project team when I argued that a transputer could very well be used for data collection and routing. A DSP TRAM-type board could then be inserted into the system to do digital signal processing for one or several sensors ("antennas") - or the T805 could do it if slower processing was acceptable. Autronica had indeed had success with T805, coupled with occam it had proved to be a winning team.
The DSP-people gave it a fight though, and won! This was in early 1995, and a T805 was not, by my standards at least, dead, as a design-in processor. But we are all quite happy with the Texas DSP now, and I feel comfortable that T805 is not in there. Admittedly, I still have a little mourning to do, of the fate of that beautiful piece of engineering.
The early project-team had already decided not to use the run-time-royalty ridden kernel that was offered for the DSP. The team had studied the nicely laid out SPoC-built state-machines, and decided not to do the same by hand. The project headed for a sequential realization.
With occam and SPoC in mind, a problem was that SPoC was free and therefore smelt of non-responsibility. The previous generation of the level-gauge had already had a long life, and we wanted to have living support a good while into the new product's life span. With up to 54 units per ship, the scenario for an error that was dependent on a non-supported tool was rather awesome. The University of Southampton and Autronica negotiated a support-deal that we could live with. This has been renewed for 1998, and we hope that the next version of SPoC will be placed back into the public domain.
With the contract lined up, another and systemic problem remained: the number of people who knew occam. It was one (not people, person!). However, the team believed that an in-house two-days occam course for the other two other programmers would minimize this problem. A company lives through so many hardware and software paradigm shifts anyhow that we gave it a go. We always give a substantial weight to what the man who is going to do the job is saying. So, we went for occam and C in a mixed programming implementation.
The nature of the product - an embedded application that did not even have C library support for "println" - with no dialogue boxes and fancy windows, also made it easier to choose occam. It remains to be seen whether occam will be a contender for programming embedded applications running f.ex. Windows CE. There should be a niche for occam between clean object-oriented API based systems and separate real-time kernel systems.
The final point was that, compared with other (machine-generated and threads-based) ANSI-C code that we analyzed (two SDL to C translators), the SPoC ANSI-C code was more readable. We thought we would be little worse off than the companies that went for those solutions, even if they had real graphics on top. We had a folding editor, line-based semi-graphics so to speak: just look at a folded ALT-structure! It is descriptively powerful, but I still miss a visual tool to view it from a different perspective.
Since, when it came to business, only two persons did the DSP coding, we have decided to use the somewhat confusing "I" and "we"- forms here. Who is saying what comes out of this paper's heading, but "we" always covers both of us. To make it even more confusing: when we use "team" it covers another additional few persons.
The application is a typical embedded application more than a traditional DSP application. We do not need to produce an output data stream with the same bandwidth as the input stream. We use the DSP so that we can get FFT's and interpolation etc. to finish "fast". Also, we are not talking about a real-time system with "hard" scheduling deadlines - at least not in the ms-range.
The system's fieldbus is based on LonWorks [LonWorks 95]. We use a "Lon-processor" to do communication on the network on one side, and a dual-port memory on the other side. The Lon-processor and the DSP talk over the dual-port memory according to an algorithm we developed [Teig 96].
The "Album" protocol used for configuration implements a client/server type strategy where the server exports comma-separated lists of offered functionality [Teig 97].
The DSP application is medium-sized among today's embedded systems: 300kB program is programmed in flash and we use about half of the 1MB RAM-memory. Typically we had to increase RAM and flash capacity along the line. The extensive process-orientation of PARs probably caused this change to take place earlier, since we then had to keep more separate buffers.
The TMS320C32 DSP has some transputer-like features - it can pull in data over a high-speed serial link with its built-in DMA machine - and boot via serial-link or from byte-wide memory. We use both: a stereo A/D-converter delivers raw data, and we boot the DSP with completely empty flash memory via the dual-port.
The DSP also controls the sensor. This is an antenna with a 9.5-10.5 GHz sweep. The microwave is heterodyned to an lf-signal that is being processed by the DSP. The main output parameter is the fluid's level, at millimeter accuracy.
Finally, results are processed further and displayed by the "Autro CARGO NL-300" NT-based end-user graphical presentation system, and on local displays.
Our main reason for using occam was the same as the reason to design with threads/ processes/ tasks: split and conquer to handle complexity. Occam is a first-rate tool for doing this. We can reason about concurrent processes and their interactions easily through occam's PAR, ALT and PROTOCOL constructs.
And we can write concurrent processes quite easy:
In occam, every statement - even a lowly assignment statement - is considered to be a process. It is up to the programmer to indicate explicitly whether statements will be combined in sequence or in parallel. Compare this with Ada which uses simple juxtaposition to indicate sequential execution, but requires a task specification and task body to indicate parallelism. Thus occam encourages concurrent programming by setting its syntax on an equal footing with sequential programming. [Ben-Ari 90]
The kind of reasoning we do will influence what we eventually make. In "Unified Modeling Language Summary" we can read that:
The choice of what models and diagrams one creates has a profound influence upon how a problem is attacked and how a corresponding solution is shaped. [UML1.1 97]
UML is based on an object-oriented paradigm and thus yields object-oriented implementations. It seems to me that, according to Wegner's taxonomy, occam is object-based [Wegner 87]. So object-oriented techniques to describe the design are not really appropriate. Maybe this explains why it is easy to feel a little outdated when using occam? The quote below describes quite well what we do have when working with occam. Architectural design is compared and contrasted with object-oriented design:
Architectural design is concerned with composing systems from components, and the interactions between these components. Such compositions provide an abstract view of a system, so that the designer can do system-level analysis and reason about system integrity constants. Examples include throughput rates and freedom from deadlock. These distinctive aspects of architectural design highlight several important contrasts with object-oriented design. Although both are concerned with system structure in general, architectural design involves a richer collections of abstractions than is typically provided by OOD. These abstractions support the ability to describe new kinds of potentially complex system glue (or connectors). [Monroe 97]
Occam offers the ability to draw processes and data/command-flow diagrams and implement the same thing directly in occam. Our experience is that use of process, channel and protocol diagrams (circles and textually decorated arrows) and message sequence diagrams (as tables) are able to piggyback on the problem domain description quite well.
Since we considered writing the whole application as one thread (or rather: as no thread), another reason for choosing occam is that making parallel code "usually" leads to a reduced complexity and length of resulting code, as compared to serial code. I have here "inverted" this statement:
Beware also that serialisation can - and usually does - lead to an explosion in the length and complexity of the resulting code. [Welch 91]
The ALT-construct of the occam language is so powerful that it may be considered a program-political issue - it is in itself a good reason to use the language. For readers not experienced in occam I will draw up some lines.
The ALT-construct lets us non-busily 1.) wait for data on a channel or channels and do something when data arrives, 2.) do something at time-out and 3.) alternatively fall through and do something if no data arrives or timeout has not come. Also, we may dynamically enable and disable any of these. In theory the ALT-construct may be non-deterministic.
These are occam language building blocks and not built-in library calls. They are used to build functionality like "wait for the escape-button-channel unconditionally and do something if it is pressed, or any-other-button-channel if the green light is blinking, but light the red light if nothing happens after 1000 ms". Since occam channels are synchronous and blocking, the any-other-button-channel sender process will non-busily wait if the green light on the receiving process is not blinking, and come through when it starts to blink.
In our system, almost all processes hang on an "ALT" as the first statement after the "WHILE TRUE" that defines the "forever loop".
The occam ALT-construct cannot timeout on a failing channel output. This functionality may be covered with memory-mapped I/O or calls to low-level native procedures.
When writing programs in something as "narrow" as occam, we must address the issue of reuse.
We have to differentiate between internal reuse, design with reuse, design for reuse and which level (analysis, design, implementation - or even person) we want reuse. There is a common belief that as much reuse as possible is, at least, an ideal goal.
We do not have much internal reuse. Some processes are started more than once, some functions used several times. Had this been implemented in Java, where inheritance is the primary reuse mechanism, I suspect there would be little inheritance since functionality is quite unrelated.
Design with and for reuse is in real organization often represented by the programmer's experience in building data-flow based systems and the associated patterns hidden in the occam code or in the programmer's head, and his ability to reuse that knowledge. How to use ALTs, how to avoid deadlock, how to break down system functionality into processes. The structure of an occam program may be mentally "built" with the unstructured select, fork, join, sendMessage, etc. in other systems. Therefore it will not be a substantial task to port occam programs to other task/data-flow architectures.
Siemens has found that a reused software component only pays back after its fifth use [Mrva 97]. The most "important" tasks were written in C: the "ullage calculator" DSP routine (this is the code we do not show to others), the volume handler and the alarm handler. The file system server and flash driver, the unpack and pack functions, the event handler, the dual-port driver and the data-acquisition were all written in occam; with bottom-most C-functions for memory-mapped I/O for DMA, dual-port and flash. There is so much occam "ALT" in this code that writing the subroutines in C would foul the whole purpose of having occam at all. In this respect we ended up writing more code in occam than we first encountered. Also, the positive feedback caused by things that worked increased the number of occam lines we ended up with.
When Aarrestad ported SPoC for us, it was not a goal to use occam on the DSP, rather to learn about how occam maps to C. We were especially curious about how a portable and preemptive real-time scheduler could be written in C, in the belief than anything usable must be preemptive these days. We learnt that it could not be portable, basically because a portable C-program cannot manipulate the program counter and stack pointer. And it not need be preemptive! Since both the occam CHAN and the new "Communicating Java Threads" (CJT) are based on CSP [Hoare 85] let us use CJT's excellent description:
Channels are very close to the scheduler, in fact they may be part of the scheduler. Using channels allows scheduling on communication within the program and without any explicit command from the programmer. Scheduling based on communication results in a very fast non-preemptive scheduling algorithm that is faster than preemptive scheduling. [Hilderink 97]
This is contrary to traditional non-preemptive scheduling where "jobs are executed one at a time to completion." [Brinch Hansen 73]
Whether our application could accept non-preemptive scheduling like SPoC's depended on maximum acceptable user wait time. Our users were seen through the dual-port, where four bi-directional channels were implemented; we only needed to respond within some hundred milliseconds. We therefore set a maximum running time for the never-descheduled "C-tasks" (see later) of 200 milliseconds. This was no problem, for these "C-tasks" could split up functions into several states and return to occam in between (also described later). Timer interrupts and the DMA-channel were always able to run.
Overall the SPoC scheduler met our requirements. However, in a few special cases we did have to insert manual descheduling points. This is explained later.
We did not have to tune the system by setting many priorities (SPoC allows as many priority levels as we would like to have). All processes are high priority except the signal-processing "engine" ("UllCalcServer") which is low priority.
What Concurrent Pascal had in the seventies [Brinch Hansen 75] and occam and Ada had in the eighties now seems to finally become mainstream with Java: the underlying operating system is more or less invisible from the programmer's point of view. With SPoC the run-time system and scheduler is contained in the two "occam2c" files.
A report describing the results of a world-wide survey carried out among 60 industrial embedded systems professionals, 40 researchers and technology providers states that:
Concerning the use of real-time operating systems, the opinions were, to a large extent, similar among researchers, technology providers and industrial practitioners, except that the industrial people thought that the Posix-type real-time operating systems would not become very common. It is remarkable that about half of all the surveyed persons looked for integrated ASIC type of operating systems, whereas only about 10% believed that traditional operating systems would survive during the coming years. [Seppänen 96]
This quote leaves room for interpretation - one could be that after all, languages with built-in process/ task/ thread support have made people aware of their virtues. Now a developer looking for another separate real-time operating system should perhaps think twice.
Do not mistake me for thinking that libraries of some sort are not needed. We do need generic APIs. If our application were to talk in TCP/IP directly from the DSP, it might well be that the TCP/IP-library we could find was intimately intertwined with a real-time operating system. This could have jeopardized our occam solution, if we did not port SPoC to that same operating system.
Within some of the occam tasks we made a single call to a "process-body" written in C. The C-call in one of the tasks looks like this (this is the earlier mentioned signal-processing "engine"):
#C UllCalcServer (
#C (ProtType_a) $protType,
#C (void *const) $envelope);
The "#C" command is one of two ways to call a C function in SPoC (the other way is a prototype definition). "#C" defines in-line C-code. SPoC expands occam names if we prefix them with a dollar-character. So, the generated C-code looks like this:
UllCalcServer (
(ProtType_a) FP->_U92._S118._U93._S121.protType_3605,
(void **const) FP->envelope_3572)
"FP" is the occam task's frame pointer. Here "protType" is situated below two unions and two structs - and "FP" is this task's node in the large struct-tree that holds (almost) all data for all tasks. In C "main" (in file "occam2c.c") we run a single "malloc" on this large struct, this is the only "malloc" needed for the SPoC run-time "occam" code.
The C-function is called from occam, and it is not being preempted by the scheduler, since the scheduler cannot preempt any code. Is has no direct communication on its own with other tasks, all this is done in the surrounding occam task. We soon came up with an occam "pattern" that reflected our needs quite nicely. All the tasks use this same pattern. In the following, let us call the C-function that is repeatably being called by occam a "C-task".
Each "C-task" is internally structured around a switch-case that analyses command/data on the "input channel", does its job and returns command/data on the "output channel". Input data does not survive, as both input and output are stored in the "envelope" array. With this scheme we have been able to define a clean interface between occam and C.
There are two problems with this scheme: (1) it enforces a special way to program the C-function and (2) the protocol definition was seen as a C-struct (with sub-unions and sub-structs) on the C-side and as an INT array on the occam side.
(1) will be handled separately later on. (2) was not difficult to solve, but it introduced a potential safety-hazard - we had to hand-index into the occam array to match the C-struct.
SPoC did not have error-free support of occam 2.1 RECORD [ST 96] to pair with C-structs, having this would have helped a lot. A C-union could then have been implemented with an occam RETYPES. Since all primitive data-types are 32 bits long, mapping any C-struct into an INT-array is in fact viable. Another problem was that when we hand-indexed into the array, quite a lot of unnecessary array bounds checks were introduced. With the large occam module we used (the whole application through #INCLUDEs, more later), switching array bounds checks off could not easily be done, after all we did want most of them!
Occam 2 supports data structuring by encapsulating data inside a process or procedure. To be honest, I have not missed user-defined data structures much during 7 years of occam coding, before this close encounter with C.
Our external clients are (1) a set of NT hosted configuration tools, (2) an NT hosted presentation system and (3) embedded machines that (a) subscribe to data we produce (like fluid "level" and "ullage") and, (b) send data to us (like the ship's "trim" and "list" tilting angles). Our constraints were:
The solutions were:
SPoC automatically inserted communication defined with PROTOCOL between occam processes. With many semicolons in it (that define synchronization points), we got much task switching. Many of the semicolons were actually unnecessary - RECORD support would have been great. SPoC does array communication with the C-function "memcpy". This simply works.
We also needed to send as occam PROTOCOL with data to be sent "unpacked" into an array of 32-bits values. How to do this crystallized out of the work with the unpack-system we developed. With the tabular PROTOCOL description, we unpack and send out at each semicolon the appropriate value or segment, and receive as occam PROTOCOL. If the receiver wants 3, then 4 values, we cannot send 7 in one batch! This generic sender was also coded in occam.
Thinking about it, what we did was something similar to Java "object serialization". We "serialized" the PROTOCOL description by hand with the tabular description, and had occam understand that description. I imagine this is what every occam compiler has to do to resolve the PROTOCOL description. Passing this hand-made description to a task would make a generic buffer task. During my years as an occam programmer I have missed a way to write or have generic buffers-tasks. This facility could have been built into the language as a set of primitive buffer types that automatically buffer any channel. We would avoid occam manual handling of each PROTOCOL tag, just to send data on as it was received.
Occam for the transputer freed the programmer for any stack overflow worries. Since occam is not recursive and does not have dynamic data handling, the exact stack need was calculated by the compiler (just like Concurrent Pascal [Brinch Hansen 75]). Not so with SPoC. Occam local variables that go in and out of scope between two process switches (all FUNCTIONs also go into this group) are declared locally in the generated C-code. This causes them, like parameters, to be allocated on the C stack, which is faster to work with than the struct that SPoC builds.
Subroutines with internal descheduling points are started like tasks and scheduled. This, plus the non-preemptive scheduling, ensure that all tasks are only one call from the scheduler - and no task sits on top of the other, as seen on the call tree. A consequence is that the generated C-code needs only one stack for the whole system, not one per process! Our old Texas MPP-Pascal from 1978 inserted stack checks on every procedure call - it knew parameter passing and called procedure's stack needs, the rest was a piece of cake. The present Texas DSP C-compiler has nothing but a cliff-hanger to offer on the top of the stack!
We were able to rectify to some degree. The DSP stack grows upwards. At power-up we fill a known value into a position close to the stack top. At the single point where the SPoC scheduler is returned to from a process, we check to see whether that value has changed. On change we crash with a SETERR (MSG_STACK_OVERFLOW). This is not 100% fail-safe, as we might not catch all overflows, and we will in any case catch it after it has happened. But in a real-time system based on C-code this is actually quite good!
SPoC is quite flexible when it comes to how STOP and CAUSEERROR may be handled. CAUSEERROR is controlled in a file called "Intrinsics.hdr". Since we have not implemented any form of separate compilation, we have not used CAUSEERROR, instead we have set STOP to do the same.
Since we have compiled with "stop on error", the user driven STOP, and the automatically inserted array bounds overflow checks, IF and CASE errors, and some other error outputs - now call the SETERR macro that we have modified. It blinks with two LEDs and leaves a single byte with status-information in the dual-port memory before it loops around like dead, and the scheduler will run no more. Standard communication over the dual-port by using its hardware semaphores is not used in this case. The Lon-controller functions as the DSP's watchdog - it will discover that the DSP is dead, and read the status byte so that it can inform the clients.
This solution was "plugged into" the solution that came with SPoC.
There were situations where we needed to instruct a process to be descheduled. We had to do this because the hand-written "C-task" sometimes wanted to continue its job without any communication with other tasks. We were not able to use the occam built-in RESCHEDULE() function, because we did not separate-compile the "Intrinsics.hdr" file. (Candidate for future work).
For each task that we needed to deschedule we already had started a tiny task called "BlackHole" that endlessly blocked waiting for an input. To deschedule, then, was to send a value over the connected channel. This descheduling would cause the dual-port polling not to be starved.
We needed to understand further. This list describes the functions that contain "DeSchedule" or "ReSchedule": (Scheduler, WAITP, ChanIn, ChanIn1, ChanIn2, ChanIn4, ChanOut, ChanOut2, ChanOut4, AltWait, WaitOnTimer, End_Process). SPoC will of course not set up a call to any of these functions at the bottom of a replicated SEQ-loop or WHILE-loop, so we had to send a message to BlackHole when needed. We needed to do this less than five place in the code.
Had our system been written in occam only, these manually inserted descheduling points would most probably not have been needed.
The first SPoC version we had was the one compiled by Aarrestad. We used this "occ2c.exe" file for a year. Then, in the middle of 1997 we received a new version from The University of Southampton. This version was an executable file, but it needed MS-Windows DLLs from Cygnus. We downloaded the Cygnus tools, and the system has worked quite well.
From the occam runtime library we have only used a few functions, and we have clipped them from the library sources and compiled them anew. We have not recompiled the whole library, as we have not written any man-machine communication in this product. According to the SPoC authors, this would have been no problem.
We have modified the files "occam2c.h" and "occam2c.c". Understanding and modifying them was not too hard. SPoC replaces "<<HEADER>>" with a C struct-tree for all non-stack based data structures, and "<<CODE>>" with the generated code.
All primitive data-types of the DSP are - the way we have laid out the hardware - 32 bits long. In our "occam2c.h" file, mapping from occam to C is: (INT and INT32 = long int (32 bits), INT16 = short int (32 bits), BYTE and BOOL = char (32 bits), REAL32 = float (32 bits)). Any occam usage of 64 bits values will cause a C-compiler error when the generated C-code is compiled.
If you are considering porting SPoC to other machines, it should not be difficult. You do not need to understand, f.ex. the C data structure for occam CHAN, the queues, or any other kernel-near data structure. We do not! SPoC comes rather complete!
We compiled our code with the ST occam toolset [ST 93] whenever we wanted something that SPoC did not offer. To do this all we had to do was run a "grep"-script that replaced all "#C" with "SKIP --", so that in-line C was removed. We then compiled with "software quality warnings" and "warning whenever a name is descoped".
We have missed variable PLACEment. This means that we cannot write to memory-mapped peripherals in SPoC occam.
The SPoC concept for separate compilation is a two-phase procedure that we have not been able to use, so all code is pulled into a main file with "#INCLUDEs". The original scheme is to compile the generated C-file and make an executable of it, and have it print a file with information about the size of data. The whole idea is to have the target system compiler report about the local meaning of "sizeof". This resolves a non-portable side of C. (This facet of the language is not portable in occam either, since INT is deliberately target dependent!) Since we did not have any "println" available in our target, we have proposed an alternative scheme where we define primitive data type sizes that match our compiler's. This will work provided the C-compiler does a strictly "linear" struct layout. At time of writing, we have not received any version of SPoC that implements this.
We had some problems with the Cygnus-based version of SPoC. On invalid occam it some times crashed with a malloc-error. The first version we used did not have this problem, we sometimes had to revert to it when the Cygnus-based version failed, to find the line of the offense. We also had this same error output on valid occam during the SPoC "tree transformations" phase when references were "miles" from usage. We expect these errors to be fixed along with occam 2.1 support on the next version, due by January 1998.
We have not yet connected an interrupt routine to the input of a channel to make an "event channel". So, at the moment we have to poll the DMA machine to see when it has finished. This is only a slight problem, as it takes 100-200 ms to complete a DMA sequence, and a poll takes 1/1000 that time. At the moment processing takes longer time than the DMA-sequence, so the DMA-machine's wasted idle time caused by polling is masked. We expect to remove polling later on, so that higher throughput may be achieved.
We have implemented the timer interrupt routine to give all TIMERs (any process priority) a tick resolution of 1 ms. On the DSP this gives an overhead of some 0.5%. Modula-2 has something called a "system module" where parameters of this type may be specified. Could occam have something similar? This, and other occam portability issues are discussed in [Cook 93].
Debugging the generated C-programs has been quite straightforward. The Texas Instruments debugger is rather archaic: we have to type the names of all variables, and we cannot inspect the variables of calling procedures. (It does support JTAG connection with target, so we can forgive Texas for a while. And besides, since they have now purchased GO DSP Corporation we expect things to get better!) Since SPoC decorates occam names either in front or at the end (or both), these names do sometimes become quite long. (The struct + union decorations make it even worse.) Even if SPoC variables (parameters and most other in-scope variables) reside in the large struct-tree, their exact location is parameterized. At any one time the call tree contains only one process. This makes it easy to understand what is going on.
Following the logic of the generated state machines is a rather thrilling experience. The layout of the code reads well: f.ex. left-alignments between states do not look ragged because if-then-else statements are broken down to single-line if-tests and intermediate new states. Also, things like the macros needed for ALT are easy to read, without having to understand the algorithms and implementation.
It would probably have been better to use SPoC prototypes, rather than inline C. Resolving pointers in calls was not always easy, as SPoC only has one way to expand an occam name when we prefix the occam name with '$'.
"Lint" generates a lot of warnings, but no errors. SPoC inserts type decorations when needed, this makes Lint complain less, but readability is often decreased.
Even if we have missed some functionality of SPoC, there is a lot of functionality we have not used; like running C generated from occam in a Unix environment and use of sockets, calling other languages than C, and the original separate compilation scheme. And we have not used occam source level GNU debugging.
Writing C-code within the occam / SPoC environment imposed some new constraints as well as easing on others as compared to writing in a pure C-environment. The fact that SPoC is not preemptive implied that the main C-program performing signal processing (UllCalcServer) had to be split up into several pieces, called "atomic units". An atomic unit is a piece of code that runs non-descheduled until it returns control to the calling function. Since we did not want a fragmented system with a large number of C-functions called from occam, the C-program was designed as a state machine defining its behavior. To minimize cohesion between the occam- and C-environments, all state information is contained entirely within the C-environment, leaving occam with no knowledge about the C-program's internal state. Thus occam would call the C-program repeatedly with a command telling the program to step to its next internal state, perform the corresponding tasks and then return to occam. The state machine is implemented with a look-up table defining all valid state transitions. Thus the next valid state is given from the state transition table using present state as entry value, and a switch-case construct branches to the C-code associated with the current state.
A switch-case construct is also used on the top level of the C-program to decode the different protocols allowed on the communication channel ("envelope array") between occam and C. Because the envelope array can only hold one command / reply per call, a command sequence is implemented by means of introducing new states to the C-program. This is quite simple though, as it only requires a modification in the state transition table.
The C-program has data that must survive after returning to occam, since completing the computing cycle requires several calls to the program. The internal data that must survive are defined in a workspace allocated on the system heap using dynamic memory allocation (malloc).
Actually occam calls three unique C-programs (also called "C-tasks") in our system. All programs follow the same pattern with envelope array, protocol decoding, use of internal state machine and storing permanent workspace data on the system heap. Data is exchanged between the different C-programs via the "envelope" using a general scheme leaving occam with no knowledge of the data contents.
Writing C-programs in the occam/SPoC environment has been a learning experience, particularly with regard to designing a system with a clean interface and little cohesion.
After it all has taken place we could ask, if this whole project would have been made, or at all been feasible, if we did not know occam before? One of these days a manager may come and instruct us to use UML on top and Java in the bottom. Few would require the solution we chose.
The main problems we had was with mixing occam and C. We have documented these problems in this article. To answer the question above: we would not have done it if we did not know occam before! But then, we would not encounter embedded Java either without having a knowledge-platform to stand on.
We can also say something about (occam) productivity. The 13000 lines of written occam code is being converted by SPoC to 26000 lines of run-time C-code. Altogether some 19000 lines of occam have been written (the additional is for test code and for several configurations). This has taken 2200 man-hours, all phases included, from conception through analysis, design, coding, testing, system testing and some documentation along the line. Meetings and communication with other people are also included.
The two best reasons for using SPoC are 1.) occam is a superb tool for comfortably writing concurrent programs and 2.) we now can run it on our DSP and almost any processor that has a C-compiler.
The two best reasons for not using SPoC are 1.) occam's state is undecidable (dead?, dying?, alive?, ahead of its time?) and 2.) automatically generated code does, after all, generate an extra level of complexity.
I would absolutely consider doing it again, and some of us (also people outside this project) would love to try it out on a Microchip PIC-processor (which we use in vast numbers) when a good C-compiler becomes available. Programming an embedded 386ex processor is also a possibility, we now use ANSI C plus VxWorks on those machines.
After doing this job we may redesign a transputer-based product with 35000 lines of occam code to run on a Texas DSP. And we would know how to do it.
The "Occam White Paper" section holds all the way to the end, and for any new starts.
Lars Thore W. Aarrestad , "Translation from occam to ANSI-C, realized for a DSP". 1996. Inst. For technical cybernetics, NTH (now NTNU), Trondheim, Norway. "Diploma theses", in Norwegian. See abstract, in norwegian.
M. Ben-Ari, "Principles of Concurrent and Distributed Programming", Prentice Hall, 1990. ISBN 0-13-711821-X, pp.107-108
Per Brinch Hansen, "Operating System Principles", Prentice Hall, 1973, p.195
Per Brinch Hansen, "The Programming Language Concurrent Pascal", IEEE Transactions on Software Engineering, Vol. SE-1, no.2, June 1975
Barry M. Cook, "Some Issues Concerning the Portability of occam Programs", in "Transputer Applications and systems '93", Ed Grebe et al. pp. 1170 - 1180 IOS Press, Amsterdam, ISBN 90-5199-140-1
Mark Debbage, Mark Hill, Sean Wykes, Denis Nicole, "Southampton's Portable Occam Compiler (SPOC)", In: Miles, Chalmers (ed.), "Progress in Transputer and occam Research", IOS Press, Amsterdam, 1994 (WoTUG-17 proceedings), pp.40-55
Gerald H. Hilderink, "Communication Java Threads (CJT), Reference Manual, version 0.9.6", June 1996, p.1. University of Twente, department of Electrical Engineering
C.A.R. Hoare, "Communicating Sequential Processes". Prentice Hall, 1985
"occam2 Reference Manual". INMOS Ltd, Prentice Hall, 1988 (C.A.R. Hoare is series editor) ISBN 0-13-629312-3
"Transputer instruction set. A compiler writer's guide". INMOS Ltd, Prentice Hall, 1988. ISBN 0-13-929100-8, page 139
"LonWorks, Technology Device Data Manual". Motorola. (LonWorks is a registered trademark of Echelon Corporation)
Monroe et.al., "Architectural Styles, Design Patterns, and Objects". IEEE Software, January 1997, pp.43-52 (quote: p.49)
Michael Mrva, "Reuse Factors in Embedded Systems Design", (Siemens AG), IEEE Computer, August 1997, pp. 93-95
Veikko Seppänen et al., "Strategic needs and future trends of embedded software", Tekes technology review 48/96 - (p.56).
SGS-THOMSON 1993, "INMOS D7305 occam 2 Toolset"
SGS-THOMSON 1996, "D7405 Toolset, occam 2.1 Reference Manual", Also at http://www.hensa.ac.uk/parallel/occam/documentation/index.html
Øyvind Teig, "Ping-Pong scheme uses semaphores to pass dual-port memory privileges", EDN, 6th June 1996
Øyvind Teig et al., "The Album Protocol". Autronica internal document, 1996-97.
Texas Instruments, "9900 The Microprocessor Pascal System, User's Guide", 1981. (PROGRAM, PROCESS, PROCEDURE, FUNCTION were all supported. However, only semaphores, not monitor was supported)
"Unified Modeling Language Summary, version 1.1, 1.Sept.1997", chapter 4.1.2., www.rational.com
P.Wegner, "Dimensions of Object-Based Language Design", In Proc. Of the OOPSLA '87 Conf. On Object-Oriented Programming Systems, Languages and Applications, 1987
P.H.Welch et al., "On the Serialisation of Parallel Programs". In: J.Edwards (ed.), "Occam and the Transputer - Current Developments", IOS Press, Amsterdam, 1991 (WoTUG-14 proceedings), p.164.
P.H.Welch, the University of Kent at Canterbury. The "comstime" program, used to benchmark occam programs, referenced first in [Debbage 94]. The same program design has also been used to benchmark Java.
This example has nothing to do with our application, but it does give a flavor of occam and SPoC.
The code example has 1 task that initiates sending data into 99 instanciations of
"Task", and 1 task that terminates the pipeline. The total ANSI-C program code
and variable cost of these 1+99+1 tasks is 424 words (1696 bytes). It takes 20ms to start
all tasks and run the 100 communications. To run the 100 communications alone takes 14ms -
140 us per communication, including task descheduling and rescheduling to loop back to
wait for new inputs in "Task". Tested with DSP TMS320C32 @40MHz and 1 wait
cycle.
PROC Task (CHAN OF INT in, out) -- This task is written as a PROC, most usual.
INT value: -- It just hangs waiting for data, increments it
WHILE TRUE -- and sends it on. Observe that it does not know
SEQ -- who it sends to (unnamed receiver), it only
in ? value -- knows the name of the channel it is sending
-- on.
out ! (value + 1) -- An instanciation of this task never
: -- terminates.
--
VAL NoOfChans IS 100: -- A real constant, as opposed to "const"
-- or "#define"
[NoOfChans]CHAN OF INT chan: -- 100 channels of PROTOCOL INT
PAR -- Specifies 3 tasks, of which one has
-- 99 subtasks
INT value: -- Task 1: This task sends a value into task 2,
SEQ -- it then terminates
value := 0 -- (Starts here)
chan[0] ! value --
PAR i = 0 FOR (NoOfChans-1) -- Task 2-100: 99 instanciations of Task, they
Task (chan[i], chan[i+1]) -- are connected in a daisy-chain, sending into
-- the next task.
INT value: -- Task 101: This receives a value from task 100,
SEQ -- it then terminates
chan[NoOfChans-1] ? value --
value := value + 1 -- (Stops here)
The ANSI C code that is being generated is shown below (suffix numbering of names starts quite high because the occam-code has been inserted into the standard GL-100 occam program somewhere down in the code):
First the generated data-structures:
typedef struct SF_P_Task_3350 tSF_P_Task_3350;
typedef struct SF_P_3356 tSF_P_3356;
typedef struct SF_P_3357 tSF_P_3357;
typedef struct SF_P_3359 tSF_P_3359;
typedef struct SF_P_3361 tSF_P_3361;
struct SF_P_Task_3350
{
tHeader _Header;
tSF_P_GL100_GL100 *Chain;
CHAN *in_3347;
CHAN *out_3348;
INT value_3349;
};
struct SF_P_3356
{
tHeader _Header;
tSF_P_3357 *Chain;
INT i_3354;
tSF_P_Task_3350 _C296;
};
struct SF_P_3357
{
tHeader _Header;
tSF_P_GL100_GL100 *Chain;
tTask _T28[99];
tSF_P_3356 _C297[99];
};
struct SF_P_3359
{
tHeader _Header;
tSF_P_GL100_GL100 *Chain;
INT value_3358;
};
struct SF_P_3361
{
tHeader _Header;
tSF_P_GL100_GL100 *Chain;
INT value_3360;
};
Then the generated code:
static void P_Task_3350 (tSF_P_Task_3350 *FP)
{
while(true)
{
switch(FP->_Header.IP)
{
CASE(0):
GOTO(1);
CASE(2):
INPUT4(FP->in_3347,&FP->value_3349,3);
CASE(3):
FP->_Header.Temp.VINT = (FP->value_3349 + 1);
OUTPUT4(FP->out_3348,&FP->_Header.Temp.VINT,4);
CASE(4):
CASE(1):
if (true)
{
GOTO(2);
}
RETURN();
default: SETERR(MSG_IP);
}
}
}
static void P_3356 (tSF_P_3356 *FP)
{
while(true)
{
switch(FP->_Header.IP)
{
CASE(0):
FP->_C296.in_3347=
*((CHAN**)(((BYTE*)FP->Chain->Chain->_U83._S101.chan_3353)+
(FP->i_3354*sizeof(CHAN *))));
FP->_C296.out_3348=
*((CHAN**)(((BYTE*)FP->Chain->Chain->_U83._S101.chan_3353)+
((FP->i_3354 + 1)*sizeof(CHAN *))));
FP->_C296.Chain = FP->Chain->Chain;
CALL(P_Task_3350,&FP->_C296,1,"P_Task_3350");
CASE(1):
ENDP();
default: SETERR(MSG_IP);
}
}
}
static void P_3357 (tSF_P_3357 *FP)
{
while(true)
{
switch(FP->_Header.IP)
{
CASE(0):
{
INT i_3355;
for (i_3355 = 0; i_3355 != 0 + 99; i_3355++)
{
FP->_C297[i_3355-0].i_3354=i_3355;
STARTP(&FP->_T28[i_3355-0],P_3356,
&FP->_C297[i_3355-0],"P_3356",1);
}
}
WAITP(1);
CASE(1):
ENDP();
default: SETERR(MSG_IP);
}
}
}
}
static void P_3359 (tSF_P_3359 *FP)
{
while(true)
{
switch(FP->_Header.IP)
{
CASE(0):
INPUT4(*((CHAN**)(((BYTE*)FP->Chain->_U83._S101.chan_3353)+
(99*sizeof(CHAN *)))),&FP->value_3358,1);
CASE(1):
FP->value_3358 = (FP->value_3358 + 1);
ENDP();
default: SETERR(MSG_IP);
}
}
}
static void P_3361 (tSF_P_3361 *FP)
{
while(true)
{
switch(FP->_Header.IP)
{
CASE(0):
FP->value_3360 = 0;
OUTPUT4(*((CHAN**)(((BYTE*)FP->Chain->_U83._S101.chan_3353)+
(0*sizeof(CHAN *)))),&FP->value_3360,1);
CASE(1):
ENDP();
default: SETERR(MSG_IP);
}
}
}
{
int TMP0;
for (TMP0=0;TMP0<100;TMP0++)
{
FP->_U83._S101.chan_3353[TMP0] = &FP->_U83._S101.chan_3353_CHAN[TMP0];
INITCH(FP->_U83._S101.chan_3353[TMP0]);
}
}
STARTP(&FP->_U83._S101._T29,P_3357,&FP->_U83._S101._C298,"P_3357",1);
STARTP(&FP->_U83._S101._T30,P_3359,&FP->_U83._S101._C299,"P_3359",1);
STARTP(&FP->_U83._S101._T31,P_3361,&FP->_U83._S101._C300,"P_3361",1);
WAITP(1);