Communicating Process Architectures

Communicating Process Architectures - 2001

Sunday 16th. September (evening) through Wednesday 19th. September (lunchtime)

Academic Programme

[Note: the categories listed below are not definitive - many papers fit into several]

Invited Talks:

Computing without Computers
Stephen Chappel (Celoxica, UK)

Abstract: To follow...

Beyond the Transputer
David May (University of Bristol, UK)

Abstract: To follow...

Theory:

Successes and Failures: Extending CSP
Adrian Lawrence (University of Oxford, UK)

Abstract: Standard CSP, timed or untimed, does not include a general treatment of priority, although the PRI ALT constructor is an essential part of occam and hardware compilation languages based upon occam. This is a revised version of the original paper which introduced CSPP, an extension of CSP incorporating priority. CSPP is defined by a novel denotational semantics, Acceptances, based on Successes rather than the usual Failures. The idea is to characterise a process by what it successfully accepts, rather than by what it refuses to do. In the light of experience, it might better have been called "Responses". The original Acceptances was exploratory, and tried to avoid constraining the sorts of systems, particularly circuits, that could be described. Experience has shown that it can be substantially simplified at very little cost. A new notation makes it much easier to follow, especially for the non specialist. This revision of the original introduction presents the simplified CSPP while retaining most of the motivational material. It is intended to have something of a tutorial flavour: three other papers, are more condensed, and deal with more technical matters. But the core semantics is common to all four. CSPP provides a rigorous comprehensible and simple foundation for compositional hardware-software codesign. HCSP is a further extension which includes extra facilities needed to describe certain circuits. And a further radical extension lifts the usual restrictions of timed CSP, and describes continuous analogue phenomena. CSPP was first presented informally at the Twente WoTUG--20 technical meeting.

CSPP and Event Priority
Adrian Lawrence (University of Oxford, UK)

Abstract: CSPP is an extension of CSP which includes priority as used in standard occam. The occam community has often discussed whether those notions are really adequate, a particular concern being the difficulties associated with priority inversion. One idea is to give priority to sets of events considered independently of the processes that perform them. We call this "event priority". Event priority is static in this presentation. But it is possible to handle dynamic priority using a global synchronisation when the "event priority" changes. Obviously there is no problem for a software system with a central scheduler, but the theory here is addressing a far wider class of systems, in particular massively parallel, widely distributed, implemented in either hardware or software or both. It may be that some higher level of abstraction should replace priority: priority is a mechanism for achieving certain properties, often relating to time and limited resources. Here we content ourselves with finding a formal description of a language including event-priority.

Infinite Traces, Acceptances, and CSPP
Adrian Lawrence (University of Oxford, UK)

Abstract: There is a long standing problem when infinite traces are included in denotational semantic models of CSP. Full models fail to be Complete Partial Orders under refinement. This paper introduces a novel, but entirely natural, way of representing infinite behaviour in which refinement is a Complete Partial Order when the alphabet of events is finite. Acceptance semantics also solves the problem of infinite behaviour with an infinite alphabet. That requires a different construction based on a metric space and will be described elsewhere.

Efficient Execution of Process Networks
T. Basten (Eindhoven University of Technology, Netherlands )
J. Hoogerbrugge (Philips Research Laboratories, Netherlands)

Abstract: Kahn process networks (KPNs) [1] are a popular modeling technique for media- and signal-processing applications. A KPN makes parallelism and commu-nication in an application explicit; thus, KPNs are a modeling paradigm that is very suitable for multi-processor architectures. We present techniques for the efficient ex-ecution of KPNs, taking into account both execution time and memory usage.

Practice:

Copying, Borrowing, and Moving Semantics
David May and Henk Muller (University of Bristol, UK)

Abstract: In this paper we discuss primitives for mobilising code and communications. We distinguish three types of semantics for mobility: copying (where an identical copy is created remotely), moving (where the original is destroyed), and borrowing (where the original is moved to the target and back to where it came from at defined moments). We discuss these semantics for mobile code and mobile channels. We have implemented Icarus, a language that uses borrowing semantics for mobile code (the {on-statement) and moving semantics for mobile channels (first class channels).

Mobile data, dynamic allocation and zero aliasing: An occam experiment
Fred Barnes and Peter Welch (University of Kent, UK)

Abstract: Traditional imperative languages (such as C) and modern object-oriented languages are plagued by uncontrolled resource aliasing problems. Add in concurrency and the problems compound exponentially. Improperly synchronised access to shared (i.e. aliased) resources leads to problems of race-hazard, deadlock, livelock and starvation. This paper describes the binding into occam (a concurrent processing language based on CSP) of a secure, dynamic and efficient way of sharing data between parallel processes with minimal synchronisation overheads. The key new facilities provided are: a movement semantics for assignment and communication, strict zero-aliasing, apparently dynamic memory allocation and automatic zero-or-very-small-unit-time garbage collection. The implementation of this mechanism is also presented, along with some initial performance figures (e.g. 80ns for mobile communication on an 800 MHz Pentium 3). With occam becoming available on a variety of microprocessors for GUI building, internet services and small-memory-footprint embedded products, these capabilities are timely. Lessons are drawn for concurrency back in OO languages - and especially for the JCSP (CSP for Java) package library.

From safe concurrent processes to process-classes
Oyvind Teig (Kongsberg Maritime Ship Systems, Ship Control (KMSS-SC), Norway)

Abstract: This article expands a concurrent language to support implementation inheritance by making block structures of the super process-class pluggable, and to interface inheritance by making the language's protocol inheritable. The parallel "object-based" concurrent language occam 2 has been used as a catalyst for the concept, causing the language in fact to become (almost?) "object-oriented" (OO). The result is white-box reuse between a "process-class" and its sub process-class, and black-box reuse seen from the client side. Since occam is aliasing free and considered a "safe" concurrent language, the expansion we discuss here keeps those properties - somewhat unusual for an OO system. This new language should be well suited in safety critical systems, since it has inherited the static (compile-time) and analysable properties from occam proper. Basically, two new keywords are defined: PLUSSING and ROLLING. The language feature suggestion is on sketch level only and therefore not complete, no BNF description exists and no compiler has been written.

Event-Based design of concurrent programs with Java implementation
Hans Rischel and Hongyan Sun (Technical University, Denmark)

Abstract: A systematic design approach to safety-critical systems is introduced by means of the Production Cell case study. The design is documented using CSP-style processes, which allow verifications using formal techniques, as well as programming in Java using the JCSP library.

The 40Gbit/s Network Processing Overview
R. McConnel and P. Winser (ClearSpeed Technology Ltd.)

Abstract:As the Internet evolves, the rapidly increasing demand for bandwidth is matched by a greater need for more intelligence with which to manage and meter the flow of that data to sustain economic growth. Conventional processing architectures and hardwired point solutions are not suited to these conflicting demands; there is an emerging need for a new approach for this data flow processing problem. This paper presents ClearSpeed's integrated Network Processor design platform that embodies many different levels of parallel processing. Designed to balance the bandwidth needs with programmability we introduce the MTAP architecture. An area and power efficient, fine-grained, scalable, multi-threaded parallel processor, designed with a 'bandwidth-centric' architecture and programmed in C. Based on the ClearConnect™ bus, an SoC communication architecture with VCI compliant interfaces, a high-bandwidth system architecture including a number of hardware accelerator units is also described. An example 40Gbit/s programmable and scalable classifier/forwarder is presented, embodying the concepts of the platform. To complete the picture, a comprehensive suite of software and hardware development tools is described.

Hardware Infrastructure:

Protocol Verification in Millipede
Jan Pedersen and Alan Wagner (University of British Columbia, Canada),

Abstract:

Using two-, four-, and eight-way multiprocessors as cluster components
B. Vinter (University of Southern Denmark, Denmark),
O. Anshus, T. Larsen and J. Bjorndalen (University of Tromso, Denmark)

Abstract: This work considers the pros and cons of different size SMPs as nodes in clusters. We investigate the Intel SMP architecture and consider the potential of and some problems with larger node-sizes in clusters of multiprocessors. Six applications that represent different classes of parallel applications are developed in versions that support both shared and distributed memory. Performance measurements are done on three different clusters of multiprocessors, with the purpose of identifying how the number of processors in each SMP node impacts the cluster performance. Our results show that clusters using higher order SMPs do not give a clear performance benefit compared to clusters using two-way SMPs. Off the bench mark-suite of six applications, the performance of two turn out to be independent of node-size, two show an advantage of larger node-sizes, as much as 34% improvement of eight-way nodes over a dual-system, while the remaining two show an advantage of dual-processor nodes as big as 11% over an eight-way cluster.

Analysis of communication protocols for use in multiprocessor routing networks
S. Triger, B. C. O'Neill and S. Clark (Nottingham Trent University, UK)

Abstract: Inter-processor communications play a vital role in the performance of distributed parallel networks. This document analyses various protocol options for high speed, serial, inter-processor communications for use in embedded multiprocessor systems. The protocol is used to pass information in a fault tolerant routing network. The work has resulted in a custom processor interface, which forms the gateway between the processing node and the distributed communications network, building on previous work. The role of the interface is to provide maximum hardware support for communications between the network and the processor. This alleviates the processor from having to oversee communications transactions and allows it to concentrate on program execution. The research focuses on protocol alterations aimed at improving the fault tolerant aspects of the design when dealing with various forms of network failure. By improving the fault tolerance of the network, communications bottlenecks can be avoided and data throughput can be maximised. Previously, communications across the network had used the OS Links protocol and, experimentally, DS Links. These were analysed and adapted to provide the current communications protocol, which is compared with these protocols. This protocol is more efficient, and helps to provide many features to ensure data integrity.

A Reconfigurable host interconnection scheme for occam based field programmable gate arrays
Roger Peel (University of Surrey, UK)

Abstract: This paper reports on the development of an interconnection scheme for field-programmable gate arrays (FPGAs). These FPGAs may be programmed in the Occam parallel programming language. Now, not only may the inter-process communication channels provided by Occam be used on-chip, but they may also be extended to a host processor using the ubiquitous Universal Serial Bus (USB). Bidirectional channels of BYTEs are carried along this bus to a host processor (running Linux) where they are presented to application code using a device driver that provides similar capabilities to the standard B004 card link driver. A unidirectional end-to-end throughput between Linux processes and FPGA processes, across USB, has been measured as high as 1025 kbytes/sec, although this rate is only achieved in favourable circumstances. Similarly, 410 kbytes/sec may be transferred in both directions simultaneously. Unidirectional transmission rates of more than 600 kbytes/sec, and bidirectional rates of 175-300 kbytes/sec in each direction may be achieved in a wide range of circumstances. The paper presents a range of performance figures, explaining which are limited by the underlying characteristics of the USB bus and which are caused by the current implementation. By implementing a transputer OS-Link in the FPGA, it is possible for a USB- enabled computer to communicate with a network of transputers, providing a convenient - and potentially faster - alternative to previous methods.

Guaranteed message delivery time on real-time distributed systems
Ting-Yu Yang and G. S. Stiles (Utah State University, USA)

Abstract: Real-time systems require guaranteed timely delivery of messages; failure to deliver a message on time may result in failure of the system and possible damage to life and property. We describe here an extension and implementation of an algorithm developed by Kandlur et al. for guaranteed message delivery. We extend this by adding two-phase randomized routing in the channel establishment procedure; this scheme requires each route to go first to a randomly chosen intermediate node, and only then to its actual destination. This scheme balances the load demonstrably better than direct routing, with respect to the likelihood of acceptance of each individual channel. Given certain constraints on the generation and size of messages, it is possible to schedule those messages such that messages arrive within their desired deadlines. Experiments on an 8-node network demonstrate the feasibility of the approach, and provide verification of Kandlur's algorithm.

The uniform heterogeneous multi-threaded processor architecture
D. Towner and D. May (University of Bristol, UK)

Abstract: Multi-threaded processor architectures are capable of concurrently execut-ing multiple threads using a shared execution resource. Two of their advantages are their ability to hide latency within a thread, and their high execution efficiency. Un-fortunately, single thread performance is often poor. In this paper we present a simple model of a multi-threaded processor, and show how an occam-like language may be compiled into fine grained threads suitable for executing on this processor. These fine grained threads allow all but the most serial programs to be compiled into multiple threads. Thus, poor single thread performance is avoided by ensuring that sufficient threads are always available, even at the instruction level. We call this technique ‘uni-form heterogeneous multi-threading’ (UHM). A compiler implementing UHM has been built, along with a cycle accurate simulator of a UHM processor. We demon-strate that the processor is capable of good performance, whilst being simple to design and build.

Software Infrastructure:

transx86 - an optimising ETC to IA32 translator
Fred Barnes (University of Kent, UK)

Abstract: This paper describes tranx86, a program which converts Extended Transputer Code (ETC) from a modified Inmos occam compiler, into IA32 code for execution on the Intel i386 family of processors within the KRoC/Linux system. Several optimisations are employed in an attempt to maximise performance on this family of processors, including optimisations in the CCSP run-time kernel. These include a graph-colouring type register allocation scheme and various inlining of code. While tranx86 is mostly architecture dependent, effort has been made to allow the use of arbitrary schedulers, although currently CCSP is the only fully supported one. Various benchmark programs are used to compare the performance of this translator with the old system, giving significant time wins in some cases. For the commstime benchmark program on an 800 MHz Pentium-3, the old KRoC/Linux system gave 233 ns per communication (2 context switches); the new one, with optimisations and inlining, gives 67 ns per communication -- more than a 3-fold reduction in overheads.

Towards a Viable Alternative to OO - extending the occam/CSP programming model
Tom Locke (University of Kent, UK)

Abstract: Object orientation has become the de facto standard for large scale, general purpose software engineering. In this paper various aspects of object orientation that are against good software engineering practice are highlighted. It is then argued that a communicating process model provides a better platform for component based programming without the discussed pitfalls of OO. At the same time, current CSP based programming technology is shown to be seriously lacking when measured against certain aspects of object oriented languages. This paper is chiefly a discussion of ideas, ideas about extensions to the occam/CSP programming model that could advance the paradigm to the point where it provides a viable alternative to object orientation for general purpose, large scale software engineering. Specifically, three ideas are discussed: mobile processes, polymorphism and routable variant channels.

The agreement problem protocol verification environment
J. Pascoe, R. Loader and V. Sundera(University of Reading, UK)

Abstract: This paper proposes the Agreement Problem Protocol Verification Environment (APPROVE) for the automated formal verification of novel solutions to agreement problems in group communication systems. Agreement problems are characterized by the need for a group of processes to agree on a proposed value and are exemplified by group membership, consensus and fault-tolerance scenarios. Due to their fundamental role, it is important that the correctness of new agreement algorithms be verified formally. In the past, the application of manual proof methods has been met with varying degrees of success, suggesting the need for a less error prone automated approach. An observation concerning previous proofs is that often a significant amount of effort is invested in modeling themes common to all such proofs, albeit using different formalisms. Thus, the APPROVE project aims to address these issues, its envisaged culmination being a usable software framework that exploits model re-use wherever possible.

A programming language for hardware/software co-design
Douglas Watt and David May (University of Bristol, UK)

Abstract: We have developed a programming language that allows programs to be expressed as single specifications in which any number of processes may be tagged for hardware compilation and the rest are compiled into software. We introduce a number of novel transformations that may be arbitrarily applied to an occam process in order to decompose it into two semantically equivalent concurrent processes. Our compiler targets hardware by compiling one of these processes into a field programmable gate array and the other into x86 object code. Furthermore, the compiler integrates a specialised communications protocol between the two programs that consists of a full-duplex channel implementation, multiplexor and buffers that are dependent on the program structure and that guarantee all external communications are free from deadlock. We demonstrate the elegance of our language and the power of our compiler on a small benchmark program.

Applications:

Parallel genetic algorithms to find a near optimal schedule to functionally distribute tasks on multiprocessor architectures
Michelle Moore (Texas A&M University, USA)

Abstract: Parallel genetic schedulers (PGS) are applied to a combinatorial optimisation problem, the scheduling of multiple, independent, non-identical tasks. The tasks are functionally partitioned and must be distributed over a multicomputer or multiprocessor system. As each task completes execution, a result message must be communicated. Communication occurs over a shared bus. This problem is known to be NP-complete [1]. The PGS execute on a shared memory multiprocessor system and on a simulated SIMD torus. Schedules produced by the PGS are compared to each other, to those found by an exponential-time optimal branch and bound algorithm, and to those found by a linear-time opportunistic algorithm. The PGS produce extremely accurate schedules very quickly. When the PGS are executed with increasing numbers of processors, near linear speedups are obtained with no decrease in the quality of the resulting schedules.

The Future:

Towards a successor for Occam
Ian East (Oxford Brookes University, UK)

Abstract: occam [1] offers features and attributes that make it unique among programming languages, particularly in the ease and security with which one may program concurrency. After a brief summary of occam's strengths, possible additional features are discussed, including recursion, source code modularity, exception response, and the automatic avoidance of deadlock. Consideration is then given to the inclusion of passive ("data") objects and the possibility of their movement between processes. Transfer primitives are proposed, alongside assignment and communication. Discussion is presented with regard to the potential for a new programming language, building on occam, while preserving its security and simplicity.

CHANnels to deliver memory
Oyvind Teig (Kongsberg Maritime Ship Systems, Ship Control (KMSS-SC), Norway)

Abstract: Memory objects are assigned to processes over a CHANnel like construct. This way one can wait for an object indefinitely, or with timeout in an ALT construct - coexisting with CHANnel inputs. The run-time SYSTEM will handle requests. Alternatively, a user memory handler process may use the underlying SYSTEM and serve other clients. Occam 2 is used as catalyst language.