Architectures, Languages and Patterns for Parallel
and Distributed Applications
Mobile Processes and Mobile Processors
Professor David May, FRS
University of Bristol, UK
Abstract: Concurrent processes are a natural and efficient way to write distributed interactive programs for the internet. Important ideas are mobile processes which carry their environments with them, and mobile channels which stretch and contract as the connected processes move.
Technology advances will give rise to increasingly mobile processors, and these will be optimised for multimedia processing and communication. They will require implementations of mobile processes and channels which allow the inter-processor connections to change continually as the processors move.
Abstract: Many of us have dedicated a significant part of our working lives to the transputer, occam and parallel processing. Unfortunately the commercial companies at the source of that technology did not share the same enthusiastic passion and have subsequently dropped their support. The advent of the SHARC from Analog Devices, together with a clear strategy to develop future generations with faster cores and communications, now gives us an opportunity to continue research into parallel processing on REAL hardware and using REAL tools. By embracing the SHARC technology maybe we can make use of the man years of effort that went into the transputer to further the acceptance of parallel processing as a viable solution to a growing number of real world applications. This presentation is intended to be an introduction to some of the aspects of using SHARC's, particularly the hardware and software products from Alex Computer Systems, and to promote discussion on their use in the academic/commercial world.
Abstract: There has been an explosion of interest in serial interconnection techniques over the last five years, as the technology has become mature and the limitations of bus-based interconnects have become more severe.
There are a number of industrial areas in which the need for low-cost, high-speed scaleable interconnect has become acute. These include: multimedia servers, LAN switching hubs, parallel computers, data acquisition systems, industrial control, ATM switching, integrated networking and home networks.
The IEEE 1355 standard was specifically developed for implementation in cheap commodity VLSI and to support low-cost switching for the construction of scalable interconnects. This talk will review its essential features and discuss how it can be used in a variety of applications.
Peter Thompson was an active member of the IEEE P1355 committee, and is now chair of The 1355 Association.
Abstract: Handel-C is a CASE tool that enables a software engineer to target directly FPGAs (Field Programmable Gate Array) in a similar fashion to classical microprocessor cross-compiler development tools, without recourse to a Hardware Description Language. This allows the software engineer directly to realise the parallel-computing capability and specific-to-purpose performance of the FPGA.
The application of FPGAs in computing systems is very diverse. This talk will provide a snapshot of examples to illustrate this diversity. The examples to be described are taken from on-going projects that are deploying the Handel-C toolset for novel application development. The talk will also cover future planned enhancements to the toolset and associated hardware development systems that are scheduled for release in the near future.
Abstract: The article describes how SPoC (Southampton Portable occam Compiler) has been used - together with hand-written C - in Autronica's new GL-100 radar-based fluid level gauge. The final C-code is running on a Texas TMS320C32 DSP. Some 26000 lines of C-code have been automatically generated from the occam sources. SPoC's non-preemptive scheduling filled our needs with a few exceptions. The main problem has been aligning occam 2 and ANSI-C data abstractions. A realtime system based on language support of high-level concurrency abstractions (as opposed to separate real-time kernel and use of library calls without direct language support) is soon to monitor worldwide charging and discharging of oil-tankers.
Abstract: The entry price of supercomputing has traditionally been very high. As processing elements, operating systems, and switch technology become cheap commodity parts, building a powerful supercomputer at a fraction of the price of a proprietary system becomes realistic.
We have recently purchased, in support of both our local and national collaborations, a dedicated computational cluster of eight DEC Alpha workstations. Each node has a 500MHz AXP21164A processor with 256Mb memory running Windows NT 4.0 and cost under 6000 pounds. They are connected by 100Mb/s switched ethernet.
In this paper we discuss some of the issues raised by our choice of processor, operating system and interconnection network. The results we present indicate that the cluster is fully competitive with systems from major vendors for a wide range of engineering and science applications, and at a cost lower by at least a factor of three. Indeed the only current area of under-performance relative to these vendors' high-end offerings is the inter-node network bandwidth and latency. We give some initial results indicating how the network performance might be improved under Windows NT.
Abstract: We aim at building a parallel computer using IEEE 1355 high-throughput low-latency DS link networks and high-performance commodity processors running a standard OS. In this context a DS Network Interface Controller (DSNIC) has been developed. The board's hardware, controlled by FPGA firmware, together with host software provides a CSP based message passing interface between Linux processes. This paper describes how the design and realisation of the DSNIC reflects our aims: low-latency high-throughput inter-process communication. Furthermore, we show benchmark results and their analysis.
Abstract: As part of the ESPRIT MACRAME and ARCHES projects, IEEE 1355 HS Links and their support devices (the RCube and Bullit) have been investigated. Their suitability for use in a large network of processors is being assessed. A description of the HS link and initial experience with the RCube, the packet router device, and the Bullit, the HS interface device, are presented. The construction of the Arches 64 node HS switching network, which is currently in the initial stages, is also introduced.
Abstract: In our approach for developing heterogeneous control systems, we have developed a real-time A/D D/A board called "the Raptor". The Raptor communicates over high speed and highly reliable DS-links (IEEE-1355). To obtain highly accurate analogue conversions, the A/D and D/A converters have a 12-bit resolution. We measured a maximum sampling frequency of 90.5 kHz on each A/D channel. The maximum sampling frequency of each D/A channel has been measured to be approximately 145 KHz. For communication with the rest of the control environment, two 100 Mbit/s DS-links are available. A data transfer rate of 5,57 Mbyte/s has been achieved on each DS-link adapter. The Raptor forms a part of a heterogeneous multiprocessor closed-loop control environment. This new environment can be used, amongst others, for controlling heavy robot applications. The work on this environment takes place in scope of the JavaPP (Java Plug & Play) project. The software will be developed together with the CJT-library that provides inherent object-oriented and parallel design patterns, according to the CSP paradigm, in Java.
Abstract: Recent developments in hardware message routing devices have demonstrated significant performance benefits for parallel processing networks. This work describes a system which uses a single chip interface between the high performance StrongARM processor and the existing ICR C416 message routing chip. The ICR C416 is a non-blocking communications routing device. Each device allows concurrent communications with up to 16 processors. A distributed parallel processing system can be constructed using the StrongARM and ICRC416 devices, with features similar to that of a transputer system but with the benefits of the higher clock speed and cache memory of the StrongARM microprocessor.
Abstract: A flow is proposed which offers a programming approach to the systems design of application specific micro-controllers. This flow is based on Handel-C, an occam-based language with C-like syntax for hardware compilation. Tools have been developed for compilation and concurrent simulation (co-simulation) of hardware and software parts of a system, and a reconfigurable board has been designed which can be used for rapid prototyping of the application specific micro-controller. The final design can be compiled into a structural VHDL netlist for a standard cell ASIC process.
Abstract: This paper presents an algorithm for checking that a CSP process satisfies a specification defined by a boolean-valued function on its traces and refusals -- i.e. that "P sat f(tr, ref)". This is contrasted with the refinement approach, as implemented by the FDR tool, of checking that one CSP process is a possible implementation of another -- i.e. that "P ]= SPEC".
Abstract: CSP, timed or untimed, has not included a general rigorous treatment of priority, although the PRI ALT constructor is an essential part of occam. This paper treats a generalization of PRI ALT in the form of a prioritized external choice. PRI PAR is also included. The extended language is called CSPP.
A new approach to a denotational semantics is introduced, although only the simplest model is outlined. The work is intended to provide a solid rigorous foundation for hardware-software codesign. And a companion paper describes untimed HCSP which is a further extension of CSP built upon these foundations: it was first presented informally at the Twente WoTUG--20 technical meeting.
Abstract: HCSP is a variant of CSP adapted to capture the semantics of hardware compilation, among other purposes. It extends CSP in several ways: it includes priority; events can be combined; new synchronization constructors are introduced; and state is explicitly modelled. Including state permits the treatment of shared memory as well as message passing systems. A possible denotational semantics is included here thus allowing proper treatment of such systems. Although most of these extensions were motivated by the needs of hardware compilation, HCSP can be applied more widely including software and thus can form the foundation of a codesign language. HCSP is an extension of CSPP: familiarity with CSPP is assumed here.
Abstract: This work reviews systems for designing parallel programs, together with considerations for their debugging, tuning and optimisation. The development of such systems is complicated and labour-intensive. Despite this, over the last few years many interesting and various projects have been developed that are capable of giving effective support for program design for parallel architectures. In this report, the modern state in this area is analysed and the various approaches are compared.
Abstract: occam is a high level language with constructs for generating explicitly concurrent processes which communicate using channels. In this paper we present our methodology for developing an optimising occam compiler. This consists of a framework to represent concurrency and the semantic properties of an occam program that enables efficient process optimisations, inter-process optimisations, and inter-procedural optimisations to be performed. Furthermore, we tackle the issue of retargeting the optimising occam compiler for different processors of the transputer family.
Abstract: This paper describes work based around oc-X, a PowerPC port of KRoC (the portable occam compiler). As well as the basic port a multiprocessor run time system is described providing services for user programs, including efficient occam channels between distributed processes, natural access to host file systems and TCP/IP network sockets. Optimization of target assembly code is discussed, with methods for removing inefficiencies introduced by the KRoC translation process.
Abstract: The transputer instruction set and its symbolic representation are reviewed. An alternative representation named ETC-code, suitable for an intermediate representation in a retargetable occam compiler, is motivated and described. The translation of such a language into a variety of alternative target languages is discussed. Its use as a representation for programs whose target processor type is not yet known is proposed.
Abstract: The alternation construct in occam provides a form of binary selective communication to the cooperating tasks of a concurrent computation. The use of this construct could lead to increased responsiveness and efficiency of concurrent programs. However, the expressiveness of the construct is restricted in the sense that only two parties can be involved in a communication. We extend the current implementation of the alternation construct to accept an arbitrary number of channel inputs such that multiway (as opposed to binary) selective communication is made possible. A new construct called multiway alternation, MALT, is proposed for occam and is implemented for transputer hardware.
Abstract: In this paper a parallel, pipeline oriented version of a well-known sequential graph coloring heuristic is introduced. Runtime and speedup results of an implementation in JAVA on a four processor machine are presented and discussed.
Abstract: The first Brazilian microsatellite will be launched at the middle of 1998. The on-board computer, named Trisputer, will play a major part in the mission, since it will perform essential on-board functions, such as, guidance, control of the on-board instrumentation, telemetry/telecommand, and control of some on-board scientific experiments. The Trisputer is a fault-tolerant multiprocessor computer with high reliability, when comparared to such sytems as TMR, and Duplex. This paper describes the conception and implementation of the hardware of this computer, well as it shows its relialibity model.