WoTUG - The place for concurrent processes

Refer Proceedings details

%T Message routing systems for transputer based parallel computers
%A Domenico Talia
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X An efficient communication system is an essential component
   of a transputer\-based parallel computer. In the last years
   many message routing systems for transputer networks have
   been developed. They allow data exchanging among processes
   mapped on transputers not directly connected. This paper
   surveys and compares some of these routing systems with
   respect to several criteria, such as deadlock freedom,
   adaptivity, network latency, livelock freedom, and

%T An environment for investigating the effectiveness of process migration strategies on transputer\-based machines
%A Joe Philips, Rosemary Candlin
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X This paper describes an experimental system which can be
   used to study and compare the behaviour of different process
   migration strategies for occam programs running on
   transputer machines. The approach taken involves loading the
   code and data for every process onto every transputer.
   Processes can then be enabled and disabled to reflect the
   initial placement and subsequently to reflect process
   migrations. While simplifying the implementation process
   this means that the entire program must fit onto a single
   transputer. A statistics collection mechanism has also been
   implemented to enable intelligent migration decisions to be
   made. The system has been verified using a random migration
   strategy on several candidate programs.

%T Evaluation of a set of message\- passing routines on transputer networks
%A Wentong Cai, David B. Skillicorn
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X A major obstacle to the use of parallel computers in
   ordinary applications, where their price/performance ratio
   should make them attractive, is the sheer difficulty of
   parallel programming. One approach which can ease the
   difficulties is data parallel programming, because of the
   simplicity of a single\-threaded flow of control. Data
   parallelism also expresses parallelism with enough
   regularity to be readily implemented across a range of
   machine types. In this paper, we describe a data parallel
   model based on a set of second order functions from the
   Bird\-Meertens theory of lists, demonstrate the
   implementation of these functions as a set of
   message\-passing routines, and evaluate their performance on
   transputer networks configured as hy\-percubes.

%T Performance modelling of a parallel meural network simulator
%A Tom Tollenaere, Dirk Roose
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X A model program structure is presented for parallel
   applications with local interactions between the data
   elements, such as neural networks simulations and the
   solution of partial differential equations. The performance
   of this model program is analyzed both theoretically by
   means of classical performance models, and experimentally
   using a parallel neural network simulator program. The
   program runs on a Meiko transputer array, and uses the Meiko
   CSTools libraries for its communications. The comparison of
   both analyses allows to predict applications performance on
   new and other machines, and indicate what parts of an
   application are worth optimizing. Moreover, it is shown that
   classical theoretical models not always capture the behavior
   of a real machine.

%T Farming: Towards a rigorous definition and efficient transputer implementation
%A Warren Day
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X The technique of the processor farm has become a very widely
   used for parallelising applications, often being mentioned
   without reference to any source.The goal of this work has
   been to put together a complete and rigorous understanding
   of what the technique can be used for and what is needed in
   order to arrive at an efficiently farmed application. This
   paper consists of these two parts.We have shown, via the
   UNITY theory of programming, that the basic structure of the
   processor farm may be used to parallelise a much wider
   domain of applications than has generally been
   considered.Second, we show by example, how to build
   efficient implementations for the first generation of INMOS
   Transputers. This work is new in that it is the first that
   has been able to test farming harnesses by taking an
   abstract view of the application.This paper has been written
   in a semi\-"instruction manual""
   style. Also it should serve as an introduction to the

%T Porting the 3L Parallel C environment to the Texas Instruments TMS320C40
%A Alan D. Culloch
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X The TMS320C40 (\[rs]C40) is a transputer\-like parallel
   processor from Texas Instruments. It is an order of
   magnitude faster than the T800 transputer. Parallel C is a
   popular programming environment for the transputer. The
   properties of both the \[rs]C40 and Parallel C are described
   and the significant differences between the \[rs]C40 and the
   transputer are pointed out. The techniques used to overcome
   these obstacles to porting Parallel C are presented. These
   include building a new real\-time kernel and reusing
   existing software packages from industry and academia. The
   suitability of the \[rs]C40 for parallel applications is

%T Transputer based adaptive signal processing
%A John J. Soragham, Woon S. Gan, Kwong H. Goh, Robert W. Stewart, Tariq S. Durrani
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Transputer based adaptive signal processing systems are
   considered. Efficient use of data communication networks
   requires adaptive equaliser structures that are efficient
   and have fast mean square error (MSE) convergence rates. The
   transputer based non\-canonical least mean square (NCLMS)
   algorithm is implemented using a variant of the standard
   finite impulse response (FIR) filter,called the
   non\-canonical FIR (NCFIR). Simulation results are given
   which show areduced excess mean square error level and an
   improved performance in an impulsive noise environment for
   the NCLMS over the conventional least mean square (LMS)
   algorithm. Simulation results comparing the LMS and NCLMS
   are presented.The equaliser structure based on the Kalman
   Filter has convergence rates that are independent of the
   channel\[rs]s characteristics. A transputer based Kalman
   Equaliser and fast Kalman Equaliser are described. Speed\-up
   curves for a variety of topologies for both systems are

%T A transform accelerator for a transputer system
%A C. J. Dodge, P. G. B. Ross, P. E. Undrill, Alastair R. Allen
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X A DSP based image transform accelerator for a transputer
   system is described. The formal specification language Z has
   been employed in the accelerator design, examples of which
   are presented including aspects of the refinement process
   and some of the problems encountered in working with a
   combined hardware/software specification.

%T A transputer based active vision system
%A Andrew B. Smith, Peter H. Welch
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X The visual detection and tracking of moving targets is
   computationally intensive and any but the most simple tasks
   are beyond the ability of single processor architectures.
   Many algorithms that are parallel in nature have been
   suggested for the recognition and tracking of moving
   targets. Implementing these algorithms on parallel computer
   systems requires the transcription of a massively parallel
   architecture onto a system with fewer physical processors.
   This, of course, is a much easier transformation than the
   reverse process: distributing a serial algorithm effectively
   over multiple processing elements.This paper describes a
   parallel Transputer implementation of a vision tracking
   system. The system is able to track a designated target by
   the physical movement of the camera. The camera is mounted
   on a pan and tilt unit and its movement and lens are under
   full control of the system. The system is modular in nature
   having been designed in three stages: i) the pre\-processing
   of the image and extraction of edge information, ii) the
   control of the focus and gain of the lens, and iii) the
   detection and tracking of moving targets. The system
   operates in real\-time (i.e. 25 frames/second).

%T SYDAMA\-2: a heterogeneous multiprocessor system for real time image processing
%A Dieter Stokar
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X In this paper we will present the architecture of SYDAMA\-2
   and its programming environment as well as early experiences
   gained in developing applications with it. The goal of this
   project was to find and develop an architecture that was
   capable of executing entire applications in the domain of
   real time image processing. The architecture consists of two
   parts: one for low level preprocessing and one for
   intermediate and high level postprocessing.The preprocessing
   subsystem is based on the direct mapping of static dataflow
   graphs onto hardware. While highly specialized processing
   elements can be implemented for reasons of efficiency most
   of them consist of lookup tables for the sake of their
   processing speed and flexibility. All of them are controlled
   by a transputer that does the housekeeping.The processing
   elements are interconnected through a communication network
   which is realized as a pipelined multichannel ringbus that
   is fully reconfigurable on the fly. The bandwidth is large
   enough to carry several video streams and because the buses
   can be subdivided at every stage, the overall bandwidth
   effectively scales with the number of stages (processing
   elements).The postprocessing subsystem consists of a
   standard off the shelf transputer network that is closely
   connected to the low level subsystem.The programming
   environment consists of a number of tools that cover the
   different stages of programming an application: The low
   level programming interface for the image processing
   subsystem, a configuration tool and the runtime support that
   controls and interconnects the different tasks.

%T General purpose parallel computers: a standard architecture with a standard programming interface
%A Geoff Barrett, Eric Barton, Trevor Carden, Dominique Duval, Denis A. Nicole
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Recent developments in the area of high performance
   computing are pointing the way to a standard architecture
   for parallel computers. This architecture contains a number
   of medium\-cost processing elements which communicate with
   each other through a high\-bandwidth, low\-latency
   interconnect. The design of the interconnect eliminates the
   concerns of "locality" which are current
   in the programming of present\-day machines. This
   "flat" topology and common architectural
   model lead to increased opportunities for establishing
   portable software for high performance computing. The Esprit
   GP\-MIMD project has exploited these opportunities by
   developing the architectural model and denning a programming
   interface for software which runs efficiently on machines
   with a range of processing power.

%T An efficient multi\- priority scheduler for the transputer
%A K. M. Shea, M. H. Cheung, Francis C. M. Lau
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Multi\-priority scheduling is essential in a spectrum of
   applications especially those involving real time. We have
   extended the hardware scheduler in the transputer to support
   multi\-priority scheduling. We did it by implementing a
   layer of provably safe and efficient queue manipulation
   primitives and a "plug\-in" data structure
   for process queueing on top of the original scheduler. For
   optimal performance, different data structures for queueing
   may be plugged into our scheduler to suit different
   application domains. We tested our scheduler with different
   process loads (up to 200 processes) and the performance is
   excellent: overhead due to the scheduler accounts for less
   than 1% of a timeslice on a T8.

%T Towards an adaptable scheduler for real\-time system
%A Celio Estevan Moron, Hussein S. M. Zedan
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Issues for designing adaptable real\-time scheduler are
   discussed. A general approach that utilises milestones [1]
   is given and illustrated using the Least Laxity Algorithm.
   Some performance results are also given (in the form of
   upper bound of the overhead).

%T TCP/IP on transputers \-\- the performance implications
%A Roger M. A. Peel
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X At the 14th WoTUG Technical Meeting at Loughborough
   University in September 1991, Graeme Tozer from INMOS
   described the architecture of the IMS B300 Ethernet
   Interface. This product supports four external transputer
   networks, providing each with an Iserver interface to
   Ethernet and connectivity to a host Iserver running on a
   processor elsewhere on the Ethernet. Performance claims in
   the range 200\-300 kbytes per second were made for raw TCP
   transfers between a typical Unix workstation and transputers
   networked to it using the B300. This paper outlines the
   techniques which the author has used recently to enhance the
   performance of his own pipelined TCP/IP implementation for
   Ethernet to achieve throughputs of up to 925 kbytes per
   second on substantially similar hardware. Many of these
   techniques are equally appropriate to any large
   communicating process application.

%T A transputer\-based accelerator for digital circuits fault simulation
%A G. P. Balboni, G. P. Cabodi, S. Gai, M. Sonza Reorda
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Fault simulating digital devices requires powerful tools
   able to deal with their increased size and complexity.
   Software simulators are often unable to satisfy the needs of
   designers and test engineers due to the size of the
   simulated circuits, and to the large number of faults;
   hardware accelerators have been proposed to solve the
   problem. We present a system running on a net of transputers
   which uses a fault\-partitioning strategy to fully exploit
   the available processors. The results show that this
   solution can represent a good trade\-off between the cost of
   the system and the obtained speed\-up.

%T Formal methods in the design of the T9000
%A Geoff Barrett, David May, D. Shepard
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X The complexity of integrated circuits continues to grow, and
   chips with over 100,000,000 transistors will be in
   widespread use by the late 1990s. These chips will combine
   general purpose processors with subsystems for
   communications and other specialised tasks. They will be far
   too complex for the design to be tested, and manufacturing
   volumes will be far too high for the design to be
   wrong!Mathematical techniques have already been applied to
   the design of parts of VLSI chips. Most of this work is
   experimental, and requires an unusual combination of
   engineering, mathematical and programming skills. Sometimes
   new theoretical work is needed, and specialised tools may
   have to be constructed. Despite these difficulties,
   mathematical techniques are playing an important role in the
   design of microprocessors at INMOS and techniques suitable
   for incorporation in standard computer\-aided design systems
   are emerging.

%T How to achieve replication within a CASE tool environment
%A Gordon A. Manson, E. A. Cachia, A. Boyle
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X A CASE tool, called ParStP, has been developed at the
   University of Sheffield. ParStP is built on top an existing
   open CASE tool called Software through Pictures and it
   combines design, compilation, running, testing and
   documentation into one integrated system. This paper shows
   how ParStP is being extended to cope with replication.

%T The PARIX pregramming environment
%A Parsytec GmBH.
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X The new Parsytec GC is a high\-performance parallel
   processing system for scientific and technical
   applications.Between 64 and 16,384 processors provide a
   computational performance from 1 to 400 GigaFlops (peak
   performance, double precision, 190 GigaFlops sustained,
   double precision) thus meeting even the most extreme
   demands.In the software environment PARIX, users work with
   standard compilers for Fortran and C, make use of
   UNIXdevelopment tools and libraries and have
   high\-performance systems for I/O, backup, graphics and
   video.The Parsytec GC is based on Inmos T9000 processors
   which can be structured into random topologies and which
   communicate both, the Inmos T800/T805 and the, at a maximal
   rate of 80 MBytes/s.

%T An Optimised Parallel Compiler for Executing Declarative Programs on Transputer Array
%A Wang Dingxing, Tian Xinmin, Zheng Weimin, Shen Meiming, Wen Dongchan
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X Many Declarative Programming Languages (DPLs) such as KL1,
   Prolog, PARLOG, Miranda and SML are considered attractive
   candidates for artificial intelligent application and
   execution on parallel architecture. However, there are many
   issues such as compile\-time granularity analysis, partial
   evalution, task scheduling and load balancing for the
   efficient implementations of DPLs on multiprocessor system.
   In this paper, we take the emphasis on the compiling
   implementation of PARLOG and SML on a distributed memory
   multiprocessor system (transputer array). Under the graph
   rewriting framework, a Heterogeneous Parallel Graph Rewritng
   Execution Model (HPGREM) and corresponding description
   Language CIL are proposed. Based on the HPGREM, a parallel
   abstract machine PAM /TGR (Parallel Abstract Machine for
   Term Graph Rewriting) and corresponding compilation rules to
   generate PAM/TGR code are presented. Futhcrmore, an
   optimised parallel compiler for executing declarative
   programs on transputer array is described. The performance
   statistic on a 16\-nodes transputer array demonstrates the
   effectiveness of our model, compiling techniques and

%T Implementation of learning automata games on a 128\-transputer reconfigurable machine using VCR1.8c (Virtual Channel Router)
%A Franciszek Seredynski, João Paulo Kitajima, Brigitte Plateau
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X An implementation of learning automata games on a
   distributed memory message\-passing reconfigurable
   multiprocessor with 128 Transputers is presented. The game
   is played using a conjugate exchange process in order to
   transform the maximal price point into the Nash point. The
   game was implemented in Occam2 with Virtual Channel Router
   (VCR), a router developed at the University of Southampton.

%T Nonconvex continuous optimization experiments on a transputer system
%A A. ter Laak, L. O. Hertzberger, P. M. A. Sloot
%E Alastair R. Allen
%B Proceedings of WoTUG\-15: Transputer Systems \- ongoing Research
%X In this paper we investigate the functionality of various
   parallel implementations of Simulated Annealing on a
   transputer platform. The optimization problem to be solved
   is that of efficiently finding the global minimum in
   continuous spaces. Our work concentrates on the consequences
   of long\-range and short\-range interactions on algorithmic
   and geometric decomposition schemes. We introduce a mixed
   transputer topology to by\-pass some of the inherent time
   critical operations involved. We show that combining the
   Fast Simulated Annealing algorithm with a systolic
   decomposition strategy results in a highly efficient
   algorithm for continuous optimization problems. Experiments
   indicate that incorporation of functional decomposition of
   the energy function results in a near optimal

If you have any comments on this database, including inaccuracies, requests to remove or add information, or suggestions for improvement, the WoTUG web team are happy to hear of them. We will do our best to resolve problems to everyone's satisfaction.

Copyright for the papers presented in this database normally resides with the authors; please contact them directly for more information. Addresses are normally presented in the full paper.

Pages © WoTUG, or the indicated author. All Rights Reserved.
Comments on these web pages should be addressed to: www at wotug.org

Valid HTML 4.01!