Home | Conferences | Links | Reference | About | Search |
|
Refer Proceedings details%T TROS: A Real Time Kernel for a Fault\-Tolerant Multi\-Processor Computer Based on Argument Flow %A Eric Verhulst, R. Lauwereins, R. Cuyvers, J. Peperstraete %E J. Wexler %B OUG\-11: Developing Transputer Applications %X The design and realisation of a fault tolerant load balancing real time kernel for a multi\-transputer system is considered. The system will be able to recover from software as well as from hardware failures. This is made possible by applying the argument flow program organisation. Argument flow programs are a mixture of normal control flow at the lowest level and of data flow at the higher levels. Hence, load balancing can be executed automatically.In the next part, the architecture of the runtime kernel and the use of argument flow is considered. %T Dynamicity through Occam and TDS %A D. Millot, J. Vautherin %E J. Wexler %B OUG\-11: Developing Transputer Applications %X A parallel program running on a parallel machine involves a logical network of processes and a physical network of processors. When both networks are known at design time, a static mapping of the logical network on the physical one can take place. Dynamicity arises when one of the networks is not entirely determined before execution starts. This can happen for instance when the logical network involves dynamic creation of processes, or when the physical topology cannot be defined at design time (this is the case in a multi\-user context, as the configuration a user gets is affected by the other users\[rs] computations, or when the programmer wants to develop a generic program that can be executed on any physical topology).When using Occam to design a network of processes that has to be mapped on transputers, things have to be decided before execution starts :\- the network of processes should be composed of an explicitly bounded number of processes due to the fact that, unlike CSP, Occam does not allow recursion,\- the topology of the transputer network should be known, for processes are explicitly mapped onto processors. One has to investigate the physical topology in order to write the configuration statements, and a change in the topology entails modifications in these statements. An application is therefore dedicated to a configuration. It would nevertheless be very attractive to design software that could run on an unidentified topology, trying to make the best of available processors. Such software has to realize a dynamic placement and therefore includes a phase supposed to investigate the configuration, load the different codes on the appropriate processors, then dynamically start their execution.Our aim is to develop suitable tools to write that kind of application. As a first step, we investigate the potentiality of Occam and TDS in this field. %T A CASE Tool for Designing Deadlock\-Free OCCAM Programs %A W. D. Crowe, R. Hasson, P. E. D. Strain-Clark %E J. Wexler %B OUG\-11: Developing Transputer Applications %X This paper describes a Computer Aided Software Engineering (CASE) tool aimed at producing correct OCCAM programs. We use variants of CSP and OCCAM2 notation , a strong form of protocols on channels and standard forms for processes in order to lessen the combinatorial problems which arise while investigating deadlock. The CASE tool is graphical in nature and two versions of it are currently being implemented. One version is written in PROLOG and runs on Mackintosh computers while the another is written in a mixture of PROLOG and C and runs on Sun workstations. %T A Deadlock Detection Tool for Occam %A Wouter Joosen, Pierre Verbaeten %E J. Wexler %B OUG\-11: Developing Transputer Applications %X A problem that occurs frequently during the development of parallel programs is deadlock.In the domain of transputer technology (and of parallel computing in general) run time debugging software still is a problem. This increases the value of verification tools based on static analysis, even if the functionality is sometimes limited.In this paper we present our approach to static analysis. The analyzer reduces an occam program to the relevant actions hi the context of this problem (communication, possibly through guards: PAR constructions...), and subsequently examines the program, reporting possible problems that could occur during real execution. The tool goes beyond the purpose of detecting deadlocks only: other infinite wait situations are also reported. %T Towards a Software Architecture for Solid Modelling Systems on Processor Networks %A J. R. Davy, Peter M. Dew, Nick Holliman, D.P. Mallon, Alan de Pennington %E J. Wexler %B OUG\-11: Developing Transputer Applications %X Our experience programming transputer\-based systems for the visualisation of solid models has highlighted the need to separate the issues of program decomposition, task and data scheduling and low level communication. In this paper we propose a three level software architecture for multicomputer to support geometrical computations. The architecture takes into account current and emerging hardware to support communications and scheduling. %T An Irregular Distributed Simulation Problem with a Dynamic Logical Process Structure %A Ming Q. Xu, Stephen J. Turner, Nie Pin %E J. Wexler %B OUG\-11: Developing Transputer Applications %X This paper describes a modeling problem which exhibits many features of more advanced distributed simulation: the simulation of biological population dynamics — "Host\-parasite interactions". It is a dynamical simulation in which certain species of hosts and parasites live, move randomly, breed and (as far as parasites are concerned) infect the hosts in a two\-dimensional ocean. Apart from its relevance to realistic biological studies, this simulation program does serve to illustrate many crucial ideas in dynamic time and event driven simulations. Our approach to the parallel implementation of this simulation requires the simulation objects (in this case, hosts and parasites) to be organised as LPs (Logical Processes) which can be created and destroyed dynamically at run time to reflect the birth/death of these simulation objects. In addition to their dynamical features, LPs must preserve the temporal aspects of the real world system. In other words, a global ordering of LP interactions (referred to here as actions) must be ensured to preserve the causality principle [1]. Also, the computational load can become imbalanced because the real world system or rather, the distribution of hosts and parasites in the underlying space is changing with time. The methods for counteracting this dynamical load imbalance will be described.We shall begin with the specification of the dynamical rules governing the behaviour of the hosts and parasites during the simulation. %T A Generally Configurable Multigrid Implementation for Transputer Networks %A Osama El-Giar, Tim Hopkins %E J. Wexler %B OUG\-11: Developing Transputer Applications %X This paper describes the performance of a multigrid method implemented on a transputer\-based architecture. We show that the combination of fast floating\-point hardware, local memory and fast communication links between processors provide an excellent environment for the parallel implementation of multigrid algorithms. The gain in efficiency obtained by increasing the number of processors is shown to be nearly linear and comparisons are made with published figures for a parallel multigrid Poisson solver on an Intel iPSC 32\-node hypercube. %T Self\-Adjusting Mapping: A Heuristic Mapping Algorithm for Mapping Parallel Programs onto Transputer Networks %A Hong Shen %E J. Wexler %B OUG\-11: Developing Transputer Applications %X The problem of mapping parallel programs onto multiprocessor system is a fundmental problem of great significance in parallel processing, but it is NP\-hard in general. In this paper we propose a fast heuristic algorithm to solve this problem on transputer networks. Our mapping algorithm consists of three modules: grouping, placement and routing, where grouping groups processes in the program into tasks which can be placed onto processors in the transputer network in a way of one\-to\-one, placement places the grouped tasks onto the processors and routing produces physical communication paths for logical communication requirements. The three modules work co\-operatively in a way of progressive self\-adjusting, and finally produce a satisfactory solution for the mapping problem. %T The Investigation of Communications Patterns in Occam Programs %A Rosemary Candlin, Qiangyi Luo, Neil Skilling %E J. Wexler %B OUG\-11: Developing Transputer Applications %X The performance of a concurrent computation running on a multiprocessor system may depend critically on the way in which the program is decomposed and placed on the machine. In order to exploit the potential of parallel processors, it is necessary to balance the advantage of spreading the computational load as thinly as possible over the processors, with the disadvantage that increased communication delays may slow down the computation. In general, there is no satisfactory theoretical model of the complex interaction between the amount of computation carried out by the individual processes, their frequency of communication and the topology of the underlying machine. For many programs, it is not easy to see in advance how computation will interact with communication, and placement strategies which depend only on a static analysis of the program structure may not be sufficient. The work described here is an attempt to provide useful tools for the occam programmer which can be used to investigate communications patterns, and to explore different configurations rapidly.We think that this approach will be particularly valuable for programs which can be decomposed in a natural way into a fairly large number of top\-level occam processes, so that the preliminary parallelization arises out of the nature of the application, and the main problem is to place these processes on a smaller number of physical processors. This is often the case for programs which model real\-time systems, and we have taken as an example an application from chemical engineering. In programs like this, there is natural concurrency in the real world which can be easily represented in terms of occam processes. At the moment, we do not attempt to extract parallelism automatically, or handle shared data, though there are a number of systems that have tackled these problems (see, for example [1] and [2]). Our main aim at this stage is to provide a programmer with profiling tools, and see to what extent they can help to produce an efficient implementation of the program. %T A System Configuration for very large Database Problems [Extended Abstract] %A Alan G. Chalmers, Derek J. Paddon %E J. Wexler %B OUG\-11: Developing Transputer Applications %X In the past many applications have ensured success by restricting the size of the application, or by increasing the number of processors and memory size to enable the full database to be supported. Here, we specify that databases of arbitrary sizes should be supported and not be restricted by the memory size of individual processors.The ability to cope with very large databases was easily achieved in many of the early MIMD systems by using a shared memory model. However, the transputer and Occam process model restricts us from using this approach, instead we may share data [7].Unlike shared memory systems, we cannot globally address data in a message passing system. However, if data items carry unique identifiers, we can share single or multiple copies of those data items across many processors. Indeed, adopting this system of shared data reference allows us the same memory flexibility for read\-only data, as would be obtained in a shared memory system, without the bus contention problems associated with that class of processor. In its degenerate form, a shared data system has only private data, which is never available at any other processor. The simple processor farm of May and Shepherd [8] is a typical example, where data and tasks are assigned to specific processors without the need for data to migrate to other processors. In many applications, such as the ray tracing of very complex computer images, a static allocation of data is inappropriate. Here, a database is managed at each node in a similare manner to a cache memory. Shared data systems for a tree based system architecture, and for very large data base problems are described by Green, Paddon and Lewis [7], and Green and Paddon [3, 4, 5, 6], where these systems were applied to image synthesis using the ray tracing method. %T A Comparison of Parallel Implementations of Flux Corrected Transport Codes %A Jing-ming Jong, G. S. Stiles %E J. Wexler %B OUG\-11: Developing Transputer Applications %X We present the results of comparing implementations of the Flux Corrected Transport (FCT) method on transputers and several other parallel and sequential machines. FCT is a finite difference scheme used to solve fluid dynamics problem which may involve steep gradients or shocks; it has proven useful for both one\- and two\-dimensional problems in plasma physics, atmospheric sciences, and detonation studies. The method vectorizes very well and hence runs quickly on supercomputers. Since the calculations at each point involve only a small number of neighbors, the method can also be efficiently implemented on multi\-processor systems. We have run one\- and two\-dimensional problems on Transputers and several other systems, including a VAX 8650, a SUN 4/280, a four\-processor Ardent Titan, an eight\-processor Alliant FX/8, and a four\-processor Silicon Graphics 240GTX. We shall also compare our results to those obtained by Gustafson (1988) on the NCube/ten.If, in the 1\-d problem, we consider the speed of a single T800 to be 1.0, the SUN 4/280 ranks at 3.8, the VAX 8650 at 4.0, 8 TSOOs at 7.9, the Silicon Graphics 240GTX at 27.0, the FX/8 at 56.9, and the Titan at 64.4. On the 1\-d problem, again taking one T800 to have a speed of 1.0, the SUN comes in at 3.6, 16 NCube nodes at 4.0, the 8650 at 4.3, 8 TSOOs at 7.7, the Titan at 65.3, and the FX/8 at 101.4. The transputer ranks highest if we calculate the cost\-effectiveness of the various systems by dividing the relative speed by the approximate cost. If we assume the 8 TSOOs have a cost\-effectiveness of 1.0 on the 1\-d problem, the Titan is second at 0.52, followed by the 240GTX at 0.17, the FX/8 at 0.094, the SUN at 0.081, and the VAX at 0.021. %T Simulating Neural Networks in a Distributed environments %A Jukka Vanhala, Kimmo Kaski %E J. Wexler %B OUG\-11: Developing Transputer Applications %X Two artificial neural network models are compared. They are the Hopfleld neural network model and the Sparse Distributed Memory model. Distributed algorithms for both of them are designed and implemented. The run time characteristics of the algorithms are analyzed theoretically and tested in practise. The storage capacities of the networks are compared. Implementations are done using a distributed multiprocessor system. %T Attribute Evaluation on a Network of Transputers %A Matthijs F. Kuiper, Atze Dijkstra %E J. Wexler %B OUG\-11: Developing Transputer Applications %X The structure and construction of parallel compilers that run on a network of transputers is discussed. The compilers are automatically generated from an attribute grammar definition of the source language. This work illustrates that parallelism can also be used in non\-numeric computations. As part of implementing parallel compilers we have constructed a general message passing kernel for transputers. This kernel can also be used in other applications. First results indicate that parallel compiling on transputers is feasible and that 4 to 16 transputers can be used in parallel compilers. %T An Object Oriented Style for the Computing Surface %A Matthew Chalmers %E J. Wexler %B OUG\-11: Developing Transputer Applications %X During the development of a ray tracer on a Meiko Computing Surface, problems with poor flexibility of configuration and slow software development were encountered. In order to overcome these difficulties, and in order to facilitate experimental programming on the Meiko, a system to support an object\-oriented style for Occam\[rs] programming was developed. The aim was to create a set of library modules that would allow user code to be quickly developed and integrated into existing programs, to support better debugging facilities than were currently available, and to allow program design to be based on a more flexible and dynamic model of concurrency than the process modelThis system has been rewritten in order to introduce new features and to take advantage of the availability of C. The new system is described, with the emphasis on how experience with the system influenced its redesign, and on the details of newer elements such as the improved facilities for monitoring and debugging. %T C_NET A C++ Based Language For Distributed And Real Time Programming %A Jean_Marc Adamo %E J. Wexler %B OUG\-11: Developing Transputer Applications %X C_NET is a high level C++ based language devoted to multiprocessor architecture programming. It has been designed so as to offer concepts of object\-oriented programming, communicating processes and exception handling, all within the same language. The purpose of this paper is to describe how merging these concepts into C_NET has been organized. Consequently, the paper is divided into three parts. The first is concerned with discussing the different roles that the notions of class object and process are intended to play within the language. It is argued that these roles are in fact orthogonal, since the first two notions are primarily concerned with data structuring, encapsulation and inheritance, whereas the last one is mainly concerned with threads of control and synchronization. The second part is devoted to describing the exception handling system, which makes it possible to derive process preemption mecanisms by combining exceptions with parallelism. Process preemption raises some atomicity problems, which are discussed at the end of the second part. Finally the last part provides information on the state of the project development and on future perspectives. %T Real\-Time Transputer Models of a Low\-Level Primate Vision %A Andrew B. Smith, Peter H. Welch %E J. Wexler %B OUG\-11: Developing Transputer Applications %X It is believed from psychophysical experiments that human vision operates in two stages: a parallel preattentive stage which extracts simple visual features and a sequential attentive stage in which local features of the scene are analysed. A processing model of the early preattentive stage has been developed. This model is computationly intensive making it unsuitable for implementation on sequential computer architectures. The development of a real\-time parallel transputer vision system based on this processing model is explained.The current implementation performs edge filtering over four separate resolution/field\-of\-view levels from 256 by 256 monochrome images. Eight T800\-20 transputers deliver over 40 frames per second. The software — hardware architecture is scalable so as to support higher resolutions and additional features (such as auto\-focusing, movement detection, and tracking) through the addition of extra transputers, whilst maintaining at least camera frame rates. %T ICR: A Transputer\-Based Intelligent Character Reader %A Francis Wong F.S., Koh Liang Seng %E J. Wexler %B OUG\-11: Developing Transputer Applications %X This paper presents the design of a transputer\-based Intelligent Character Reader (ICR). The implementation of the ICR is based on two algorithms. The Filtered Projection Technique (FPT) is a modification of the conventional projection profile techniques \- it filters out and projects the structural information of a visual pattern into several components along several directions, such that some salient structural information can be extracted from these components easily. The Structural Complexity Index (SCI) algorithm determines the structural complexity of a visual pattern and produces the corresponding index to indicate its relative complexity. The ICR is capable of recognizing printed characters, such as the Chinese characters printedon the local newspapers, accurately, despite the presence of noise due to printing, scanning, slight font variations and misalignment. The ICR prototype is implemented on Occam II and C, and run on an array of transputers. %T Solving Partial Differential equations via Cellular Automata: A Binary and Statistical Approach %A A. Cosnuau, F. Desbois, Y. Morchoisne %E J. Wexler %B OUG\-11: Developing Transputer Applications %X Cellular automata are used to solve partial differential equations (PDE) discretized on a non\-regular grid. Boolean representations of real numbers are introduced and logical intrinsic computer functions are used to achieve algebraic operations of the finite difference algorithm. In a first step the method is briefly explained and then implementation is described. For diffusion or convection problems, preliminary computations on a Cray\-XMP18 and on a network of T800 Transputers were done. |
If you have any comments on this database, including inaccuracies, requests to remove or add information, or suggestions for improvement, the WoTUG web team are happy to hear of them. We will do our best to resolve problems to everyone's satisfaction.
Copyright for the papers presented in this database normally resides with the authors; please contact them directly for more information. Addresses are normally presented in the full paper.
Pages © WoTUG, or the indicated author. All Rights Reserved.
Comments on these web pages should be addressed to:
www at wotug.org