|
| WoTUG-20
| |
|
|
Parallel Programming and Java conference
13 - 16 April 1997
Control Laboratory of the
University of Twente
the Netherlands
|
The conference will be held from 13 - 16 April, 1997.
|
|
Accepted Papers
|
Title: A Tool for Proving Deadlock Freedom
Authors: J.M.R. Martin and S.A. Jassim
Title: Higher-Order Concurrency in Java
Author: Erik D. Demaine
Title: Compile-Time Schemes for Mapping Loop Parallelism
Author: Rizos Sakellariou
Title: Scriptic: Parallel Programming in Extended Java
Author: Andre van Delft
Title: Communicating Java Threads
Authors: Gerald Hilderink, Jan Broenink, Wiek Vervoort and André Bakkers
Title: The Design of JET: A Java Library for Embarrassingly Parallel Applications
Authors: Luis M. Silva, Hernani Pedroso and Joao Gabriel Silva
Title: Reconfigurable Computing
Author: Roger Gook
Title: Prefetch Data Management for Parallel Particle Tracing
Authors: Jonathan Tidmus Roger Miles Alan Chalmers
Title: Dynamic Process Interaction
Authors: Lajos SCHRETTNER and Innes JELLY
Title: Data-Strobe Links and Virtual Channel Processors
Author: Barry M. COOK
Title: The Macramé 1024 Node Switching Network
Authors: S. Haas, D.A. Thornley, M. Zhu, R.W. Dobinson and B. Martin
Title: WEAVE - A System for Dynamic Configuration of Virtual Links
Authors: S.R.Harrison & C.R.Brown
Title: Expanding the Message Passing Library Model with Nested Parallelism
Authors: Rodríguez C., Sande F., León C., García F.
Title: Beyond transputing: fully distributed semantics in Virtuoso's Virtual Single Processor programming model.
Author: Eric Verhulst.
Title: occam for Multi-Processor DEC Alphas
Authors: Peter H. Welch and Michael D. Poole
Title: A Multiprocessor OCCAM Development System for UNIX Network Clusters
Authors: D.G. Patrick, P.R. Green and T. A. York
Title: A tool for optimisation of program execution in dynamic topology systems.
Author: Tomasz Kalinowski
Title: Higher Levels of Process Synchronisation
Authors: Peter H. Welch and David C. Wood
Title: Fine-grained global control constructs for parallel programming environments
Author: Marek TUDRUJ
Title: An Open Systems Strategy for Distributed occam Execution
Authors: Paul SINGLETON an Barry M. COOK
|
|
|
|
A Tool for Proving Deadlock Freedom
J.M.R. Martin1) and S.A. Jassim2)
1) Oxford University Computing Services
13 Banbury Road
Oxford 0X2 6NN, UK
2) Department of Mathematics
Statistics, and Computer Science
University of Buckingham
MK18 1EG, UK
|
| Abstract. We describe a tool, programmed in Java, for the formal verification of
the absence of deadlock and livelock in networks of CSP processes. The
innovative techniques used scale well to very large networks, unlike the
exhaustive state checking method employed by existing tools. | |
|
|
Higher-Order Concurrency in Java
Erik D. Demaine
Dept. of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
eddemaine@uwaterloo.ca
|
| Abstract. In this paper we examine an extension to Hoare's Communicating
Sequential Processes model called higher-order concurrency, proposed by
Reppy. In this extension, communication algorithms (or events) are first-class
objects and can be created and manipulated dynamically. In addition, threads are
automatically garbage collected and channels are first-class, that is, they can be
passed over other channels. We describe the design of a Java package that
implements the main features of higher-order concurrency, with similar
ease-of-use to Reppy's Concurrent ML system. Our implementation can be
easily extended to use a distributed system, which is a major limitation with
Concurrent ML. We also hope to bring the idea of higher-order concurrency to a
wider audience, since it is extremely powerful and flexible, but currently only well
known to the programming-languages community. | |
|
|
Compile-Time Schemes for Mapping Loop
Parallelism
Rizos Sakellariou
Department of Computer Science, University of Manchester,
Oxford Road, Manchester M13 9PL, U.K.
rizos@cs.man.ac.uk
|
| Abstract. Loop structures are a potentially rich source of parallelism in programs
written in a high-level programming language such as Java; their parallelism can
be exploited by assigning different loop iterations on each processor of a parallel
computer (for instance, using the Java thread mechanism). This paper classifies
the schemes available for performing this assignment and emphasis is placed on
compile-time mapping schemes. An analytical mathematical description of the
properties of the latter is presented and the drawbacks that may be involved are
highlighted. | |
|
Scriptic:
Parallel Programming in Extended Java
André van Delft
Delftware Technology BV, Schijflaan 17, 2625 KG Delft, Netherlands
|
| Abstract. Scriptic is an expression based extension to the Java programming
language, targeted at user interfaces, simulations and parallel computing on shared
memory systems. The extras are mainly founded on the theory of Process
Algebra: constructs for non-deterministic choice, parallelism and communica-tion.
By default, these parallel constructs have interleaving seman-tics, rather than
multi-threading or forking.
Specific Java code fragments may run in their own threads or handle events from
the windowing system. This makes interactive applications such as arcade games
execute as fast as corresponding plain Java versions. GUI components such as
buttons and menu items are enabled and disabled when applicable, without
additional programming.
This paper covers an example application in Scriptic, an overview of the language
constructs, the implementation, originality, previous work and current work. | |
|
|
Communicating Java Threads
Gerald Hilderink
Jan Broenink
Wiek Vervoort
Andre Bakkers
(G.H.Hilderink, J.F.Broenink, A.W.P.Bakkers)@el.utwente.nl
W.Vervoort@cs.utwente.nl
University of Twente, dept. EE, Control Laboratory,
P.O.Box 217, 7500 AE Enschede, The Netherlands
|
| Abstract. The incorporation of multithreading in Java may be considered as a
significant part of the Java language, because it provides rudimentary facilities for
concurrent programming. However, we belief that the use of channels is a
fundamental concept for concurrent programming. The channel approach as
described in this paper is a realization of a systematic design method for
concurrent programming in Java based on the CSP paradigm. CSP requires the
availability of a Channel class and the addition of composition constructs for
sequential, parallel and alternative processes. The Channel class and the
constructs have been implemented in Java in compliance with the definitions in
CSP. As a result, implementing communication between processes is facilitated,
the programmer can avoid deadlock more easily, and the programmer is freed
from synchronization and scheduling constructs. The use of the Channel class and
the additional constructs is illustrated in a simple application. | |
|
|
The Design of JET: A Java Library
for Embarrassingly Parallel Applications
LUIS M. SILVA HERNÂNI PEDROSO JOÃO GABRIEL SILVA
Departamento Engenharia Informática
Universidade de Coimbra - POLO II
Vila Franca - 3030 Coimbra
PORTUGAL
Email: luis@dei.uc.pt, hernani@dsg.dei.uc.pt
|
| Abstract. JET is a parallel virtual machine. It has a dynamic number of
processors which may run different proprietary operating systems. The
processors communicate through the slowest intercommunication network of the
world, do not provide peak performance and in the overall the parallel machine
can be a very unstable computing surface. In other words, JET uses the idle CPU
cycles of computers that are connected to the Internet, being a really inexpensive
supercomputer. This paper presents the design of a Java parallel library that
provides support for the execution of embarrassingly parallel applications. It
inherits the security, robustness and portability features of Java and includes
support for fault-tolerance, scalability and high-performance through the use of
parallelism. | |
|
|
Reconfigurable Computing
Roger Gook
Managing Director
Embedded Solutions Ltd.
E-mail 104330,1033@compuserve.com
|
| Abstract. The name may be familiar of old to the WoTUG community, but it has
now been adopted by one of the fastest growing sectors of the silicon industry. Reconfigurable Computers are computing systems whose hardware architecture
can be modified by software to suit the application at hand. The core component
is the FPGA. Remarkable performance gains are achieved by placing an
algorithm in an FPGA for embedded applications, compared with using a
microprocessor or DSP. This is because an FPGA takes advantage of hardware
parallelism while reducing the timing overheads needed for general-purpose
microprocessor applications. For example the time taken by load/store
operations and instruction decoding can be eliminated. Reconfiguration enables
the FPGA to provide a problem specific computer for highly optimised
application performance.
Just as high level programming languages liberated the first microprocessors
programming languages will liberate the FPGA. The first of these languages to
become commercially available is Handel-C.
Handel-C is based on the CSP Model; it was developed by the Hardware
Compilation Group at the University of Oxford and is to be marketed by ESL.
The Handel-C tools enable a software engineer to target directly FPGAs in a
similar fashion to classical microprocessor cross-compiler development tools,
without recourse to a Hardware Description Language. Thereby allowing the
software engineer to directly realise the raw real-time processing capability of the
FPGA. The skills and expertise gained by the WoTUG, provide the group with a
competitive advantage to develop the innovative algorithms, applications and
products in this domain.
References:
University of Oxford Hardware Compilation Group.
http://www.comlab.ox.ac.uk/oucl/hwcomp.html
DARPA Chameleon Project Presentation. Application of this domain to the
defence industry.
http://www.ito.darpa.mil/ResearchAreas/Adaptive_Computing_Systems/ACSintro.html#anchor_index
| |
|
|
Prefetch Data Management for Parallel
Particle Tracing
Jonathan Tidmus, Roger Miles and Alan Chalmers
Intelligent Computer System Centre
University of the West of England
Bristol England BS16 1QY
jpt@ics.uwe.ac.uk
Department of Computer Science
University of Bristol
Bristol England BS8 1UB
January 1997
|
| Abstract. The particle tracing method uses a stochastic approach for global
illumination computation of three-dimensional environments. As with many
graphics techniques the computation associated with the image generation is
complex. Parallel processing offers the potential of solving the computational
complex particle tracing more rapidly. Distributed memory parallel systems are
scalable and readily available. However, large environmental models are often
bigger than individual node storage capabilities requiring data management to
distribute and control the movement of environmental data as computation
proceeds. Prefetch data management attempts to reduce idle time associated with
remote data fetches by anticipating the latency and requesting required data items
prior to their actual use during computation. This paper demonstrates how
attention to work division and supply coupled with prefetch data management can
be utilised to minimise overheads associated with a parallel implementation and
reduce overall image synthesis time. | |
|
|
Dynamic Process Interaction
Lajos SCHRETTNER1) and Innes JELLY 2)
1) Department of Computer Science, József Attila University, Szeged, Hungary
email: schrettner@inf.u-szeged.hu
2) Computing Research Centre, School of Computing and Management Sciences, Sheffield
Hallam University, Sheffield, UK
email: I.E.Jelly@shu.ac.uk
|
| Abstract. This paper concerns with the design of a building block for parallel and
distributed software systems. We start with a very common problem of process
interaction and successively derive a building block that can be used to construct
systems that are correct by construction. When copies of this building block are
connected to each other in an arbitrary fashion, the resulting system is
deadlock/livelock free. An application is outlined where this method was used.
We also would like to stress that the method of successive refinements used in
this
paper seems to be a fruitful approach in the design of protocols | |
|
|
Data-Strobe Links
and
Virtual Channel Processors
Barry M. COOK
Department of Computer Science,
Keele University,
Keele,Staffordshire, ST5 5BG, UK
email: barry@cs.keele.ac.uk
|
| Abstract. Data-strobe links and message transfer protocols as used by the
SGS-Thomson T9000 processor are described, as are the essential
characteristics of the
supporting virtual channel processor. A method of providing the same
functionality without the use of a T9000 is suggested and illustrated by a T225
processor design using a software virtual channel processor and minimal
supporting hardware. Finally, differences between the international standard,
IEEE 1355, and the T9000 links from which it was derived are described and the
implications for virtual channel links highlighted. | |
|
|
The Macramé 1024 Node Switching Network
S. Haas 1;2 D.A. Thornley 3 M. Zhu 1;2
R.W. Dobinson 1;2 B. Martin 1
1 CERN, 1211 Geneva 23, Switzerland
2 University of Liverpool, Liverpool, UK
3 University of Kent, Canterbury, UK
10th March 1997
|
| Abstract. To date, practical experience in constructing switching networks using
IEEE 1355 technology [1] has been confined to relatively small systems and there
are no experimental results on how the performance of such systems will scale up
to several hundred or even several thousand nodes.
Some theoretical studies [2, 3] have been carried out for large networks of up to
one thousand nodes for different topologies. We present results obtained on a
large modular testbed using 100 Mbits/s point to point DS links. One thousand
nodes will be interconnected by a switching fabric based on the 32 way STC104
packet switch [4]. The system has been designed and constructed in a modular
way to allow a variety of different network topologies to be investigated (Clos,
grid, torus, etc.). Network throughput and latency are being studied for various
traffic conditions as a function of the topology and network size. Results obtained
with the current 656 node setup are presented.
The work presented has been carried out within the framework of the European
Union's Esprit 1 program as part of the Macramé [5] project (Esprit project
8603). | |
|
|
WEAVE
A System for Dynamic Configuration of Virtual Links
S.R.Harrison & C.R.Brown
A.I. Vision Research Unit
Psychology Department
University of Sheffield
S.Harrison@aivru.shef.ac.uk
|
| Abstract: This paper describes Weave, a system which has been developed to
support the use of a DS Link (IEEE 1355) based parallel computer architecture.
Weave is an extension layer on IIPC (a simple transputer and UNIX parallel
processing environment which provides dynamic process management, hardware
transparency and message passing.) Although Weave is suitable for any T9000
Transputer network, it has been specifically designed to support the use of the
AIVRU DS Link Vision Engine. A brief description is given of this machine,
which processes live digital video using a mixture of hardware modules and
software processes, all interconnected by DS Links.
Weave provides the ability to make virtual link connections between processes
on demand at run-time. These connections may be disconnected when no longer
required, and hence the whole hardware architecture is dynamically reconfigured
automatically to suit the requirements of the software application. A small
functional interface provides processes with the ability to alter their own
connectivity, and that of other processes. A temporal locking mechanism for each
virtual link controls when it may be disconnected, and when pending connection
requests can be fulfilled. This locking mechanism is driven by the action of
communication over the virtual link. The Weave system supports transparently,
the creation and destruction of connections between software processes and the
image processing hardware modules (and between hardware modules directly).
Also provided transparently by Weave is support for the use of hardware
message replicator module(s) that multicast virtual link data to any number of DS
Link recipients. | |
|
|
Expanding the Message Passing Library
Model with Nested Parallelism
Rodríguez C., Sande F., León C., García F
Dpto. Estadística, Investigación Operativa y Computación,
Universidad de La Laguna, Tenerife, Spain
|
| Abstract. A synchronous extension to the library model for message passing
(Inmos C, PVM, Parmacs, MPI, etc.) is presented. This extension, provides a
comfortable expression of nested parallelism from inside the message passing
model. Furthermore of being a valuable tool for the presentation and teaching of
parallel algorithms, the computational results prove that an efficiency similar to or
even better than the one obtained designing and implementing algorithms using the
native language can be achieved. | |
|
|
Beyond transputing : fully distributed
semantics in Virtuoso's Virtual
Single Processor programming model and it's
implementation on of-the-shelf
parallel DSPs.
Eric Verhulst
Eonic Systems Inc.
info@eonic.com
http://www.eonic.com
|
| Abstract. Virtuoso Classico /VSP is a fully distributed real-time operating
system originally developed on the INMOS transputer. Its generic architecture is
based on a small but very fast nanokernel and a portable preemptive microkernel.
It was further on ported in single and virtual single processor implementations to a
wide range of processors. As the basis of any real-time application is a scheduler
that allows the developer to use the minimum schedule to satisfy the real-time
requirements, a number of derived Virtuoso tools have been developed with
complementary functionalities. This paper describes the rationale for developing
the distributed semantics of Virtuoso's microkernel and describes some of the
implementation issues. The analysis is based on the parallel DSP implementations
as these push the performance limits most for hard real-time applications. The
Virtuoso tools are being ported and further developed in the DIPSAP-II and
EURICO OMI/Esprit projects. | |
|
|
occam for Multi-Processor DEC Alphas
Peter H. Welch and Michael D. Poole
Computing Laboratory, University of Kent at Canterbury, CT2 7NF.
fP.H.Welch, M.D.Pooleg@ukc.ac.uk
|
| Abstract. A multi-processor implementation of occam2.1 for interconnected
DEC
Alpha processors has been derived from the Kent Retargetable occam Compiler.
Each Alpha processor is supported over a PCI bus by a T425 transputer, whose
links complete the communications fabric. This paper reports on the software
mechanisms for supporting these platforms from occam so that they appear just
like any transputer system -- a collection of processing nodes connected by
channels placed on links.
Advantage was taken of a proprietary multi-threading kernel, supplied as part of
3L Parallel C/AXP, to support parallel inter-node communication. occam multi-
processing is supported by the KRoC kernel running within one of the 3L
threads. The performance of generated code and networked systems has been
benchmarked, with particular care being taken to measure the interaction
overheads between the Alpha and its communication fabric. An image analysis
program was also used in the benchmarking as an example of a real
multi-processor application. | |
|
|
A Multiprocessor OCCAM Development
System for UNIX Network Clusters
D.G. Patrick, P.R. Green and T. A. York
Department of Electrical Engineering and Electronics,
University of Manchester Institute of Science and Technology
|
| Abstract. This paper describes the OCCNIX multiprocessor environment, that
enables OCCAM program development and testing, on clusters of UNIX
workstations. Linked binary level interpreters form a virtual Transputer network
that uses the TCP/IP client-server model and provides hardware independent
multiple platform access in a similar way to the recently released JAVA. Results
show a single processor performance that is half that of an actual Transputer and
a four-processor speedup of 0.78. The system also has the ability to run
development tools such as the ISPY network debugger | |
|
|
A tool for optimisation of program execution
in dynamic topology systems
Tomasz KALINOWSKI
Institute of Computer Science, Polish Academy of Sciences,
ul. Ordona 21, 01-237 Warsaw, Poland
|
| Abstract. In this paper, we present a tool for optimisation of execution of parallel
programs in distributed memory multi-processor systems with dynamic
interconnection networks. The programs are described as Directed Acyclic
Graphs (DAGs).
The tool allows to compare simulated execution times for different task
scheduling heuristics, target system topologies and communication models. A list
scheduling algorithm, which has been applied, accounts for dynamic changes of
interconnection structure. We demonstrate the efficiency of dynamic networks by
comparing schedules obtained for dynamic and fixed topology systems.
We propose a method of validating simulation results in a target system
composed of T9000 transputers. The method relies on comparison of simulation
results with execution times of synthetic OCCAM applications in the target
system. The comparison indicates that assumptions taken on program execution
and system model hold in the system under investigation. | |
|
|
Higher Levels of Process Synchronisation
Peter H. Welch and David C. Wood
Computing Laboratory, University of Kent at Canterbury, CT2 7NF.
fP.H.Welch, D.C.Woodg@ukc.ac.uk
|
| Abstract. Four new synchronisation primitives (SEMAPHOREs, RESOURCEs,
EVENTs and BUCKETs) were introduced in the KRoC 0.8beta release of
occam for SPARC (SunOS/Solaris) and Alpha (OSF/1) UNIX workstations
[1][2][3]. This paper reports on the rationale, application and implementation of
two of these (SEMAPHOREs and EVENTs). Details on the other two may be
found on the web [4].
The new primitives are designed to support higher-level mechanisms of
SHARING between parallel processes and give us greater powers of expression.
They will also let greater levels of concurrency be safely exploited from future
parallel architectures, such as those providing (virtual) shared-memory. They
demonstrate that occam is neutral in any debate between the merits of
message-passing versus shared-memory parallelism, enabling applications to take
advantage of whichever paradigm (or mixture of paradigms) is the most
appropriate.
The new primitives could be (but are not) implemented in terms of traditional
channels, but only at the expense of increased complexity and computational
overhead. The primitives are immediately useful even for uni-processors -- for
example, the cost of a fair ALT can be reduced from O(n) to O(1). In fact, all the
operations associated with new primitives have constant space and time
complexities; and the constants are very low.
The KRoC release provides an Abstract Data Type interface to the primitives.
However, direct use of such mechanisms still allows the user to misuse them.
They must be used in the ways prescribed below else else their semantics
become unpredictable. No tool is provided to check correct usage at this level.
The intention is to bind those primitives found to be useful into higher level
versions of occam. Some of the primitives (e.g. SEMAPHOREs) may never
themselves be made visible in the language, but may be used to implement
bindings of higher-level paradigms (such as SHARED channels and
BLACKBOARDs). The compiler will perform the relevant usage checking on all
new language bindings, closing the security loopholes opened by raw use of the
primitives.
The paper closes by relating this work with the notions of virtual transputers,
microcoded schedulers, object orientation and Java threads. | |
|
|
Fine-grained global control constructs for parallel
programming environments
Marek TUDRUJ
Institute of Computer Science, Polish Academy of Sciences
01-237 Warsaw, ul. Ordona 21, Poland
|
| Abstract. Problems of evolved control in fine-grained parallel programs in
distributed memory systems are discussed in the paper. Global control constructs
are proposed which logically bind program modules, assign them to worker
processors and define the involved flow of control. Implementation methods are
discussed which assume control flow processing decoupled from data processing
inside executive modules. The proposed constructs are extensions of the
OCCAM 2 language. They can be incorporated into an intermediate code
generated by a parallel language compiler or can be used by a programmer to
define control flow between fine-grained program modules assigned to different
processors. Architectural requirements for efficient implementation of the
proposed control constructs are discussed | |
|
|
An Open Systems Strategy
for Distributed occam Execution
Paul SINGLETON (paul@cs.keele.ac.uk)
Barry M. COOK (barry@cs.keele.ac.uk)
|
| Abstract. We demonstrate the feasibility of distributed execution of occam
programs within computer networks which support open systems standards for
inter-process communication, remote execution and program hosting (e.g.
hardware-independent programming languages and
operating-system-independent APIs). We first propose a general architecture,
and then describe a constructive proof of its viability (i.e. a prototype). which
uses a novel multiplexed virtual channel protocol over UNIX sockets. Finally we
summarise some of the opportunities for further development which this project
has created. | |
|
|
|