Proceedings details

Title: Communicating Process Architectures 2013
Editors: Peter H. Welch, Frederick R. M. Barnes, Jan F. Broenink, Kevin Chalmers, Jan Bækgaard Pedersen, Adam T. Sampson
Publisher: Open Channel Publishing Ltd., Bicester
ISBN: 978-0-9565409-7-3

Keynote presentations

Title:	Costing by Construction
Authors:	Greg Michaelson
Abstract:	Predicting the performance of sequential systems is hard and concurrency brings additional complexities of coordination and timing. However, current models of concurrency tend to be unitary and so conflate computation and coordination. In contrast, the Hume language is based on concurrent generalised finite state boxes linked by wires. Boxes are stateless with transitions driven by pattern matching to select actions in a full strength functional language. This explicit separation of coordination and computation greatly eases concurrent system modelling: classical inductive reasoning may be used to establish properties within boxes, while box coordination may be explored independently through the novel box calculus. This seminar gives an introduction to the Hume language, cost models for Hume, and the box calculus, and considers how they might be integrated in a system to support costing by construction, where the resource implications of design decisions are made manifest as a system evolves.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF)

Title:	National HPC Facilities at EPCC: Exploiting Massively Parallel Architectures for Scientific Simulation
Authors:	Andrew Turner
Abstract:	This presentation gives an overview of the challenges facing scientific software developers in exploiting current and upcoming massively parallel HPC facilities. The different levels of parallelism available in the architectures will be discussed, together with their impact for software design and what the trend is for the future. The discussion will be illustrated with examples from the two national facilities currently hosted at EPCC: HECToR (a Cray XE6) and the DiRAC IBM BlueGene/Q.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF)

Papers

Title:	The Meaning and Implementation of SKIP in CSP
Authors:	Thomas Gibson-Robinson, Michael Goldsmith
Abstract:	The CSP model checker FDR has long supported Hoare's termination semantics for CSP, but has not supported the more theoretically complete construction of Roscoe's, largely due to the complexity of adding a second termination semantics. In this paper we provide a method of simulating Roscoe's termination semantics using the Hoare termination semantics and then prove the equivalence of the two different approaches. We also ensure that FDR can support the simulation reasonably efficiently.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Successful Termination in Timed CSP
Authors:	Paul Howells, Mark d'Inverno
Abstract:	In previous work the authors investigated the inconsistencies of how successful termination was modelled in Hoare, Brookes and Roscoe's original CSP. This led to the definition of a variant of CSP, called CSPt. CSPt presents a solution to these problems by means of adding a termination axiom to the original process axioms. In this paper we investigate how successful process termination is modelled in Reed and Roscoe's Timed CSP, which is the temporal version of Hoare's original untimed CSP. We discuss the issues that need to be considered when selecting termination axioms for Timed CSP, based on our experiences in defining CSPt. The outcome of this investigation and discussion is a collection of candidate successful termination axioms that could be added to the existing Timed CSP models, leading to an improved treatment of successful termination within the Timed CSP framework. We outline how these termination axioms would be added to the family of semantic models for Timed CSP. Finally, we outline what further work needs to be done once these new models for Timed CSP have been defined. For example, it would then be possible to define timed versions of the new more flexible parallel operators introduced in CSPt.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Verifying the CPA Networking Stack using SPIN/Promela
Authors:	Kevin Chalmers, Jon Kerridge
Abstract:	This paper presents a verification of the CPA Networking Stack, using the SPIN Model Checker. Our work shows that the system developed for general networking within CPA applications works under the conditions defined for it. The model itself focuses on ensuring deadlock freedom, and work still needs to be undertaken to verify expected behaviour of the architecture.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Improving the Performance of Periodic Real-time Processes: a Graph Theoretical Approach
Authors:	Antoon H. Boode, Hajo Broersma, Jan F. Broenink
Abstract:	In this paper the performance gain obtained by combining parallel periodic real-time processes is elaborated. In certain single-core mono-processor configurations, for example embedded control systems in robotics comprising many short processes, process context switches may consume a considerable amount of the available processing power. For this reason it can be advantageous to combine processes, to reduce the number of context switches and thereby increase the performance of the application. As we consider robotic applications only, often consisting of processes with identical periods, release times and deadlines, we restrict these configurations to periodic real-time processes executing on a single-core mono-processor. By graph theoretical concepts and means, we provide necessary and sufficient conditions so that the number of context switches can be reduced by combining synchronising processes.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Scaling PyCSP
Authors:	Rune Møllegard Friborg, John Markus Bjørndalen, Brian Vinter
Abstract:	PyCSP is intended to help scientists develop correct, maintainable and portable code for emerging architectures. The library uses concepts from Communicating Sequential Processes, CSP, and is implemented in the Python programming language. This paper introduces a new channel implementation and new process types for PyCSP that are intended to simplify writing programs for clusters. The new processes and channels are investigated by running 3 benchmarks on two separate clusters, using up to 512 CPU cores. The results show that PyCSP scales well, with close to linear scaling for some of the benchmarks.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Service Oriented Programming in MPI
Authors:	Sarwar Alam, Humaira Kamal, Alan Wagner
Abstract:	In this paper we introduce a service oriented approach to the design of distributed data structures for MPI. Using this approach we present the design of an ordered linked-list structure. The implementation relies on Fine-Grain MPI (FG-MPI) and its support for exposing fine-grain concurrency. We describe the implementation of the service and show how to compose and map it onto a cluster. We experiment with the service to show how its behaviour can be adjusted to match the application and the machine.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Scalable Performance for Scala Message-Passing Concurrency
Authors:	Andrew Bate
Abstract:	This paper presents an embedded domain-specific language for building massively concurrent systems. In particular, we demonstrate how ultra-lightweight cooperatively-scheduled processes and message-passing concurrency can be provided for the Scala programming language on the Java Virtual Machine (JVM). We make use of a well-known continuation-passing style bytecode transformation in order to achieve performance that is several orders of magnitude higher than native JVM threads. Our library is capable of scaling to millions of processes and messages on a single virtual machine instance, and our runtime system will detect deadlock should it occur. Message-passing is over 100 times faster than Erlang, and context switching is 1000 times faster than native Java threads. In benchmarks, the performance of our library is close to compiled code.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	ProcessJ: A Possible Future of Process-Oriented Design
Authors:	Jan Bækgaard Pedersen, Marc L. Smith
Abstract:	We propose ProcessJ as a new, more contemporary programming language that supports process-oriented design, which raises the level of abstraction and lowers the barrier of entry for parallel and concurrent programming. ProcessJ promises verifiability (e.g., deadlock detection), based on Hoare's CSP model of concurrency, and existing model checkers like FDR. Process-oriented means processes compose, unlike thread-based or asynchronous message-passing models of concurrency; this means that programmers can incrementally define larger and larger concurrent processes without concern for undesirable nondeterminism or unexpected side effects. Processes at their lowest, most granular level are sequential programs; there are no global variables, so no race conditions, and the rules of parallel composition are functional in nature, not imperative, and based on the mathematically sound CSP process algebra. Collectively, these ideas raise the level of abstraction for concurrency; they were successful once before with the occam language and the Transputer. We believe their time has come again, and will not go away, in this new age of multi-core processors. Computers have finally caught up with CSP and process-oriented design. We believe that ProcessJ can be the programming language that provides a bridge from today's languages to tomorrow's concurrent programs. Learning or teaching the programming model and language will be greatly supported through the educational part of the proposed project, which includes course templates and an online teaching tool that integrates in-browser programming with teaching material. Our efforts are encouraged by the forthcoming 2013 IEEE and ACM curricula guidelines, which for the first time include concurrent programming as a core knowledge area at the undergraduate level.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Specifying and Analysing Networks of Processes in CSPt (or In Search of Associativity)
Authors:	Paul Howells, Mark d'Inverno
Abstract:	In proposing theories of how we should design and specify networks of processes it is necessary to show that the semantics of any language we use to write down the intended behaviours of a system has several qualities. First in that the meaning of what is written on the page reflects the intention of the designer; second that there are no unexpected behaviours that might arise in a specified system that are hidden from the unsuspecting specifier; and third that the intention for the design of the behaviour of a network of processes can be communicated clearly and intuitively to others. In order to achieve this we have developed a variant of CSP, called CSPt, designed to solve the problems of termination of parallel processes present in the original formulation of CSP. In CSPt we introduced three parallel operators, each with a different kind of termination semantics, which we call synchronous, asynchronous and race. These operators provide specifiers with an expressive and flexible tool kit to define the intended behaviour of a system in such a way that unexpected or unwanted behaviours are guaranteed not to take place. In this paper we extend out analysis of CSPt and introduce the notion of an alphabet diagram that illustrates the different categories of events that can arise in the parallel composition of processes. These alphabet diagrams are then used to analyse networks of three processes in parallel with the aim of identifying sufficient constraints to ensure associativity of their parallel composition. Having achieved this we then proceed to prove associativity laws for the three parallel operators of CSPt. Next, we illustrate how to design and construct a network of three processes that satisfy the associativity law, using the associativity theorem and alphabet diagrams. Finally, we outline how this could be achieved for more general networks of processes.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Efficient Simulation of CSP-Like Languages
Authors:	Thomas Gibson-Robinson
Abstract:	In "On the Expressiveness of CSP", Roscoe provides a construction that, given the operational semantics rules of a CSP-like language and a process in that language, constructs a strongly bisimilar CSP process. Unfortunately, the construction provided is difficult to use and the scripts produced cannot be compiled by the CSP model-checker, FDR. In this paper, we adapt Roscoe's simulation to make it produce a process that can be checked relatively efficiently by FDR. Further, we extend Roscoe's simulation in order to allow recursively defined processes to be simulated in FDR, which was not supported by the original simulation. We also describe the construction of a tool that can automatically construct the simulation, given the operational semantics of the language and a script to simulate, both in an easy-to-use format.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Selective Choice "Feathering" with XCHANs
Authors:	Øyvind Teig
Abstract:	This paper suggests an additional semantics to XCHANs, where a sender to a synchronous channel that ends up as a component in a receiver's selective choice (like ALT) may (if wanted) become signaled whenever the ALT has been (or is being) set up with the actual channel not in the active channel set. Information about this is either received as the standard return on XCHAN's attempted sending or on the built-in feedback channel (called x-channel) if initial sending failed. This semantics may be used to avoid having to send (and receive) messages that have been seen as uninteresting. We call this scheme feathering, a kind of low level implicit subscriber mechanism. The mechanism may be useful for systems where channels that were not listened to while listening on some other set of channels, will not cause a later including of those channels to carry already declared uninteresting messages. It is like not having to treat earlier bus-stop arrival messages for the wrong direction after you sit on the first arrived bus for the correct direction. The paper discusses the idea as far as possible, since modeling or implementation has not been possible. This paper's main purpose is to present the idea.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	The Distributed Application Debugger
Authors:	Michael Quinn Jones, Jan Bækgaard Pedersen
Abstract:	In this paper we introduce a tool for debugging parallel programs which utilize the popular MPI (message passing interface) C library. The tool is called The Distributed Application Debugger and introduces distinct components around the code being debugged in order to play, record, and replay sessions of the user's code remotely. The GUI component incorporates the popular GDB debugger for stepping through code line by line as it runs on the distributed cluster as well as an analysis tool for stepping through the executed portions of code after the session has completed. While the system is composed of multiple components, the logistics of coordinating them is managed without user interaction; requiring the user to only provide the location of the program to debug before starting.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	BPU Simulator
Authors:	Martin Rehr, Kenneth Skovhede, Brian Vinter
Abstract:	A number of scientific applications start their life as a Matlab prototype that is later re-implemented in a low level programming language, typically C++ or Fortran for the sake of performance. Bohrium is a project that seeks to eliminate both the cost and the potential errors introduced in that process. Our goal is to support all execution platforms, and in this work we introduce the Bohrium Processing Unit, BPU, which will be the FPGA backend for Bohrium. The BPU is modeled as a PyCSP application, and the clear advantages of using CSP for simulating a new CPU is described. The current PyCSP simulator is able to simulate 2^20 Monte Carlo simulations in less than 35 seconds in the smallest BPU simulation.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Exploring GPGPU Acceleration of Process-Oriented Simulations
Authors:	Frederick R. M. Barnes, Thomas Pressnell, Brendan Le Foll
Abstract:	This paper reports on our experiences of using commodity GPUs to speed-up the execution of fine-grained concurrent simulations. Starting with an existing process-oriented "boids" simulation, we explore a variety of techniques aimed at improving performance, gradually refactoring the original code. Successive improvements lead to a 10-fold improvement in performance, which we believe can still be improved upon, allowing us to explore simulations with larger numbers of agents (30,000 rather than 2,000) interactively and without significant performance degradation.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	A Personal Perspective on the State of HPC in 2013
Authors:	Christopher C.R. Jones
Abstract:	This paper is fundamentally a personal perspective on the sad state of High Performance Computing (HPC, or what was known once as Supercomputing). It arises from the author's current experience in trying to find computing technology that will allow codes to run faster: codes that have been painstakingly adapted to efficient performance on parallel computing technologies since around 1990, and have allowed effective 10-fold increases in computing performance at 5 year HPC up-grade intervals, but for which the latest high-count multi-core processor options fail to deliver improvement. The presently available processors may as well not have the majority of their cores as to use them actually slows the code - hard-won budget must be squandered on cores that will not contribute. The present situation is not satisfactory: there are very many reasons why we need computational response, not merely throughput. There are a host of cases where we need a large, complex simulation to run in a very short time. A simplistic calculation based on the nominal performance of some of the big machines with vast numbers of cores would lead one to believe that such rapid computation would be possible. The nature of the machines and the programming paradigms, however, remove this possibility. Some of the ways in which the computer science community could mitigate the hardware shortfalls are discussed, with a few more off the wall ideas about where greater compute performance might be found.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF)

Title:	An Evaluation of Intel's Restricted Transactional Memory for CPAs
Authors:	Carl G. Ritson, Frederick R. M. Barnes
Abstract:	With the release of their latest processor microarchitecture, codenamed Haswell, Intel added new Transactional Synchronization Extensions (TSX) to their processors' instruction set. These extensions include support for Restricted Transactional Memory (RTM), a programming model in which arbitrary sized units of memory can be read and written in an atomic manner. This paper describes the low-level RTM programming model, benchmarks the performance of its instructions and speculates on how it may be used to implement and enhance Communicating Process Architectures.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF)

Title:	Life of occam-Pi
Authors:	Peter H. Welch
Abstract:	This paper considers some questions prompted by a brief review of the history of computing. Why is programming so hard? Why is concurrency considered an "advanced" subject? What's the matter with Objects? Where did all the Maths go? In searching for answers, the paper looks at some concerns over fundamental ideas within object orientation (as represented by modern programming languages), before focussing on the concurrency model of communicating processes and its particular expression in the occam family of languages. In that focus, it looks at the history of occam, its underlying philosophy (Ockham's Razor), its semantic foundation on Hoare's CSP, its principles of process oriented design and its development over almost three decades into occam-pi (which blends in the concurrency dynamics of Milner's pi-calculus). Also presented will be an urgent need for rationalisation - occam-pi is an experiment that has demonstrated significant results, but now needs time to be spent on careful review and implementing the conclusions of that review. Finally, the future is considered. In particular, is there a future?
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Paper (PDF), Slides (PDF), Slides (PowerPoint)

Fringe presentations

Title:	Mutually Assured Destruction (or the Joy of Sync)
Authors:	Peter H. Welch, Jan Bækgaard Pedersen, Frederick R. M. Barnes
Abstract:	In contrast to the Client-Server pattern, Mutually Assured Destruction (MAD) allows the safe opening of communication by either of two processes with the other. Should both processes commit to opening conversation together, both are immediately aware of this and can take appropriate action - there is no deadlock, even though the communications are synchronous. A common need for this pattern is in real-time control systems (e.g. robotics), artificial intelligence, e-commerce, model checking and elsewhere. A typical scenario is when two processes are given a problem to solve; we are satisfied with a solution to either one of them; whichever process solves its problem first kills the other and makes a report; the one that is killed also reports that fact. In this case, the opening communication between the two processes is the only communication (a kill signal). If both solve their problem around the same time and try to kill each other, MAD is the desired outcome (along with making their reports). A simple and safe solution (verified with FDR) for this pattern with occam synchronous communications is presented. This is compared with solutions via asynchronous communications. Although these avoid the potential deadlock that a naive strategy with synchronous communications would present, they leave a mess that has to be tidied up and this tidying adds complexity and run-time cost. The Joy of Sync arises from the extra semantic information provided by synchronous communications - that if a message has been sent, the receiving process has taken it. With this knowledge comes power and with that power comes the ability to implement higher level synchronisation patterns (such as MAD and Non-blocking Syncs) in ways that are simple, verifyably safe and impose very low overheads.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF), Slides (PowerPoint)

Title:	FDR3: the Future of CSP Model Checking
Authors:	Thomas Gibson-Robinson
Abstract:	Over the last couple of years a brand new version of FDR, FDR3, has been under development at Oxford. This includes greatly improved performance (typically 2.5 times faster than FDR2 on a single core) and, for the first time, a parallel model-checking mode that scales almost linearly with the number of available cores. In this talk I will give a demonstration of FDR3, including the new debug viewer, the integrated version of ProBE, and the enhanced error messages. FDR3 is due for general release this October.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF)

Title:	Using FDR to Model Check CSP-Like Languages
Authors:	Thomas Gibson-Robinson
Abstract:	This talk presents a demonstration of tyger, a tool that has been constructed to allow a large class of CSP-like languages to be simulated in CSP. In particular, FDR will be used to demonstrate model checking of a simple CCS version of the classic Dining Philosophers problem. The theory behind this demonstration will be presented in the paper "Efficient Simulation of CSP-Like Languages".
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF)

Title:	An Introduction to Go
Authors:	Rick D. Beton
Abstract:	Go is a new open-source programming language from a team of Google's developers. It provides a modern garbage-collected runtime that has particular support for concurrency. The pattern for this is based on Hoare's CSP, using channels for synchronization and communication, and discouraging the direct sharing of variables. Processes are light-weight co-operative threads that the runtime time-slices onto underlying operating system threads; this allows an application's natural concurrency to be expressed as fully as needed by the natural concurrency of the application domain. The presentation provides an overview of Go's main features, covering in particular the unusual interfaces and duck-typing, the reflection and the concurrency. Focussing on concurrency in detail, as an example, a simple web-crawler application is converted step by step to operate concurrently, and at each stage this allows a particular feature of Go's concurrency to be demonstrated.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF), Code (Zip)

Title:	The Guppy Language: an Update
Authors:	Frederick R. M. Barnes
Abstract:	As a possible successor and perhaps replacement language for occam-pi, we have been working piecemeal for the past few years on a new language - Guppy. Rather than being a completely new language, it aims to rationalise and simplify the concurrent programming concepts and idioms that we have explored whilst developing occam-pi. This short talk will look at the current state of the language and its implementation, that is now able to compile and run a simple "commstime" benchmark, scheduled by the existing occam-pi run-time system (CCSP).
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF)

Title:	An occam Model of XCHANs
Authors:	Peter H. Welch
Abstract:	Øyvind Teig, in "XCHANs: Notes on a New Channel Type", proposed a higher level channel construct (XCHAN) that attempts to reconcile those wedded to asynchronous message passing with the synchronous form in CSP. Sending a message does not block the sender, but the message may not get sent: the sender receives a success/fail result on each send. The XCHAN provides a conventional feedback channel on which it signals when it is ready to take a message. Being ready means that it has space (if it is buffered) or a reading process has committed to take the message (if it is not buffered). Sending to a ready XCHAN always succeeds; sending to an XCHAN that is not ready always fails. The sender can always wait for the signal from the XCHAN (whilst ALTing on, and processing, other events) before sending. We can model an XCHAN by a process in occam-pi. Buffered XCHANs are easy. Zero-buffered XCHANs are a little harder, because we need to maintain end-to-end synchronisation. However, occam-pi's extended input (??) and output (!!) primitives enable the process implementing the XCHAN to be hidden from its users. Unfortunately, extended outputs are not yet in the language, but their semantics can be simulated by making the receiving process read twice and ignore the first (which is just a signal whose taking must commit the reader to its second read). An important message is that sane higher level synchronisation mechanisms are usually not hard to implement efficiently via the low level CSP primitives offered by occam-pi. Although not yet measured for XCHANs, it is likely that such simulation in occam-pi will have competitive performance with direct implementation elsewhere.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF), Slides (PowerPoint)

Title:	Names of XCHAN Implementations
Authors:	Øyvind Teig
Abstract:	Two names for possible XCHAN implementations are suggested. The original presentation (Teig, CPA-2012) describes the "classic" scheme where the xchan-ready channel is used only if the original sending fails. The occam-pi model (Welch, CPA-2013 fringe) uses the "pre-confirmed" scheme, where a signal on xchan-ready is a necessary precondition to any communication. It is believed that "feathering" (Teig, CPA-2013) seems to be possible only with the classic scheme.
Bibliography:	Web page:BibTEX, Refer Plain text: BibTEX, Refer
Files:	Abstract (PDF), Slides (PDF)

If you have any comments on this database, including inaccuracies, requests to remove or add information, or suggestions for improvement, the WoTUG web team are happy to hear of them. We will do our best to resolve problems to everyone's satisfaction.

Copyright for the papers presented in this database normally resides with the authors; please contact them directly for more information. Addresses are normally presented in the full paper.