High Performance Computing and Communications Glossary 2.1

A significant part of the material of this glossary was adapted from material originally written by Gregory V. Wilson which appeared as "A Glossary of Parallel Computing Terminology" (IEEE Parallel & Distributed Technology, February 1993), and is being re-printed in the same author's "Practical Parallel Programming" (MIT Press, 1995). Several people have contributed additions to this glossary, especially Jack Dongarra, Geoffrey Fox and many of my colleagues at Edinburgh and Syracuse.

Original version is from NPAC at <URL:http://nhse.npac.syr.edu/hpccgloss/>

Original author: Ken Hawick, khawick@cs.adelaide.edu.au

See also the index of all letters and the full list of entries (very large)

Sections: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

S

satisfiable (adj.) true under the rules of logic.

SAXPY (n.) elemental BLAS operation involving "constant (a) times vector (x) plus vector (y)". The S refers to Fortran Single precision. SAXPY and related BLAS operations are often implemented by the hardware manufacturer as part of the system software and the execution time for such operations has been used a a primitive benchmark of a high performance computing system.

scalable (adj.) Capable of being increased in size, or more accurately, capable of delivering an increase in performance proportional to an increase in size. A scalable architecture is one that can be used as a design for arbitrarily large machines, or one whose increase in performance is linear in the amount of hardware invested. The term is also applied to programming systems, although its meaning is less clear in these cases. It is generally used to imply that the methods and algorithms employed in such systems are in principle capable of performing correctly equally well on large and small hardware systems. See also Gustafson's Law

ScaLAPACK (n.) A linear algebra software package, which has been mounted on a wide range of platforms. This is a version of LAPACK suitable for distributed memory computer systems. The software is available from the National Software Exchange. See also LAPACK.

scalar processor (n.) A computer in the traditional Von Neumann sense of operating only on scalar data. See also uniprocessor, vector processor, array processor.

scalar temporary (n.) a compiler created scalar variable that holds the value of a vectorizable expression on each iteration of a DO-loop.

scan (n.) See parallel prefix.

scan-vector model (n.) A theoretical model of parallel computing in which a scalar processor and a vector processor have access, respectively to a memory holding scalar values and a memory holding vectors of arbitrary length. Vector operations take either a single time step or a time proportional to the logarithm of the number of elements. See parallel prefix, data parallelism, reduction operation.

scattered decomposition (n.) See decomposition.

scheduling (n.) Deciding the order in which the calculations in a program are to be executed, and by which processes. Allocating processes to processors is usually called mapping. See also load balance.

scoreboard (n.) A hardware device that maintains the state of machine resources to enable instructions to execute without conflict at the earliest opportunity.

SCSI (n.) Small Computer Systems Interface; a hardware standard for interfacing to devices such as disks.

secondary memory (n.) a larger but slower memory than primary memory. Access to secondary memory often requires special instructions, such as I/O instructions.

segmented parallel prefix operation (n.) A parallel prefix operation in which the vector being operated on is divided into segments, and the operator is applied to each segment as if it were a separate vector. This is usually implemented by supplying a separate vector of segment lengths.

self-scheduling (adj.) Automatically allocating work to processes. If T tasks are to be done by P processors, and P < T, then they may be self-scheduled by keeping them in a central pool from which each processor claims a new job when it finishes executing its old one. See also task farming.

semaphore (n.) A data type for controlling concurrency. A semaphore can be initialised to any non negative integer value. After that, only two operations may be applied to it: "signal" which increments the semaphore's value by one, and "wait" which blocks its caller until the semaphore's value is greater than zero, then decrements the semaphore. The value of a semaphore typically represents the amount of some resource that is currently available, while waiting on a semaphore forces processes to block until some of that resource can be claimed. A binary semaphore is one that can only take on the values 0 and 1.

sequential bottleneck (n.) A part of a computation for which there is little or no parallelism. See also Amdahl's Law.

sequential computer (n.) synonymous with a Von Neumann architecture computer and is a "conventional" computer in which only one processing element works on a problem at a given time. See also uniprocessor.

serialize (v.) To put potentially concurrent operations in a strictly sequential order. If concurrent processes must claim a lock before doing some operation, for example, then their operations will be serialized.

service kernel (n.) See kernel.

set associative (n.) A cache structure in which all tags in a particular set are compared with an access key in order to access an item in cache. The set may have as few as one element or as many elements as there are lines in the full cache.

shared memory (n.) Memory that appears to the user to be contained in a single address space and that can be accessed by any process. In a uniprocessor or multiprocessor there is typically a single memory unit, or several memory units interleaved to give the appearance of a single memory unit. See also disjoint memory, distributed memory.

shared variables (n.) Variables to which two or more processes have access, or the model of parallel computing in which interprocess communication and synchronization are managed through such variables. See also data parallelism, futures, generative communication, live variable, message passing.

short necklace (n.) a necklace of length less than k in a shuffle exchange network containing 2^k nodes.

shortstopping (v.) using the output of a functional unit before it is routed back to memory.

shuffle exchange network (n.) A topology containing N = 2^L nodes, each of which is labeled by a unique L-bit integer. If two nodes have labels <|IL...I0|> and <|JL...J0|>, then I and J are connected if Ik=Jk for 1<=k<(L-1) and I0<>J0, or if J is a left or right cyclic shift of I. See also butterfly, hypercube.

SIMD (adj.) Single instruction, multiple data; a category of Flynn's taxonomy in which a single instruction stream is concurrently applied to multiple data sets. A SIMD architecture is one in which homogeneous processes synchronously execute the same instructions on their own data, or one in which an operation can be executed on vectors of fixed or varying size. See also array processor, processor array, vector processor.

SIMD-CC (n.) cube-connected (hypercube) SIMD architecture.

SIMD-CCC (n.) cube connected cycles SIMD architecture. See also SIMD-CC.

SIMD-MC (n.) mesh connected SIMD architecture. Superscript refers to the number of dimensions in the architecture.

SIMD-P (n.) SIMD architecture with pyramid processor organization.

SIMD-PS (n.) SIMD architecture with perfect shuffle connections.

SIMD-SM (n.) shared memory SIMD architecture. Two processors may not read from or write into the same shared memory location during the same instruction.

SIMD-SM-R (n.) shared memory SIMD architecture. Two processors may read from the same memory location during an instruction, but concurrent writing into the same location is forbidden.

SIMD-SM-RW (n.) shared memory SIMD architecture. Concurrent reading from and writing to the same memory location is allowed.

SNMP (n.) simple network management protocol, a network management tool used in TCP/IP based networks that is used to manage the network equipment and processes. Usually graphic on an X-window display.

simple network management protocol (n.) See SNMP.

simulated annealing (n.) An optimization technique introduced by Kirkpatrick in 1983 which uses statistical physics methods to find an approximate optimal solution to a problem. Typically a thermodynamic analogy is used for the model system under study and the task of finding an optimal solution is mapped to that of finding the ground state of the thermodynamic system.

single instruction, multiple data (adj.) See SIMD.

single instruction, single data (adj.) See SISD.

single program, multiple data (adj.) See SPMD.

single-source shortest-path problem (n.) problem of finding the shortest path from a single designated vertex (the source) to all the other vertices in a weighted, directed graph.

SISD (adj.) Single instruction, single data; a category of Flynn's taxonomy in which a single instruction stream is serially applied to a single data set. Most uniprocessors are SISD machines.

SLIP (n.) serial line internet protocol is used to run IP over serial lines, telephone circuits or RS-232 cables connecting two hosts.

SMTP (n.) simple mail transfer protocol is the internet's electronic mail protocol.

software lockout (n.) See lock.

SOR (n.) Successive over-relaxation is a technique for accelerating the convergence of relaxation methods for solving sets of simultaneous linear equation, Ax=b. It typically involves adding an appropriate multiple of the unit matrix to the matrix of coefficients A.

space complexity (n.) space used up by an algorithm as a function of problem size.

space sharing (n.) Dividing the resources of a parallel computer among many programs so they can run simultaneously without affecting one another's performance. See also time sharing.

spanning tree (n.) A tree containing a subset of the links in a graph which reaches every node in that graph. A spanning tree can always be constructed so that its depth (the greatest distance between its root and any leaf) is no greater than the diameter of the graph. Spanning trees are frequently used to implement broadcast operations.

SPARC (n.) Scalable Processor ARChitecture; a family of chips which can be manufactured using a variety of technologies, but still be compatible in some ways.

sparse matrix (n.) A matrix with the majority of its elements equal to zero. See also full matrix.

spawn (v.) To create a new process with arbitrary initial memory contents and instructions. See also fork.

speedup (n.) The ratio of two program execution times, particularly when times are from execution on 1 and P nodes of the same computer. Speedup is usually discussed as a function of the number of processors, but is also a function (implicitly) of the problem size. See also Amdahl's Law, Gustafson's Law, isoefficiency, optimal.

spin lock (n.) an implementation of the lock primitive that causes a processor to retest a semaphore until it changes value. Busy-waits will spin until the lock is free.

spinning (adj.) a process waiting for the value of a spin lock to change is said to be spinning.

SPMD (adj.) Single program, multiple data; a category sometimes added to Flynn's taxonomy to describe programs made up of many instances of a single type of process, each executing the same code independently. SPMD can be viewed either as an extension of SIMD, or as a restriction of MIMD. See also process group, SISD.

SPX (n.) Novell's proprietary version of TCP.

SQL (n.) Standard Query Language; a standard for adding data to, or recovering data from, databases.

SRAM (n.) Static RAM; memory which stores data in such a way that it requires no memory refresh cycle and hence have low power consumption. Generally this type of RAM is faster but more expensive than DRAM.

startup cost (n.) The time taken to initiate any transaction with some entity. The startup cost of a message passing system, for example, is the time needed to send a message of zero length to nowhere. See also latency.

static channel naming (n.) a message passing scheme in which source and destination designators are fixed at compile time.

static decomposition (n.) task allocation policy that assumes tasks and their precedence relations are known before execution.

stencil (n.) A pattern of data accesses used when updating the values in a mesh. A stencil is usually represented as a grid around a central point, which indicates the location of the value being updated.

stream parallelism (n.) a pipelined variant of AND parallelism.

stream unit (n.) transmits vectors into the vector arithmetic section on some vector CPUs.

strength reduction (n.) a process whereby a compiler attempts to replace instructions with less time-costly instructions that produce identical results. For example in Fortran, X**2 might be automatically replaced by X*X.

stride (n.) a term derived from the concept of walking or striding through data from one location to the next. The term is often used in relation to vector storage. A mathematical vector is represented in a computer as an array of numbers. Most computers use contiguous storage of vectors, locating them by the address of the first word and by the vector length. Many applications in linear algebra however, require load and store of vectors with components that do not reside contiguously in memory, such as the rows of a matrix stored in column order. The row vectors are spaced in memory by a distance or stride of the leading dimension of the array representing the matrix. Some vector computers allow vector fetching and storing to occur with randomly stored vectors. An index vector locates successive components. This is often useful in storing the nonzero elements of a sparse vector.

striped (adj.) A file system that distributes files across disks by distributing individual blocks, often at the single-bit level. See also declustered.

stripmining (n.) a process used by a compiler on a register-to-register vector processor whereby a DO-loop of long or variable iteration count is executed in strips which are the length of a vector register, except for a remainder strip whose length is less.

strong search (n.) an algorithm that searches for a given key, locks the node associated with that key, and returns the node.

strongly ordered game tree (n.) game tree with the following properties: (1) 70 percent of the time the first move chosen from any nonterminal node is the best move, and (2) 90 percent of the time the best move from any nonterminal node is one of the first 25 percent of the moves searched.

subcube (n.) term for a subset of nodes of a hypercube hypercube. The hypercube architecture has a natural recursive definition so that a cube of dimension d1 includes within it 2^(d1-d2) lower dimensional sets of nodes, each of which is itself a hypercube of dimensionality d2. These subsets of nodes are known as subcubes.

subgraph (n.)given a graph, a subgraph is another graph whose vertices and edges are in the original graph.

subnet address (n.) An extension of the IP address that allows a network to be autonomous by itself and still be a subsection of a larger user network.

successive over relaxation (n.) See SOR.

supercomputer (n.) a time dependent term which refers to the class of most powerful computer systems worldwide at the time of reference. See also concurrent computer, parallel computer.

superlinear speedup (n.) Speedup that is greater than an amount proportional to the number of processors used. While super-linear speedup is theoretically impossible, in practice it may occur because distributing a problem among many processors may increase the effective total size of the cache being used, or because distribution may change the order in which nondeterministic operations are carried out, which can lead to earlier termination of the program.

superword (n.) a term used on some computers to describe a group of eight 64-bit words, or alternatively, sixteen 32-bit half-words. The memory units on such machines, generally fetch and store data in superwords (also called swords), regardless of the size of the data item referenced by the user program.

SV+TM (n.) Strataview Plus, a UNIX based graphic user interface used to manage frame relay networks.

swap (v.) To exchange two items. The term refers to swapping a section of real memory (contents) for a section of virtual memory.

switch (n.) A physical communication medium containing nodes that perform only communications functions. Examples include crossbar switches, in which N+M buses cross orthogonally at NM switching points to connect N objects of one type to M objects of another, and multistage switches in which several layers of switching nodes connect N objects of one type to N objects of another type. See also butterfly, combining, network, shuffle exchange network.

synchronization (n.) The act of bringing two or more processes to known points in their execution at the same clock time. Explicit synchronization is not needed in SIMD programs (in which every processor either executes the same operation as every other or does nothing), but is often necessary in SPMD and MIMD programs. The time wasted by processes waiting for other processes to synchronize with them can be a major source of inefficiency in parallel programs. See also asynchronous, barrier synchronization, synchronous.

synchronous (adj.) Occurring at the same clock time. For example, if a communication event is synchronous, then there is some moment at which both the sender and the receiver are engaged in the operation. See also asynchronous.

systolic (adj.) Driven by the availability of data or other resources. In a systolic system, processes execute operations synchronously as their inputs become available.

systolic array (n.) A class of parallel computers with a fixed array of fine grain nodes through which data is pumped by a master or control processor.