From: "Coetmeur, Alain" <alain.coetmeur@icdc.caissedesdepots.fr>
Newsgroups: comp.parallel
Subject: Re: Business use of Beowulf
Date: 21 Jul 1999 14:26:41 GMT
Organization: <NONE>
Approved: bigrigg@cs.cmu.edu
Message-Id: <7n4lb1$rrk$1@goldenapple.srv.cs.cmu.edu>
Originator: bigrigg@ux6.sp.cs.cmu.edu
Xref: ukc comp.parallel:15737


Peter Szwedyk a =E9crit dans le message
<7m3ci2$9kt$1@goldenapple.srv.cs.cmu.edu>...
>Issue: Is it possible to use a beowulf without re-writing
>existing business applications?
>I just want to summarize what I found and what conclusion
>I formed.  I would appreciate your comments or corrections:
>There are two common parallel computation models:
>- Message Passing
ok
>- Distributed Shared Memory
hum... I'll say just Multi-threading with shared memory.
implemented as:
- uniform memory access (like in SMP systems)
- non uniform memory access (NUMA)
- DSM

anyway with distributed memory computers like Beowulfs
only message passing and rarely DSM can be used.

>Beowulf on Linux employs the Message Passing model which means that
>existing serial applications, whether multi-threaded or not, will not
>benefit from a Beowulf system without re-writing.=20
yes
but it may be quite simple in some case
(add 50 lines),
or intrickable in some others
(rebuils all).

>It is my understanding that in a Distributed Shared Memory system
>existing serial, multi-threaded apps would automatically (in most
>cases) be able to take advantage of the cluster without a need for
>re-write. However, DSM is currently not supported in a Beowulf system.

maybe is it experimental. look at mosix
http://www.mosix.cs.huji.ac.il/
for info.

anyway with common networks interface
DSM is way too slow. moreover DSM on such system
is oftem much less precise that on NUMA system
and false sharing is more frequent.

even NUMA on machine like SGI O2000 may be too slow
on some multithreaded (eg with open MP). SMP
is often much more effective but if the program is
sharing too much data, a partial rewrite may be needed anyway.



it depend on your app ! (reuseable/ed answers)

>
>The conclusion I reached was that a business organization that is
>- willing to invest in building a Beowulf/Linux cluster
>- NOT willing to re-write existing serial apps (to Message Passing
>model)
>
>currently can NOT benefit from a Beowulf system.

with Mosix system you may benefit much, if
your program is structured in may=20
processes, with good concurency...
such an example in our business is=20
a complex natural language processing chain
with mail and web feeding, which is composed
of a hundred of processes, which can be made
very concurrent, and whose load can be balanced by mosix.

another trivial case is the paralell compilation
proposed by gnumake that launch independent compilations
as concurent processes, which can be migrated by mosix.

another case is a message passing program
which can use the transparent load balancing=20
to be more efficient, alone or with other programs.

some users (on beowulf mailing list)
have very good experience with PVM programs on Mosix.

>So what do these business organizations do?  Well, we could:
>- wait for DSM addition to the Beowulf system
IMHO DSM will never be enough effective without a
hardware support comparable to the NUMA architecture.
Maybe with SCI, will it be possible, but DSM
is not adapted to small grain memory exchange like=20
some multi thread programs.

anyway with some programs the memory is well
separated between threads and can be attached
to a node for a long time. to obtain this, you often have
to redesign the program is a way which is not
much different to message passing.
but DSM is much more complex to understand,
monitor, predict and debug than message passing is.

>- search for another DSM cluster solution. =20

>The MOSIX team is currently
>working on a Network RAM project. =20
the idea is to migrate processes to their data
instead of the reverse today.

anyway Mosix is a bundle of patches for Linux or Open BSD...


>Could this be it?  Are there other
>such projects?



>
>Please note that I do not claim that my findings/conclusions stated
>above are correct.  Please comment.  Thanks.
neither am I ... 8)

--
Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel

