From: Andrew Dalke <dalke@bioreason.com>
Newsgroups: comp.parallel
Subject: Re: seti@home -> general purpose parellel machine
Date: 21 Jul 1999 14:26:08 GMT
Organization: Bioreason, Inc.
Approved: bigrigg@cs.cmu.edu
Message-Id: <7n4la0$rr6$1@goldenapple.srv.cs.cmu.edu>
Originator: bigrigg@ux6.sp.cs.cmu.edu
Xref: ukc comp.parallel:15743


Hello,

  Just a few thoughts on the post of Ben Houston
<bhouston@chat.carleton.ca> on the requirements of developing
a toolkit for wide-area distributed programming.

  I seem to recall some discussion about this on.. distributed.net
a few years ago?  Don't remember.  That's likely your best place
to start.

> - Probably should let the user choose which project (or type of
> project) they would like to dedicate their CPU cycles to.

Definitely.

Note that some projects may be illegal in some countries. 
For example, there is a distributed project to crack the
encryption codes for some European satellite TV.  Apparently
in some countries this is legal and in some it isn't.

So the end-user must be able to say yea or nay on a project.
Also, it may require the the coordinator know the location of
the client machines, in order to determine legality.

You'll also need several different organizations which can act
as a servers, independent of other servers.  Otherwise, in
addition to the legal aspects I mentioned you'll have a single
organization in charge of what can or cannot be distributed.
If the project gets large, that means:
  1) one place has veto power over a project; "fringe" projects
could be ignored
  2) that place will end up needing commercial funds to manage
all of the projects (see later for how to get funds), which makes
it more suspectible to corporate decisions on what projects are
or are not allowed.
  3) increases the possibility of single point of failure

  Not that there shouldn't be companies involved; I just prefer
a framework where many people can be involved.

> - There should be better security on the data files being
> passed around.

Recall that there are US export restrictions on strong crypto.
Now remember that the idea of your project makes it easier to
break weak crypto :)

> - I believe these clients should work off of servers instead
> of trying to do inter-client communication.

  I agree with that reason, and also because it is nicer to
let a single site in through a firewall than any possible host.
(I'm thinking of the poor sys admin who starts seeing 3000 new
hosts talking to the network and gets worried about a new type
of network attack.)

> Benefits of having one client application for all distributed
> projects are:
> 
> - Standardization is always good.

  I agree that standardization is good.  I disagree that having
one client is the right solution.  All you need are ways to
mediate:
 o  admin information (eg, to indicate that 10 machines are
       contributing to the same project)
 o  choice of programs to run locally
 o  distribution of the program
 o  exchange of the intermediate results
 o  some control issues (restart, shutdown, program finished, etc.)

These are what must be standardized.  Then there can be several
clients which speak the protocol.


Here's a question for the server.  How do you detect that there
isn't a modified client sending back deliberately corrupted results?


Anyway, to the money part.  You didn't include one possible use
of this resource -- sell computer time, with the server getting
some percentage of the total.

Most companies have a lot of machines sitting idle over the
weekend or evening.  If your project is feasible (and sufficiently
secure), this could be a way for it to recoup costs.

Problem is, most of the projects I know of which rent computer
access are for things like weather simulation or other sorts of
large scale modelling, which aren't very easy to adapt to the
master/slave & low bandwidth model you describe (the Mersenne prime
client exchanged a message every couple of weeks).  So it would
have to start from its own niche, which likely means a long startup
phase for commercial viability.

In addition, it would need to make enough money to be worth the
hassle of admin and accounting overhead.  Let's assume the total
effort is 1/5th of a full-time employee's time and overhead.  This
is about $15-20,000/year.

Now consider that a client program only has access to a little
bit of RAM and disk space; this hardware is effectively what you
could buy for about $1000.  Thus, any customer would have to
decide if it wasn't better just to buy 10 machines outright,
which would be available all the time.

So what that's saying is, don't bet your house on making money
on this in the next couple of years :)

I was just thinking on how we might use such a system.  We're
looking at having to get information about 250,000 data objects.
It's coarse-grain parallelizable, with each evaluation taking
about a minute, so 170 days total on a single machine.  We have
several machines but if the problem was larger we would buy a
few more, meaning they would go to "waste" when the task is
done.

In that case, being able to buy time would be useful if the
toolkit was available so we could use it for testing and local
deployment; only farming it out for those times when it would
take too much time.  This way we wouldn't run into development
costs by having to change to another communications method.

But the problem is we depend on node-locked licensed software
which is platform specific.  This shoots down the idea right
away.

So you're left with projects where most of the software can be
developed from scratch (or is freely available) and can be easily
retargeted to different platforms.  I take that back.  What you
want is some way to make it easy for a client app to talk to the
mediator for the current machine.  Say, have it as a COM inteface
for Windows, a demon or CORBA app for Unix, and an AppleScript (?)
interface for the Mac.  Then the user can decide if an app is
trustworthy to run, and that app talks to the local mediator to
connect to the appropriate main server.

Hmm, that seems to the most reasonable way.

Also note that the current distributed projects work because
they are not commerical; the cost is part of the hobby and
doesn't need to make money.  That would likely continue to be
your target domain.


Anyway, just some ideas which came to mind.

						Andrew Dalke
						dalke@acm.org

--
Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel