Ключевые слова:cluster, freebsd, proccess, (найти похожие документы)
From: soup4you2
Newsgroups: http://bsdhound.com
Date: Sun, 21 Jan 2004 17:02:14 +0000 (UTC)
Subject: Установка Lam-mpi кластера под FreeBSD
Оригинал: http://www.bsdhound.com/newsread.php?newsid=205
Installing Lam-mpi Cluster on FreeBSD How to
A cluster is used to make a collection
of 2 or more computers run as a single super computer. Clusters can be
used to increase reliability and/or increase performance and resources
available. A Beowulf cluster is a group of usually identical PC
computers that are networked together into a TCP/IP LAN, and have
libraries and programs installed which allow processing to be shared
among them.
Now before you get all happy here it's important to know that the
applications need to be written for mpicc in order to utalize a
cluster resource. you can consult the lam (http://lam-mpi.org/) website
for information and tutorials on it.
Lets begin this quick and dirty howto.
The first thing you need to take care of is each node on the cluster
needs a DNS name. If your not running a DNS server using the
/etc/hosts file will work just fine. I'm not going to get into the
configuration of bind; Ill save that for a later date.
Next our server needs to be configured as a NFS Server.
Server:
($:~)=> vi /etc/rc.conf
nfs_server_flags="-u -t -n 4 -h 10.0.5.100" #Replace with your internal ip address.
mountd_enable="YES"
mountd_flags="-l -r"
rpcbind_enable="YES"
rpcbind_flags="-l -h 10.0.5.100" #Replace with your internal ip address.
nfs_server_enable="YES"
Then our client nodes need to be configured as a NFS client.
Client:
($:~)=> vi /etc/rc.conf
nfs_client_enable="YES"
Next thing we need to export our /home directory
Server:
($:~)=> vi /etc/exports /home -maproot=0:0 -network 10.0.5.0 -mask 255.255.255.0
Now each client needs to mount it
Client:
($:~)=> vi /etc/fstab
10.0.5.100:/home /home nfs rw 0 0
Make sure your NFS share is working properly before continuing.
Now we install the lam-mpi clustering software. Do this for all
computers on the cluster.
All:
($:~)=> cd /usr/ports/net/lam
($:~)=> make install clean
Next lets install some software to help us monitor the clusters.
All:
($:~)=> cd /usr/ports/sysutils/ganglia-monitor-core
($:~)=> make install clean
On the server, we need the web interface for this. You should already
have a web server setup with PHP installed and configured for the GD
graphics library support.
Server:
($:~)=> cd /usr/ports/sysutils/ganglia-webfrontend
($:~)=> make install clean
Now onto the configurations.
All:
($:~)=> cp /usr/local/etc/gmond.conf.sample /usr/local/etc/gmond.conf
($:~)=> vi /etc/gmond.conf
There are 2 important areas to change in this file. The rest Ill leave
to your digression.
First being your cluster name:
All:
name "ClusterName"
Next the interface we wish to use for the cluster.
All:
mcast_if xl0
($:~)=> cp /usr/local/etc/gmetad.conf.sample /usr/local/etc/gmetad.conf
Here we need to tell our monitors what hosts are available. Put a
entry for every computer on the cluster.
All:
data_source "ClusterName"10 node1.yourdomain.com:8649
node2.yourdomain.com:8649
Make sure ClusterName matches the name in the gmond.conf configuration
file. The 10 is the polling interval followed by the computers in the
cluster.
Now our monitoring software is configured lets configure the cluster
software.
All:
($:~)=> vi /usr/local/etc/lam-bhost.def
Configuration for this is easy. Just put in the full domain names to
each box.
All:
Node1.yourdomain.com
Node2.yourdomain.com
Now lets fire this puppy up.
All:
($:~)=> mv /usr/local/etc/rc.d/gmetad.sh.sample /usr/local/etc/rc.d/gmetad.sh
($:~)=> mv /usr/local/etc/rc.d/gmond.sh.sample /usr/local/etc/rc.d/gmond.sh
($:~)=> /usr/local/etc/rc.d/gmetad.sh start
($:~)=> /usr/local/etc/rc.d/gmond.sh start
Now on the server or whatever node you choose we need to start the
cluster. Run this as a underprivileged user.
Server:
($:~)=> lambood -dv
you should be presented with something like this:
lamboot: boot schema file: /usr/local/etc/lam-bhost.def
lamboot: opening hostfile /usr/local/etc/lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 node1.yourdomain.com
lamboot: n1 node2.yourdomain.com
lamboot: resolved hosts:
lamboot: n0 node1.yourdomain.com --> 10.0.5.100
lamboot: n1 node2.yourdomain.com --> 10.0.5.105
lamboot: found 2 host node(s)
lamboot: origin node is 0 (node1.yourdomain.com)
Executing hboot on n0 (node2.yourdomain.com - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
10.0.5.100 -P 57552 -n 0 -o 0 ""
hboot: process schema = "/usr/local/etc/lam-conf.lam"
hboot: found /usr/local/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/local/bin/lamd
[1] 44660 lamd -H 10.0.5.100 -P 57552 -n 0 -o 0 -d
hboot: attempting to execute
Executing hboot on n1 (node2.yourdomain.com - 1 CPU)...
lamboot: attempting to execute "/usr/bin/ssh node2.yourdomain.com -n echo $SHELL"
lamboot: got remote shell /usr/local/bin/bash
lamboot: attempting to execute "/usr/bin/ssh node2.yourdomain.com -n
hboot -t -c lam-conf.lam -d -v -s -I "-H 10.0.5.100 -P 57552 -n 1 -o 0 ""
hboot: process schema = "/usr/local/etc/lam-conf.lam"
hboot: found /usr/local/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/local/bin/lamd
[1] 53214 lamd -H 10.0.5.100 -P 57552 -n 1 -o 0 -d
topology done
lamboot completed successfully
Looks good. Now lets make sure all of our clients are attached.
Server:
($:~)=> lamnodes
n0 node1.yourdomain.com:1
n1 node2.yourdomain.com:1
Congratulations.. Your clustered. You may open up your browser and
view /usr/local/www/data-dist/ganglia and ultimately setup a point on
your web server to view it.
Now so how do i use this cluster?
some commands that i commenly use are:
Server:
($:~)=> tping N
1 byte from 1 remote node and 1 local node: 0.002 secs
1 byte from 1 remote node and 1 local node: 0.001 secs
1 byte from 1 remote node and 1 local node: 0.001 secs
The tping command is same as ping but it's used to ping the nodes in
the cluster. the N (uppercase) means all nodes in the cluster. If i
just wanted to ping node2.yourdomain.com i would use the lamnodes
command to find out the number associated with that node then run
tping n1 (n1 being node2.yourdomain.com)
Another benifit is i can sit on one machine and tell the cluster to
start applications on the other machines and return the display to the
monitor i'm on.. Lets try it shall we:
Server:
($:~)=> lamexec N echo "hi"
hi
hi
Since i used the uppercase N meaning all nodes it ran the echo "hi" on
both pc's returning the results to the 1 machine. i would suggest
reading up on lamexec for other information and tips you can do with
it. so how can you be sure it' running these processes on both pc's?
watch this:
Server:
($:~)=> lamexec N hostname
node1.yourdomain.com
node2.yourdomain.com
Also read the man lamd page it contains other useful programs for your
cluster. Enjoy and happy crunching.
Some Links Of Interest:
Computer Clusters Profiles on TechTV
http://www.techtv.com/screensavers/answerstips/story/0,24330,2554333,00.html
Offmyserver building a Beowulf cluster
http://www.offmyserver.com/cgi-bin/store/cluster.html
Brooks paper on building a FreeBSD cluster
http://people.freebsd.org/~brooks/papers/bsdcon2003/
LAM-MPI Parallel computing page
http://lam-mpi.org/
LAM-MPI Download Page (For Mac Binaries If Needed)
http://www.lam-mpi.org/7.0/download.php
FreeBSD Cluster forum
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
Deploying Mac OS X Clusters
http://www.cmu.edu/computing/project/macosx/
Playstation 2 Super Computer
http://www.techtv.com/screensavers/supergeek/story/0,24330,3474732,00.html