LAM - introduction to Local Area Multicomputer (LAM)
LAM features a full implementation of the MPI communication standard, with the exception that the MPI_CANCEL function will not properly cancel messages that have been sent.
% cat lamhosts # a 2-node LAM beowulf1.lam-mpi.org beowulf2.lam-mpi.org
Each machine will be given a node identifier (nodeid) starting with 0 for the first listed machine, 1 for the second, etc.
The recon(1) tool verifies that the cluster is bootable.
% recon -v lamhosts recon: -- testing n0 (beowulf1.lam-mpi.org) recon: -- testing n1 (beowulf2.lam-mpi.org)
The lamboot(1) tool actually starts LAM on the cluster.
% lamboot -v lamhosts LAM 6.5.6 - University of Notre Dame Executing hboot on n0 (beowulf1.lam-mpi.org)... Executing hboot on n1 (beowulf2.lam-mpi.org)...
lamboot(1) returns to the UNIX shell prompt. LAM does not force a canned environment or a "LAM shell". The tping(1) command builds user confidence that the cluster and LAM are running.
% tping -c1 N 1 byte from 2 nodes: 0.009 secs
% mpicc -o foo foo.c % mpif77 -o foo foo.f
% mpirun -v -c 2 trivial 2445 trivial running on n0 (o) 361 trivial running on n1
An application with multiple programs must be described in an application schema, a file that lists each program and its target node(s). See appschema(5).
% cat appfile # 1 master, 2 slaves n0 master n0-1 slave % mpirun -v appfile 3292 master running on n0 (o) 3296 slave running on n0 (o) 412 slave running on n1
Applications can choose, at run-time, to use the "daemon" mode of communication or the "client-to-client" mode. Each has advantages and disadvantages, which are discussed in MPI(7).
% mpitask TASK (G/L) FUNCTION PEER|ROOT TAG COMM COUNT DATATYPE 0/0 trivial Ssend 1/1 123 WORLD 64 INT 1/1 trivial Recv 0/0 456 WORLD 64 INT
Process rank 0 is blocked sending a synchronous message (MPI_Ssend()) to process rank 1 on tag 123 using the MPI_COMM_WORLD communicator. The message contains 64 integers. Process rank 1 is blocked on MPI_Recv() on the same communicator with a different tag.
% mpimsg SRC (G/L) DEST (G/L) TAG COMM COUNT DATATYPE MSG 0/0 1/1 123 WORLD 64 INT n1,#0
The unreceived message can be examined with mpimsg(1). The expected tag and communicator are shown, along with a message identifier that can be used to display the message contents.
% lamclean -v killing processes, done sweeping messages, done closing files, done sweeping traces, done
This command is frequently used between MPI runs, especially while developing and debugging MPI programs.
% lamhalt LAM 6.5.6 - University of Notre Dame
Alternatively, if for some reason lamhalt(1) is not able to shut the running LAM down properly, the deprecated wipe(1) command can be used with the boot schema that was used to originally boot LAM:
% wipe -v lamhosts Executing tkill on n0 (beowulf1.lam-mpi.org)... Executing tkill on n1 (beowulf2.lam-mpi.org)...
The unique software engineering of LAM is transparent to users and system administrators, who only see a conventional daemon. System developers can de-cluster the daemon into a daemon containing only the nano-kernel and several full client processes. This developer's mode is still transparent to users but exposes LAM's highly modular components to simplified individual debugging. It also reveals LAM's evolution from Trollius, which ran natively on scalable multicomputers and joined them to the UNIX network through a uniform programming interface. Trollius is the ultimate heterogeneous parallel environment.
The network layer in LAM is a documented, primitive and abstract layer on which to implement a more powerful communication standard like MPI.
A significant portion of the MPI specification can be (and is) implemented completely within the runtime system and independent of the underlying environment.
As with all MPI implementations, LAM must synchronize the launch of MPI applications so that all processes locate each other before user code is entered. The mpirun(1) command achieves this after finding and loading the program(s) which constitute the application. A simple SPMD application can be specified on the mpirun(1) command line, while a more complex configuration is described in a separate file, called an application schema.
MPI programs developed on LAM can be moved without source code changes to any other platform that supports MPI.
Закладки на сайте Проследить за страницей |
Created 1996-2024 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |