High Performance Compute Clustering with Windows
University of Tennessee
Innovative Computing Laboratory
Computer Science Department
Jack Dongarra
Windows Cluster Project
People
Jack Dongarra
George Bosilca
Dave Cronk
Julien Langou
Piotr Luszczek
Projects:
1. Numerical Linear Algebra Algorithms and Software
a. LAPACK, ScaLAPACK, ATLAS
b. Self Adapting Numerical Algorithms (SANS) Effort
c. Generic Code Optimization
d. LAPACK For Clusters – easy access to clusters
2. Heterogeneous Distributed Computing
a. NetSolve, FT-MPI, Open-MPI
3. Performance Evaluation
a. PAPI, HPC Challenge, Top500
4. Software Repositories
a. Netlib
LAPACK
1. Used by Matlab, Mathematica, Numeric Python,…
2. Tuned version provided by vendors: AMD, Apple, Compaq, Cray, Fujitsu, Hewlett-Packard, Hitachi, IBM, Intel, MathWorks, NAG, NEC, PGI, SUN, Visual Numerics, by Microsoft and most of Linux distribution (Fedora, Debian, Cygwin,...).
3. On going work: performance, accuracy, extended precision, ease of use
ScaLAPACK
1. Parallel implementation of LAPACK scaling on parallel hardware from 10’s to 100’s to 1000’s of processors
2. On going work: Match functionalities of current LAPACK
3. On going work: Target new architectures, new parallel environment. For example port to Microsoft HPC cluster solution
LAPACK for Clusters (LFC)
1. Most of ScaLAPACK functionality from serial clients (Matlab, Python, Mathematica)
FT-MPI and Open-MPI
1. Define the behavior of MPI in event a failure occurs at the process level.
2. FT-MPI based on MPI 1.3 (plus some MPI 2 features) with a fault tolerant model similar to what was done in PVM.
3. Complete reimplementation, not based on other implementations.
a. Gives the application the possibility to recover from a process-failure.
b. A regular, non fault-tolerant MPI program will run using FT-MPI.
c. What FT-MPI does not do:
4. Recover user data (e.g. automatic check-pointing)
5. Provide transparent fault-tolerance
Performance Application Programming Interface (PAPI)
1. A portable library to access hardware counters found on processors
2. Provides a standardized list of performance metrics
KOJAK (Joint with Felix Wolf)
1. Software package for the automatic performance analysis of parallel apps
2. Message passing and multi-threading (MPI and/or OpenMP)
3. Parallel performance
4. CPU and memory performance
Posters for Related Projects
· FT-MPI
· HPCC
· Kojak
· LAPACK / ScaLAPACK
· NetSolve / ActiveSheets
· NetSolve / .NET
· Open MPI
· PAPI
· top500
|
Hardware Configuration |
|
Team HPC |
|
Dual Core 4GB AMD Opterons |
|
Team HPC Turnkey Beowulf-Class Supercomputer |
|
26 4GB AMD Opteron DC Compute Nodes, 1 Head Node |
|
CPU Manufacturer |
AMD |
|
CPU Model |
Opteron 265 |
|
CPU Speed |
1.8 GHZ |
|
Number of nodes |
26 |
|
Number of cores |
2 |
|
Interconnect(s) |
Infiniband, Myranet, GigE |
|
|
|
Item Description |
QTY |
|
26 Compute Nodes |
|
Supermicro H8DCE Motherboard |
26 |
|
3U Chassis w/ 350W PS with PCI-E riser & Slide Rails |
26 |
|
AMD Opteron 265 1.8GHz with Heatsink |
52 |
|
4GB PC3200 Registered/ECC DDR |
104 |
|
1Gb X4 Total memory |
|
80GB 7200rpm SATA 8 MB cache HDD |
26 |
|
ATI Rage on board |
26 |
|
Dual Gigabit Ethernet Integrated on board |
26 |
|
One Year Standard Warranty |
26 |
|
Opteron Linux Installed and Tested |
26 |
|
Built, Tested & Configured |
26 |
|
Torque, Kick-Start Utility & Web-Based Mon. Software |
|
Head Node 4Gb per Node |
|
|
|
Supermicro H8DCE Motherboard |
1 |
|
3U Chassis w/ PS and Slide Rails |
1 |
|
AMD Opteron 265 1.8 GHZ with Heatsink and Fan |
2 |
|
4GB PC3200 Registered/ECC DDR |
4 |
|
1GB X 4 Total memory |
|
DVD Combo Drive |
1 |
|
ATI Rage on board |
1 |
|
Dual Gigabit Ethernet Integrated on board |
1 |
|
42U APC Rack Enclosure with perforated doors, sides and levelers |
2 |
|
APC Masterswitch 3 Phase 208 |
2 |
|
Wiring Harness |
1 |
|
1U All in one KB, VIDEO and MOUSE |
1 |