System Capabilities

Compute Node Hardware

There are three kinds of compute nodes in the cluster: General Computing Nodes, GPU Nodes, and Petascale Data Analysis Facility (PDAF) Nodes. The current specifications for each type of node are as follows:

General Computing Nodes
Processors Dual-socket, 8-core, 2.6GHz Intel Xeon E5-2670 (Sandy Bridge)
Memory 64GB (4GB/core) (128GB memory optional)
Network 10GbE (QDR InfiniBand optional)
Hard Drive 500GB onboard (second hard drive or SSD optional)
Warranty 3-years
GPU Nodes
Host Processors Dual-socket, Intel Xeon E5-26xx (clock rate and core count TBD) (Sandy Bridge)
GPUs NVIDIA GeForce GTX680 or Tesla K20 (count TBD)
Memory 32GB
Network 10GbE (QDR InfiniBand optional)
Hard Drive 500GB + 240GB SSD
Warranty 3-years
PDAF (shared resource; pay-as-you-go only)
Processors 8-socket, 4-core AMD Shanghai Opteron
Memory 512 GB
Network 10 GbE

(RCI will annually update the hardware choices for general computing and GPU condo purchasers, to stay abreast of technology/cost advances.)

Network, Storage, Usage Model, and Plan Details

Network

Nodes with the QDR InfiniBand (IB) option will plug into 32-port IB switches, allowing up to 512 cores to communicate at full bisection bandwidth for low latency, parallel computing. 

Storage

TSCC users will receive 100GB of home file storage and shared access to the >200TB Data Oasis Lustre high performance file system.  (There is a 90-day purge policy on Data Oasis for normal usage.)  Additional persistent storage can be mounted from lab file servers over the campus network or can be purchased from SDSC.

Usage Model

Each year, condo cluster participants receive an amount of cluster runtime proportional to the capabilities of their purchased nodes. For example, a participant that purchases eight general computing nodes will receive just under 1.1 M Service Units (1SU=1core-­‐hour) of time, which represents 24x365 usage of 128 cores, allowing for 3% maintenance downtime. These core-­‐hours can be used any time during the year on any of the computing nodes; however unused core-­‐hours by condo participants expire at the end of each year.

Condo participants' jobs that require a number of cores less than or equal to their purchased nodes are guaranteed to start within eight hours of submission and can run for an unlimited amount of time. Jobs that extend into the cluster hotel nodes have a 72-­‐hour time limit and share a queue with the jobs of pay-­‐as-­‐you-­‐go users, while jobs that extend to other participants’ condo nodes have an eight-­‐hour time limit. Condo participants may submit gleaning jobs to run on idle computing nodes. These jobs are not charged against the submitter's SU balance, but they may be terminated at any time by the scheduler if the nodes where they are running are needed to run higher-­‐priority jobs.

Because the capabilities and purpose of the GPU nodes differ significantly from the general computing nodes, SUs received for contributed GPU nodes and general computing nodes cannot be interchanged.

Pay-­‐as-­‐you-­‐go (hotel) users’ jobs can only run on the hotel nodes. Initially there will be 40 general computing nodes (6,400cores); additional nodes may be added based on demand. Hotel nodes are configured with 64 GB of memory and an IB interface. The general computing nodes will be allocated per-­‐core, allowing up to 16 jobs to run on each node simultaneously. Hotel jobs can also run on the large-­‐memory PDAF nodes; the PDAF nodes will allow a maximum of two jobs per node (i.e.,256GB or 512GB of memory), meaning that use of either 16 or 32 cores will be charged to each job.

Plan

Participants who contribute to the cluster will have priority access to the nodes that they contribute. In addition, they have the option to run jobs on additional cluster nodes when available, effectively increasing their computing capability and flexibility. The TSCC allows participating researchers access to additional cycles during times of peak use, through a model that pools computing resources. During times of intense research, this approach provides participants with far more computational power than they would have if running only on their own hardware. It also supports running jobs at a higher core count than when restricted to their own nodes.

The condo purchaser has the option of taking possession of their nodes at any time; however once equipment is removed, it cannot later be returned to TSCC.

After the expiration of the three-year warranty, condo participants may leave their contributed nodes in the TSCC for another year, as long as the nodes remain operational.  At the end of four years, participants must take possession of or surplus their equipment.

TSCC-Supported Software

The TSCC will run the CentOS v6.2 operating system. PGI and Intel compilers will be available, as will mvapich2 and openmpi. Over 50 additional software applications and libraries will be installed on the system, and the administrators will be happy to work with researchers to extend this set as time/costs allow.

Note

This is not a complete list of TSCC software for Day One. Versions and installation details are still being updated. The lists below are changing frequently, and will be finalized by production rollout in late February.

Application Software

This is a partial list, based on current software from Triton Resource. The TSCC software installation is being assembled and this information will be updated at that time with specifics for installed location, version, topic area, license type, web site documentation and host node type.

Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth
APBS (Adaptive Poisson-Boltzmann Solver) Bioinformatics 1.3 BSD, MIT APBS Home Page /opt/apbs C
BEAST Bioinformatics, Phylogenetics 1.7.4 GNU LGPL BEAST Home Page /opt/beast C
BLAT Bioinformatics, Genetics 35 Free Non-commercial BLAT User Guide /opt/biotools/blat B
bbFTP Large Parallel File Transfer 3.2.0 GNU GPL bbFTP Home Page /opt/bbftpc B
Bowtie Short Read Aligner Bioinformatics 0.12.9 GNU GPL Bowtie Home Page /opt/biotools/bowtie B
Burrows-Wheeler Aligner (BWA) Bioinformatics 0.6.2 GNU GPL BWA Home Page /opt/biotools/bwa B
Cilk Parallel Programming 5.4.6 GNU GPL Cilk Home Page /opt/cilk B
CP2K Physics, Chemistry, Biology 2.3 GPL CP2K Home Page /opt/cp2k B
CPMD Molecular Dynamics

3.15.3

Free for noncommercial research CPMD Home Page /opt/cpmd/bin B
DDT Graphical Parallel Debugger 3.2 Licensed DDT Home Page /opt/ddt B
Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth
FFTW General 3.3.3 GNU GPL FFTW Home Page /opt/fftw/3.2.1/intel
–or–
/opt/fftw/3.2.1/pgi
–or–
/opt/fftw/3.2.1/gnu
B
FPMPI MPI Programming 2.2 Licenced FPMPI Home Page TBD TBD
GAMESS Chemistry 5.2012 No-cost Site License GAMESS Home Page /opt/gamess B
Genome Analysis Toolkit (GATK) Bioinformatics 2.3.9 BSD Open Source GATK Home Page /opt/biotools/GenomeAnalysisTK B
IDL Visualization 8.2.1 Licensed IDL Home Page /opt/idl B
IPython Parallel Computing 0.13.1 BSD IPython Home Page /opt/ipython B
matplotlib Python Graphing Library 1.1.1 PSF (Python Software Foundation) matplotlib Home Page /opt/scipy/lib64/python2.4/site-packages B
LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) Molecular Dynamics Simulator 12Jan13 GPL LAMMPS Home Page /opt/lammps C
MATLAB Parallel Development Environment 7.9 Licensed MATLAB Home Page /home/beta/matlab.2011a
–and–
/home/beta/matlab_server_2012b
B







Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth
openmpi Parallel Library 1.6.3 Generic Open MPI Home Page /opt/openmpi/intel/mx
–or–
/opt/openmpi/pgi/mx
–or–
/opt/openmpi/gnu/mx
B
NAMD Molecular Dynamics, BioInformatics 2.9 Non-Exclusive, Non-Commercial Use NAMD Home Page /opt/namd C
NCO NetCDF Support 4.2.3 Generic NCO Home Page /opt/nco/intel
–or–
/opt/nco/pgi
–or–
/opt/nco/gnu
B
NetCDF General 4.2.1.1 Licensed (free) NetCDF Home Environment Module B
NoSE (Network Simulation Environment) Networking 1.2.1 GNU GPL NoSE Home Perl Module B
NumPy (Numerical Python) Scientific Calculation 1.6.2 BSD NumPy Home /opt/scipy/lib64/python2.4/site-packages B
NWChem Chemistry 6.1.1 EMSL (free) NWChem Home Page /opt/nwchem B
PyFITS Astrophysics 3.1.1 BSD PyFITS Home Page /opt/scipy/lib64/python2.4/site-packages B
Python General Scripting 2.7 BSD Python Home Page /opt/python/bin B
pytz Python TimeZone Module 2012j MIT PyTZ Home Page /opt/scipy/lib64/python2.4/site-packages B
R Statistical Computing and Graphics 2.15.2 GNU GPL R Home Page /opt/R/bin C
SciPy (Scientific Python) Scientific Computing 0.11.0rc1 BSD SciPy Home Page /opt/scipy/lib64/python2.4/site-packages B
TAU Tuning and Analysis Utilities 2.22.p1 GNU GPL TAU Home Page /opt/tau/intel
–or–
/opt/tau/pgi
–or–
/opt/tau/gnu
B
Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth

System Software

System Software

Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth
Lustre Scalable File System 1.8 GNU GPL Lustre Home Page /opt/lustre B
Maui Workload Scheduler
GNU Lesser GPL
/opt/maui L
xen N/A TBD Generic xen Home Page TBD /opt/? B
CentOS Operating System 5.3 Open Source CentOS Home Page N/A B
Environment Modules Environment Variable Management 3.2.7 GNU GPL Environment Modules Home Page /opt/modules B
Ganglia N/A 2.5.7 Open Source Ganglia Home Page /opt/ganglia B
Nagios N/A 3.4.3 Open Source Nagios Home Page /opt/nagios L
mvapich2 Message Passing Interface 1.9a2 Open Source TBD TBD
–or–
TBD
–or–
TBD
B
TORQUE Resource Manager 2.3.6 Open Source TORQUE Home Page /opt/torque B
Gold Allocation Manager 2.2.0.5 Open Source Gold Home Page /opt/gold L
Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth

Compilers

Compilers

Package Topic Area Version License Type Package Home Page User Install Location Installed on:
(L)ogin,
(C)ompute,
(B)oth
Java Compiler 1.6.0_07 Generic Java Home Page /usr/bin/javac B
PGI Compilers C and Fortran Compilers 12.10 Licensed (flexlm) PGI Compilers Home Page /opt/pgi B
Intel Compilers C and Fortran Compilers 2013.1.117 Licensed (flexlm) Intel Compilers Home Page /opt/intel B

Requesting Additional Software

Users can install software in their home directories. User Support will also install software on request in a beta location. If interest is shared with other users, requested installations can become part of the core software repository. Please post your requests to the TSCC Mailing List.

Condo/Hotel Cost Details

Condo Computing

The TSCC condo cost structure is based on condo participants purchasing their nodes, paying a one-time infrastructure fee for their pro rata share of the common networking and storage infrastructure, and then paying a modest annual operating expense that is supplemented by the campus RCI program. Pay-as-you-go hotel users purchase cycles that reflect the total cost-of-ownership, albeit leveraging the economies of scale afforded by TSCC.

Hotel Computing

For pay-as-you-go users, the cost for the general computing hotel nodes is $0.025 per SU, and the cost for the PDAF high-memory nodes is $0.025/SU but with a 16-core/256GB minimum per job. Hotel purchases carry a minimum purchase of $250.

Additional UCSD/Non-UCSD Cost Details

Cost for UCSD Users

For condo participants, the primary cost is purchasing the computing nodes ($3934 per node) and a one-time infrastructure fee of $920 per node to cover the costs of shared infrastructure such as interconnects, home file systems and the parallel file system. In addition, there is a modest IDC-bearing operations fee of $495/node/year, which will allow growth in operations and user services support as the size of the cluster increases. (Most of the operating costs for system administration, user support, software licensing, etc. are supplemented by the RCI program.)

Condo purchasers may purchase only general computing nodes at this time. A separate price will be determined for GPU nodes in the coming weeks. The current price structure for the condo expenses per node follows (costs and configurations are subject to change annually). The operations fee is supplemented by the RCI program, and pays for labor, software licensing, administration hardware, and colocation fees. It is anticipated that both the node purchase and one-time infrastructure fee will not bear indirect costs, while the annual operating fee will bear applicable IDC.

Cost for non-UCSD users

The TSCC is available to researchers from other UC campuses, other educational institutions and industry. Costs are competitive but higher than those cited above for UCSD researchers because the UCSD RCI program is supplementing the program. Please contact us for information on the rate structure for your organization.

Most of the system administration, user support, software licensing, and other operating costs are paid by the RCI program. The system is housed at the San Diego Supercomputer Center on the UCSD campus.

Computing