RCI is launching a new computational cluster for research computing, designed as a turnkey, professionally-administered high performance computing resource with flexible usage and business models. Unlike many traditional clusters, the Triton Shared Computing Cluster (TSCC) will be a collaborative effort, where the majority of the cluster nodes will be purchased and shared by the cluster users. In addition to these participant-contributed condo nodes, TSCC will have a set of hotel nodes, available to both condo owners and to other researchers on a rental basis. Participants who contribute to the cluster will have priority access to the nodes that they contribute. In addition, they have the option to run jobs on additional cluster nodes when available, effectively increasing their computing capability and flexibility.
The TSCC allows participating researchers access to additional cycles during times of peak use, through a model that pools computing resources. During times of intense research, this approach provides participants with far more computational power than they would have if running only on their own hardware. It also supports running jobs at a higher core count than when restricted to their own nodes. In the condo model, participants purchase one or more computing nodes for the shared cluster and receive a proportional time allocation. This time can be used over a full year to run jobs on combinations of the participant's own nodes, the nodes contributed by other participants, and the cluster's hotel nodes. Researchers will also have the hotel option of purchasing time by the hour.
The primary cost to condo participants is the cost of the computing nodes, detailed below. In addition, there will be a small, per-node operation fee, to allow for growth in operations and user services support as the size of the cluster increases. The cost to hotel users for on-demand hourly services is projected to be approximately $0.025 per core hour.
The TSCC condo cost structure is based on condo participants purchasing their nodes, paying a one-time infrastructure fee for their pro rata share of the common networking and storage infrastructure, and then paying a modest annual operating expense that is supplemented by the campus RCI program. Pay-as-you-go hotel users purchase cycles that reflect the total cost-of-ownership, albeit leveraging the economies of scale afforded by TSCC.
Condo purchasers may purchase only general computing nodes at this time. A separate price will be determined for GPU nodes in the coming weeks. The current price structure for the condo expenses per node follows (costs and configurations are subject to change annually). The operations fee is supplemented by the RCI program, and pays for labor, software licensing, administration hardware, and colocation fees. It is anticipated that both the node purchase and one-time infrastructure fee will not bear indirect costs, while the annual operating fee will bear applicable IDC.
|Item||Node Purchase||Infrastructure Fee||Annual Ops Fee|
|General Compute Node||$3,934||$920||$495/yr|
|128 GB Memory||+ $575||+ $0||-|
|InfiniBand||+ $0||+ $200||-|
Over a four-year life of a fully-utilized node, the effective cost for condo participants is less than $0.015/core-hour including IDC.
For pay-as-you-go users, the cost for the general computing hotel nodes is $0.025 per SU, and the cost for the PDAF high-memory nodes is $0.025/SU but with a 16-core/256GB minimum per job. Hotel purchases carry a minimum purchase of $250.
The TSCC is available to researchers from other UC campuses, other educational institutions and industry. Costs are competitive but higher than those cited above for UCSD researchers because the UCSD RCI program is supplementing the program. Please contact us for information on the rate structure for your organization.
Most of the system administration, user support, software licensing, and other operating costs are paid by the RCI program. The system is housed at the San Diego Supercomputer Center on the UCSD campus.
RCI's TSCC is currently in testing and deployment. The cluster is planned to enter production by February 1, 2013. Discussion and announcements can be found on the TSCC mailing list.