A novel 
massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon 
System-On-a-
Chip technology, i.e., each 
processing node comprises a 
single Application Specific 
Integrated Circuit (ASIC). Within each ASIC node is a plurality of 
processing elements each of which consists of a 
central processing unit (CPU) and plurality of 
floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a 
single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular 
algorithm being solved or executed at any point in time. The 
system-on-a-
chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications 
throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for 
parallel algorithm message passing including a Torus, Global Tree, and a Global 
Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an 
algorithm for optimizing 
algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external 
connectivity and used for Input / Output, 
System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the 
supercomputer in multiple networks for optimizing supercomputing resources.