Embodiments of both a non-volatile main memory (NVMM) single node and a multi-node computing system are disclosed. One embodiment of the NVMM single node system has a cache subsystem composed of all DRAM, a large main memory subsystem of all NAND flash, and provides different address-mapping policies for each software application. The NVMM memory controller provides high, sustained bandwidths for client processor requests, by managing the DRAM cache as a large, highly banked system with multiple ranks and multiple DRAM channels, and large cache blocks to accommodate large NAND flash pages. Multi-node systems organize the NVMM single nodes in a large inter-connected cache/flash main memory low-latency network. The entire interconnected flash system exports a single address space to the client processors and, like a unified cache, the flash system is shared in a way that can be divided unevenly among its client processors: client processors that need more memory resources receive it at the expense of processors that need less storage. Multi-node systems have numerous configurations, from board-area networks, to multi-board networks, and all nodes are connected in various Moore graph topologies. Overall, the disclosed memory architecture dissipates less power per GB than traditional DRAM architectures, uses an extremely large solid-state capacity of a terabyte or more of main memory per CPU socket, with a cost-per-bit approaching that of NAND flash memory, and performance approaching that of an all DRAM system.