This exploratory research will evaluate an alternative parallel non-von Neumann architecture class (Continuum Computer Architecture or CCA) in order to open new opportunities to extreme parallelism not available through conventional practices while mitigating current challenges to efficiency by means of fine grain hardware structures related to communication latency, synchronization overheads, adaptive contention avoidance, fault tolerance, and energy suppression. An advanced cellular architecture is to be derived such that the global emergent behavior will exhibit the properties of general-purpose parallel execution by means of local rules and high replication. Local rules will be based on a combination of dataflow and actors semantics with a reference tree implementation of a virtual global address space. This strategy reverses the common architecture priority of ALU utilization to stress memory bandwidth and data movement time and energy costs. Instead, ALU availability is emphasized. Beyond the local operation, global operation is guided by the experimental ParalleX execution model to govern resource management and task scheduling, and impose user-friendly programming model. Micro-benchmarks derived from computational kernels of existing applications will be applied to analysis of CCA behavior over a range of scientific computing algorithms (including climate, linear algebra, and graph problems). The proposed Phase I research is a stepping stone intended to inform the design and development of the future optimized custom silicon structures composing the Smart Memory Accelerators to be deployed in NASA data centers alongside the conventional supercomputing hardware. An important characteristic of CCA is the simplicity and uniformity of the computing structures that makes the CMOS implementation feasible to a small engineering team. If successful, this research will minimize the user’s time to solution, and increase the scale and complexity of executed problems.
Potential NASA applications include the CFD Utility Software Library, NAS parallel benchmarks, ARC2D, Cart3D, and INS3D. NASA applications involving systems of sparse linear equations are ideal for the smart memory accelerator.
There are four specific non-NASA software applications targeted for CCA: the Community Earth System Model (CESM), the Graph500 breadth first search kernel, the High Performance Linpack kernel, and the High Performance Conjugate Gradients kernel. Using CESM, smart memory architecture has the potential allow a jump in model resolution from ~100 km to ~20 km in the US Global Forecast System.