Methods and apparatus for localized processing within multicore neural networks. Unlike existing solutions that rely on commodity software and hardware to perform “brute force” large scale neural network processing the various techniques described herein map and partition a neural network into the hardware limitations of a target platform. Specifically, the various implementations described herein synergistically leverage localization, sparsity, and distributed scheduling, to enable neural network processing within embedded hardware applications. As described herein, hardware-aware mapping / partitioning enhances neural network performance by e.g., avoiding pin-limited memory accesses, processing data in compressed formats / skipping unnecessary operations, and decoupling scheduling between cores.