Abstract: Dedicated neural-network inference-processors improve latency and power of the computing devices. They use custom memory hierarchies that take into account the flow of operators present in ...