MSc ME Thesis Presentation

Off-chip Self Timed SNN Custom Digital Interconnect System

Yichen Yang

To support the spike propagates between neurons, neuromorphic computing systems always require a high-speed communication link.

Meanwhile, spiking neural networks are event-driven so that the communication links normally exclude the clock signal and related blocks.

This thesis aims to develop a self-timed off-chip interconnect system with ring topology that supports multi-point communication in neuromorphic computing systems. This interconnect system is implemented in high-level modeling with SystemC and involves the burstmode two-wire protocol in point-to-point communication. In order to ensure the flexibility of the system, the distributed control system is involved. Further, the system can be configured with different numbers of chiplet to fulfill various spiking neural network structures.

We also explore optimization methods, which is a bi-directional ring topology achieving the growth of throughput. Based on evaluation and simulation results, the interconnect system can achieve 4.57Gbps with the specific application scenario.

Additional information ...

MSc ME Thesis Presentation

A New Logarithmic Quantization Technique and Corresponding Processing Element Design for CNN Accelerators

Longxing Jiang

Convolutional Neural Networks (CNN) have become a popular solution for computer vision problems. However, due to the high data volumes and intensive computation involved in CNNs, deploying CNNs on low-power hardware systems is still challenging. The power consumption of CNNs can be prohibitive in the most common implementation platforms: CPUs and GPUs. Therefore, hardware accelerators that can exploit CNN parallelism and methods to reduce the computation burden or memory requirements are still hot research topics. Quantization is one of these methods.

One suitable quantization strategy for low-power deployments is logarithmic quantization.

Logarithmic quantization for Convolutional Neural Networks (CNN): a) fits well typical weights and activation distributions, and b) allows the replacement of the multiplication operation by a shift operation that can be implemented with fewer hardware resources. In this thesis, a new quantization method named Jumping Log Quantization (JLQ) is proposed. The key idea of JLQ is to extend the quantization range, by adding a coefficient parameter ”s” in the power of two exponents (2sx+i ).

This quantization strategy skips some values from the standard logarithmic quantization. In addition, a small hardware-friendly optimization called weight de-zeroing is proposed in this work. Zero-valued weights that cannot be performed by a single shift operation are all replaced with logarithmic weights to reduce hardware resources with little accuracy loss.

To implement the Multiply-And-Accumulate (MAC) operation (needed to compute convolutions) when the weights are JLQ-ed and dezeroed, a new Processing Element (PE) have been developed. This new PE uses a modified barrel shifter that can efficiently avoid the skipped values.

Resource utilization, area, and power consumption of the new PE standing alone and in a systolic array prototype are reported. The results show that JLQ performs better than other state-of-the-art logarithmic quantization methods when the bit width of the operands becomes very small.