ترغب بنشر مسار تعليمي؟ اضغط هنا

In this paper we extend the concept of the traditional transactor, which focuses on correct content transfer, to a new timing-coherent transactor that also accurately aligns the timing of each transaction boundary so that designers can perform precis e concurrent system behavior analysis in mixed-abstraction-level system simulations which are essential to increasingly complex system designs. To streamline the process, we also developed an automatic approach for timing-coherent transactor generation. Our approach is actually applied in mixed-level simulations and the results show that it achieves 100% timing accuracy while the conventional approach produces results of 25% to 44% error rate.
We propose an effective parallel program debugging approach based on the timing annotation technique. With prevalent multi-core platforms, parallel programming is required to fully utilize the computing power. However, the non-determinism property an d the associated concurrency bugs are notorious and remain to be great challenge to designers. We hence propose an effective program debugging approach using the timing annotation technique derived from the deterministic Multi-Core Instruction Set Simulation (MCISS) technology. We hence construct a deterministic execution environment for parallel program debugging and devise a few unique, effective and easy-to-use parallel debugging functions. We modify QEMU and GDB to implement and demonstrate our proposed idea. The usage of our debugger is almost identical to the conventional GDB debugger. Therefore, users may learn how to use the tool seamlessly.
In this paper, we propose a very compact embedded CNN processor design based on a modified logarithmic computing method using very low bit-width representation. Our high-quality CNN processor can easily fit into edge devices. For Yolov2, our processi ng circuit takes only 0.15 mm2 using TSMC 40 nm cell library. The key idea is to constrain the activation and weight values of all layers uniformly to be within the range [-1, 1] and produce low bit-width logarithmic representation. With the uniform representations, we devise a unified, reusable CNN computing kernel and significantly reduce computing resources. The proposed approach has been extensively evaluated on many popular image classification CNN models (AlexNet, VGG16, and ResNet-18/34) and object detection models (Yolov2). The hardware-implemented results show that our design consumes only minimal computing and storage resources, yet attains very high accuracy. The design is thoroughly verified on FPGAs, and the SoC integration is underway with promising results. With extremely efficient resource and energy usage, our design is excellent for edge computing purposes.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا