Clock distribution networks (CDNs) are costly in high-performance ASICs. This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates. Each domain is synchronized with an inexpensive clock signal, generated locally. This is possible by adopting the paradigm of Stochastic Computation, where signal values are encoded as random bit streams. The design method is illustrated with the synthesis of circuits for applications in signal and image processing.
~
NAJAFI, M. Hassan, LILJA, David J., RIEDEL, Marc and BAZARGAN, Kia, 2016. Polysynchronous stochastic circuits. In: 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE. 2016. p. 492–498. ieee [Accessed 23 February 2024].
All electronic systems are inherently asynchronous in nature. By carefully choreographing transitions with clock signals, asynchronous circuitry can be adapted to appear to behave synchronously. Such synchronism brings significant advantages: it greatly simplifies the design effort; also, with predictable timing, one can make performance guarantees. However, synchronism comes at a significant cost: one must create a clock distribution network (CDN) that supplies a common reference signal to all synchronous components. Historically, the primary design goal for CDNs has been to ensure that a single clock signal arrives at every synchronous component at precisely the same time (so there is zero clock skew). Achieving this is difficult and costly in terms of design effort and resources. In modern large-scale integrated circuits, the CDN accounts for significant area, consumes significant power, and often limits the overall circuit performance [1][2]. With increasing variation in circuit parameters, designing CDNs with tolerable clock skew is becoming a major design bottleneck.
[1] E.G. Friedman. Clock distribution networks in synchronous digital integrated circuits. Proceedings of the IEEE, 89(5):665692, May 2001.
[2] Y. Jiang, H. Zhang, H. Zhang, H. Liu, X. Song, M. Gu, and J. Sun. Design of mixed synchronous/asynchronous systems with multiple clocks. Parallel and Distributed Systems, IEEE Transactions on, PP(99):1–1, 2014.
Completely asynchronous design methodologies have been studied for decades, but these have never gained widespread acceptance (in spite of strong advocacy by proponents). Circuits with multiple independent clock domains – so circuits that are globally asynchronous, but locally synchronous (GALS) – have become common [3]. Splitting the domains reduces the cost of the distribution network, but relatively complex circuitry for handshaking is needed at domain crossings, so the splitting is only performed at a coarse level.
[3] D. Chapiro. Globally-asynchronous locally-synchronous systems. Stanford University, 1984. pdf
This paper proposes a radically new approach: splitting clock domains at a very fine level, with domains consisting of only a handful of gates each. Each domain is synchronized by an inexpensive clock signal, generated locally. This is feasible if one adopts a stochastic representation for signal values [4, 5, 6, 7, 8, 9]. Logical computation is performed on randomized bit streams, with signal values encoded in the statistics of the streams: a real value x in the interval [0, 1] is represented by a stream with bits each having independent probability x of being 1.
[4] B.R. Gaines. Stochastic computing systems. In JuliusT. Tou, editor, Advances in Information Systems Science, Advances in Information Systems Science, pages 37–172. Springer US, 1969.
[5] W. Qian and M.D. Riedel. The synthesis of robust polynomial arithmetic with stochastic logic. In 45th ACM/IEEE Design Automation Conference, DAC’08, pages 648–653, 2008.
[6] Weikang Qian, Xin Li, M.D. Riedel, K. Bazargan, and D.J. Lilja. An architecture for fault-tolerant computation with stochastic logic. Computers, IEEE Transactions on, 60(1):93105, Jan 2011.
[7] P. Li, D. J. Lilja, W. Qian, K. Bazargan, and M. Riedel. The synthesis of complex arithmetic computation on stochastic bit streams using sequential logic. In Computer-Aided Design, 2012. ICCAD 2012. IEEE/ACM International Conference on. IEEE, 2012.
[8] Peng Li, D.J. Lilja, Weikang Qian, K. Bazargan, and M.D. Riedel. Computation on stochastic bit streams digital image processing case studies. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 22(3):449–462, March 2014.
[9] Armin Alaghi and John P. Hayes. Survey of stochastic computing. ACM Trans. Embed. Comput. Syst., 12(2s):92:1–92:19, May 2013.
Compared to a binary radix representation, such a stochastic representation is not very compact. With M bits, a binary radix representation can represent 2^M distinct numbers. To represent real numbers with a resolution of 2^−M , i.e., numbers of the form a/2^M for integers a between 0 and 2^M , a stochastic representation requires a stream of 2^M bits. The two representations are at opposite ends of the spectrum: conventional binary radix is a maximally compressed, positional encoding; a stochastic representation is an uncompressed, uniform encoding.
~
How to achieve persistence of a 2D coordinate system representing connexions? This post explains it in a bottom-up approach. post
A visual proof of the existence on an anti-symmetric complementary space S* of gaps (an other space) within the traditional mathematical continuum X=[0,1].
~
A stochastic representation, although not very compact, has an advantage over binary radix in terms of error tolerance. Suppose that the environment is noisy: bit flips occur and these afflict all the bits with equal probability. With a binary radix representation, in the worst case, the most significant bit gets flipped, resulting in a large error. In contrast, with a stochastic representation, all the bits in the stream have equal weight. A single flip results in a small error. This error tolerance scales to high error rates: multiple bit flips produce small and uniform deviations from the nominal value. More compelling than the error tolerance is the simplicity of the designs in the stochastic paradigm. Complex functions can be implemented with remarkably simple logic. A reduction in area of 50x or 100x compared to conventional implementations is common [8][10].
[8] Peng Li, D.J. Lilja, Weikang Qian, K. Bazargan, and M.D. Riedel. Computation on stochastic bit streams digital image processing case studies. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 22(3):449–462, March 2014.
[10] A. Alaghi and J.P. Hayes. Fast and accurate computation using stochastic circuits. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, pages 1–4, March 2014.
A more compelling advantage still of the stochastic paradigm could be the idea advocated in this paper. With a stochastic representation, computational units can tolerate skew in the arrival time of their inputs. This stems from the fact that the stochastic representation is uniform: all that matters in terms of the value that is computed is the fraction of time that the signal is high. We will demonstrate that the correct value is computed even when the inputs are misaligned temporally. Accordingly, adopting the stochastic paradigm obviates the need for a global clock signal and the associated CDN. Instead one can simply use local clock signal generators throughout.
1–2
We call this approach polysynchronous stochastic (to distinguish it from asynchronous and GALS methodologies).
The paper is structured as follows. In Section II, we provide some background on stochastic computing. In Section III, we introduce polysynchronous stochastic concepts and demonstrate how to implement basic operations. In Section IV, we describe our experimental methodology and present experimental results. Finally, in Section V, we present conclusions and discuss future work.
BACKGROUND
In the paradigm of Stochastic Computing (SC), circuits operate on random bit streams where the signal value is encoded by the probability of obtaining a one versus a zero. In the unipolar stochastic representation, each realvalued number x (0 ≤ x ≤ 1) is represented by a sequence of random bits, each of which has probability x of being one and probability 1 − x of being zero. In the bipolar representation, each real-valued number y (−1 ≤ y ≤ 1) is represented by a sequence of random bits, each of which has probability y+1 2 of being one and probability 1 − y+1 2 of being zero.
This representation is much less compact than a binary radix. However, complex operations can be performed with very simple logic. In particular, arithmetic functions, consisting of operations like addition and multiplication can be implemented very efficiently. Complex functions, such as exponentials and trigonometric functions, can be computed through polynomial approximations [5, 8].
[5] W. Qian and M.D. Riedel. The synthesis of robust polynomial arithmetic with stochastic logic. In 45th ACM/IEEE Design Automation Conference, DAC’08, pages 648–653, 2008.
[8] Peng Li, D.J. Lilja, Weikang Qian, K. Bazargan, and M.D. Riedel. Computation on stochastic bit streams digital image processing case studies. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 22(3):449–462, March 2014.
Because the bit stream representation is uniform, with all bits weighted equally, circuits designed this way are highly tolerant of soft errors (i.e., bit flips) [6].
[6] Weikang Qian, Xin Li, M.D. Riedel, K. Bazargan, and D.J. Lilja. An architecture for fault-tolerant computation with stochastic logic. Computers, IEEE Transactions on, 60(1):93105, Jan 2011.
Critical to the ideas in this paper is the observation that the stochastic representation is a uniform fractional representation: all that matters is the fraction of time that the signal is high. Consequently, precise synchronization between the arrival time of input values to logic gates does not matter. This is illustrated in the next section.
[…]
~
Ward did interesting computation on networks of computers using only one page of assembly language. page See Cybords
> These little computers were so inexpensive that I would buy 50 at a time. > A message was just an electrical pulse, high or low, repeated stochastically until communication happened a few tenths of a second later. Each part had to interface pins, one sending, the other listening. A computer could listen to multiple senders at the same time and chance would have it Hear the Arithmetic Sum of both sources.
~