SHARC DSP Benchmarks
Real-time signal processing tasks are I/O and
computationally intensive. In addition to high-speed math
units and all instructions executing in a single-cycle,
including single-cycle multiply accumulates (MACs), SHARC DSPs
are designed for maximum I/O and memory access bandwidth. This
balance of core speed, memory integration and I/O bandwidth
achieves the sustained performance critical to real-time
applications.
The new ADSP-21065L processes up to 198 million math
operations per second, the total operations processed by each
of 3 math units running at 66 MHz. In fact, millions of
additional operations execute within a second, totaling over
600 million operations.
As for the ADSP-21160, its performance is even more
impressive. Using two sets of computational units (ALU, Barrel
Shifter, MAC, Register files), the ADSP-21160-95 MHz can have
a five-fold performance increase versus the ADSP-2106x on a
range of DSP algorithms.
Benchmarks are important in that they show how a particular
DSP performs in the context of an application. The smaller the
benchmark number, the quicker the algorithm execution. If a
DSP can perform the task quicker, the processor can perform
more tasks in a given amount of time. Just looking at the
cycle time, clock speed or MIPS of a DSP can not give an
accurate indication of the true performance of the processor.
Therefore it's important to analyze algorithm benchmarks, not
only clock speed and cycle time.
Clock Cycle |
66 MHz |
95 MHz |
100 MHz |
Instruction Cycle Time |
15ns |
10.5ns |
10ns |
MFLOPS Sustained |
132 MFLOPS |
380 MFLOPS |
400 MFLOPS |
MFLOPS Peak |
198 MFLOPS |
570 MFLOPS |
600 MFLOPS |
1024 Point Complex FFT (Radix 4, with bit
reversal) |
279 µs |
97 µs |
92 µs |
FIR Filter (per tap) |
15ns |
5.2ns |
5ns |
IIR Filter (per biquad) |
61ns |
21ns |
20ns |
Matrix Multiply (pipelined) [3x3] *
[3x1] [4x4] * [4x1] |
136ns 242ns |
47ns 83ns |
45ns 80ns |
Divide (y/x) |
91ns |
31ns |
30ns |
Inverse Square Root |
136ns |
47ns |
45ns | |