Designing Divider
Summary
In this section, how digital divider calculates quotient and remainder will be explained. Also, how this can be implemented on FPGA will be shown with some experimental results.
Binary division
Figure 1 shows the decimal division calculation. How decimal division works is like below.
- Compare each digit of dividend with divisor, and if divisor is greater than that digit, do nothing and go to next digit (First “0” in figure 1).
- If divisor is less than or equal to that digit, find the maximum product of divisor, which is less than or equal to that digit (50 x 1 in blue background in figure 1). This multiplier (“1” in this case) will be a digit of quotient.
- Subtract the product from the digit, and remainder goes to next calculation (67 in figure 1).
- Repeat it until it reaches the last digit, or remainder becomes 0.
Binary division is much easier. Figure 2 shows the binary division calculation. Algorithm will be like below.
- Compare each digit of dividend with divisor.
- If divisor is greater than the digit, do nothing and go to the next digit.
- If divisor is less than or equal to that digit, subtract the divisor from that digit, and remainder goes to the next calculation.
- Repeat it until it reaches the last digit, or remainder becomes 0.
Implementation of the divider
There are many ways to implement divider. There are two key different approaches.
- Restoring
- Non-restoring
Restoring algorithm is always accurate, but it is generally slower than non-restoring approach. Non-restoring algorithm might includes some errors, but calculation is faster. In this article, I’ll stick to “restoring” approach, because non-restoring algorithm tends to be more complicated. Which means, it’ll use more resources on FPGA.
Array divider
The concept of array divider is quite similar to array multiplier in previous section. Array divider uses multiple subtractors to calculate quotient and reminder at once. Figure 3 shows an abstract figure of the array divider implementation.
One thing to note is, subtractor in array divider is special. This subtractor has 2 outputs;
- Difference … If A ≥ B: A - B, else: A
- Sign … If A ≥ B:1, else: 0
So basically, this subtractor does not output negative value. It either outputs A - B, or A as it is depends on the sign of this calculation. If sign bit is “1”, which means negative, subtractor outputs A as it is, and if sign bit is “0”, which means positive, it outputs A - B.
Also, important to note that, because of this special subtractor, array divider has more delay compared to multiplier. This delay slows the clock frequency, and entire performance of the divider.
Figure 4 shows the simulation result of the array divider. In a “simulation”, the result is calculated immediately (0x13579bdf / 0x2468 = 0x8802 … 0x130f). But in real world, this has some delays.
The implementation is simple, but it consumes huge area of FPGA. Figure 5 shows the synthesis and implementation result from Vivado.
Multi-cycle divider
Multi-cycle divider has just one subtractor, shift-register, and state machine. Each digit will be calculated in one cycle. This is the smallest and simplest hardware, but it takes number of digit cycles to calculate quotient and remainder. Figure 6 shows the simulation result of multi-cycle divider.
Since operands are both 32 bits, it takes 32 cycles to calculate the result. Figure 7 shows the synthesized/implemented result from Vivado. This consumes tiny little area of FPGA.
Hybrid approach
Since this will be implemented on FPGA, and clock frequency is not that high (around 100 MHz), simple enough but more efficient approach is to combine array divider with multi-cycle divider. There are multiple subtractors (in this implementation, there are 4) which calculate 4 digit of quotient at a time (partial array divider). Since 4 digits will be calculated at once, it takes 8 cycles to calculate quotient and remainder (32 bits / 4 = 8). Figure 8 shows the simulation result of hybrid divider.
Simulation result is fine, but in reality, this divider doesn’t even work on FPGA due to delays from partial array divider (timing doesn’t match). But as you may see in figure 9, this is 4 times faster than multi-cycle divider, also it’s more area-efficient compared to array divider.
I also implemented and tested 2 digit version of partial array divider, and this worked on Zedboard.
Radix-4 divider
Another way to calculate multiple digit at once is, using radix-4 divider. What it does is, it calculates “divisor”, “divisor x 2”, “divisor x 3” in parallel, and compare those 3 with current remainder. Therefore, this hardware can output either 0 (0b00), 1 (0b01), 2 (0b10), or 3 (0b11) as quotient.
Figure 10 shows the simulation result of the combination of radix-4 + 4 * partial array divider. This can calculate 8 digit at once.
Again, this simulation result is fine, but because of delay from partial array divider, this did not work on FPGA at 100 MHz (timing does not match).
Figure 11 shows implementation result of this divider.
Conclusion
In this section, how the digital hardware calculates division was explained. Also, multiple different implementation for FPGA were shown with both simulation and implementation results. There are many different ways to implement hardware divider on the ASIC, or on the FPGA. In this section, the focus was to find realistic, and simple enough implementation specifically on FPGA. In conclusion, partial array divider with 4 subtractors did not work on FPGA at 100 MHz. So the realistic approach was to use radix-4 divider with partial array divider with 2 subtractors (8 cycles to calculate 32 bit quotient and remainder).