Designing Divider

Yuhei Horibe
5 min readOct 29, 2020
Figure1. Division

Summary

In this section, how digital divider calculates quotient and remainder will be explained. Also, how this can be implemented on FPGA will be shown with some experimental results.

Binary division

Figure 1 shows the decimal division calculation. How decimal division works is like below.

  • Compare each digit of dividend with divisor, and if divisor is greater than that digit, do nothing and go to next digit (First “0” in figure 1).
  • If divisor is less than or equal to that digit, find the maximum product of divisor, which is less than or equal to that digit (50 x 1 in blue background in figure 1). This multiplier (“1” in this case) will be a digit of quotient.
  • Subtract the product from the digit, and remainder goes to next calculation (67 in figure 1).
  • Repeat it until it reaches the last digit, or remainder becomes 0.

Binary division is much easier. Figure 2 shows the binary division calculation. Algorithm will be like below.

Figure2. Binary division
  • Compare each digit of dividend with divisor.
  • If divisor is greater than the digit, do nothing and go to the next digit.
  • If divisor is less than or equal to that digit, subtract the divisor from that digit, and remainder goes to the next calculation.
  • Repeat it until it reaches the last digit, or remainder becomes 0.

Implementation of the divider

There are many ways to implement divider. There are two key different approaches.

  • Restoring
  • Non-restoring

Restoring algorithm is always accurate, but it is generally slower than non-restoring approach. Non-restoring algorithm might includes some errors, but calculation is faster. In this article, I’ll stick to “restoring” approach, because non-restoring algorithm tends to be more complicated. Which means, it’ll use more resources on FPGA.

Array divider

The concept of array divider is quite similar to array multiplier in previous section. Array divider uses multiple subtractors to calculate quotient and reminder at once. Figure 3 shows an abstract figure of the array divider implementation.

One thing to note is, subtractor in array divider is special. This subtractor has 2 outputs;

  • Difference … If A ≥ B: A - B, else: A
  • Sign … If A ≥ B:1, else: 0

So basically, this subtractor does not output negative value. It either outputs A - B, or A as it is depends on the sign of this calculation. If sign bit is “1”, which means negative, subtractor outputs A as it is, and if sign bit is “0”, which means positive, it outputs A - B.

Figure3. Array divider

Also, important to note that, because of this special subtractor, array divider has more delay compared to multiplier. This delay slows the clock frequency, and entire performance of the divider.

Figure 4 shows the simulation result of the array divider. In a “simulation”, the result is calculated immediately (0x13579bdf / 0x2468 = 0x8802 … 0x130f). But in real world, this has some delays.

Figure 4. Array divider simulation result

The implementation is simple, but it consumes huge area of FPGA. Figure 5 shows the synthesis and implementation result from Vivado.

Figure 5. Implementation of the array divider on FPGA

Multi-cycle divider

Multi-cycle divider has just one subtractor, shift-register, and state machine. Each digit will be calculated in one cycle. This is the smallest and simplest hardware, but it takes number of digit cycles to calculate quotient and remainder. Figure 6 shows the simulation result of multi-cycle divider.

Figure 6. Simulation result of multi-cycle divider

Since operands are both 32 bits, it takes 32 cycles to calculate the result. Figure 7 shows the synthesized/implemented result from Vivado. This consumes tiny little area of FPGA.

Figure 7. Implementation of multi-cycle divider on FPGA

Hybrid approach

Since this will be implemented on FPGA, and clock frequency is not that high (around 100 MHz), simple enough but more efficient approach is to combine array divider with multi-cycle divider. There are multiple subtractors (in this implementation, there are 4) which calculate 4 digit of quotient at a time (partial array divider). Since 4 digits will be calculated at once, it takes 8 cycles to calculate quotient and remainder (32 bits / 4 = 8). Figure 8 shows the simulation result of hybrid divider.

Figure 8. Simulation result of hybrid divider

Simulation result is fine, but in reality, this divider doesn’t even work on FPGA due to delays from partial array divider (timing doesn’t match). But as you may see in figure 9, this is 4 times faster than multi-cycle divider, also it’s more area-efficient compared to array divider.

Figure 9. Implementation of hybrid divider on FPGA

I also implemented and tested 2 digit version of partial array divider, and this worked on Zedboard.

Radix-4 divider

Another way to calculate multiple digit at once is, using radix-4 divider. What it does is, it calculates “divisor”, “divisor x 2”, “divisor x 3” in parallel, and compare those 3 with current remainder. Therefore, this hardware can output either 0 (0b00), 1 (0b01), 2 (0b10), or 3 (0b11) as quotient.

Figure 10 shows the simulation result of the combination of radix-4 + 4 * partial array divider. This can calculate 8 digit at once.

Figure 10. Simulation result of radix-4 + partial array divider

Again, this simulation result is fine, but because of delay from partial array divider, this did not work on FPGA at 100 MHz (timing does not match).

Figure 11 shows implementation result of this divider.

Figure 11. Implementation result of radix-4 + partial array divider

Conclusion

In this section, how the digital hardware calculates division was explained. Also, multiple different implementation for FPGA were shown with both simulation and implementation results. There are many different ways to implement hardware divider on the ASIC, or on the FPGA. In this section, the focus was to find realistic, and simple enough implementation specifically on FPGA. In conclusion, partial array divider with 4 subtractors did not work on FPGA at 100 MHz. So the realistic approach was to use radix-4 divider with partial array divider with 2 subtractors (8 cycles to calculate 32 bit quotient and remainder).

--

--