# Forum: FPGA, VHDL & Verilog LUT utilization is 121 %

Rate this post
 0 ▲ useful ▼ not useful
Hi
I am working on a multiplicative inverse algorithm to implement ecc
algorithm. I have made the Verilog code for this, The simulation is
working properly but the synthesis report is showing LUT utilization
76776 out of 63400.I have applied DSP attribute to reduce no of LUTs and
it reduces from 78000 to 76776
I am targeting Nexys4DDR FPGA board
Please provide any solution to minimize LUT utilization.I have attached
project summary and utilization report of this code.
 (* use_dsp48 = "yes" *) module multiplicative_inverse #(parameter n = 190)(clk, reset, enable, p, a, x, result_ready); input clk, reset, enable; input wire [n-1:0] p; input wire [n-1:0] a ; output reg [n-1:0] x; output reg result_ready; //flag to initialize variables when reset reg flag; reg [n-1:0] Y; reg [n-1:0] D; reg [n-1:0] B; wire Y0 = (Y == 0) ? 1 : 0; wire D0 = (D == 0) ? 1 : 0; reg flagY0 = 1; reg flagD0 = 1; always @ (posedge clk) begin if (reset) begin x <= 0; flag <= 1; result_ready <= 0; end else begin if(result_ready) begin result_ready <= 0; flag <= 1; end else //Initialize the variables if (flag || enable) begin Y <= a; D <= p; B <= 1; x <= 0; flag <= 0; end else begin if (Y != 0) begin if (Y0 && flagY0) begin Y = Y >> 1; B = (B + (B*p)) >> 1; end //end if else begin flagY0 = 0; end if (D0 && flagD0) begin D = D >> 1; x = (x + (x*p)) >> 1; end //end if else begin flagD0 = 0; end if((flagY0 == 0) && (flagD0 == 0)) begin if (Y >= D) begin Y = Y - D; if(B < x) B = B + p ; B = (B - x) % p; end else begin if(D < Y) D = D + p; D = D - Y; if(x < B) x = x + p; x = (x - B) % p; end //else flagY0 = 1; flagD0 = 1; end //else end //end if else begin result_ready <= 1; end end end end endmodule 

: Edited by Moderator

Rate this post
 0 ▲ useful ▼ not useful
I'm guessing you're trying to find
 such that
 with
.

I'm not quite sure as to how your algorithm is supposed to work since
to their functionality. However you're instantiating at least 2 190 bit
modulo operators which should be a dead giveaway as to where the
resources are going. You're also going to see some ridiculous timing
violations on these paths.

Each of these is going to have in the order of
 LUTs since it's basically doing a test-subtraction of a shifted value
of p for every bit.

At least the solution is simple, get a bigger FPGA or write a better
algorithm. Your choice.

Rate this post
 0 ▲ useful ▼ not useful
Oh i forgot to mention, if it were just one LUT per bit that would turn
out to be almost 18k LUTs per '%' in your code, i.e. 36k. Probably it's
more like 2 LUTs per bit or something...

Rate this post
 0 ▲ useful ▼ not useful
Lakshita J. wrote:
> x = (x - B) % p;
Try that combinatorial division (a modulo value is only the remainder of
a division) with a very simple "three-line-design": just divide 2 input
vectors and assign the result to an output vector. Start with two 8 bit
vectors, increase the length to 16, 32, 64, 128 and at last 190 bits and
look für the resource consumption and the speed/delay of the design.

This here gives 33k LUTs in a Spartan3 for a 160/80 bits design:
Beitrag "Re: Rechnen mit unsigned vs. signed und einer division"
And its rather slow: max. clock frequency is 1 MHz...   :-o

Rate this post
 0 ▲ useful ▼ not useful
Thanks for the valuable information.
operator % was taking lot of resources.I somehow manage to get down the
no of LUTS and now the consumption is 37226 out of 63400.
But there is problem regarding input output pin,the availability is 210
and
I am getting 574 because the input & output are of 192 bits.Due to this
implementation is getting failed.
Please provide any solution for this too.

Rate this post
 0 ▲ useful ▼ not useful
You need a chip with more I/O-pins. Or you find a solution to get your
data in and out in a serial or partial serial way.

Duke

Rate this post
 0 ▲ useful ▼ not useful
Hello,

I have used virtual input output concept to get down number of pins.The
synthesis ,implementation and bit stream generation is working
properly.but when I programmed it to FPGA,the output showing 0 results
and during implementation the summary report is showing router estimated
timing not net.
I have attached the timing report and output results.
If anybody knows how to use  virtual input output(VIO),then please
provide any solution to above problem.

Rate this post
 0 ▲ useful ▼ not useful
Lakshita J. wrote:
> but when I programmed it to FPGA,the output showing 0 results
Do you have a testbench, which shows the expected results?

Duke

Rate this post
 0 ▲ useful ▼ not useful
I have attached testbench file and simulation results.

Rate this post
 0 ▲ useful ▼ not useful
Lakshita J. wrote:
> I have used virtual input output concept to get down number of pins.The
> synthesis ,implementation and bit stream generation is working
> properly.but when I programmed it to FPGA,the output showing 0 results
> and during implementation the summary report is showing router estimated
> timing not net.
> I have attached the timing report and output results.
> If anybody knows how to use  virtual input output(VIO),then please
> provide any solution to above problem.

The given code simulates (after some small changes). Synthesis work only
with the xc7v2000tflg1925 device.

For debugging I suggest to use a clear smaller number than 190 for n.
Start with n = 8 or n = 16. Than you can more easely what's going wrong
with n = 190.

Duke

Rate this post
 0 ▲ useful ▼ not useful
> For debugging I suggest to use a clear smaller number than 190 for n.
> Start with n = 8 or n = 16. Than you can more easely what's going wrong
> with n = 190.

I tried for n=5,20,60,80,100 bits,I am getting some value but when I
move above 100 bits the Hardware server gets shut down and it
automatically gets disconnect.I have attached the screen shot for this
problem and also I have attached screen shot for 100 bit results.
Please provide any solution for this problem.

Rate this post
 0 ▲ useful ▼ not useful
So you find a limitation in Hardware server.
I never used Hardware server, so I can't say anything about it.

You need to find another way to communicate with you multiplier.
What hardware interfaces exists on your board? UART? Ethernet?

Duke

• $formula (LaTeX syntax)$