Posted on:

Hi I am working on a multiplicative inverse algorithm to implement ecc algorithm. I have made the Verilog code for this, The simulation is working properly but the synthesis report is showing LUT utilization 76776 out of 63400.I have applied DSP attribute to reduce no of LUTs and it reduces from 78000 to 76776 I am targeting Nexys4DDR FPGA board Please provide any solution to minimize LUT utilization.I have attached project summary and utilization report of this code.
(* use_dsp48 = "yes" *) module multiplicative_inverse #(parameter n = 190)(clk, reset, enable, p, a, x, result_ready); input clk, reset, enable; input wire [n1:0] p; input wire [n1:0] a ; output reg [n1:0] x; output reg result_ready; //flag to initialize variables when reset reg flag; reg [n1:0] Y; reg [n1:0] D; reg [n1:0] B; wire Y0 = (Y[0] == 0) ? 1 : 0; wire D0 = (D[0] == 0) ? 1 : 0; reg flagY0 = 1; reg flagD0 = 1; always @ (posedge clk) begin if (reset) begin x <= 0; flag <= 1; result_ready <= 0; end else begin if(result_ready) begin result_ready <= 0; flag <= 1; end else //Initialize the variables if (flag  enable) begin Y <= a; D <= p; B <= 1; x <= 0; flag <= 0; end else begin if (Y != 0) begin if (Y0 && flagY0) begin Y = Y >> 1; B = (B + (B[0]*p)) >> 1; end //end if else begin flagY0 = 0; end if (D0 && flagD0) begin D = D >> 1; x = (x + (x[0]*p)) >> 1; end //end if else begin flagD0 = 0; end if((flagY0 == 0) && (flagD0 == 0)) begin if (Y >= D) begin Y = Y  D; if(B < x) B = B + p ; B = (B  x) % p; end else begin if(D < Y) D = D + p; D = D  Y; if(x < B) x = x + p; x = (x  B) % p; end //else flagY0 = 1; flagD0 = 1; end //else end //end if else begin result_ready <= 1; end end end end endmodule 
:
Edited by Moderator
Posted on:

I'm guessing you're trying to find
such that
with
. I'm not quite sure as to how your algorithm is supposed to work since you're neither properly naming your signals nor giving any comments as to their functionality. However you're instantiating at least 2 190 bit modulo operators which should be a dead giveaway as to where the resources are going. You're also going to see some ridiculous timing violations on these paths. Each of these is going to have in the order of
LUTs since it's basically doing a testsubtraction of a shifted value of p for every bit. At least the solution is simple, get a bigger FPGA or write a better algorithm. Your choice.
Posted on:

Oh i forgot to mention, if it were just one LUT per bit that would turn out to be almost 18k LUTs per '%' in your code, i.e. 36k. Probably it's more like 2 LUTs per bit or something...
Posted on:

Lakshita J. wrote: > x = (x  B) % p; Try that combinatorial division (a modulo value is only the remainder of a division) with a very simple "threelinedesign": just divide 2 input vectors and assign the result to an output vector. Start with two 8 bit vectors, increase the length to 16, 32, 64, 128 and at last 190 bits and look für the resource consumption and the speed/delay of the design. This here gives 33k LUTs in a Spartan3 for a 160/80 bits design: Beitrag "Re: Rechnen mit unsigned vs. signed und einer division" And its rather slow: max. clock frequency is 1 MHz... :o
Posted on:

Thanks for the valuable information. operator % was taking lot of resources.I somehow manage to get down the no of LUTS and now the consumption is 37226 out of 63400. But there is problem regarding input output pin,the availability is 210 and I am getting 574 because the input & output are of 192 bits.Due to this implementation is getting failed. Please provide any solution for this too.
Posted on:

You need a chip with more I/Opins. Or you find a solution to get your data in and out in a serial or partial serial way. Duke
Posted on:

Hello, I have used virtual input output concept to get down number of pins.The synthesis ,implementation and bit stream generation is working properly.but when I programmed it to FPGA,the output showing 0 results and during implementation the summary report is showing router estimated timing not net. I have attached the timing report and output results. If anybody knows how to use virtual input output(VIO),then please provide any solution to above problem.
Posted on:

Lakshita J. wrote: > but when I programmed it to FPGA,the output showing 0 results Do you have a testbench, which shows the expected results? Duke
Posted on:

Lakshita J. wrote: > I have used virtual input output concept to get down number of pins.The > synthesis ,implementation and bit stream generation is working > properly.but when I programmed it to FPGA,the output showing 0 results > and during implementation the summary report is showing router estimated > timing not net. > I have attached the timing report and output results. > If anybody knows how to use virtual input output(VIO),then please > provide any solution to above problem. The given code simulates (after some small changes). Synthesis work only with the xc7v2000tflg1925 device. For debugging I suggest to use a clear smaller number than 190 for n. Start with n = 8 or n = 16. Than you can more easely what's going wrong with n = 190. Duke
Posted on:

> For debugging I suggest to use a clear smaller number than 190 for n. > Start with n = 8 or n = 16. Than you can more easely what's going wrong > with n = 190. I tried for n=5,20,60,80,100 bits,I am getting some value but when I move above 100 bits the Hardware server gets shut down and it automatically gets disconnect.I have attached the screen shot for this problem and also I have attached screen shot for 100 bit results. Please provide any solution for this problem.
Posted on:

So you find a limitation in Hardware server. I never used Hardware server, so I can't say anything about it. You need to find another way to communicate with you multiplier. What hardware interfaces exists on your board? UART? Ethernet? Duke