EmbDev.net

Forum: FPGA, VHDL & Verilog LUT utilization is 121 %


Author: Lakshita Jaiswal (lakshita)
Posted on:
Attached files:

Rate this post
0 useful
not useful
Hi
I am working on a multiplicative inverse algorithm to implement ecc 
algorithm. I have made the Verilog code for this, The simulation is 
working properly but the synthesis report is showing LUT utilization 
76776 out of 63400.I have applied DSP attribute to reduce no of LUTs and 
it reduces from 78000 to 76776
I am targeting Nexys4DDR FPGA board
Please provide any solution to minimize LUT utilization.I have attached 
project summary and utilization report of this code.
(* use_dsp48 = "yes" *)
module multiplicative_inverse #(parameter n = 190)(clk, reset, enable, p, a, x, result_ready);
  input clk, reset, enable;
  input wire [n-1:0] p;
  input wire [n-1:0] a ;
   output reg [n-1:0] x;
  output reg result_ready;

  //flag to initialize variables when reset
 reg flag;
  
   reg [n-1:0] Y;
   reg [n-1:0] D; 
   reg [n-1:0] B;
 
   wire Y0 = (Y[0] == 0) ? 1 : 0;
   wire D0 = (D[0] == 0) ? 1 : 0;

  reg flagY0 = 1;
  reg flagD0 = 1;
 
 

  

  always @ (posedge clk) begin
    if (reset) begin
      x <= 0;
      flag <= 1;
      result_ready <= 0;
    end
    else  begin
      if(result_ready) begin
        result_ready <= 0;
        flag <= 1;
      end
      else //Initialize the variables
      if (flag || enable) begin
        Y <= a;
        D <= p;
        B <= 1;
        x <= 0;
        flag <= 0;
      end
      else begin
    
        if (Y != 0) begin
        
          if (Y0 && flagY0) begin
           
            Y = Y >> 1;
           
      
            B = (B + (B[0]*p)) >> 1;
     
          end //end if
        
          else begin
            flagY0 = 0;
          end
          if (D0 && flagD0) begin
      
            D = D >> 1;
            x = (x + (x[0]*p)) >> 1;
         
          end //end if
          else begin
            flagD0 = 0;
          end
         if((flagY0 == 0) && (flagD0 == 0)) begin
         
            if (Y >= D) begin
              Y = Y - D;
              if(B < x) 
            
              B = B + p ;
              B = (B - x) % p;
              
            end
            else begin
              if(D < Y) 
              D = D + p;
              D = D - Y;
              if(x < B)
              x = x + p;
              x = (x - B) % p;
            end //else
            flagY0 = 1;
            flagD0 = 1;
          end //else
        end //end if
        else begin
          result_ready <= 1;
        end
      end
    end
  end
endmodule

: Edited by Moderator
Author: Guest (Guest)
Posted on:

Rate this post
0 useful
not useful
I'm guessing you're trying to find
 such that
 with
.

I'm not quite sure as to how your algorithm is supposed to work since 
you're neither properly naming your signals nor giving any comments as 
to their functionality. However you're instantiating at least 2 190 bit 
modulo operators which should be a dead giveaway as to where the 
resources are going. You're also going to see some ridiculous timing 
violations on these paths.

Each of these is going to have in the order of
 LUTs since it's basically doing a test-subtraction of a shifted value 
of p for every bit.

At least the solution is simple, get a bigger FPGA or write a better 
algorithm. Your choice.

Author: Guest (Guest)
Posted on:

Rate this post
0 useful
not useful
Oh i forgot to mention, if it were just one LUT per bit that would turn 
out to be almost 18k LUTs per '%' in your code, i.e. 36k. Probably it's 
more like 2 LUTs per bit or something...

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
Lakshita J. wrote:
> x = (x - B) % p;
Try that combinatorial division (a modulo value is only the remainder of 
a division) with a very simple "three-line-design": just divide 2 input 
vectors and assign the result to an output vector. Start with two 8 bit 
vectors, increase the length to 16, 32, 64, 128 and at last 190 bits and 
look für the resource consumption and the speed/delay of the design.

This here gives 33k LUTs in a Spartan3 for a 160/80 bits design:
Beitrag "Re: Rechnen mit unsigned vs. signed und einer division"
And its rather slow: max. clock frequency is 1 MHz...   :-o

Author: Lakshita Jaiswal (lakshita)
Posted on:
Attached files:

Rate this post
0 useful
not useful
Thanks for the valuable information.
operator % was taking lot of resources.I somehow manage to get down the 
no of LUTS and now the consumption is 37226 out of 63400.
But there is problem regarding input output pin,the availability is 210 
and
I am getting 574 because the input & output are of 192 bits.Due to this 
implementation is getting failed.
Please provide any solution for this too.

Author: Duke Scarring (Guest)
Posted on:

Rate this post
0 useful
not useful
You need a chip with more I/O-pins. Or you find a solution to get your 
data in and out in a serial or partial serial way.

Duke

Author: Lakshita Jaiswal (lakshita)
Posted on:
Attached files:

Rate this post
0 useful
not useful
Hello,

I have used virtual input output concept to get down number of pins.The 
synthesis ,implementation and bit stream generation is working 
properly.but when I programmed it to FPGA,the output showing 0 results 
and during implementation the summary report is showing router estimated 
timing not net.
I have attached the timing report and output results.
If anybody knows how to use  virtual input output(VIO),then please 
provide any solution to above problem.

Author: Duke Scarring (Guest)
Posted on:

Rate this post
0 useful
not useful
Lakshita J. wrote:
> but when I programmed it to FPGA,the output showing 0 results
Do you have a testbench, which shows the expected results?

Duke

Author: Lakshita Jaiswal (lakshita)
Posted on:
Attached files:

Rate this post
0 useful
not useful
I have attached testbench file and simulation results.

Author: Duke Scarring (Guest)
Posted on:
Attached files:

Rate this post
0 useful
not useful
Lakshita J. wrote:
> I have used virtual input output concept to get down number of pins.The
> synthesis ,implementation and bit stream generation is working
> properly.but when I programmed it to FPGA,the output showing 0 results
> and during implementation the summary report is showing router estimated
> timing not net.
> I have attached the timing report and output results.
> If anybody knows how to use  virtual input output(VIO),then please
> provide any solution to above problem.

The given code simulates (after some small changes). Synthesis work only 
with the xc7v2000tflg1925 device.

For debugging I suggest to use a clear smaller number than 190 for n.
Start with n = 8 or n = 16. Than you can more easely what's going wrong 
with n = 190.

Duke

Author: Lakshita Jaiswal (lakshita)
Posted on:
Attached files:

Rate this post
0 useful
not useful
> For debugging I suggest to use a clear smaller number than 190 for n.
> Start with n = 8 or n = 16. Than you can more easely what's going wrong
> with n = 190.

I tried for n=5,20,60,80,100 bits,I am getting some value but when I 
move above 100 bits the Hardware server gets shut down and it 
automatically gets disconnect.I have attached the screen shot for this 
problem and also I have attached screen shot for 100 bit results.
Please provide any solution for this problem.

Author: Duke Scarring (Guest)
Posted on:

Rate this post
0 useful
not useful
So you find a limitation in Hardware server.
I never used Hardware server, so I can't say anything about it.

You need to find another way to communicate with you multiplier.
What hardware interfaces exists on your board? UART? Ethernet?

Duke

Reply

Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [vhdl]VHDL code[/vhdl]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]




Bild automatisch verkleinern, falls nötig