EmbDev.net

Forum: FPGA, VHDL & Verilog Poor RTL optimization


Author: Kurt English (Company: RPI) (khenglish)
Posted on:
Attached files:

Rate this post
0 useful
not useful
So my Verilog file for an asynchronous bi-directional memory mostly 
works, but it has some problems.

1.  The layout is EXTREMELY inefficient, which can be seen in the RTL 
viewer.  There is a 3-input AND-gate at the enable of every memory bit. 
This AND-gate has the same identical inputs for every memory bit.  Every 
single one of these AND-gates could be eliminated since the memory 
latches are never supposed to be disabled.  Fixing this would probably 
cut FPGA usage by over 50%.

2. The READY signal is at times undefined.  It should never be 
undefined.  How can I fix this?

3. I can't get else statements to work.  When trying to use them I 
always get a memory initialization error when compiling.

Below is the Verilog.  In the screenshot I also have the timing diagram.
module sysmem (
A,
WR,
D,
READY
);

//input ports
input [15:0] A;
input WR;

//output ports
inout [7:0] D;
output READY;

//registers/wires
reg [7:0] Dout;
reg [7:0] memdat [0:255];
reg READY;

initial begin
  $readmemh("sysinit.txt", memdat);
end

assign D = (WR) ? Dout : 8'bz;

always @ (*)
  begin
    begin: rdyset
      if (memdat[A] == D)
        READY = 1'b1;
      if (memdat[A] != D)
        READY = 1'b0;
    end
    begin: memread
      if (WR)
        Dout = memdat[A];
    end
    begin: memwrite
      if (!WR)
        memdat[A] = D;
    end
  end
endmodule

: Edited by Moderator
Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
Kurt English wrote:
> There is a 3-input AND-gate at the enable of every memory bit.
Where?

> FPGA
Which one and what toolchain do you use?

> Fixing this would probably cut FPGA usage by over 50%.
You should use one of the synchronous ram blocks. That would be the most 
efficient way to implement memory ON a FPGA.

Your cannot use your ram the way you did it in the timing diagram in 
real life!
Enabling the write signal all over the time will cause spurious writes 
to any memory register due to glitches in the adress lines. You must 
have any kind of enable signal to tell your ram that now data and 
adress are valid.
Just have a look at the data sheet of some discrete asynchronous rams.

BTW: Pls wrap your code in the code tags
   [c]
     your code
   [/c]

: Edited by Moderator
Author: Kurt English (Company: RPI) (khenglish)
Posted on:

Rate this post
0 useful
not useful
Fixed it.  Adding a buffer instead of directly accessing the memory got 
rid of the hundreds of extra inferred latches.  READY was screwed up due 
to the testing schematic trying to write to the D bus the same time the 
memory was writing to it.
module sysmem (
A,
WR,
D,
READY
);

//input ports
input [15:0] A;
input WR;

//output ports
inout [7:0] D;
output READY;

//registers/wires
reg [7:0] Dout;
reg [7:0] memdat [0:255];
reg READY;
reg [7:0] membuf;

initial begin
  $readmemh("sysinit.txt", memdat);
end

assign D = (WR) ? Dout : 8'bz;

always @ (A or D or WR)
  begin
  membuf = memdat[A];
    begin: rdyset
      if (membuf == D)
        READY = 1'b1;
      else if (membuf != D)
        READY = 1'b0;
      else
        READY = 1'b0;
    end
  end
always @ (A)
  begin
    begin: memread
      Dout = memdat[A];
    end
  end
always @ (posedge WR)
  begin
    begin: memwrite
      memdat[A] = D;
    end
  end
endmodule

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
Its slightly more complex.
This code implements latches (which is usually not a good idea or 
practice):
always @ (*)
  begin
    begin: memwrite
      if (!WR)
        memdat[A] = D;
    end
  end

And the "new" code implements flipflops
always @ (posedge WR)
  begin
    begin: memwrite
      memdat[A] = D;
    end
  end

But keep in mind: you added one clock domain to your design. This may be 
not reproducable sync to your master clock!

Author: Kurt English (Company: RPI) (khenglish)
Posted on:

Rate this post
0 useful
not useful
Lothar Miller wrote:

> But keep in mind: you added one clock domain to your design. This may be
> not reproducable sync to your master clock!

Yeah I'm not sure how the CPU handles WR timing or when it checks the 
READY flag yet.  Definitely going to have to look it up and make sure 
what I did works.  I don't think the READY flag is still quite right, 
but it may be good enough to work.  Since I have to write the CPU 
microcode I'll figure it out then :)

Also to answer your earlier questions this is on an old Cyclone II FPGA. 
I cannot use the memory tool in Quartus because it does not work for 
asynchronous memory on the Cyclone 2, while memory implemented directly 
in VHDL or Verilog works fine.  The other big reason is because the 
Verilog needs to be portable so that another group member can place and 
route in Cadence to simulate a fab'd chip (simulating an 8080, possibly 
a Z80 for a final course project).

Thanks for your help!

Author: Kurt English (Company: RPI) (khenglish)
Posted on:
Attached files:

Rate this post
0 useful
not useful
So it turns out that despite having a better looking RTL the memory is 
still using an exorbitant amount of FPGA area.  I made another memory 
that's simpler (not writable) but is 7 times larger (56bit word size) 
and it takes 1/8th the FPGA logic.
module sysmem (
A,
WR,
D,
READY
);

//input ports
input [7:0] A;
input WR;

//output ports
inout [7:0] D;
output READY;

//registers/wires
reg [7:0] Dout;
reg [7:0] memdat [0:127];
reg READY;
reg [7:0] membuf;

initial begin
  $readmemh("sysinit.txt", memdat);
end

assign D = (WR) ? Dout : 8'bz;

always @ (*)
  begin

  membuf = memdat[A];
    begin: rdyset
      if (membuf == D)
        READY = 1'b1;
      else
        READY = 1'b0;
    end
  end
always @ (A)
  begin
    begin: memread
      Dout = memdat[A];
    end
  end
always @ (posedge WR)
  begin
    begin: memwrite
      memdat[A] = D;
    end
  end
endmodule

I shrank the memory size so it would compile faster.  Also for some 
reason this is compiling into a dual port memory.  Not sure why.  A 
screenshot of the RTL is attached.

If you know how to make the memory take less area I'd like to hear it.

Reply

Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [vhdl]VHDL code[/vhdl]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]




Bild automatisch verkleinern, falls nötig
Note: the original post is older than 6 months. Please don't ask any new questions in this thread, but start a new one.