EmbDev.net

Forum: FPGA, VHDL & Verilog Reusing registers in VHDL FSM code


Author: Darian Reyes (Guest)
Posted on:
Attached files:
  • preview image for 1.png
    1.png
    15.5 KB, 32 downloads
  • preview image for 2.png
    2.png
    28.5 KB, 59 downloads
  • preview image for 3.jpg
    3.jpg
    234 KB, 68 downloads

Rate this post
0 useful
not useful
Hello,

I need to write a Finite State Machine (FSM) in VHDL code and  want to 
have several computations being processed at the same time (a standard 
pipeline). In every state I have several operations to be calculated and 
I employ registers for the result of each one. I strongly need to reuse 
these registers, for example: Register 1 is filled in State 1 (as a 
result of a multiplication) and it is used in the State 2 and State 3 
(as parameter of other operations), then in the State 4, I want to save 
a new operation result (another multiplication) in Register 1 reusing 
it.

My code works in Simulation in Xilinx Vivado 2019, but when I implement 
the desing in a real FPGA (Basys 3 Artix-7) it doesn't work. I realized 
that the problem is that the correct values are not saved when I reuse 
the registers. Sometimes, the first time I reuse them, they keep the 
correct value, but already in the second reuse in later FSM states, the 
stored values are not correct, I mean, they do not correspond to the 
result of the operation that I am trying to save in the register.

Next, an example of my FSM design:
LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.ALL;

ENTITY test1_arith IS
GENERIC (
    ap_bit_width : positive := 4;
    ap_latency : positive := 2
);
PORT (
    I1 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0);
    I2 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0);
    I3 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0);
    O1 : OUT STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0);
    ap_clk : IN STD_LOGIC;
    ap_rst : IN STD_LOGIC;
    ap_start : IN STD_LOGIC;
    ap_done : OUT STD_LOGIC;
    ap_idle : OUT STD_LOGIC;
    ap_ready : OUT STD_LOGIC
);
END;

ARCHITECTURE test1_arith_arch OF test1_arith IS
    ATTRIBUTE CORE_GENERATION_INFO : STRING;
    ATTRIBUTE CORE_GENERATION_INFO OF test1_arith_arch : ARCHITECTURE IS "Test,VHDLbyMOEA,{HLS_SYN_LAT=2}";
    CONSTANT ap_const_logic_1 : STD_LOGIC := '1';
    CONSTANT ap_const_logic_0 : STD_LOGIC := '0';
    TYPE state IS (state_1,state_2,state_3);
    SIGNAL state_present: state;
    SIGNAL state_future: state; 
    SIGNAL Flag: Integer:=0;
     --Signal RF : STD_LOGIC_VECTOR_array;
    FUNCTION ALU ( Op: IN integer range 0 TO 23;
     A, B: IN STD_LOGIC_VECTOR (ap_bit_width - 1 downto 0) )
    RETURN std_logic_vector is variable Result : std_logic_vector(ap_bit_width - 1 downto 0);        

    variable A_int: Integer:=0;
    variable B_int: Integer:=0;
    variable Result_int: Integer:=0;
    begin
    A_int := to_integer(unsigned(A));
    B_int := to_integer(unsigned(B));
    With Op Select Result_int:=
        to_integer(unsigned(NOT A)) When 0,
        to_integer(unsigned(A AND B)) When 1,
        to_integer(unsigned(A OR B)) When 2,
        to_integer(unsigned(A NAND B)) When 3,
        to_integer(unsigned(A NOR B)) When 4,
        to_integer(unsigned(A XOR B)) When 5,
        to_integer(unsigned(A XNOR B)) When 6,
        (A_int + B_int) When 7,
        (A_int - B_int) When 8,
        (A_int * B_int) When 9,
        (A_int / B_int) When 10,
        ABS(A_int) When 11,
        (A_int ** B_int) When 12,
        (A_int MOD B_int) When 13,
        to_integer(unsigned(A) & unsigned(B)) When 14,
        to_integer(unsigned(A) SLL B_int) When 15,
        to_integer(unsigned(A) SRL B_int) When 16,
        to_integer(unsigned(A) SLA B_int) When 17,
        to_integer(unsigned(A) SRA B_int) When 18,
        to_integer(unsigned(A) ROL B_int) When 19,
        to_integer(unsigned(A) ROR B_int) When 20,
        to_integer(unsigned(A) & unsigned(B)) When 21,
        to_integer(unsigned(A) & unsigned(B)) When 22,
                 0   When others;
    return STD_LOGIC_VECTOR (TO_UNSIGNED (Result_int, (ap_bit_width)));
    END FUNCTION;

    SHARED VARIABLE R1:std_logic_vector(ap_bit_width - 1 downto 0);    


    BEGIN

    OP_FSM : PROCESS (state_present)

     BEGIN 
    CASE state_present IS

    WHEN state_1=> 
    R1 := ALU(Op => 7 ,A => I1,B => I2);
    Flag<=1;
    IF (Flag=1) THEN
    state_future <= state_2;
    END IF;

    WHEN state_2=> 
    R1:= ALU(Op => 7 ,A => R1, B => I3);
    Flag<=2;
    IF (Flag=2) THEN
    state_future <= state_3;
    END IF;

    WHEN state_3=> 
    O1<= ALU(Op => 7 ,A => R1,B => "0001");
    Flag<=3;
    IF (Flag=3) THEN
    state_future <= state_1;
    END IF;
    END CASE;
    END PROCESS OP_FSM;

    CLK_FSM : PROCESS (ap_clk)
    BEGIN
    IF (ap_clk = '1' AND ap_clk'EVENT) THEN
    state_present <= state_future;
    END IF;
    END PROCESS CLK_FSM;

END test1_arith_arch;

In this case, I want to reuse R1 and it works well in Simulation with 
Xilinx Vivado (1 + 4 + 0 + 1 = 6):

Figure 1.

Unfortunately, in the Basys 3 FPGA Artix-7 I don't get the correct 
results:

Figure 2.

In this figure, I show the Case 10 in a FPGA, it should get 6 (1 + 4 + 0 
+ 1) as result, but it gets 14 instead:

Figure 3.

In the tests that I have been doing I realized that it works better when 
before assigning a new value in the registry the value of the record is 
made zero before reassigning a value, for example:
WHEN state_3=> 
    R4<="0000"
    IF( R4 = "0000") then
    R4<= ALU(Op => 7 ,A=> R2,B=> R3, C =>"0000");
    Flag <=3;
    IF (Flag =3) THEN
    state_future <= state_4;
END IF;
END IF;

Using this form I can reuse a register once, the second time I want to 
reassign a value to the register, incorrect values are shown in the 
output.

I declarated the registers as SHARED VARIABLE and SIGNALS and I have the 
same problem with both.

I appreciate any suggestion or idea, thanks a lot.

Author: Lothar M. (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
Darian Reyes wrote:
> OP_FSM : PROCESS (state_present)
This sensitiviy list is incomplete! I2, I3, R1 and this strange "Flag" 
is missing. Therefore the simulation is WRONG! The toolchain tells you 
that with a "Info" or a "Warning"...

What the heck is that thing with this "Flag"? Where did you see this 
kind of coding?

One word about "variables" espacially "shared variables": you don't need 
both of them at all.

Author: Vancouver (Guest)
Posted on:

Rate this post
0 useful
not useful
You have implemented the ALU logic completely as a large combinatorial 
function. This will result in an incredible large and incredible slow 
unstructured sea of connected lookup tables.
For example: One of the alu function is a division. Make yourself 
familar with division algorithms for hardware and then try to imagine 
how they would behave if implemented without any pipelining or 
sequential elements.

Ok, let us assume for the moment, that your alu is functionally 
correct. Anyway, it will be slooooow. On the other hand, your state 
machine does a state transition in every clock cycle. So how fast is 
your clock? Lets say, 20MHz? That is, your alu has 50ns of time for 
computation. This is fine for AND and XOR, but surely not for 
unpipelined MULT and DIV.

Did you set any timing constraints in your design? Does Vivado know 
about which frequency your design is intended to run at? If yes, does 
the timing report complain about any timing violations?

When looking at your code, I don't see a VHDL design. I see a piece of 
software that somebody has translated into VHDL statements. There is a 
good reason why the simulation is correct, and the hardware fails: It is 
a simulation model, not a hardware design. There is almost no chance for 
the synthesis tools to translate this into an efficient hardware.

Before starting with VHDL, learn about digital design, how to implement 
arithmetics if you only have gates and flipflops. Expect that division 
in hardware is much more complicated than just writing a "/". Then come 
back to VHDL and write your model completely new. Do not use VHDL to 
implement an algorithm. You will fail with that (everybody would fail 
with that, but some people persistently reject to understand that). 
Develop an architecture first, and then use VHDL to describe this 
architecture.

Author: Lothar M. (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
Vancouver wrote:
> use VHDL to describe this architecture.
Just to confirm that statement. VHDL ist NOT a programming language. 
Its a hardware description language. So there must be a kind of 
"picture" (at least in brain) that can be described with VH*D*L.
Afer syntheszing your desgin have a look for the RTL-schematics and chek 
whether it matches your "picture" or not. If not, then you have to 
change your VHDL description until its alsmost the same.

> not for unpipelined MULT
The SPARTAN 3 FPGA on the eval board has hardware multipliers that can 
handle up to 18x18 bit data width: 
https://www.xilinx.com/support/documentation/application_notes/xapp467.pdf
With the description above and its 4 bit input they can be used and the 
20MHz will cause no problem here.

>>> (A_int / B_int)
> division in hardware is much more complicated than just writing a "/"
Indeed!

Reply

Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [vhdl]VHDL code[/vhdl]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]




Bild automatisch verkleinern, falls nötig