# Forum: FPGA, VHDL & Verilog Reusing registers in VHDL FSM code

Rate this post
 0 ▲ useful ▼ not useful
Hello,

I need to write a Finite State Machine (FSM) in VHDL code and  want to
have several computations being processed at the same time (a standard
pipeline). In every state I have several operations to be calculated and
I employ registers for the result of each one. I strongly need to reuse
these registers, for example: Register 1 is filled in State 1 (as a
result of a multiplication) and it is used in the State 2 and State 3
(as parameter of other operations), then in the State 4, I want to save
a new operation result (another multiplication) in Register 1 reusing
it.

My code works in Simulation in Xilinx Vivado 2019, but when I implement
the desing in a real FPGA (Basys 3 Artix-7) it doesn't work. I realized
that the problem is that the correct values are not saved when I reuse
the registers. Sometimes, the first time I reuse them, they keep the
correct value, but already in the second reuse in later FSM states, the
stored values are not correct, I mean, they do not correspond to the
result of the operation that I am trying to save in the register.

Next, an example of my FSM design:

 LIBRARY IEEE; USE IEEE.std_logic_1164.all; USE IEEE.numeric_std.ALL; ENTITY test1_arith IS GENERIC ( ap_bit_width : positive := 4; ap_latency : positive := 2 ); PORT ( I1 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0); I2 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0); I3 : IN STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0); O1 : OUT STD_LOGIC_VECTOR(ap_bit_width - 1 downto 0); ap_clk : IN STD_LOGIC; ap_rst : IN STD_LOGIC; ap_start : IN STD_LOGIC; ap_done : OUT STD_LOGIC; ap_idle : OUT STD_LOGIC; ap_ready : OUT STD_LOGIC ); END; ARCHITECTURE test1_arith_arch OF test1_arith IS ATTRIBUTE CORE_GENERATION_INFO : STRING; ATTRIBUTE CORE_GENERATION_INFO OF test1_arith_arch : ARCHITECTURE IS "Test,VHDLbyMOEA,{HLS_SYN_LAT=2}"; CONSTANT ap_const_logic_1 : STD_LOGIC := '1'; CONSTANT ap_const_logic_0 : STD_LOGIC := '0'; TYPE state IS (state_1,state_2,state_3); SIGNAL state_present: state; SIGNAL state_future: state; SIGNAL Flag: Integer:=0; --Signal RF : STD_LOGIC_VECTOR_array; FUNCTION ALU ( Op: IN integer range 0 TO 23; A, B: IN STD_LOGIC_VECTOR (ap_bit_width - 1 downto 0) ) RETURN std_logic_vector is variable Result : std_logic_vector(ap_bit_width - 1 downto 0); variable A_int: Integer:=0; variable B_int: Integer:=0; variable Result_int: Integer:=0; begin A_int := to_integer(unsigned(A)); B_int := to_integer(unsigned(B)); With Op Select Result_int:= to_integer(unsigned(NOT A)) When 0, to_integer(unsigned(A AND B)) When 1, to_integer(unsigned(A OR B)) When 2, to_integer(unsigned(A NAND B)) When 3, to_integer(unsigned(A NOR B)) When 4, to_integer(unsigned(A XOR B)) When 5, to_integer(unsigned(A XNOR B)) When 6, (A_int + B_int) When 7, (A_int - B_int) When 8, (A_int * B_int) When 9, (A_int / B_int) When 10, ABS(A_int) When 11, (A_int ** B_int) When 12, (A_int MOD B_int) When 13, to_integer(unsigned(A) & unsigned(B)) When 14, to_integer(unsigned(A) SLL B_int) When 15, to_integer(unsigned(A) SRL B_int) When 16, to_integer(unsigned(A) SLA B_int) When 17, to_integer(unsigned(A) SRA B_int) When 18, to_integer(unsigned(A) ROL B_int) When 19, to_integer(unsigned(A) ROR B_int) When 20, to_integer(unsigned(A) & unsigned(B)) When 21, to_integer(unsigned(A) & unsigned(B)) When 22, 0 When others; return STD_LOGIC_VECTOR (TO_UNSIGNED (Result_int, (ap_bit_width))); END FUNCTION; SHARED VARIABLE R1:std_logic_vector(ap_bit_width - 1 downto 0); BEGIN OP_FSM : PROCESS (state_present) BEGIN CASE state_present IS WHEN state_1=> R1 := ALU(Op => 7 ,A => I1,B => I2); Flag<=1; IF (Flag=1) THEN state_future <= state_2; END IF; WHEN state_2=> R1:= ALU(Op => 7 ,A => R1, B => I3); Flag<=2; IF (Flag=2) THEN state_future <= state_3; END IF; WHEN state_3=> O1<= ALU(Op => 7 ,A => R1,B => "0001"); Flag<=3; IF (Flag=3) THEN state_future <= state_1; END IF; END CASE; END PROCESS OP_FSM; CLK_FSM : PROCESS (ap_clk) BEGIN IF (ap_clk = '1' AND ap_clk'EVENT) THEN state_present <= state_future; END IF; END PROCESS CLK_FSM; END test1_arith_arch;

In this case, I want to reuse R1 and it works well in Simulation with
Xilinx Vivado (1 + 4 + 0 + 1 = 6):

Figure 1.

Unfortunately, in the Basys 3 FPGA Artix-7 I don't get the correct
results:

Figure 2.

In this figure, I show the Case 10 in a FPGA, it should get 6 (1 + 4 + 0
+ 1) as result, but it gets 14 instead:

Figure 3.

In the tests that I have been doing I realized that it works better when
before assigning a new value in the registry the value of the record is
made zero before reassigning a value, for example:

 WHEN state_3=> R4<="0000" IF( R4 = "0000") then R4<= ALU(Op => 7 ,A=> R2,B=> R3, C =>"0000"); Flag <=3; IF (Flag =3) THEN state_future <= state_4; END IF; END IF;

Using this form I can reuse a register once, the second time I want to
reassign a value to the register, incorrect values are shown in the
output.

I declarated the registers as SHARED VARIABLE and SIGNALS and I have the
same problem with both.

I appreciate any suggestion or idea, thanks a lot.

Rate this post
 0 ▲ useful ▼ not useful
Darian Reyes wrote:
> OP_FSM : PROCESS (state_present)
This sensitiviy list is incomplete! I2, I3, R1 and this strange "Flag"
is missing. Therefore the simulation is WRONG! The toolchain tells you
that with a "Info" or a "Warning"...

What the heck is that thing with this "Flag"? Where did you see this
kind of coding?

One word about "variables" espacially "shared variables": you don't need
both of them at all.

Rate this post
 0 ▲ useful ▼ not useful
You have implemented the ALU logic completely as a large combinatorial
function. This will result in an incredible large and incredible slow
unstructured sea of connected lookup tables.
For example: One of the alu function is a division. Make yourself
familar with division algorithms for hardware and then try to imagine
how they would behave if implemented without any pipelining or
sequential elements.

Ok, let us assume for the moment, that your alu is functionally
correct. Anyway, it will be slooooow. On the other hand, your state
machine does a state transition in every clock cycle. So how fast is
your clock? Lets say, 20MHz? That is, your alu has 50ns of time for
computation. This is fine for AND and XOR, but surely not for
unpipelined MULT and DIV.

Did you set any timing constraints in your design? Does Vivado know
about which frequency your design is intended to run at? If yes, does
the timing report complain about any timing violations?

When looking at your code, I don't see a VHDL design. I see a piece of
software that somebody has translated into VHDL statements. There is a
good reason why the simulation is correct, and the hardware fails: It is
a simulation model, not a hardware design. There is almost no chance for
the synthesis tools to translate this into an efficient hardware.

Before starting with VHDL, learn about digital design, how to implement
arithmetics if you only have gates and flipflops. Expect that division
in hardware is much more complicated than just writing a "/". Then come
back to VHDL and write your model completely new. Do not use VHDL to
implement an algorithm. You will fail with that (everybody would fail
with that, but some people persistently reject to understand that).
Develop an architecture first, and then use VHDL to describe this
architecture.

Rate this post
 0 ▲ useful ▼ not useful
Vancouver wrote:
> use VHDL to describe this architecture.
Just to confirm that statement. VHDL ist NOT a programming language.
Its a hardware description language. So there must be a kind of
"picture" (at least in brain) that can be described with VH*D*L.
Afer syntheszing your desgin have a look for the RTL-schematics and chek
whether it matches your "picture" or not. If not, then you have to
change your VHDL description until its alsmost the same.

> not for unpipelined MULT
The SPARTAN 3 FPGA on the eval board has hardware multipliers that can
handle up to 18x18 bit data width:
https://www.xilinx.com/support/documentation/application_notes/xapp467.pdf
With the description above and its 4 bit input they can be used and the
20MHz will cause no problem here.

>>> (A_int / B_int)
> division in hardware is much more complicated than just writing a "/"
Indeed!

• $formula (LaTeX syntax)$