Hello, I have to write program in VHDL which calculate sqrt using Newton method. I wrote the code which seems to me to be ok but it does not work. Behavioral simulation gives proper output value but post synthesis (and launched on hardware) not. Program was implemented as state machine. Input value is an integer (used format is std_logic_vector), and output is fixed point (for calculation purposes input value was multiplied by 64^2 so output value has 6 LSB bits are fractional part). I used function to divide in vhdl from vhdlguru blogspot. In behavioral simulation calculating sqrt takes about 350 ns (Tclk=10 ns) but in post synthesis only 50 ns.
1 | |
2 | library ieee; |
3 | use ieee.std_logic_1164.all; |
4 | use ieee.std_logic_arith.all; |
5 | use ieee.std_logic_unsigned.all; |
6 | entity moore_sqrt is |
7 | port (clk : in std_logic; |
8 | enable : in std_logic; |
9 | input : in std_logic_vector (15 downto 0); |
10 | data_ready : out std_logic; |
11 | output : out std_logic_vector (31 downto 0) |
12 | );
|
13 | end moore_sqrt; |
14 | |
15 | architecture behavioral of moore_sqrt is |
16 | ------------------------------------------------------------
|
17 | function division (x : std_logic_vector; y : std_logic_vector) return std_logic_vector is |
18 | variable a1 : std_logic_vector(x'length-1 downto 0):=x; |
19 | variable b1 : std_logic_vector(y'length-1 downto 0):=y; |
20 | variable p1 : std_logic_vector(y'length downto 0):= (others => '0'); |
21 | variable i : integer:=0; |
22 | begin
|
23 | for i in 0 to y'length-1 loop |
24 | p1(y'length-1 downto 1) := p1(y'length-2 downto 0); |
25 | p1(0) := a1(x'length-1); |
26 | a1(x'length-1 downto 1) := a1(x'length-2 downto 0); |
27 | p1 := p1-b1; |
28 | if(p1(y'length-1) ='1') then |
29 | a1(0) :='0'; |
30 | p1 := p1+b1; |
31 | else
|
32 | a1(0) :='1'; |
33 | end if; |
34 | end loop; |
35 | return a1; |
36 | end division; |
37 | --------------------------------------------------------------
|
38 | type state_type is (s0, s1, s2, s3, s4, s5, s6); --type of state machine |
39 | signal current_state,next_state: state_type; --current and next state declaration |
40 | |
41 | signal xk : std_logic_vector (31 downto 0); |
42 | signal temp : std_logic_vector (31 downto 0); |
43 | signal latched_input : std_logic_vector (15 downto 0); |
44 | signal iterations : integer := 0; |
45 | signal max_iterations : integer := 10; --corresponds with accuracy |
46 | |
47 | begin
|
48 | |
49 | process (clk,enable) |
50 | begin
|
51 | if enable = '0' then |
52 | current_state <= s0; |
53 | elsif clk'event and clk = '1' then |
54 | current_state <= next_state; --state change |
55 | end if; |
56 | end process; |
57 | |
58 | --state machine
|
59 | process (current_state) |
60 | begin
|
61 | case current_state is |
62 | when s0 => -- reset |
63 | output <= "00000000000000000000000000000000"; |
64 | data_ready <= '0'; |
65 | next_state <= s1; |
66 | when s1 => -- latching input data |
67 | latched_input <= input; |
68 | next_state <= s2; |
69 | when s2 => -- start calculating |
70 | -- initial value is set as a half of input data
|
71 | output <= "00000000000000000000000000000000"; |
72 | data_ready <= '0'; |
73 | xk <= "0000000000000000" & division(latched_input, "0000000000000010"); |
74 | next_state <= s3; |
75 | iterations <= 0; |
76 | when s3 => -- division |
77 | temp <= division ("0000" & latched_input & "000000000000", xk); |
78 | next_state <= s4; |
79 | when s4 => -- calculating |
80 | if(iterations < max_iterations) then |
81 | xk <= xk + temp; |
82 | next_state <= s5; |
83 | iterations <= iterations + 1; |
84 | else
|
85 | next_state <= s6; |
86 | end if; |
87 | when s5 => -- shift logic right by 1 |
88 | xk <= division(xk, "00000000000000000000000000000010"); |
89 | next_state <= s3; |
90 | when s6 => -- stop - proper data |
91 | -- output <= division(xk, "00000000000000000000000001000000"); --the nearest integer value
|
92 | output <= xk; -- fixed point 24.6, sqrt = output/64; |
93 | data_ready <= '1'; |
94 | end case; |
95 | end process; |
96 | end behavioral; |
I have only little experience with VHDL and I have no idea what can I do to fix problem. I tried to exclude other process which was for calculation but it also did not work. I attached printscreens of results behavioral and post -synthesis simulations I hope you can help me. Platform: Zynq ZedBoard IDE: Vivado 2014.4 Regards, Michal
Michal wrote: > I used function to divide in vhdl from vhdlguru blogspot. Did you check how fast is that incredible bunch of logic you implemented there? Just to get an impression: have a look at the RTL schematic. > (Tclk=10 ns) I'm fairly sure that your huge division algorithm is much(!!!!) slower. Did you set a timing constraint for that clock? Just to be sure: run your timing simulation with a 1MHz clock...
:
Edited by Moderator
I've checked this before - time constraints were ok. I launch post-synthesis simulation with 1 MHz clock but result is the same as with 100 MHz.
The mismatch between the simulations is caused by the incomplete sensitivity list of your 2nd process. This process is full of latches and combinatorial loops which are hidden by the incomplete list in the behavioral simulation but will show up in the hardware implementation. The easiest way to fix the problem, is to get rid of the two process implementation of your state machine. Use only one clocked process. Once you have don this, the problem Lothar mentioned will show up, that is your design will only run at lower clock speeds.
Lattice User wrote: > This process is full of latches and combinatorial loops Ouch.... @Michal: the synthesizer must have shown a big bunch of messages you should analyze. To keep it short: you can neglect errors, warnings and even infos from the synthesizer only when you understand the according message fully and you want such a behaviour. For combinatorial loops try that with Google translator: http://www.lothar-miller.de/s9y/categories/36-Kombinatorische-Schleife And this here is your problem: http://www.lothar-miller.de/s9y/archives/43-Ein-oder-Zwei-Prozess-Schreibweise-fuer-FSM.html