Posted on:

Hello, I have to write program in VHDL which calculate sqrt using Newton method. I wrote the code which seems to me to be ok but it does not work. Behavioral simulation gives proper output value but post synthesis (and launched on hardware) not. Program was implemented as state machine. Input value is an integer (used format is std_logic_vector), and output is fixed point (for calculation purposes input value was multiplied by 64^2 so output value has 6 LSB bits are fractional part). I used function to divide in vhdl from vhdlguru blogspot. In behavioral simulation calculating sqrt takes about 350 ns (Tclk=10 ns) but in post synthesis only 50 ns.
library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity moore_sqrt is port (clk : in std_logic; enable : in std_logic; input : in std_logic_vector (15 downto 0); data_ready : out std_logic; output : out std_logic_vector (31 downto 0) ); end moore_sqrt; architecture behavioral of moore_sqrt is  function division (x : std_logic_vector; y : std_logic_vector) return std_logic_vector is variable a1 : std_logic_vector(x'length1 downto 0):=x; variable b1 : std_logic_vector(y'length1 downto 0):=y; variable p1 : std_logic_vector(y'length downto 0):= (others => '0'); variable i : integer:=0; begin for i in 0 to y'length1 loop p1(y'length1 downto 1) := p1(y'length2 downto 0); p1(0) := a1(x'length1); a1(x'length1 downto 1) := a1(x'length2 downto 0); p1 := p1b1; if(p1(y'length1) ='1') then a1(0) :='0'; p1 := p1+b1; else a1(0) :='1'; end if; end loop; return a1; end division;  type state_type is (s0, s1, s2, s3, s4, s5, s6); type of state machine signal current_state,next_state: state_type; current and next state declaration signal xk : std_logic_vector (31 downto 0); signal temp : std_logic_vector (31 downto 0); signal latched_input : std_logic_vector (15 downto 0); signal iterations : integer := 0; signal max_iterations : integer := 10; corresponds with accuracy begin process (clk,enable) begin if enable = '0' then current_state <= s0; elsif clk'event and clk = '1' then current_state <= next_state; state change end if; end process; state machine process (current_state) begin case current_state is when s0 =>  reset output <= "00000000000000000000000000000000"; data_ready <= '0'; next_state <= s1; when s1 =>  latching input data latched_input <= input; next_state <= s2; when s2 =>  start calculating  initial value is set as a half of input data output <= "00000000000000000000000000000000"; data_ready <= '0'; xk <= "0000000000000000" & division(latched_input, "0000000000000010"); next_state <= s3; iterations <= 0; when s3 =>  division temp <= division ("0000" & latched_input & "000000000000", xk); next_state <= s4; when s4 =>  calculating if(iterations < max_iterations) then xk <= xk + temp; next_state <= s5; iterations <= iterations + 1; else next_state <= s6; end if; when s5 =>  shift logic right by 1 xk <= division(xk, "00000000000000000000000000000010"); next_state <= s3; when s6 =>  stop  proper data  output <= division(xk, "00000000000000000000000001000000"); the nearest integer value output <= xk;  fixed point 24.6, sqrt = output/64; data_ready <= '1'; end case; end process; end behavioral; 
I have only little experience with VHDL and I have no idea what can I do to fix problem. I tried to exclude other process which was for calculation but it also did not work. I attached printscreens of results behavioral and post synthesis simulations I hope you can help me. Platform: Zynq ZedBoard IDE: Vivado 2014.4 Regards, Michal
Posted on:

Michal wrote: > I used function to divide in vhdl from vhdlguru blogspot. Did you check how fast is that incredible bunch of logic you implemented there? Just to get an impression: have a look at the RTL schematic. > (Tclk=10 ns) I'm fairly sure that your huge division algorithm is much(!!!!) slower. Did you set a timing constraint for that clock? Just to be sure: run your timing simulation with a 1MHz clock...
:
Edited by Moderator
Posted on:

I've checked this before  time constraints were ok. I launch postsynthesis simulation with 1 MHz clock but result is the same as with 100 MHz.
Posted on:

The mismatch between the simulations is caused by the incomplete sensitivity list of your 2nd process. This process is full of latches and combinatorial loops which are hidden by the incomplete list in the behavioral simulation but will show up in the hardware implementation. The easiest way to fix the problem, is to get rid of the two process implementation of your state machine. Use only one clocked process. Once you have don this, the problem Lothar mentioned will show up, that is your design will only run at lower clock speeds.
Posted on:

Lattice User wrote: > This process is full of latches and combinatorial loops Ouch.... @Michal: the synthesizer must have shown a big bunch of messages you should analyze. To keep it short: you can neglect errors, warnings and even infos from the synthesizer only when you understand the according message fully and you want such a behaviour. For combinatorial loops try that with Google translator: http://www.lotharmiller.de/s9y/categories/36KombinatorischeSchleife And this here is your problem: http://www.lotharmiller.de/s9y/archives/43EinoderZweiProzessSchreibweisefuerFSM.html