EmbDev.net

Forum: FPGA, VHDL & Verilog Signed Addition overflow in VHDL


Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:

Rate this post
0 useful
not useful
Hi everyone,

I tried to implement a VHDL program that add two signed numbers.
The description of the algorithm is as follows, I receive a signed 32 
bits value, let's say A. This value will be added to the previous 
addition and then the result will be: Result_now= A + Result_before.
So, the first thing i do is to resize A and Result_before to be 33 bits, 
in order to avoid overflow, Result_now is 33 bits.
I create a test bench to test my code, but i face a strange problem, the 
result is not as expected, for example when i add the value -1,26562 to 
8.06250, I get 1.49691e-038.
can you help to resolve this problem please ?

The codes are below:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

Library altera_mf;
USE altera_mf.all;
-- Add the library and use clauses before the design unit declaration
 
library altera;
use altera.altera_primitives_components.all;

Entity Sum_Position is 
   Generic ( Accu_lenght : integer  -- in µs 
             ); 
  port 
  
  (
    Clk: in std_logic;
    Reset: in std_logic;
    Raz_position: in std_logic;
    Position_In: in std_logic_vector(Accu_lenght-1 downto 0);
    Position_Out: out std_logic_vector(Accu_lenght-1 downto 0)
  );
end Sum_Position;


Architecture Arch_position of sum_Position is 

signal position_before: signed (Accu_lenght-1 downto 0):= (OTHERS => '0');

-- both signals have one more bit than the original
signal Position_s   : SIGNED(Accu_lenght downto 0):= (OTHERS => '0');
signal Position_Before_s   : SIGNED(Accu_lenght downto 0):= (OTHERS => '0');
signal Sum_Pos_s : SIGNED(Accu_lenght downto 0):= (OTHERS => '0');
signal temp        : std_logic_vector(2 downto 0):= (OTHERS => '0');

Begin  -- begin of architecture

-- convert type and perform a sign-extension

Position_s <=resize(signed(Position_In), Position_s'length);
Position_Before_s <= resize(signed(position_before), Position_Before_s'length);

Sum_of_position: process(Clk, Reset) 

begin 
  
  IF (Reset='0') THEN        -- when reset is selected
  -- initialize all values 
   Sum_Pos_s<= (OTHERS => '0');
  ELSIF (Clk'event and Clk = '1') then
     -- addition of two 33 bit values
  Sum_Pos_s <= Position_s + Position_Before_s;

  END IF;  

end process Sum_of_position;

-- resize to require size and type conversion
position_before <= (OTHERS => '0') WHEN Raz_position='1' else  -- Reset to zero when Raz_position='1'
                                         signed(resize(Sum_Pos_s, position_before'length));
Position_Out  <= (OTHERS => '0') WHEN Raz_position='1' else  -- Reset to zero when Raz_position='1'
               std_logic_vector(resize(Sum_Pos_s, Position_Out'length));
end Arch_position;

And the test Bench is this one:
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;

ENTITY TBH IS
END TBH;

ARCHITECTURE TBH_ARCH OF TBH IS
COMPONENT Sum_Position                          -- to declare the block of ADC_CNL
  Generic ( Accu_lenght : integer  -- in µs 
        ); 
  
  PORT(
    Clk: in std_logic;
    Reset: in std_logic;
    Raz_position: in std_logic;
    Position_In: in std_logic_vector(Accu_lenght-1 downto 0);
    Position_Out: out std_logic_vector(Accu_lenght-1 downto 0)
  );
END COMPONENT;

CONSTANT Accu_size: integer := 32;
SIGNAL CLK   : STD_LOGIC := '1';

SIGNAL RESET   : STD_LOGIC := '0';
SIGNAL Raz : STD_LOGIC := '0';
SIGNAL Position_In  : STD_LOGIC_VECTOR(Accu_size-1 DOWNTO 0);
SIGNAL Position_Out  : STD_LOGIC_VECTOR(Accu_size-1 DOWNTO 0);

BEGIN
--to define the signal and the block's relationship
Position : Sum_Position generic map (Accu_size)PORT MAP(
  Clk   => CLK,                       --COMPONENT PORT => ACTUAL SIGNAL
  Reset => RESET,
  Raz_position        => Raz,
  Position_In  => Position_In,
  Position_Out => Position_Out
);

  PROCESS(CLK)        --to produce CLK
  BEGIN 
    CLK <= NOT CLK AFTER 10 NS;
  END PROCESS;
  
  PROCESS             --to simulation the signal the AD timing
  BEGIN
    Position_In <= (OTHERS => '0');
    WAIT FOR 20 NS;
    RESET <='1';
    WAIT FOR 20 NS;
    Position_In <= X"bfa1ffd6";  -- -1,26562
    WAIT FOR 20 NS;
    Position_In <= X"41010000";  --8,0625
    WAIT FOR 20 NS;
    Position_In <= X"bf31003f";  -- -0,69141
    WAIT FOR 20 NS;
    Position_In <= X"3fb9ffd6";  -- +1,45312
    WAIT FOR 20 NS;
    Position_In <= X"c0b80000";  -- -5,75
    WAIT FOR 20 NS;
    Position_In <= X"c0f10000";  -- -7,53125
    WAIT  ;--100 NS;
  END PROCESS;
END TBH_ARCH;


Thank you for taking time to help me.
Best regards,

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
jeorges F. wrote:
> for example when i add the value -1,26562 to 8.06250, I get 1.49691e-038.
With the + operator on singed vectors you do not add any float values, 
but instead you add twos-complement integer values.
And it seems to me you have some kind of very strange own number format. 
This here looks like the MSB alone is the negative sign:
X"bfa1ffd6";  -- -1,26562
X"3fb9ffd6";  -- +1,45312
The binary representation of those float values look almost the same, 
but only the MSB is set and the whole value gets negative.
Thats not how two's-complement binary numbers work!!! And therefore you 
cannot use a two's-complement addition to add your own number format!

So: whats your numbers format?
How do you calculate the binary representation of those float numbers?

> for example when i add the value -1,26562 to 8.06250, I get 1.49691e-038.
Can you show those numbers in a test bench waveform?
Or are those numbers only in your head or on a sheet of paper?

> can you help to resolve this problem please ?
The compiler does not read comments. It just does what the sourcecode 
tells him. So it looks very like the compiler does interpret X"bfa1ffd6" 
different from you...

Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:
Attached files:

Rate this post
0 useful
not useful
Thank you Lothar for your answer.
In fact my input comes from a Nios processor. The nios does computing in 
float, so the Position_In is float inside Nios, then it's converted to 
Integer 32 bits. It's arranged in my variable std_logic_vector (31 
downto 0).

To have the equivalent from hex to float number, i use the Floating 
Point to Hex Converter.

You find enclosed the test bench waveform, the values are represented in 
Hex, but when i change the radix, i get the correct float equivalent.

Best regards

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
jeorges F. wrote:
> To have the equivalent from hex to float number, i use the Floating
> Point to Hex Converter.
Of course you cannot add two float numbers with an 
two's-complement-adder (what you are doing when adding two signed 
vectors).

> but when i change the radix, i get the correct float equivalent.
Read a few lines about IEEE754 then you will obviously see that you 
never simply can add two float values this way. You will need to 
normalize the two
float values, then you can perform the addition, then you must convert 
it back to a valid float number.

Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:

Rate this post
0 useful
not useful
Ok, i see, thank you Lothar.
In my C code, i do this:
/* Variable for position */
union position
{
   alt_32   I_position;
    float      F_position;
};
When i calculate the position, I put it into F_position, then i can have 
the decimal value in I_position.
The I_position is then exploited as signed in my VHDL code. So 
normally,i manipulate decimal representation, It's correct ?

Thanks in advance,

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
jeorges F. wrote:
> alt_32
This is a 32 bit integer data type?

> When i calculate the position, I put it into F_position, then i can have
> the decimal value in I_position.
No chance! A union does not convert anything! This way you tell the 
compiler only to look at the very same bit pattern in a different 
manner.

Just as an example:
The bit pattern  0xbf31003f is the float number -0,69141
The same pattern 0xbf31003f is the integer number 3207659583
But -0,69141 is a -1 when rounded to an integer...
Do you see the problem?

: Edited by Moderator
Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:

Rate this post
0 useful
not useful
alt_32 = typedef signed long.
Yes i see the problem, thank you. But, what's the best approach to 
manipulate my float data ?

Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:

Rate this post
0 useful
not useful
Hi Lothar,

I understand the issue Now, thank you very much for your explanation.
Now,  I have to focus on the implementation of floating addition.

Do you know some links where i can find examples or documentation about 
this subject?

Thanks in advance,

Best regards,

Author: Lothar Miller (lkmiller) (Moderator)
Posted on:

Rate this post
0 useful
not useful
jeorges F. wrote:
> But, what's the best approach to manipulate my float data ?
Activate that thing between the ears...  ;-)
If thats not desired then try google:
This here: https://www.google.de/?#q=float+addition+bitwise
Gets you: 
https://www.cs.umd.edu/class/sum2003/cmsc311/Notes...

Author: jeorges FrenchRivera (Company: xlue) (khal1985)
Posted on:

Rate this post
0 useful
not useful
Thank you very much.
Best regards;

Reply

Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [vhdl]VHDL code[/vhdl]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]




Bild automatisch verkleinern, falls nötig