EmbDev.net

Forum: FPGA, VHDL & Verilog Problem synthesizing in Vivado


von Julian M. (Company: Relevant Technologies Ltd) (geoffreym)



Rate this post
useful
not useful
Hi there

I am a bit of a VHDL newbie, so please excuse this. I am using the 
following code to attempt to read an array of up to 12288 bits 
transmitted via AXI4 stream in 32 but chunks, I am using the TUSER 
channel to receive the number of useful bits transmitted in the 
accompanying DWORD, which can vary according to the convolution code in 
use (each dword is a convolution coded byte, containing usually between 
9 and 32 bits, but can also contain between 1 and 32 padding bits).

I am building a COFDM transmitter, and the block I am working on handles 
frequency interleaving, the nature of which means I need access to a 
complete symbol's worth of bits before transmission. The transmit side 
synthesizes fine, but the code which is causing problems is on the 
receive side - it takes an age to synthesize, several hours, to be more 
precise. The system is flexible, and configuration information is loaded 
by the controlling ARM core into AXI lite slave registers.

Here's the code that appears to be causing the trouble: If I comment it 
the code synthesizes in good time.

            rx_bits := to_integer(unsigned(axis_in_tuser));
            for i in 1 to 32 loop
              bit_buffer(rx_bit_buf_ptr) <= axis_in_tdata(i-1);
              rx_bit_buf_ptr := rx_bit_buf_ptr + 1;
              if i = rx_bits then
                exit;
              end if;
            end loop;

Am I doing something silly? bit_buffer has length 2 * 12288 - 1 to 
facilitate the reception of a symbol during the transmission of another. 
I am running Vivado 2015.4 on a Quad-core Skylake Xeon laptop with 48G 
of DDR4 and a 1TB SSD, so processing speed should be respectable.

I have attached the complete code for the block. Please note it has yet 
to be physically debugged. I have dispensed with the files generated for 
the master and slave AXI4 stream data ports, handling these in the top 
level with clock and reset ports shared between all interfaces. Some 
signals and definitions may be redundant, or provided solely for 
eventual debugging purposes.

Best regards

Geoff

von Lothar M. (Company: Titel) (lkmiller) (Moderator)


Rate this post
useful
not useful
Julian M. wrote:
> I have attached the complete code for the block.
Thank God that he gave us CTRL-C and CTRL-V!

> bit_buffer has length 2 * 12288 - 1 to facilitate the reception of a
> symbol during the transmission of another.
When I look at the whole design I can give only one hint: pack your big 
register set in a DPRAM. At the moment you are generating logic to 
access each of the bit by some huge, vast multiplexers.

von Julian Geoffrey Mortimer (Guest)


Rate this post
useful
not useful
Thanks, Lothar, I know what you mean. I have been truly dumb! I come 
from a SW background, the mindset is different! Last time I designed 
logic it was cute little symbols on bits of graph paper. I guess a sheet 
of graph paper capable of covering the greater part of the Pacific ocean 
would be needed to implement what I was attempting! Can you give me any 
hint how to use DPRAM? Just a hint would do...

Writing this down, I'm pretty sure I understand how to do it! I can use 
address translation to achieve any arbitrary scrambling function for the 
interleaving, since there is an obvious onto- mapping between input and 
output bits, this could be precomputed and also stored in block memory. 
The basic requirement is a throughput equal to the clock frequency, so 
it would need to be possible to access a word every clock, changing only 
the address.

The maximum number of bits encoded in a single carrier is 6 (64-QAM). 
Six memory blocks, write interface (port A) 384 x 32 bits, read 
interface (port B) 12288 x 1 bit. I write the same (packed) input data 
into all six, and read one bit from each into each of six data bits. 
This allows for an arbitrary interleaving function, since they are 
addressed individually by the output of a further block RAM configured 
as lookup table, addressed sequentially by the logic, and programmed 
with the interleaving function. These bits are not used directly, but 
instead address another block RAM, configured also as a lookup table, 64 
x 32 loaded with the real and imaginary parts of the modulation 
constellation map. I am warming to this concept! The design boils down 
to little more than a memory controller.


The preceding convolution encoder might benefit from the same approach. 
The outer interleaver certainly!

Many thanks indeed
Kind regards,
Geoff

von Julian Geoffrey Mortimer (Guest)


Rate this post
useful
not useful
Lothar, I must restate my thanks, your trigger has completely changed my 
way of approaching this kind of problem!

Best wishes
Geoff

von Julian M. (Company: Relevant Technologies Ltd) (geoffreym)


Attached files:

Rate this post
useful
not useful
The finished OFDM frequency interleaver/modulator, under test.

The purpose is to construct OFDM symbols, modulated and 
frequency-interleaved, suitable for iFFT transformation into baseband I 
and Q signals. Input data is copied into six memory blocks 
simultaneously. In run mode, the carrier index is used to address six 
address-mapping BRAMs, which, for the purpose of interleaving in the 
frequancy domain, select arbitrary input bits from which an output word 
of up to six bits is constructed. This is then used to address a 
modulation BRAM which is loaded with the constellation map which outputs 
the I and Q values of the desired modulation function in a format 
compatible with the proprietry Xilinx FFT block. The buffer memory is 
loaded in x32, and output x1. It is split to enable simultaneous 
reception and transmission of input, and interleaved, modulated, output 
data.

The controller implements an administrative mode in which all BRAMS are 
writeable (of course) and readable, hence the multiple buses passing 
through the controller. In normal operation these are routed directly 
from input to output.

In principle, this can handle pretty much any configuration imaginable, 
although I claim no kudos for economy. It has a throughput equal to 
system clock frequency, and is arbitrarily stallable at both ends. The 
output logic is configurable to accept arbitrary series-connected lookup 
BRAMS, in this case, with primitive output registers enabled, the 
latency (programmable) in this case being six cycles.

It works very nicely, many thanks! It is part of a much larger project, 
most of which is simple. However, I now have to make a viterbi decoder 
capable of sustaining system throughput.

Best regards
Geoff

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.