Hi there I am a bit of a VHDL newbie, so please excuse this. I am using the following code to attempt to read an array of up to 12288 bits transmitted via AXI4 stream in 32 but chunks, I am using the TUSER channel to receive the number of useful bits transmitted in the accompanying DWORD, which can vary according to the convolution code in use (each dword is a convolution coded byte, containing usually between 9 and 32 bits, but can also contain between 1 and 32 padding bits). I am building a COFDM transmitter, and the block I am working on handles frequency interleaving, the nature of which means I need access to a complete symbol's worth of bits before transmission. The transmit side synthesizes fine, but the code which is causing problems is on the receive side - it takes an age to synthesize, several hours, to be more precise. The system is flexible, and configuration information is loaded by the controlling ARM core into AXI lite slave registers. Here's the code that appears to be causing the trouble: If I comment it the code synthesizes in good time. rx_bits := to_integer(unsigned(axis_in_tuser)); for i in 1 to 32 loop bit_buffer(rx_bit_buf_ptr) <= axis_in_tdata(i-1); rx_bit_buf_ptr := rx_bit_buf_ptr + 1; if i = rx_bits then exit; end if; end loop; Am I doing something silly? bit_buffer has length 2 * 12288 - 1 to facilitate the reception of a symbol during the transmission of another. I am running Vivado 2015.4 on a Quad-core Skylake Xeon laptop with 48G of DDR4 and a 1TB SSD, so processing speed should be respectable. I have attached the complete code for the block. Please note it has yet to be physically debugged. I have dispensed with the files generated for the master and slave AXI4 stream data ports, handling these in the top level with clock and reset ports shared between all interfaces. Some signals and definitions may be redundant, or provided solely for eventual debugging purposes. Best regards Geoff
Julian M. wrote: > I have attached the complete code for the block. Thank God that he gave us CTRL-C and CTRL-V! > bit_buffer has length 2 * 12288 - 1 to facilitate the reception of a > symbol during the transmission of another. When I look at the whole design I can give only one hint: pack your big register set in a DPRAM. At the moment you are generating logic to access each of the bit by some huge, vast multiplexers.
Thanks, Lothar, I know what you mean. I have been truly dumb! I come from a SW background, the mindset is different! Last time I designed logic it was cute little symbols on bits of graph paper. I guess a sheet of graph paper capable of covering the greater part of the Pacific ocean would be needed to implement what I was attempting! Can you give me any hint how to use DPRAM? Just a hint would do... Writing this down, I'm pretty sure I understand how to do it! I can use address translation to achieve any arbitrary scrambling function for the interleaving, since there is an obvious onto- mapping between input and output bits, this could be precomputed and also stored in block memory. The basic requirement is a throughput equal to the clock frequency, so it would need to be possible to access a word every clock, changing only the address. The maximum number of bits encoded in a single carrier is 6 (64-QAM). Six memory blocks, write interface (port A) 384 x 32 bits, read interface (port B) 12288 x 1 bit. I write the same (packed) input data into all six, and read one bit from each into each of six data bits. This allows for an arbitrary interleaving function, since they are addressed individually by the output of a further block RAM configured as lookup table, addressed sequentially by the logic, and programmed with the interleaving function. These bits are not used directly, but instead address another block RAM, configured also as a lookup table, 64 x 32 loaded with the real and imaginary parts of the modulation constellation map. I am warming to this concept! The design boils down to little more than a memory controller. The preceding convolution encoder might benefit from the same approach. The outer interleaver certainly! Many thanks indeed Kind regards, Geoff
Lothar, I must restate my thanks, your trigger has completely changed my way of approaching this kind of problem! Best wishes Geoff
The finished OFDM frequency interleaver/modulator, under test. The purpose is to construct OFDM symbols, modulated and frequency-interleaved, suitable for iFFT transformation into baseband I and Q signals. Input data is copied into six memory blocks simultaneously. In run mode, the carrier index is used to address six address-mapping BRAMs, which, for the purpose of interleaving in the frequancy domain, select arbitrary input bits from which an output word of up to six bits is constructed. This is then used to address a modulation BRAM which is loaded with the constellation map which outputs the I and Q values of the desired modulation function in a format compatible with the proprietry Xilinx FFT block. The buffer memory is loaded in x32, and output x1. It is split to enable simultaneous reception and transmission of input, and interleaved, modulated, output data. The controller implements an administrative mode in which all BRAMS are writeable (of course) and readable, hence the multiple buses passing through the controller. In normal operation these are routed directly from input to output. In principle, this can handle pretty much any configuration imaginable, although I claim no kudos for economy. It has a throughput equal to system clock frequency, and is arbitrarily stallable at both ends. The output logic is configurable to accept arbitrary series-connected lookup BRAMS, in this case, with primitive output registers enabled, the latency (programmable) in this case being six cycles. It works very nicely, many thanks! It is part of a much larger project, most of which is simple. However, I now have to make a viterbi decoder capable of sustaining system throughput. Best regards Geoff