EmbDev.net

Forum: FPGA, VHDL & Verilog PCI Express data grabber


von Christoph K. (cklein)


Rate this post
useful
not useful
Hi everybody!

I have the task of developing kind of a data recorder for my final year 
project. Due to the expected high data rates (~300MByte per sec) I plan 
to build a PCIe card  (with an Altera FPGA) which samples the analog 
input signals and transmits the data DIRECTLY to a RAID controller.
PCIe is capable of doing point-to-point transfers, but I found no 
information if it is possible to access the RAID directly (without the 
host CPU). Altera provides several reference designs for PCIe, but they 
only transfer data between the PCIe card's memory and the system memory 
(RAM).
I would be very glad if somebody could shed some light on this.


kind regards
Chris

von Andreas (Guest)


Rate this post
useful
not useful
Hi Cris,

as far as I know, there is no OS-specific implementations of transfering 
data between an IO-Card (your Grabber) and RAID (storage) systems.

So the way a raidcontroller is built is generic, (Adressmapping of 
Datafifo, DMA and Interrupt(registers)).
There was some standard in the past (I2O) but as I know its slept away.

You could get datasheets(mostly under NDA) of an specific 
raidcontroller, unload it from OS-control, bring in your own 
Interrupt-Service Routine (for the raidsystem). (Host-CPU might have to 
handle it since you will not get the interrupt information on your PCIe 
Card) and built up a very propiertary system, or

use standard components with your minimum requirements:

A nowerdays powerful PC-System should be able to stream almost 1GB/sec. 
from IO-card to mainmemory and back to storagesystem over PCIe if both 
are populted by at least PCIex4 and the Mainboard is capable to run 2 
x4Cards(channnels) at the same time.

The only thing is that you have to be aware of big latencies (so you 
must be capable to buffer a few (tens) of millisecounds on your PCIe 
Card and you have to deal with scatter/gather in order to use and 
lock/free normal pooled memory to stream your data in the most effective 
way.

regards

Andreas

von Christoph K. (cklein)


Rate this post
useful
not useful
thanks for the reply, Andreas

I've been affraid that it won't be easy (just writing to some memory 
location...).
The reason why I wanted to transmit the data directly is because I 
didn't want wo write software (driver + app.) too.
But it seems I have no choice, and will have to go with the 
data->memory->disk aproach.

What exactly do you mean by "...deal with scatter/gather..."?
A DMA can access only physical addresses, but not virtual addresses, and 
thus some sort of address translation has to be done. Is that what you 
mean?

Are there any other possibilities to solve this problem?


kind regards
Chris

von Andreas (Guest)


Rate this post
useful
not useful
Hi Chris,

So for writing to some adresses inside the PC, you have to be a PCI(e) 
Busmaster which is somewhat more complicate than beeing slave device 
only. But if you like to handle datas in tens to hundrets of MB/sec. 
there is no way out...

Scatter/Gather means you habe to deal with a map of more of less small 
memory regions instead of one linear memorymap.
You(re driver) must also deal between linear and physical adresses, 
while your PCIe device is dealing with physical adresses only...
There are functions for translating physical<->virtual for drivers in 
every OS.

Some more backgrounds:
One way to get big datathrouput in windows for example is to understand 
the concept of direct-show which is streamoriented:
Oyu built up an graph: your device -> filewriter, then
you will tell that you like to transfer data. So you ask for buffers ( 
as many as you need, the OS could delivery you). You will then get a 
memory list (down to 4kB Blocks physically). After some buffers are 
filled, your driver will tell the OS that some buffers are filled and 
you give back this buffers to the system and ask for some more.
Then the OS will write the data to the file and will free the buffers.

For Linux there are some differences (possible) but after all, this 
might be the most powerful model over all, to handle maximum 
performence, since the OS could optimize Harddisk Acess by resorting 
writeaccesses to the hardware and deal with huge latencies and short ( 
or longer ) concurrent harddisk accesses...

regards

Andreas

von Christoph K. (cklein)


Rate this post
useful
not useful
Andreas wrote:
> Hi Chris,
>
> So for writing to some adresses inside the PC, you have to be a PCI(e)
> Busmaster which is somewhat more complicate than beeing slave device
> only. But if you like to handle datas in tens to hundrets of MB/sec.
> there is no way out...

I thougth PCIe isn't a shared bus like PCI, and there is no Busmaster 
anymore (only switches)? You say it isn't possible for an PCIe endpoint 
to achive thougputs of 300MB/s (with >= 2 lanes)? What about those high 
performance graphic cards? Are they Busmasters too?


regards
Chris

von Thomas R. (Company: abaxor engineering) (abaxor)


Rate this post
useful
not useful
Hi Chris,

why do you add additional battlefields (like PCIe, operating system, 
drivers) to your project? Look for an appropriate RAID system and 
control it from your FPGA directly. I would expect, there are RAID 
systems understanding the ATA command set. ATA can be handled in an FPGA 
easily. Furthermore, your FPGA may have transceivers for S-ATA.
Exploit the parallism your FPGA offers and avoid bottlenecks like 
processors.

Bye Tom

von Andreas (Guest)


Rate this post
useful
not useful
Hi Chris,

PCIe isnt a shared bus, you are right.
But in sight of your FPGA you are dealing still with Initiating a 
busaccess. So from sight of your FPGA there is no really a change of 
architecture.
In deed every endpoint could be an Iniator. You might mix some things 
up.
Routecomplex and switches are responsible for holding the things 
together.
There have some intelligence to forward bus access from one side to 
enougher and to store some portions of data to avoid single word 
acesses.

There are many issues, why Im not talking about 2Lanes. The most 
importend is that supporting PCIe Lanes is not mandatory for PCIe1-Gen1.
So in real there are many systems which will fall back to x1 Operation.
Secound issue is that you will get absolute maximum theoretical 
throughput of 400MB/sec (because of Bitcoding on the Lane). in x2 
PCIe-Gen1 with infinite burstlength. You will see much less, since 
buffers in switches are rather small... May be you could tune your 
specific system to archive >300MB/sec. For unknown systems you might run 
in trouble by quaranty such throughput on two lanes.

For example graficcards are archiving its throughput by its 
Initiatorcapablity and by using up to 16 Lanes. Also it is very common 
for graphiccards to use PCIe_Gen2 with 5Gbit/sec. per Lane.

@Thomas
if there are ATA compatible RAIDS out there he might have an easier 
solution for writing directly to the raid-device. But readingback from 
that device must also controlled...
As long, the raid is plugged to the computer PCIe-Slot it might be 
easier to write some driver (your fpga-vendor will provide you a usable 
windriver exampledriver) than to write some filesystem (might be as easy 
as sector = sector+1) and a propietary driver an ATA Initiator inside 
the FPGA (or using a CPU inside the FPGA) and proprietary GUI?!? When 
developing with windows, he might make his job, by providing some 
standard capturedevice and the rest is done by OS and free Tools. 
(Filtergraph).
Providing PCIe without a PC-Mainboard is not easy ( you cant use the 
provided mfg-hard IPs, while hard-IP isnt playing Root-Complex ).
Also provding SATA diretly from the FPGA isnt very easy, since there is 
absolutly no free IP outside and playing with 2,5Gbit Signals in the fog 
could end in frustration...

regards

Andreas

von Thomas R. (Company: abaxor engineering) (abaxor)


Rate this post
useful
not useful
> if there are ATA compatible RAIDS out there he might have an easier
> solution for writing directly to the raid-device. But readingback from
> that device must also controlled...
Yes, attach the RAID to a PC directly. dd is your friend. You know the 
structure of your data.

> As long, the raid is plugged to the computer PCIe-Slot it might be
> easier to write some driver (your fpga-vendor will provide you a usable
> windriver exampledriver) than to write some filesystem
You don't need a file system. Each time you have data to record, start 
at sector 0 and write. Some years ago I developed a similar system, 
which recorded data on 16 CF-Cards in the same way.

.
> Providing PCIe without a PC-Mainboard is not easy ( you cant use the
> provided mfg-hard IPs, while hard-IP isnt playing Root-Complex ).
If a direct connection to the RAID is used, PCIe isn't necessary. But 
S-ATA, you are right.

> Also provding SATA diretly from the FPGA isnt very easy, since there is
> absolutly no free IP outside and playing with 2,5Gbit Signals in the fog
> could end in frustration...

Is there any free PCIe IP outside? I don't know.

Bye Tom

von Christoph K. (cklein)


Rate this post
useful
not useful
I would really like to go with an "FPGA only" solution, but 
unfortunately I haven't found any RAID controller supporting the ATA 
command set yet.
And like Andreas said SATA(II) IP cores don't come for free and writing 
it myself is to complex.

Another reason why I prefer PCIe solution is that the recorded data has 
to analyzed with MATLAB for example. If the data grabber resides in the 
computer where the data will be analyzed I don't have to copy the huge 
amount of data or remove the media from the data grabber...

I took a closer look at Altera's development kits and found out that the 
"Stratix IV GX" dev. kit already supports PCIe 2.0 with up to eight 
lanes. So the required throuput should be easily achievable?


regards
Chris

von Christoph K. (cklein)


Rate this post
useful
not useful
Thomas Reinemann wrote:
> Yes, attach the RAID to a PC directly. dd is your friend. You know the
> structure of your data.
what exactly do you mean by "directly"? Can I attach a RAID controller 
indirectly too?

> You don't need a file system. Each time you have data to record, start
> at sector 0 and write. Some years ago I developed a similar system,
> which recorded data on 16 CF-Cards in the same way.
how have the CompactFlash cards been connected? In parallel, 
multiplexed...? What thruput did you achive?

> Is there any free PCIe IP outside? I don't know.
depends on the FPGA you're using. If the FPGA provides only the high 
speed transceivers than the IP usually isn't for free, but if the FPGA 
contains a PCIe hard macro than the IP is free (at least Altera's)


regards
Chris

von Thomas R. (ruschi)


Rate this post
useful
not useful
AFAIK for Altera only the Stratix and Arria GX family have PCIe 
hard-cores included (which are quite costly) - however the new Xilinx 
Spartan-6 has PCIe (at least I believe to have seen a commercial).
Keep in mind - just having a digital IP core is by far not enough, the 
physical layer of PCIe is a non-trivial part!
correct timing and thus signal path length is crucial.
I think is is a Master-/Diplom- on its own to design a PCIe card....

von Thomas R. (Company: abaxor engineering) (abaxor)


Rate this post
useful
not useful
Christoph Klein wrote:
> Thomas Reinemann wrote:
>> Yes, attach the RAID to a PC directly. dd is your friend. You know the
>> structure of your data.
> what exactly do you mean by "directly"? Can I attach a RAID controller
> indirectly too?
Yes via PCIe, the CPU and OS, as you suggested.


>> You don't need a file system. Each time you have data to record, start
>> at sector 0 and write. Some years ago I developed a similar system,
>> which recorded data on 16 CF-Cards in the same way.
> how have the CompactFlash cards been connected? In parallel,
> multiplexed...?
Parallel, we had 4 boards each 5 FPGAs (Spartan3), one Master 4 Slaves. 
Each slave received the data from a pre-processing card and wrote it to 
its CF card.

> What thruput did you achive?
I don't now the value exactly, but some ten MB/s. The FPGA was faster 
than the CF-Card. But we ran in trouble, because the CF cards made a nap 
each minute for about 40 ms.


Bye Tom

von Thomas R. (Company: abaxor engineering) (abaxor)


Rate this post
useful
not useful
Thomas Ruschival wrote:
> AFAIK for Altera only the Stratix and Arria GX family have PCIe
> hard-cores included (which are quite costly) - however the new Xilinx
> Spartan-6 has PCIe (at least I believe to have seen a commercial).
> Keep in mind - just having a digital IP core is by far not enough, the
> physical layer of PCIe is a non-trivial part!
> correct timing and thus signal path length is crucial.
> I think is is a Master-/Diplom- on its own to design a PCIe card....

I completely agree. This project needs three experienced engineers, a 
board designer, an FPGA Designer and a software designer, if the PCIe, 
CPU, OS approach is followed.

Or a lot of time:-).

To Chris:

> I took a closer look at Altera's development kits and found out that the
> "Stratix IV GX" dev. kit already supports PCIe 2.0 with up to eight
> lanes. So the required throuput should be easily achievable?
And you really believe, you are able to design a PCIe 2.0 board. If yes, 
you are very innocent.

Where will you the PCIe 2.0 PC buy?

Bye Tom

von Christoph K. (cklein)


Rate this post
useful
not useful
Hi Thomas,

Of course I'm not going to design the board myself (guess you can't do 
that with EAGLE ;-)). I would like to use Altera's Stratix IV GX dev. 
kit which contains 2 PCIe Gen.2 hard macros (at least according to the 
reference guide on page 1-2 available at 
http://www.altera.com/literature/manual/rm_sivgx_fpga_dev_board.pdf).
A quick search for MOBOs with PCIe Gen2 slots returned for example the 
"P6T7 WS SuperComputer" board from Asus. This one may not be cheap 
(~350€), but I bet there are more.
Of cource I would prefer an easier solution, but it seems there is no 
other (and I don't know how to connect the RAID 'directly').


kind regards
Chris

von Andreas (Guest)


Rate this post
useful
not useful
@Chris,

if your Board support 8 Lanes and you board has capable to run it at 4 
Lanes mininum in your constelation than you really dont need an 
PCIe-Gen2 Mainboard. Your FPGA-Board will fall back to PCIe-Gen1 which 
is good for somewhat beyond 1GB/sec.Should be ok, so far.

If you start with such Evalboard than you must not start from Scratch. 
The board will be delivered with samplecode (FPGA), some driver plus 
source for some read/write example and hopefully with some DMA demo.

By the way. It is possible to design an PCIe Board with Eagle. It is not 
recomended since your software will not support you with HF 
Designrulecheck but, you could calculate the parameters by hand and 
control yourself.
Highspeed Designs were routed with rubbersymbols, long time before the 
first Layoutsoftware is seen in the wilderness.

regards

Andreas

von Christoph K. (cklein)


Rate this post
useful
not useful
Hi Andreas,

If PCIe Gen.1 is sufficient for this task I could use the "Arria II GX" 
dev. Kit 
(http://www.altera.com/products/devkits/altera/kit-aiigx-pcie.html) 
which also includes a hard macro, but is much 'cheaper' than the 
"Stratix IV GX" dev. kit. On the other hand the Arria II GX provides 
much less logic elements etc. than the Stratix IV. The Stratix IV GX 
dev. kit on the other hand has no 1GB DDR2 SODIMM like the Arria II GX 
dev. kit. (for buffering the data) 'only' 512MB of DDR3 (onboard).
You're right I needn't start from scratch when using one of these dev. 
kits. Altera provides the "PCI Express High-Performance Reference 
Design" 
(http://www.altera.com/support/refdesigns/ip/interface/ref-pciexpress-hp.html) 
which seems to be exactly what I need.


kind regards
Chris

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.