Hi everybody!
I have the task of developing kind of a data recorder for my final year
project. Due to the expected high data rates (~300MByte per sec) I plan
to build a PCIe card (with an Altera FPGA) which samples the analog
input signals and transmits the data DIRECTLY to a RAID controller.
PCIe is capable of doing point-to-point transfers, but I found no
information if it is possible to access the RAID directly (without the
host CPU). Altera provides several reference designs for PCIe, but they
only transfer data between the PCIe card's memory and the system memory
(RAM).
I would be very glad if somebody could shed some light on this.
kind regards
Chris
Hi Cris,
as far as I know, there is no OS-specific implementations of transfering
data between an IO-Card (your Grabber) and RAID (storage) systems.
So the way a raidcontroller is built is generic, (Adressmapping of
Datafifo, DMA and Interrupt(registers)).
There was some standard in the past (I2O) but as I know its slept away.
You could get datasheets(mostly under NDA) of an specific
raidcontroller, unload it from OS-control, bring in your own
Interrupt-Service Routine (for the raidsystem). (Host-CPU might have to
handle it since you will not get the interrupt information on your PCIe
Card) and built up a very propiertary system, or
use standard components with your minimum requirements:
A nowerdays powerful PC-System should be able to stream almost 1GB/sec.
from IO-card to mainmemory and back to storagesystem over PCIe if both
are populted by at least PCIex4 and the Mainboard is capable to run 2
x4Cards(channnels) at the same time.
The only thing is that you have to be aware of big latencies (so you
must be capable to buffer a few (tens) of millisecounds on your PCIe
Card and you have to deal with scatter/gather in order to use and
lock/free normal pooled memory to stream your data in the most effective
way.
regards
Andreas
thanks for the reply, Andreas
I've been affraid that it won't be easy (just writing to some memory
location...).
The reason why I wanted to transmit the data directly is because I
didn't want wo write software (driver + app.) too.
But it seems I have no choice, and will have to go with the
data->memory->disk aproach.
What exactly do you mean by "...deal with scatter/gather..."?
A DMA can access only physical addresses, but not virtual addresses, and
thus some sort of address translation has to be done. Is that what you
mean?
Are there any other possibilities to solve this problem?
kind regards
Chris
Hi Chris,
So for writing to some adresses inside the PC, you have to be a PCI(e)
Busmaster which is somewhat more complicate than beeing slave device
only. But if you like to handle datas in tens to hundrets of MB/sec.
there is no way out...
Scatter/Gather means you habe to deal with a map of more of less small
memory regions instead of one linear memorymap.
You(re driver) must also deal between linear and physical adresses,
while your PCIe device is dealing with physical adresses only...
There are functions for translating physical<->virtual for drivers in
every OS.
Some more backgrounds:
One way to get big datathrouput in windows for example is to understand
the concept of direct-show which is streamoriented:
Oyu built up an graph: your device -> filewriter, then
you will tell that you like to transfer data. So you ask for buffers (
as many as you need, the OS could delivery you). You will then get a
memory list (down to 4kB Blocks physically). After some buffers are
filled, your driver will tell the OS that some buffers are filled and
you give back this buffers to the system and ask for some more.
Then the OS will write the data to the file and will free the buffers.
For Linux there are some differences (possible) but after all, this
might be the most powerful model over all, to handle maximum
performence, since the OS could optimize Harddisk Acess by resorting
writeaccesses to the hardware and deal with huge latencies and short (
or longer ) concurrent harddisk accesses...
regards
Andreas
Andreas wrote:
> Hi Chris,>> So for writing to some adresses inside the PC, you have to be a PCI(e)> Busmaster which is somewhat more complicate than beeing slave device> only. But if you like to handle datas in tens to hundrets of MB/sec.> there is no way out...
I thougth PCIe isn't a shared bus like PCI, and there is no Busmaster
anymore (only switches)? You say it isn't possible for an PCIe endpoint
to achive thougputs of 300MB/s (with >= 2 lanes)? What about those high
performance graphic cards? Are they Busmasters too?
regards
Chris
Hi Chris,
why do you add additional battlefields (like PCIe, operating system,
drivers) to your project? Look for an appropriate RAID system and
control it from your FPGA directly. I would expect, there are RAID
systems understanding the ATA command set. ATA can be handled in an FPGA
easily. Furthermore, your FPGA may have transceivers for S-ATA.
Exploit the parallism your FPGA offers and avoid bottlenecks like
processors.
Bye Tom
Hi Chris,
PCIe isnt a shared bus, you are right.
But in sight of your FPGA you are dealing still with Initiating a
busaccess. So from sight of your FPGA there is no really a change of
architecture.
In deed every endpoint could be an Iniator. You might mix some things
up.
Routecomplex and switches are responsible for holding the things
together.
There have some intelligence to forward bus access from one side to
enougher and to store some portions of data to avoid single word
acesses.
There are many issues, why Im not talking about 2Lanes. The most
importend is that supporting PCIe Lanes is not mandatory for PCIe1-Gen1.
So in real there are many systems which will fall back to x1 Operation.
Secound issue is that you will get absolute maximum theoretical
throughput of 400MB/sec (because of Bitcoding on the Lane). in x2
PCIe-Gen1 with infinite burstlength. You will see much less, since
buffers in switches are rather small... May be you could tune your
specific system to archive >300MB/sec. For unknown systems you might run
in trouble by quaranty such throughput on two lanes.
For example graficcards are archiving its throughput by its
Initiatorcapablity and by using up to 16 Lanes. Also it is very common
for graphiccards to use PCIe_Gen2 with 5Gbit/sec. per Lane.
@Thomas
if there are ATA compatible RAIDS out there he might have an easier
solution for writing directly to the raid-device. But readingback from
that device must also controlled...
As long, the raid is plugged to the computer PCIe-Slot it might be
easier to write some driver (your fpga-vendor will provide you a usable
windriver exampledriver) than to write some filesystem (might be as easy
as sector = sector+1) and a propietary driver an ATA Initiator inside
the FPGA (or using a CPU inside the FPGA) and proprietary GUI?!? When
developing with windows, he might make his job, by providing some
standard capturedevice and the rest is done by OS and free Tools.
(Filtergraph).
Providing PCIe without a PC-Mainboard is not easy ( you cant use the
provided mfg-hard IPs, while hard-IP isnt playing Root-Complex ).
Also provding SATA diretly from the FPGA isnt very easy, since there is
absolutly no free IP outside and playing with 2,5Gbit Signals in the fog
could end in frustration...
regards
Andreas
> if there are ATA compatible RAIDS out there he might have an easier> solution for writing directly to the raid-device. But readingback from> that device must also controlled...
Yes, attach the RAID to a PC directly. dd is your friend. You know the
structure of your data.
> As long, the raid is plugged to the computer PCIe-Slot it might be> easier to write some driver (your fpga-vendor will provide you a usable> windriver exampledriver) than to write some filesystem
You don't need a file system. Each time you have data to record, start
at sector 0 and write. Some years ago I developed a similar system,
which recorded data on 16 CF-Cards in the same way.
.
> Providing PCIe without a PC-Mainboard is not easy ( you cant use the> provided mfg-hard IPs, while hard-IP isnt playing Root-Complex ).
If a direct connection to the RAID is used, PCIe isn't necessary. But
S-ATA, you are right.
> Also provding SATA diretly from the FPGA isnt very easy, since there is> absolutly no free IP outside and playing with 2,5Gbit Signals in the fog> could end in frustration...
Is there any free PCIe IP outside? I don't know.
Bye Tom
I would really like to go with an "FPGA only" solution, but
unfortunately I haven't found any RAID controller supporting the ATA
command set yet.
And like Andreas said SATA(II) IP cores don't come for free and writing
it myself is to complex.
Another reason why I prefer PCIe solution is that the recorded data has
to analyzed with MATLAB for example. If the data grabber resides in the
computer where the data will be analyzed I don't have to copy the huge
amount of data or remove the media from the data grabber...
I took a closer look at Altera's development kits and found out that the
"Stratix IV GX" dev. kit already supports PCIe 2.0 with up to eight
lanes. So the required throuput should be easily achievable?
regards
Chris
Thomas Reinemann wrote:
> Yes, attach the RAID to a PC directly. dd is your friend. You know the> structure of your data.
what exactly do you mean by "directly"? Can I attach a RAID controller
indirectly too?
> You don't need a file system. Each time you have data to record, start> at sector 0 and write. Some years ago I developed a similar system,> which recorded data on 16 CF-Cards in the same way.
how have the CompactFlash cards been connected? In parallel,
multiplexed...? What thruput did you achive?
> Is there any free PCIe IP outside? I don't know.
depends on the FPGA you're using. If the FPGA provides only the high
speed transceivers than the IP usually isn't for free, but if the FPGA
contains a PCIe hard macro than the IP is free (at least Altera's)
regards
Chris
AFAIK for Altera only the Stratix and Arria GX family have PCIe
hard-cores included (which are quite costly) - however the new Xilinx
Spartan-6 has PCIe (at least I believe to have seen a commercial).
Keep in mind - just having a digital IP core is by far not enough, the
physical layer of PCIe is a non-trivial part!
correct timing and thus signal path length is crucial.
I think is is a Master-/Diplom- on its own to design a PCIe card....
Christoph Klein wrote:
> Thomas Reinemann wrote:>> Yes, attach the RAID to a PC directly. dd is your friend. You know the>> structure of your data.> what exactly do you mean by "directly"? Can I attach a RAID controller> indirectly too?
Yes via PCIe, the CPU and OS, as you suggested.
>> You don't need a file system. Each time you have data to record, start>> at sector 0 and write. Some years ago I developed a similar system,>> which recorded data on 16 CF-Cards in the same way.> how have the CompactFlash cards been connected? In parallel,> multiplexed...?
Parallel, we had 4 boards each 5 FPGAs (Spartan3), one Master 4 Slaves.
Each slave received the data from a pre-processing card and wrote it to
its CF card.
> What thruput did you achive?
I don't now the value exactly, but some ten MB/s. The FPGA was faster
than the CF-Card. But we ran in trouble, because the CF cards made a nap
each minute for about 40 ms.
Bye Tom
Thomas Ruschival wrote:
> AFAIK for Altera only the Stratix and Arria GX family have PCIe> hard-cores included (which are quite costly) - however the new Xilinx> Spartan-6 has PCIe (at least I believe to have seen a commercial).> Keep in mind - just having a digital IP core is by far not enough, the> physical layer of PCIe is a non-trivial part!> correct timing and thus signal path length is crucial.> I think is is a Master-/Diplom- on its own to design a PCIe card....
I completely agree. This project needs three experienced engineers, a
board designer, an FPGA Designer and a software designer, if the PCIe,
CPU, OS approach is followed.
Or a lot of time:-).
To Chris:
> I took a closer look at Altera's development kits and found out that the> "Stratix IV GX" dev. kit already supports PCIe 2.0 with up to eight> lanes. So the required throuput should be easily achievable?
And you really believe, you are able to design a PCIe 2.0 board. If yes,
you are very innocent.
Where will you the PCIe 2.0 PC buy?
Bye Tom
Hi Thomas,
Of course I'm not going to design the board myself (guess you can't do
that with EAGLE ;-)). I would like to use Altera's Stratix IV GX dev.
kit which contains 2 PCIe Gen.2 hard macros (at least according to the
reference guide on page 1-2 available at
http://www.altera.com/literature/manual/rm_sivgx_fpga_dev_board.pdf).
A quick search for MOBOs with PCIe Gen2 slots returned for example the
"P6T7 WS SuperComputer" board from Asus. This one may not be cheap
(~350€), but I bet there are more.
Of cource I would prefer an easier solution, but it seems there is no
other (and I don't know how to connect the RAID 'directly').
kind regards
Chris
@Chris,
if your Board support 8 Lanes and you board has capable to run it at 4
Lanes mininum in your constelation than you really dont need an
PCIe-Gen2 Mainboard. Your FPGA-Board will fall back to PCIe-Gen1 which
is good for somewhat beyond 1GB/sec.Should be ok, so far.
If you start with such Evalboard than you must not start from Scratch.
The board will be delivered with samplecode (FPGA), some driver plus
source for some read/write example and hopefully with some DMA demo.
By the way. It is possible to design an PCIe Board with Eagle. It is not
recomended since your software will not support you with HF
Designrulecheck but, you could calculate the parameters by hand and
control yourself.
Highspeed Designs were routed with rubbersymbols, long time before the
first Layoutsoftware is seen in the wilderness.
regards
Andreas
Hi Andreas,
If PCIe Gen.1 is sufficient for this task I could use the "Arria II GX"
dev. Kit
(http://www.altera.com/products/devkits/altera/kit-aiigx-pcie.html)
which also includes a hard macro, but is much 'cheaper' than the
"Stratix IV GX" dev. kit. On the other hand the Arria II GX provides
much less logic elements etc. than the Stratix IV. The Stratix IV GX
dev. kit on the other hand has no 1GB DDR2 SODIMM like the Arria II GX
dev. kit. (for buffering the data) 'only' 512MB of DDR3 (onboard).
You're right I needn't start from scratch when using one of these dev.
kits. Altera provides the "PCI Express High-Performance Reference
Design"
(http://www.altera.com/support/refdesigns/ip/interface/ref-pciexpress-hp.html)
which seems to be exactly what I need.
kind regards
Chris