EmbDev.net

Forum: FPGA, VHDL & Verilog FPGA in CPU socket


von James Y. (Company: Naval Undersea Warfare Center) (newport_j)


Rate this post
useful
not useful
I am very new to FPGAs, but my main interest is in using the FPGA to 
occupy the second CPU socket in a dual CPU mothboard.

I am not even sure what an FPGA development board is compared to just an 
FPGA board.

Do you program on one board and then port code to another board, i.e. 
the FPGA board that the program will eventually run on?

I assume that programming for a given FPGA is the same no matter how it 
is connected to the motherboard.

Also, what about Alpha Data, Inc. I read somewhere where they connect 
the FPGA to the motherbaord through the motherboard memory. Is this 
true?

Any help appreciated.

Thanks in advance.

Respectfully,

Newport_j

: Edited by Admin
von Lothar M. (Company: Titel) (lkmiller) (Moderator)


Rate this post
useful
not useful
James Yunker wrote:
> using the FPGA to occupy the second CPU socket in a dual CPU mothboard.
That is not the usual way, because it is a very hard way. You will have 
to mimic the behavior of the CPU very detailed.

> in a dual CPU mothboard.
Which one?

> I am not even sure what an FPGA development board is compared to just an
> FPGA board.
Its the same. It depends just on the connectors whether you can use a 
specific FPGA board for your application.

The usual and most portable way to access the CPUs memory is by using 
DMA via PCI or PCIe.

von A.K. (Guest)


Rate this post
useful
not useful
A normal PC mainboard? Could get a bit hard, unless the sockets still 
have conventional buses. For Intel, this era ended after the Core 2, for 
AMD at K7/K8.

To communicate with the system you'll likely need a bit more 
documentation that you might be able to find publicly. IIRC neither 
Intel QPI nor AMD Hypertransport is fully documented publicly concerning 
communication between CPU sockets. Besides that those links are damned 
fast.

von James Y. (Company: Naval Undersea Warfare Center) (newport_j)


Rate this post
useful
not useful
Going to PCI puts me back in the jam of constricted data passage when I 
tried CPU-GPu programming.

It limits what can be passed and it does take time (do not forget that 
data  must travel to the GPU and back to the CPU for a round trip) that 
is two passes across the CPU-GPU bus; think of the time.

I again am interested in the arragement that allows the FPGA to reside 
on the motherboard in the spare CPU sockett. This of course requires a 
motherboard with two CPU sockets. Thay do make them - check out NUWEGG.

If I cannot have that then interface with the DDR3 memory is my next 
choice. I just do not think that the PCI interfcae offers any 
advantages.

Any help appreciated.

thanks in advance.

Respectfully,


Newport_j

PS. Nallatech and DRC Computer (to name a few) offer motherboards with 
an FPGA situated in the spare CPU socket on a dual CPU socket 
motherboard.

As I said, I am very new to this technology (FPGAs), but the items that 
I have read make a big deal out of placing the CPU on the motherboard. 
It has the fastest speed and the largest bandwidth. That sounds great to 
me.

von (prx) A. K. (prx)


Rate this post
useful
not useful
James Yunker wrote:
> Going to PCI puts me back in the jam of constricted data passage when I
> tried CPU-GPu programming.

Yes, PCI is outdated. Why not PCI Express?

von Lattice User (Guest)


Rate this post
useful
not useful
James Yunker wrote:

>
> PS. Nallatech and DRC Computer (to name a few) offer motherboards with
> an FPGA situated in the spare CPU socket on a dual CPU socket
> motherboard.

On the Nallatech Website i can only find PCI Express based Solutions, 
can you point us to a CPU Socket Solution?

>
> As I said, I am very new to this technology (FPGAs), but the items that
> I have read make a big deal out of placing the CPU on the motherboard.
> It has the fastest speed and the largest bandwidth. That sounds great to
> me.

As A.K. has pointet out with actuell CPU's you are out of luck. One 
reason is that a northbridge connected to a FSB (front side bus) of the 
CPU is a thing of the past.
But feel free to point us to an actuell product and prove us wrong.

von (prx) A. K. (prx)


Rate this post
useful
not useful
James Yunker wrote:
> Nallatech

Didn't find a MB there.

> and DRC Computer

Found a socket F mainboard. Not exactly high end for todays standards, 
and the bandwidth probably isn't a lot faster as todays PCI Express 
links.

> As I said, I am very new to this technology (FPGAs)

Since top bandwidth appears to be your concern: Do you think you can 
master interconnects clocked at about 3GHz (Intel QPI), x2 for data 
rate, regarding layout and FPGA design?

: Edited by User
von Lattice User (Guest)


Rate this post
useful
not useful
A. K. wrote:

>
> Since top bandwidth appears to be your concern: Do you think you can
> master interconnects clocked at about 3GHz (Intel QPI), x2 for data
> rate, regarding layout and FPGA design?

According to

http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect

Intel QPI has a raw datarate of 12.8 GByte/s in one direction, roughly 
the same as PCIe Gen3 x16 after the protocoll overhead. PCIe also can 
receive and transmit at the same time.

von (prx) A. K. (prx)


Rate this post
useful
not useful
Lattice User wrote:
> Intel QPI has a raw datarate of 12.8 GByte/s in one direction, roughly
> the same as PCIe Gen3 x16 after the protocoll overhead. PCIe also can
> receive and transmit at the same time.

And with PCIe he may have a chance to find some FPGA having a PCIe 
interface module included. I don't think such a beast exists for QPI.

But actually I wanted to point out, what interface clock rates he'll 
have to deal with in such a design. Not for the faint of heart.

: Edited by User
von James Y. (Company: Naval Undersea Warfare Center) (newport_j)


Rate this post
useful
not useful
You  know that you bring up an interesting point here. I saw the adds 
for a FPGA on a motherboard in the context of a summary report written 
.... well, I do not know when. That is teh problem.

It did spark my interest. As I said, I have been struggling with 
bandwidth issues on my CPU-GPU interface connection. When I saw this 
report I became interested. It seems to answer most if not all of my 
problems.

But no one gave a date on this report that I read. After talking with 
some vendors (Nallatch, Altera, Xilinx and Xtremedata) I was concerned 
that while this FPGA on the motherboard CPU slot is out there nobody is 
pusshing it now - or so it seems.

My guess then was that it is an old and unsuccessful technology. In 
other words in came and went. I did see something last week that may 
support that.

Their arguement was FPGA manufacturers on the motherboard second CPU 
slot for two reasons do do it anymore. They reasons are:

1. The CPU slot on the motherbaord is always changing;

2. The second slot on a motherboard is disappearing since if you want 
more cores in you system just buy a CPU (either Intel of AMD) with more 
cores. Don't bother to add a second CPU; it is just not neccessary 
anymore.


Anyway that is where I am. Anything that has to do with PCI gets a 
thumbs down from me since that is the interface that I struggled so 
mightily with in my CPU-GPU setup.

Maybe I am wrong and betting on the wrong setup. I know that HFT techs 
use my system setup. Maybe no one else does.

As I said is this setup around. Also what are the benfits of PCI over 
PCIe?

Any help appreciated.

Thanks in advance.

Respectfully,

Newport_j

von (prx) A. K. (prx)


Rate this post
useful
not useful
James Yunker wrote:
> Also what are the benfits of PCI over PCIe?

Slow enough to be easily implemented. ;-)

von Achim S. (Guest)


Rate this post
useful
not useful
you can find a description of the Nallatech MB-system here:

http://www.nallatech.com/Intel-Xeon-FSB-Socket-Fillers/fsb-development-systems.html

von (prx) A. K. (prx)


Rate this post
useful
not useful
Achim S. wrote:
> you can find a description of the Nallatech MB-system here:

Roughly the same technological age as the old Opteron system from DRC 
above. Has Intels old conventional front side bus.

: Edited by User
von (prx) A. K. (prx)


Rate this post
useful
not useful
James Yunker wrote:
> 1. The CPU slot on the motherbaord is always changing;

Which kills such a product as soon as the socket changes. PCIe however 
survives.

> 2. The second slot on a motherboard is disappearing

Only in desktops/workstations. There still are many multi-socket 
servers.

AMD wanted to sell Hypertransport as a bus for highspeed 
peripherals/coprocessors, even had its own HTX connector, though this 
was at a time when the highend buses were big PCI-X. Wasn't really 
successful though.

> Anyway that is where I am. Anything that has to do with PCI gets a
> thumbs down from me since that is the interface that I struggled so
> mightily with in my CPU-GPU setup.

Talking about PCI oder PCIe? They are quite different. More exactly, PCI 
and PCIe have nothing in common, at phys layer. Don't confuse PCIe with 
PCI-X which is wider and faster PCI.

: Edited by User
von Lattice User (Guest)


Rate this post
useful
not useful
Achim S. wrote:
> you can find a description of the Nallatech MB-system here:
>
> 
http://www.nallatech.com/Intel-Xeon-FSB-Socket-Fillers/fsb-development-systems.html

Xeon FSB is end of live:

http://ark.intel.com/products/series/28355/Intel-Xeon-Processors-with-800-MHz-FSB

In other words outdatet.

> Anyway that is where I am. Anything that has to do with PCI gets a
> thumbs down from me since that is the interface that I struggled so
> mightily with in my CPU-GPU setup.

Without understanding what exactly the bottleneck is im this setup your 
conclusion is premature. PCIe Gen3 x16 (as with modern GPUs) is much 
faster than the old Xeon FSB setup. If your CPU-GPU setup is slow it is 
most likely to a bad software setup or a limitation of the GPU 
architecture. Both don't apply to a PCIe implementation on a FPGA.

von James Y. (Company: Naval Undersea Warfare Center) (newport_j)


Rate this post
useful
not useful
Let me ask this. The HTX slot on a AMD comaptible motherboard is a 
location where one can place an FPGA provide it (the fpga) is slot 
compatible.

Is the HTX slot just a CPU slot or is it a special slot just for things 
like an FPGA? I do not mean to imply the HTX slot is exclusively for 
FPGA.

Also, the Intel FSB format is nearing end of life so I can ignore it.

Any help appreciated.

Thanks in advance.

Respectfully,


Newport_j

von (prx) A. K. (prx)


Rate this post
useful
not useful
James Yunker wrote:
> Is the HTX slot just a CPU slot or is it a special slot just for things
> like an FPGA?

It is (was) intended for a high bandwidth low latency peripheral or 
coprocessor, not für a general purpose CPU. However the distinction 
between peripheral and processor is largely arbitrary at this level, 
except for cache coherency traffic.

: Edited by User
von Grendel (Guest)


Rate this post
useful
not useful
Are you sure that the problem is too low datarate between CPU-GPU and 
not the architecture of your software?
If this is really the problem, maybe it is different with an FPGA - as 
you might be able to move some of the processing that is now done on the 
CPU over to the FPGA - if it is suitable for that.
(there are also FPGAs with on-chip ARM Cortex Cores...)

von Newport_j (Guest)


Rate this post
useful
not useful
The Intel FSB format is now reaching end of life? What is taking its 
place?

Thanks in advance.

Respectfully,


Newport_j

von (prx) A. K. (prx)


Rate this post
useful
not useful
Newport_j wrote:
> The Intel FSB format is now reaching end of life? What is taking its
> place?

It reached end of life years ago.

Memory is attached directly, peripherals are either connected directly 
by PCIe links or are part of the system- and interprocessor 
communication structure which uses QPI point to point links. Though the 
links are techically different, the overall structure is the same with 
Intels QPI and AMDs Hypertransport.

AMD developed this system communication concept a decade ago with the 
K8, Intel followed years later. It is no longer possible to run parallel 
multidrop buses at the required speed, so partially serialized point to 
point links running at very high speed were becoming necessary.

Simplified, todays multi-socket servers looks somewhat like several 
processors having local memory, and one or more i/o-hubs, connected 
together by a meshed serial network instead of parallel buses. Think of 
a room with single socket PCs and separate peripheral devices, all 
connected by a packetized network, pretending to be a single system. As 
a result, bandwidth and access time of memory are no longer uniform, but 
instead depend on the exact path between cpu and memory.

: Edited by User
von Lattice User (Guest)


Rate this post
useful
not useful
Intel QPI on Virtex 7:

http://press.xilinx.com/2012-09-11-Xilinx-Demonstrates-Industrys-First-QPI-1-1-Interface-with-FPGAs-at-Intel-Developer-Forum

That was on a Sandy bridge, actual CPUs have a higher clockrate on the 
QPI.
(8 GHz vs 3.1 GHz)

von Lattice User (Guest)


Rate this post
useful
not useful
Lattice User wrote:

> (8 GHz vs 3.1 GHz)

Typo: must be (8 GHz vs 3.2 GHz)

von Lattice User (Guest)


Rate this post
useful
not useful
Lattice User wrote:
> Intel QPI on Virtex 7:
>

One more Correction about the Speed, the Intel datasheets are bit 
sparse:

The Xilinx Demo was on an Ivy bridge CPU (Xeon E5 2600 v2).
This runs with 6.4 GT/s (forward Clock 3.2 GHz).
http://www.youtube.com/watch?v=Pqfmh88KHvo

Actuell Xeon E5 4600 v2 perfomance is 8.0 GT/s (forward Clock 4 GHz).

von Newport_j (Guest)


Rate this post
useful
not useful
I am unsure where to begin this endeavor - for want of better term. I do 
not know what FPGA boad to get first. A develpopment board?

I feel that there is a good chnance that I will buy a board and find out 
later it does not meet my needs. I have a very specific plan of speeding 
up one very long complex c program so I do not think my needs are vague. 
They  is specific.

Would a Nios Embedded Evaluation Board (NEEK) do it? It seems with thta 
you can try many boards with a NEEK and decide which one that you want.

Respectfully,


Newport_j

von Grendel (Guest)


Rate this post
useful
not useful
Newport_j wrote:
>
> Would a Nios Embedded Evaluation Board (NEEK) do it? It seems with thta
> you can try many boards with a NEEK and decide which one that you want.
>


This looks old - Cyclone 3 is how old, a decade or so?

Better go for the latest generation. And if you want to run C code on 
the FPGA and accelerate portions of it - again: Look at the devices with 
embedded ARM Cortex Cores (Xilinx Zynq or the respective Altera 
devices).

von Uwe (Guest)


Rate this post
useful
not useful
You should think about GPGPU with OpenCL or CUDA.

von Newport_j (Guest)


Rate this post
useful
not useful
I have it is just CPU-GPu bus constricted. It wil not work given the 
present state of technology in GPUs.

Respectfully,

Newport_j

von Newport_j (Guest)


Rate this post
useful
not useful
1
Better go for the latest generation. And if you want to run C code on 
2
the FPGA and accelerate portions of it - again: Look at the devices with 
3
embedded ARM Cortex Cores (Xilinx Zynq or the respective Altera 
4
devices).

Please give me a link. It would be very helpful.

Thanks in advance.

Respectfully,

Newport_j

von Grendel (Guest)


Rate this post
useful
not useful
http://www.xilinx.com/products/silicon-devices/soc/zynq-7000/index.htm

Xilinx itself has some nice PCIe Boards (but there are lots of other 
boards):

http://www.xilinx.com/products/boards-and-kits/EK-Z7-ZC706-G.htm

I could guess that you need some aquisition as well in you application, 
FMC Cards are nice for this:
http://www.xilinx.com/products/boards-and-kits/1-45SL7B.htm

von Lattice User (Guest)


Rate this post
useful
not useful
Newport_j wrote:
> I have it is just CPU-GPu bus constricted. It wil not work given the
> present state of technology in GPUs.
>
> Respectfully,
>
> Newport_j

Can you tell us what data rates (in MBytes/second or GBytes/second) you 
are looking for, also what you can currently achieve with the GPU and 
your application.

von Newport_j (Guest)


Rate this post
useful
not useful
1
Can you tell us what data rates (in MBytes/second or GBytes/second) you 
2
are looking for, also what you can currently achieve with the GPU and 
3
your application.

I am sorry. I cannot do that. The information is not available any more. 
It is just not available.

I know from using the Portland Group C compiler that the code was 
calculation and memory bound. Many for loops had independent 
interations, but there was just not enough iterations in a loop to 
justify sending calcuation to GPU and then sending them back afterwards 
to the CPU. That sending to GPU and sending back to CPU takes time.

The secret to speeeding up my program is parallelism. When I increased 
the number of cores the program uses when it runs - it executed faster. 
It was almost linear - double the cores and halve the execution time.

I only had 8 cores so this could only go up to 8. All the software that 
I used to determine program scalability suggested this can continue if I 
kept on doubling the number of cores. I could not.

So instead of waiting for processors with more than 8 cores being 
available on PC desktops, I looked into FPGAs.

They seem ideal for my job. They are very paralleizable. Thtat is what I 
need.

Respectfully,


Newport_j

von Joseph (Guest)


Rate this post
useful
not useful
Hi,

Any update in the discussion?

Dose any body contact with xilinix customer service about price of QPI 
SmartCORE IP?

http://www.xilinx.com/esp/datacenter/data_center_ip.htm

System Interconnect   QuickPath Interconnect (QPI)

    -Designed for High-speed FPGA-to-processor communications
    -Cache agent, with full-width (20 lanes) operation at 6.4Gbps per 
lane
    -Example design, for rapid start-up, based on Xilinx® Virtex®-7 FPGA

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.