I am very new to FPGAs, but my main interest is in using the FPGA to occupy the second CPU socket in a dual CPU mothboard. I am not even sure what an FPGA development board is compared to just an FPGA board. Do you program on one board and then port code to another board, i.e. the FPGA board that the program will eventually run on? I assume that programming for a given FPGA is the same no matter how it is connected to the motherboard. Also, what about Alpha Data, Inc. I read somewhere where they connect the FPGA to the motherbaord through the motherboard memory. Is this true? Any help appreciated. Thanks in advance. Respectfully, Newport_j
: Edited by Admin
James Yunker wrote: > using the FPGA to occupy the second CPU socket in a dual CPU mothboard. That is not the usual way, because it is a very hard way. You will have to mimic the behavior of the CPU very detailed. > in a dual CPU mothboard. Which one? > I am not even sure what an FPGA development board is compared to just an > FPGA board. Its the same. It depends just on the connectors whether you can use a specific FPGA board for your application. The usual and most portable way to access the CPUs memory is by using DMA via PCI or PCIe.
A normal PC mainboard? Could get a bit hard, unless the sockets still have conventional buses. For Intel, this era ended after the Core 2, for AMD at K7/K8. To communicate with the system you'll likely need a bit more documentation that you might be able to find publicly. IIRC neither Intel QPI nor AMD Hypertransport is fully documented publicly concerning communication between CPU sockets. Besides that those links are damned fast.
Going to PCI puts me back in the jam of constricted data passage when I tried CPU-GPu programming. It limits what can be passed and it does take time (do not forget that data must travel to the GPU and back to the CPU for a round trip) that is two passes across the CPU-GPU bus; think of the time. I again am interested in the arragement that allows the FPGA to reside on the motherboard in the spare CPU sockett. This of course requires a motherboard with two CPU sockets. Thay do make them - check out NUWEGG. If I cannot have that then interface with the DDR3 memory is my next choice. I just do not think that the PCI interfcae offers any advantages. Any help appreciated. thanks in advance. Respectfully, Newport_j PS. Nallatech and DRC Computer (to name a few) offer motherboards with an FPGA situated in the spare CPU socket on a dual CPU socket motherboard. As I said, I am very new to this technology (FPGAs), but the items that I have read make a big deal out of placing the CPU on the motherboard. It has the fastest speed and the largest bandwidth. That sounds great to me.
James Yunker wrote: > Going to PCI puts me back in the jam of constricted data passage when I > tried CPU-GPu programming. Yes, PCI is outdated. Why not PCI Express?
James Yunker wrote: > > PS. Nallatech and DRC Computer (to name a few) offer motherboards with > an FPGA situated in the spare CPU socket on a dual CPU socket > motherboard. On the Nallatech Website i can only find PCI Express based Solutions, can you point us to a CPU Socket Solution? > > As I said, I am very new to this technology (FPGAs), but the items that > I have read make a big deal out of placing the CPU on the motherboard. > It has the fastest speed and the largest bandwidth. That sounds great to > me. As A.K. has pointet out with actuell CPU's you are out of luck. One reason is that a northbridge connected to a FSB (front side bus) of the CPU is a thing of the past. But feel free to point us to an actuell product and prove us wrong.
James Yunker wrote: > Nallatech Didn't find a MB there. > and DRC Computer Found a socket F mainboard. Not exactly high end for todays standards, and the bandwidth probably isn't a lot faster as todays PCI Express links. > As I said, I am very new to this technology (FPGAs) Since top bandwidth appears to be your concern: Do you think you can master interconnects clocked at about 3GHz (Intel QPI), x2 for data rate, regarding layout and FPGA design?
: Edited by User
A. K. wrote: > > Since top bandwidth appears to be your concern: Do you think you can > master interconnects clocked at about 3GHz (Intel QPI), x2 for data > rate, regarding layout and FPGA design? According to http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect Intel QPI has a raw datarate of 12.8 GByte/s in one direction, roughly the same as PCIe Gen3 x16 after the protocoll overhead. PCIe also can receive and transmit at the same time.
Lattice User wrote: > Intel QPI has a raw datarate of 12.8 GByte/s in one direction, roughly > the same as PCIe Gen3 x16 after the protocoll overhead. PCIe also can > receive and transmit at the same time. And with PCIe he may have a chance to find some FPGA having a PCIe interface module included. I don't think such a beast exists for QPI. But actually I wanted to point out, what interface clock rates he'll have to deal with in such a design. Not for the faint of heart.
: Edited by User
You know that you bring up an interesting point here. I saw the adds for a FPGA on a motherboard in the context of a summary report written .... well, I do not know when. That is teh problem. It did spark my interest. As I said, I have been struggling with bandwidth issues on my CPU-GPU interface connection. When I saw this report I became interested. It seems to answer most if not all of my problems. But no one gave a date on this report that I read. After talking with some vendors (Nallatch, Altera, Xilinx and Xtremedata) I was concerned that while this FPGA on the motherboard CPU slot is out there nobody is pusshing it now - or so it seems. My guess then was that it is an old and unsuccessful technology. In other words in came and went. I did see something last week that may support that. Their arguement was FPGA manufacturers on the motherboard second CPU slot for two reasons do do it anymore. They reasons are: 1. The CPU slot on the motherbaord is always changing; 2. The second slot on a motherboard is disappearing since if you want more cores in you system just buy a CPU (either Intel of AMD) with more cores. Don't bother to add a second CPU; it is just not neccessary anymore. Anyway that is where I am. Anything that has to do with PCI gets a thumbs down from me since that is the interface that I struggled so mightily with in my CPU-GPU setup. Maybe I am wrong and betting on the wrong setup. I know that HFT techs use my system setup. Maybe no one else does. As I said is this setup around. Also what are the benfits of PCI over PCIe? Any help appreciated. Thanks in advance. Respectfully, Newport_j
James Yunker wrote: > Also what are the benfits of PCI over PCIe? Slow enough to be easily implemented. ;-)
you can find a description of the Nallatech MB-system here: http://www.nallatech.com/Intel-Xeon-FSB-Socket-Fillers/fsb-development-systems.html
Achim S. wrote: > you can find a description of the Nallatech MB-system here: Roughly the same technological age as the old Opteron system from DRC above. Has Intels old conventional front side bus.
: Edited by User
James Yunker wrote: > 1. The CPU slot on the motherbaord is always changing; Which kills such a product as soon as the socket changes. PCIe however survives. > 2. The second slot on a motherboard is disappearing Only in desktops/workstations. There still are many multi-socket servers. AMD wanted to sell Hypertransport as a bus for highspeed peripherals/coprocessors, even had its own HTX connector, though this was at a time when the highend buses were big PCI-X. Wasn't really successful though. > Anyway that is where I am. Anything that has to do with PCI gets a > thumbs down from me since that is the interface that I struggled so > mightily with in my CPU-GPU setup. Talking about PCI oder PCIe? They are quite different. More exactly, PCI and PCIe have nothing in common, at phys layer. Don't confuse PCIe with PCI-X which is wider and faster PCI.
: Edited by User
Achim S. wrote: > you can find a description of the Nallatech MB-system here: > > http://www.nallatech.com/Intel-Xeon-FSB-Socket-Fillers/fsb-development-systems.html Xeon FSB is end of live: http://ark.intel.com/products/series/28355/Intel-Xeon-Processors-with-800-MHz-FSB In other words outdatet. > Anyway that is where I am. Anything that has to do with PCI gets a > thumbs down from me since that is the interface that I struggled so > mightily with in my CPU-GPU setup. Without understanding what exactly the bottleneck is im this setup your conclusion is premature. PCIe Gen3 x16 (as with modern GPUs) is much faster than the old Xeon FSB setup. If your CPU-GPU setup is slow it is most likely to a bad software setup or a limitation of the GPU architecture. Both don't apply to a PCIe implementation on a FPGA.
Let me ask this. The HTX slot on a AMD comaptible motherboard is a location where one can place an FPGA provide it (the fpga) is slot compatible. Is the HTX slot just a CPU slot or is it a special slot just for things like an FPGA? I do not mean to imply the HTX slot is exclusively for FPGA. Also, the Intel FSB format is nearing end of life so I can ignore it. Any help appreciated. Thanks in advance. Respectfully, Newport_j
James Yunker wrote: > Is the HTX slot just a CPU slot or is it a special slot just for things > like an FPGA? It is (was) intended for a high bandwidth low latency peripheral or coprocessor, not für a general purpose CPU. However the distinction between peripheral and processor is largely arbitrary at this level, except for cache coherency traffic.
: Edited by User
Are you sure that the problem is too low datarate between CPU-GPU and not the architecture of your software? If this is really the problem, maybe it is different with an FPGA - as you might be able to move some of the processing that is now done on the CPU over to the FPGA - if it is suitable for that. (there are also FPGAs with on-chip ARM Cortex Cores...)
The Intel FSB format is now reaching end of life? What is taking its place? Thanks in advance. Respectfully, Newport_j
Newport_j wrote: > The Intel FSB format is now reaching end of life? What is taking its > place? It reached end of life years ago. Memory is attached directly, peripherals are either connected directly by PCIe links or are part of the system- and interprocessor communication structure which uses QPI point to point links. Though the links are techically different, the overall structure is the same with Intels QPI and AMDs Hypertransport. AMD developed this system communication concept a decade ago with the K8, Intel followed years later. It is no longer possible to run parallel multidrop buses at the required speed, so partially serialized point to point links running at very high speed were becoming necessary. Simplified, todays multi-socket servers looks somewhat like several processors having local memory, and one or more i/o-hubs, connected together by a meshed serial network instead of parallel buses. Think of a room with single socket PCs and separate peripheral devices, all connected by a packetized network, pretending to be a single system. As a result, bandwidth and access time of memory are no longer uniform, but instead depend on the exact path between cpu and memory.
: Edited by User
Intel QPI on Virtex 7: http://press.xilinx.com/2012-09-11-Xilinx-Demonstrates-Industrys-First-QPI-1-1-Interface-with-FPGAs-at-Intel-Developer-Forum That was on a Sandy bridge, actual CPUs have a higher clockrate on the QPI. (8 GHz vs 3.1 GHz)
Lattice User wrote: > Intel QPI on Virtex 7: > One more Correction about the Speed, the Intel datasheets are bit sparse: The Xilinx Demo was on an Ivy bridge CPU (Xeon E5 2600 v2). This runs with 6.4 GT/s (forward Clock 3.2 GHz). http://www.youtube.com/watch?v=Pqfmh88KHvo Actuell Xeon E5 4600 v2 perfomance is 8.0 GT/s (forward Clock 4 GHz).
I am unsure where to begin this endeavor - for want of better term. I do not know what FPGA boad to get first. A develpopment board? I feel that there is a good chnance that I will buy a board and find out later it does not meet my needs. I have a very specific plan of speeding up one very long complex c program so I do not think my needs are vague. They is specific. Would a Nios Embedded Evaluation Board (NEEK) do it? It seems with thta you can try many boards with a NEEK and decide which one that you want. Respectfully, Newport_j
Newport_j wrote: > > Would a Nios Embedded Evaluation Board (NEEK) do it? It seems with thta > you can try many boards with a NEEK and decide which one that you want. > This looks old - Cyclone 3 is how old, a decade or so? Better go for the latest generation. And if you want to run C code on the FPGA and accelerate portions of it - again: Look at the devices with embedded ARM Cortex Cores (Xilinx Zynq or the respective Altera devices).
You should think about GPGPU with OpenCL or CUDA.
I have it is just CPU-GPu bus constricted. It wil not work given the present state of technology in GPUs. Respectfully, Newport_j
Better go for the latest generation. And if you want to run C code on the FPGA and accelerate portions of it - again: Look at the devices with embedded ARM Cortex Cores (Xilinx Zynq or the respective Altera devices).
Please give me a link. It would be very helpful. Thanks in advance. Respectfully, Newport_j
http://www.xilinx.com/products/silicon-devices/soc/zynq-7000/index.htm Xilinx itself has some nice PCIe Boards (but there are lots of other boards): http://www.xilinx.com/products/boards-and-kits/EK-Z7-ZC706-G.htm I could guess that you need some aquisition as well in you application, FMC Cards are nice for this: http://www.xilinx.com/products/boards-and-kits/1-45SL7B.htm
Newport_j wrote: > I have it is just CPU-GPu bus constricted. It wil not work given the > present state of technology in GPUs. > > Respectfully, > > Newport_j Can you tell us what data rates (in MBytes/second or GBytes/second) you are looking for, also what you can currently achieve with the GPU and your application.
Can you tell us what data rates (in MBytes/second or GBytes/second) you are looking for, also what you can currently achieve with the GPU and your application.
I am sorry. I cannot do that. The information is not available any more. It is just not available. I know from using the Portland Group C compiler that the code was calculation and memory bound. Many for loops had independent interations, but there was just not enough iterations in a loop to justify sending calcuation to GPU and then sending them back afterwards to the CPU. That sending to GPU and sending back to CPU takes time. The secret to speeeding up my program is parallelism. When I increased the number of cores the program uses when it runs - it executed faster. It was almost linear - double the cores and halve the execution time. I only had 8 cores so this could only go up to 8. All the software that I used to determine program scalability suggested this can continue if I kept on doubling the number of cores. I could not. So instead of waiting for processors with more than 8 cores being available on PC desktops, I looked into FPGAs. They seem ideal for my job. They are very paralleizable. Thtat is what I need. Respectfully, Newport_j
Hi, Any update in the discussion? Dose any body contact with xilinix customer service about price of QPI SmartCORE IP? http://www.xilinx.com/esp/datacenter/data_center_ip.htm System Interconnect QuickPath Interconnect (QPI) -Designed for High-speed FPGA-to-processor communications -Cache agent, with full-width (20 lanes) operation at 6.4Gbps per lane -Example design, for rapid start-up, based on Xilinx® Virtex®-7 FPGA