Hello, First of all, we are a software company who almost doesn't have any idea what FPGA's capabilities are. Keeping that in mind our problem is this: we have a custom written algorithm which requires heavy calculations and needs to be repeated 30 times a second. We assemblied a really powerful computer but our framerate is about ~15. We use parallel programming techniques like CUDA and it increased the speed significantly but we were thinking whether reimplementing this algorithm in FPGA would increase the speed. First of all, is implementing any kind of algorithm in FPGA possible? The algorithm includes parts which can be executed in parallel and also includes parts which -should- be executed sequential. So we just wanted to have an idea whether FPGA is the answer for our problem.
What is the algorithm doing? Are missing programming skills the reason why it cannot be executed on a desktop pc? 30 times in 1 second doesn't sound much. please give more info. in general, you can implement any algo on an fpga.
We basically do the following steps: - Bilateral filtering with 11 pixel window width, - Sift feature extraction and matching according to previous frame, - Finding position of camera, - Basic 3D estimation of features and send to the computer with image.
Without FPGA experience i think it is easier to program 2 desktop pcs to do the job.
If we decide that going the FPGA way will improve the performance dramatically, we will consider hiring an enginner to do the job. So, what we are interested in is whether implementing such algorithms on FPGAs are efficient or not.
if you are using a standard camera with CameraLink or GigE interface: there is a smart solution from SiliconSoftware. It simplifies the FPGA software development by using graphical programming with function blocks. http://www.silicon-software.de/en/index.html
Hi, Let me comment: > - Bilateral filtering with 11 pixel window width, Can work very fast on an FPGA and is scalable (the limit being your FPGA resources). > - Sift feature extraction and matching according to previous frame, The basic feature extraction (corner detection, etc) can be put into a FPGA as well. However, more complex decision algorithms should probably run in software. > - Finding position of camera, > - Basic 3D estimation of features and send to the computer with image. That sounds like a job for the software as well. IMHO, the frame grabber or camera link based solutions are not very elegant solutions and graphical programming tools may lead the way to a prototype, but not to the real product, so eventually, you have to redo it all over again. If no extreme memory bus bandwidth is required, I could recommend a DSP/FPGA companion (I have made good experience with Blackfin and Xilinx FPGA WRT to intelligent cameras). If you don't need the full frame rate output over ethernet, the 100M interface will do. The embedded approach will only work well when there is a linear processing chain and no requirement of massive bandwidth, i.e. something like: image data -> fpga preprocessing -> dsp postprocessing -> output What you'll have to do at an early stage, is to prototype your "accelerated" core operations and estimate the resources before buying hardware. Cheers, - Martin
@TO: I am typically involved in image processing and acquisition and own cores for camera link and gigE as well to perform transport to pc in order to do the offline processing there in VPX and also GPU. Apart from that I already performed real time image processing and object tracking for moving objects and very quickly moving cameras (MACH 0.9). If you need support, feel free to step in contact. Answering your questions in advance I also would point out that gigE and Cameralink are non optimal solutions in many cases, since they were designed to meet many persons requirements. Continueing, yes FPGAs can do all your work properly and will be the most speedfull solution. The reason why most customers choose DSPs, MCUs CPUs and also CUDA though are costs, test issues and current demands. You will have to find out about the minimum speed requirements in order to define how much you will have to implement into the FPGAs, which are expensive in costs and require much more time for development, validation and test. >11 pixels window width wow :-) in fact this can be still done with a video dsp. :-) My current design contains a three stage pixel reconstruction / interpolation over 13x13x3 pixels including NUC (GOC, FPNC) and subsequent noise filtering (chromatic extraction) for images around one million pixels each with movement analysis and vector extraction. It even can perform oversampling, meaning 120 fps instead of 30 for photon noise processing and value forecasting. The demonstrator design runs a GPU unit with MATLAB based algos and takes 3mins to post process a single second, while the FPGA does all in real time and only increases in size.
Canol wrote: > We basically do the following steps: > > - Bilateral filtering with 11 pixel window width, > - Sift feature extraction and matching according to previous frame, > - Finding position of camera, > - Basic 3D estimation of features and send to the computer with image. The bigest think in FPGA is to divide the task. The parallel calculation and pre- and postcalculation can get the speed. Shift estimation is very good to write in a FPGA. Often is an optimised algorithm difficult to port in a FPGA. You have other resouces and need an other philosophy for implementing a HDL code.
To me, the strongest advantage of an FPGA is sizeable accuracy and calculation precision especially for technical applications where often more than 32 bit are required. But you have to face the fact that implementation is more complex and causes much more design expertise then "simple" C in order to get most of of the system.
hello, My work is on Novel FFT algorithm implementation on FPGA.I want to know how to read an image in Verilog.
Is there anybody out there working with abstract modelling to creat complex algorithms in FPGAs? Which tools do you use to implement, and which to verify?
> We assemblied a really powerful computer but our framerate is about ~15. I think you have to find the bottleneck. -Is it the transport from Camera to Framegrabber . -Is it the transfer from Framegrabber to the main memory. -the Transport from main Memory to GPU memory -The Algorithm on GPU -the Transport from GPU memory back to main memory -the Transport from main memory back to GPU 8) to show it on the Screen ;-) How much is the Framerate in live view without GPU. Best use the original Software of the Framegrabber.
Uwe wrote: > I think you have to find the bottleneck. guess if he didn't find it during the 1.5y this thread resides, there is no chance he will ever make it.
As for all designs I came across so far I could state the the data transportation from and to the GPU over what so ever is always the bottle neck even if one uses pcie. This is the reason why people commonly use FPGAs for this. As soon as data interchange becomes more complex than just shift from one cell to the next (what GPU-elements could do very well) the heaven like power oft the GPU systems gets down to it's knees. > We use parallel programming techniques like CUDA and it increased the > speed significantly but we were thinking whether reimplementing this > algorithm in FPGA would increase the speed. This is the point, whereby "parallel techniques" does not tell the full truth about a design. CUDA and FPGA-DSP techniques differ significantly from each other and while stepping from C and DSP based processing from parallelly coded CUDA processing is small - transferring algorithms to FPGAs will fail if fundamental issues are not understood correctly. At first glance, the FPGAs will perform lower when only fed with soft core like ALUs performing DSP-based processing. Totally new structures will have to be invented first to overcome this limits and take advantage from the FPGA's real capabilities. Since the tree of information looks mostly like this: Design - Software - DSP - CUDA GPU - Hardware - FPGA-softcore - - FPGA-calculation chains it will never be possible transform a design from a software to FPGAs efficiently.