EmbDev.net

Forum: FPGA, VHDL & Verilog Implementing a complex algorithm on FPGA


von Canol (Guest)


Rate this post
useful
not useful
Hello,

First of all, we are a software company who almost doesn't have any idea 
what FPGA's capabilities are. Keeping that in mind our problem is this: 
we have a custom written algorithm which requires heavy calculations and 
needs to be repeated 30 times a second. We assemblied a really powerful 
computer but our framerate is about ~15.

We use parallel programming techniques like CUDA and it increased the 
speed significantly but we were thinking whether reimplementing this 
algorithm in FPGA would increase the speed.

First of all, is implementing any kind of algorithm in FPGA possible? 
The algorithm includes parts which can be executed in parallel and also 
includes parts which -should- be executed sequential.

So we just wanted to have an idea whether FPGA is the answer for our 
problem.

von Kan a. (Company: Basta) (kanasta)


Rate this post
useful
not useful
What is the algorithm doing?
Are missing programming skills the reason why it cannot be executed on a 
desktop pc?
30 times in 1 second doesn't sound much. please give more info.

in general, you can implement any algo on an fpga.

von Canol (Guest)


Rate this post
useful
not useful
We basically do the following steps:

 - Bilateral filtering with 11 pixel window width,
 - Sift feature extraction and matching according to previous frame,
 - Finding position of camera,
 - Basic 3D estimation of features and send to the computer with image.

von Kan a. (Company: Basta) (kanasta)


Rate this post
useful
not useful
Without FPGA experience i think it is easier to program 2 desktop pcs to 
do the job.

von Canol (Guest)


Rate this post
useful
not useful
If we decide that going the FPGA way will improve the performance 
dramatically, we will consider hiring an enginner to do the job. So, 
what we are interested in is whether implementing such algorithms on 
FPGAs are efficient or not.

von JojoS (Guest)


Rate this post
useful
not useful
if you are using a standard camera with CameraLink or GigE interface: 
there is a smart solution from SiliconSoftware. It simplifies the FPGA 
software development by using graphical programming with function 
blocks.
http://www.silicon-software.de/en/index.html

von Strubi (Guest)


Rate this post
useful
not useful
Hi,

Let me comment:

> - Bilateral filtering with 11 pixel window width,

Can work very fast on an FPGA and is scalable (the limit being your FPGA 
resources).

> - Sift feature extraction and matching according to previous frame,

The basic feature extraction (corner detection, etc) can be put into a 
FPGA as well. However, more complex decision algorithms should probably 
run in software.

> - Finding position of camera,
> - Basic 3D estimation of features and send to the computer with image.

That sounds like a job for the software as well.

IMHO, the frame grabber or camera link based solutions are not very 
elegant solutions and graphical programming tools may lead the way to a 
prototype, but not to the real product, so eventually, you have to redo 
it all over again.

If no extreme memory bus bandwidth is required, I could recommend a 
DSP/FPGA companion (I have made good experience with Blackfin and Xilinx 
FPGA WRT to intelligent cameras). If you don't need the full frame rate 
output over ethernet, the 100M interface will do.

The embedded approach will only work well when there is a linear 
processing chain and no requirement of massive bandwidth, i.e. something 
like:

image data -> fpga preprocessing -> dsp postprocessing -> output

What you'll have to do at an early stage, is to prototype your 
"accelerated" core operations and estimate the resources before buying 
hardware.

Cheers,
- Martin

von J. S. (engineer)


Rate this post
useful
not useful
@TO: I am typically involved in image processing and acquisition and own 
cores for camera link and gigE as well to perform transport to pc in 
order to do the offline processing there in VPX and also GPU. Apart from 
that I already performed real time image processing and object tracking 
for moving objects and very quickly moving cameras (MACH 0.9).

If you need support, feel free to step in contact.

Answering your questions in advance I also would point out that gigE and 
Cameralink are non optimal solutions in many cases, since they were 
designed to meet many persons requirements.

Continueing, yes FPGAs can do all your work properly and will be the 
most speedfull solution. The reason why most customers choose DSPs, MCUs 
CPUs and also CUDA though are costs, test issues and current demands.

You will have to find out about the minimum speed requirements in order 
to define how much you will have to implement into the FPGAs, which are 
expensive in costs and require much more time for development, 
validation and test.

>11 pixels window width
wow :-) in fact this can be still done with a video dsp. :-)

My current design contains a three stage pixel reconstruction / 
interpolation over 13x13x3 pixels including NUC (GOC, FPNC) and 
subsequent noise filtering (chromatic extraction) for images around one 
million pixels each with movement analysis and vector extraction. It 
even can perform oversampling, meaning 120 fps instead of 30 for photon 
noise processing and value forecasting.

The demonstrator design runs a GPU unit with MATLAB based algos and 
takes 3mins to post process a single second, while the FPGA does all in 
real time and only increases in size.

von René D. (Company: www.dossmatik.de) (dose)


Rate this post
useful
not useful
Canol wrote:
> We basically do the following steps:
>
>  - Bilateral filtering with 11 pixel window width,
>  - Sift feature extraction and matching according to previous frame,
>  - Finding position of camera,
>  - Basic 3D estimation of features and send to the computer with image.


The bigest think in FPGA is to divide the task. The parallel calculation 
and pre- and postcalculation  can get the speed.

Shift estimation is very good to write in a FPGA.
Often is an optimised algorithm difficult to port in a FPGA. You have 
other resouces and need an other philosophy for implementing a HDL code.

von Thomas (Guest)


Rate this post
useful
not useful
To me, the strongest advantage of an FPGA is sizeable accuracy and 
calculation precision especially for technical applications where often 
more than 32 bit are required. But you have to face the fact that 
implementation is more complex and causes much more design expertise 
then "simple" C in order to get most of of the system.

von Anitha T G R. (Company: JSSATE) (honey_tg4u)


Rate this post
useful
not useful
hello,
My work is on Novel FFT algorithm implementation on FPGA.I want to know 
how to read an image in Verilog.

von Franklin (Guest)


Rate this post
useful
not useful
Is there anybody out there working with abstract modelling to creat 
complex algorithms in FPGAs?

Which tools do you use to implement, and which to verify?

von Uwe (Guest)


Rate this post
useful
not useful
> We assemblied a really powerful computer but our framerate is about ~15.
I think you have to find the bottleneck.
-Is it the transport from Camera to Framegrabber .
-Is it the transfer from Framegrabber to the main memory.
-the Transport from main Memory to GPU memory
-The Algorithm on GPU
-the Transport from GPU memory back to main memory
-the Transport from main memory back to GPU 8) to show it on the Screen 
;-)
How much is the Framerate in live view without GPU. Best use the 
original Software of the Framegrabber.

von the real mccoy (Guest)


Rate this post
useful
not useful
Uwe wrote:
> I think you have to find the bottleneck.
guess if he didn't find it during the 1.5y this thread resides, there is 
no chance he will ever make it.

von FPGA-Professional (Guest)


Rate this post
useful
not useful
As for all designs I came across so far I could state the the data 
transportation from and to the GPU over what so ever is always the 
bottle neck even if one uses pcie.

This is the reason why people commonly use FPGAs for this.

As soon as data interchange becomes more complex than just shift from 
one cell to the next (what GPU-elements could do very well) the heaven 
like power oft the GPU systems gets down to it's knees.

> We use parallel programming techniques like CUDA and it increased the
> speed significantly but we were thinking whether reimplementing this
> algorithm in FPGA would increase the speed.
This is the point, whereby "parallel techniques" does not tell the full 
truth about a design. CUDA and FPGA-DSP techniques differ significantly 
from each other and while stepping from C and DSP based processing from 
parallelly coded CUDA processing is small - transferring algorithms to 
FPGAs will fail if fundamental issues are not understood correctly.

At first glance, the FPGAs will perform lower when only fed with soft 
core like ALUs performing DSP-based processing. Totally new structures 
will have to be invented first to overcome this limits and take 
advantage from the FPGA's real capabilities.

Since the tree of information looks mostly like this:

Design - Software - DSP
                  - CUDA GPU

       - Hardware - FPGA-softcore
       -          - FPGA-calculation chains

it will never be possible transform a design from a software to FPGAs 
efficiently.

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.