EmbDev.net

Forum: ARM programming with GCC/GNU tools coding optimization help !


von Jonathan D. (dumarjo)


Rate this post
useful
not useful
Hi all,

I'm a teacher that supervise some student project at college level. For
near the first time, the student work with real CPU, an ARM7 :)

I told them to use a library for fingerprint, that called FVS
(http://fvs.sourceforge.net/)

This is something written in C / C++, that do fingerprint reconisation.
So far, we have tested this on PC base system and the lib seem to work
well.

So far the code compile on the ARM7 (SAM7S) parts.

The probleme is:

- The code is slow in execution, This is due to big loops that compute
floating point operation.

- all the analyze for the bitmap is done in byte operation.

I would like to know if it is possible to find some documentation that
can help them to optimize this source code. So far, we can analyze a
fingerprints iin 12-15 second. It not too bad, but if we can reduce the
time under 10 second i'll be happy.

Thanx for your help

Jonathan

von Simon E. (fordp)


Rate this post
useful
not useful
Jonathan Dumaresq wrote:
> Hi all,
>
> I'm a teacher that supervise some student project at college level. For
> near the first time, the student work with real CPU, an ARM7 :)
>
> I told them to use a library for fingerprint, that called FVS
> (http://fvs.sourceforge.net/)
>
> This is something written in C / C++, that do fingerprint reconisation.
> So far, we have tested this on PC base system and the lib seem to work
> well.
>
> So far the code compile on the ARM7 (SAM7S) parts.
>
> The probleme is:
>
> - The code is slow in execution, This is due to big loops that compute
> floating point operation.
>
> - all the analyze for the bitmap is done in byte operation.
>
> I would like to know if it is possible to find some documentation that
> can help them to optimize this source code. So far, we can analyze a
> fingerprints iin 12-15 second. It not too bad, but if we can reduce the
> time under 10 second i'll be happy.
>
> Thanx for your help
>
> Jonathan

Can the maths not be switched to fixed point calculations ???

http://en.wikipedia.org/wiki/Fixed_point_(mathematics)

The ARM7 will do fixed point 2 orders of magnitude quicker.

Consider Cortex M3 too, as that has an integer hardware divide so if
divide is a big issue the Cortex M3 chips will go MUCH faster than ARM7.

The Luminary Micro dev boards look great for Academic hacking too.

von Simon E. (fordp)


Rate this post
useful
not useful
Sorry that wikipedia link was wrong ! Sorry.

Try this http://members.aol.com/form1/fixed.htm.

Or just google "fixed point maths".

Cheers.

von Jonathan D. (dumarjo)


Rate this post
useful
not useful
Simon Ellwood wrote:
> Sorry that wikipedia link was wrong ! Sorry.
>
> Try this http://members.aol.com/form1/fixed.htm.
>
> Or just google "fixed point maths".
>
> Cheers.

Hi,

thanx for the info.

The project is already in progress with an arm7, so the switch to the M3
is not really an option.

I will have a look at this fixed math thing.

regards

Jonathan

von Clifford S. (clifford)


Rate this post
useful
not useful
Jonathan Dumaresq wrote:

> - The code is slow in execution, This is due to big loops that compute
> floating point operation.
>
You need to consider whether what you are expecting is realistic. Most
ARM parts (certainly ARM7 parts) have no floating point hardware.
Consider that on an ARM9 with a VFP floating point unit, software
floating point is about 5 times slower than hardware floating point
operations (without using vectorisation optimisations, since there is
not yet a compiler that will do that for the VFP unit). Further consider
that your ARM7 is intrinsically slower that an ARM9 due to instruction
set differences and lack of cache, and the fact that it is probably
running from Flash. It is also likely slower because it will be running
at sub-100MHz as opposed to the 1 to 2Ghz or more of the PC
implementation. The consequence is that I would expect a 60Mhz ARM7
running floating point intensive code to be 100 to 200 times slower than
the PC implementation


> - all the analyze for the bitmap is done in byte operation.
>
> I would like to know if it is possible to find some documentation that
> can help them to optimize this source code. So far, we can analyze a
> fingerprints iin 12-15 second. It not too bad, but if we can reduce the
> time under 10 second i'll be happy.

Converting the floating point operations to fixed point should be
sufficient to achieve your goal. But it is not that simple, you have to
consider range, precision and data width. Greater range means less
precision or greater data width, greater data width means more memory to
be moved and possibly more instructions (if you use 64bit types on a 32
bit processor for example).

Most available fixed point libraries available are sub-optimal but may
be good enough for your application. Converting floating point C code to
fixed point is not trivial since simple arithmetic operations like * / +
- must be replaced with functions or macros, and make the consequent
code less easy to read. This is an ideal use for C++ since it supports
operator overloading.

The current issue of Dr. Dobb's has an article on just this subject with
a case study on an almost identical problem (porting a PC implementation
to an ARM device). It used C++ but if you had the time I guess you could
implement it as a  C library, but it would be far less elegant.

Of course another approach would be to come up with your own less
compute intensive algorithm. Often it is sufficient for such systems to
be less than 100% accurate (by allowing a proportion of false
positives). For example a building access system might require both a
matching fingerprint and a swipe card or PIN number. By using fewer
features in the source data and thereby being less accurate, the
matching can be speeded up.


> I'm a teacher that supervise some student project at college level. For
> near the first time, the student work with real CPU, an ARM7 :)

It somewhat concerns me that someone apparently teaching embedded
systems was not already aware of fixed-point arithmetic, or the issues
regarding floating point code on typical embedded micro-controllers
without an FPU.

Clifford

von Clifford S. (clifford)


Rate this post
useful
not useful
Clifford Slocombe wrote:
> The current issue of Dr. Dobb's has an article on just this subject with
> a case study on an almost identical problem (porting a PC implementation
> to an ARM device). It used C++ but if you had the time I guess you could
> implement it as a  C library, but it would be far less elegant.

Sorry, omitted the link: http://www.ddj.com/cpp/207000448

von Jonathan D. (dumarjo)


Rate this post
useful
not useful
Clifford Slocombe wrote:
> Clifford Slocombe wrote:
>> The current issue of Dr. Dobb's has an article on just this subject with
>> a case study on an almost identical problem (porting a PC implementation
>> to an ARM device). It used C++ but if you had the time I guess you could
>> implement it as a  C library, but it would be far less elegant.
>
> Sorry, omitted the link: http://www.ddj.com/cpp/207000448

Hi Clifford,

I have checked some fixed algorythme, and with the source code we had,
it will be difficult to replace it. Many cos/tan/atan2/sin ar used.

The speed is not really our principal goal here, we try to get the code
working for the proof of concept.

Your explanation are exactly what i have talked with my student.

Regards

jonathan

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.