Hi all. I have question for all. I test LPC2146 (72MHz) with this function if variable are DWORD a,b float m b = a*(1-m.a.a); and it's for me 7,5us time. is ARM9 faster??? from ST or NXP . regards

Sevc Dominik wrote: > Hi all. > I have question for all. > I test LPC2146 (72MHz) with this function > if variable are > DWORD a,b > float m > > b = a*(1-m.a.a); > > and it's for me 7,5us time. > is ARM9 faster??? from ST or NXP . > > regards On ARM7 you don't have any Hardware floating point units. SO every floating point issue is done in asm. This is not really good for speed point. You could look into fixed point multiplication that give you a better speed. I have found a class that encapsulate this. could not remember where. But if you google it, you should find it. I'm pretty sure you can find this info on that board. As for ARM9, you should get a better speed as the cpu is faster. hope this help Jonathan

thanks Jonathan. I need calculate some value . m = 530/(72000000*72000000) so m is very small number and I need calculate this DWORD a,b float m b = a*(1-m.a.a); have any ARM9 only faster core (to 96MHz) or have Hardware floating point units. like STR9 or other. but I need MCU fith internal flash. regards

```
Sevc Dominik wrote:
> or any help how to make good formula to calculate my variable????
Try to describe the complete calculation. How is "b" used in the
following code?
```

Martin Thomas wrote: > Sevc Dominik wrote: >> or any help how to make good formula to calculate my variable???? > > Try to describe the complete calculation. How is "b" used in the > following code? I need calculate delay for steper motor. this formula is approximation of acceleration/decceleration . b is T0MR0 , a is value delay from first step or previos step. m is multiplicator (m = accel/(Fosc*Fosc) if formula is a= a*(1-m*a*a) then acceleration if si 1+m... then decceleration and it's work fine. but I want generate more step per sec. I need reduce this formula or calculate more efective. regards sorru for my eanglish

Not many off-the-shelf ARM microcontrollers (as opposed to FPGA soft core IP) include an FPU. Choices I am aware of are: NXP LPC3xxx (220MHz ARM9 with VFP coprocessor) Freescale i.MX31 (600MHz ARM11 with VFP) The LPC3xxx runs at 220MHz and FP on the hardware is about 5 times faster than software floating point operations at the same clock rate. Plus the ARM9 has cache memory, but you could then expect it to be at least 15 times faster that your LPC2xxx part. The VFP supports SIMD instructions, so it si possible to achieve even faster throughput for some times of calculation, but you would have to code in assembler, I have not come across an ARM compiler yet that can perform vectorisation for VFP operations. Note that if you are going to use teh floating point math library, you will need to rebuild teh library from source to utilise the VFP. And even then you will ahve to re-implement the sqrt()/fsqrt() functions to use teh SQRT/FSQRT instruction instead of a software algorithm. All that said, it is a lot of expense to go to just for a delay calculation. A sledgehammer to crack a nut in fact! Plenty of motor controllers work with far less. First you might ask yourself how fast the calculation actually needs to be to guarantee deadlines. If 7.5us is fast enough why worry. Fast enough is fast enough after all. A very fast alternative solution (at the expense of memory) would be to pre-calculate delay values and place them into a look-up table. If the range of values is large, then it may be sufficient to use a sparse table with linear interpolation between values. If you do m = 530/(72000000*72000000) you will have trouble with fixed point arithmetic, and I would suggest trouble with precision in floating point - it is too small a value to be represented by a float and a double would be even slower. However there is a lot of redundancy there, how much precision do you need. Consider this: 530 * 10000 /(72*72) = 1022 530/(72000000*72000000) = 1.022e-13 Note that the significant figures are the same, but the scale is different. The trick is to work in different units so you can use integer arithmetic and scaled integers. Now if you make your scaling a power of two (so instead of say 10000 use 8192, it will be even faster because the compiler will replace the / and * operations with bit-shift operations (or you can do it explicitly in code if you wish, but the compiler will do it. Using a fixed point library or class may cause you problems in this situation, it is often best to deal with the calculation directly so that you can use different scaling values for different parts of the calculation to maximise precision and avoid overflow. I usually start with decimal scaling because it is easier to see if the answer is what you expected and where you might have missed a scaling, and then switch the scale factors to binary afterwards. I also generally model the calculations in a spreadsheet first to ensure it is correct since that is easier than target debugging. You can calculate a full range of values all at once and graph them. If you do that, use separate cells for partial calculations to ensure that no sub-expression overflows will be hidden. Then I would prototype the code using a desktop compiler such as VC++ to make sure the C implementation matches the spreadsheet model, and because debugging is so much easier. You can get the prototype to output comma separated data and import that directly into a spreadsheet and graph it to compare with the model. My recommendation however is if you can do it with a lookup table, then do so. It will be lightening fast, even with interpolation. Also it may be easier to look at the whole thing differently. You can simply calculate the desired position over time and issue the appropriate number of steps to get to that position from the current position. That is far simpler than calculating individual step periods. Like this: t = current time p = current position d = desired position at time t error = p - d if error < 0 direction = reverse issue |error| steps If the loop time is fast enough error will always be small. Timing is far less critical because you issue the pulses at the fastest possible speed to catch up to where you should be at that time. This method works for constant velocity and acceleration. If you use a hardware timer to do precisely timed periodic updates (perhaps using an RTOS thread), then it becomes even simpler because delta-t is a constant. Clifford

Hi Clifford. 1. I want use MCU with internal flash (LOW price , easy construction of board ...) LPC3000 series are with external flash. 2. I need calculate acceleration/decelarieon and genrate pulse to output with linear or circle interpolation (it's about 7us), it's all time for me about 14us and my max step speed is 1/14us , it's 71428 step per sec. its good but problem is so I need free time to comunicate trought serial port. 3. look-up table for full sample is not real . if want use some max speed and use some accelration it's about 2.000.000 sample . I too think to create look-up table only for about 10.000 sample and use linear interpolation to calculate value, but I don't know if it take more time like this 7,5us. 4. I think so good way be to using a fixed point library or class. value m (accelration in step / (osc * osc)) are like constant and calculate only one . fool formula dtime=dtime*(1-m*dtime*dtime) calculate for every next step. look at this http://www.hwml.com/LeibRamp.htm , there is [20] I'm use it. hmm can any help to use fixed point library. in previous post dumarjo give some code but hou to use it. I try somethig in this week end. regards

Sevc Dominik wrote: > http://www.hwml.com/LeibRamp.htm , there is [20] I'm use it. Hmmm... I realise that my response was long and suggested a number of solutions, and none of them in detail, but I am pretty sure I would not do it that way. I'd go with my last suggestion - the "where should I be now?" approach. It is far less time critical and the errors will be minute. Errors will grow if CPU time is taken up in other processing, but your performance can now be defined in terms of the maximum error your application can sustain rather than time deadlines, which is probably more closely related to the application. A fixed periodic update will make it more deterministic, but at the expense of greater average error. To expand on my earlier proposal, where v is velocity, a is acceleration, dt is the change in time since the last update, p is position, and np the 'next' position (pseudocode): np = p + v * dt + (adtdt) / 2 v = v + a * dt direction = np < p ? -1 : 1 error = abs(np - p ) if( error > MAX_ERROR ) { // handle exception } else { for( i = 0; i < abs(np - p); i++ ) { step( direction ) } } Because this does not depend on the huge value of Fosc, you are far less likley to end up with overflow issues when using fixed point arithmetic. Even if using floating point you are less likely to end up with loss of precision or range errors. Clifford

Hi . I try this: IOSET1 = (1<<5); // output to osciloskop np = p + v * dt + (adtdt) / 2; v = v + a * dt; direction = np < p ? -1 : 1; error = abs(np - p ); IOCLR1 = (1<<5); // output to osciloskop ant it's : 280ns I meditate about your solution Cifford. regards.

Sevc Dominik wrote: > Hi . > I try this: > > IOSET1 = (1<<5); // output to osciloskop > np = p + v * dt + (adtdt) / 2; > v = v + a * dt; > direction = np < p ? -1 : 1; > error = abs(np - p ); > IOCLR1 = (1<<5); // output to osciloskop > > ant it's : 280ns > I meditate about your solution Cifford. > > regards. 20 clock cycles! I am not sure if I believe that especially if the variables were floating point! Some of the calculations will have been optimised out if you did nothing with the results. Pass all the variables to a printf() call after the scope output. That will prevent them from being optimised away. Do not use the volatile keyword since that will prevent other valid optimisations. Your timing mechanism is flawed. On many microcontrollers GPIO operations require wait states. This probably says more about GPIO performance than calculation performance; and for that, 280ns I can believe - it is of the same magnitude as your 1/14th us step pulse capability. You need to time repeated calculations so that the I/O timing becomes insignificant. Note my 'solution' was not intended to be directly translated to code. It was merely the algorithm and equations. You will need to translate them to machine math. You might use floating point or you could translate it to fixed point (scaled integer) math. With fixed point math there are additional operations to perform the scaling. If you use a library, it will be far slower that optimising each calculation yourself in one expression, since you can determine when scaling is necessary. For example, if you use four decimal place fixed point to calculate: a * b / c a library implementation would result in: multiply: temp = (a * b) / 1000 divide: (temp * 1000) / c whereas you could optimise the expression as simply a * b / c because the scale factors cancel out, moreover for small values of a and b, the library solution will give incorrect results, because the first step will significantly loose precision (or even result in zero!). Of course you'd not use 1000, but probably 1024 for the reasons I mentioned earlier. Personally I would not trust a fixed point library solution, it will be less efficient, and many I have come across on the Internet are seriously flawed in any case - do it by hand. Note an omission in my suggestion, np must be copied to p: error = abs(np - p ) p = np Clifford