EmbDev.net

Forum: ARM programming with GCC/GNU tools AVR GCC writing a bit from one byte into another


von Alex P. (alex_p)


Rate this post
useful
not useful
Hello community,

Programming my Atmega8 I came across the task of doing the following 
operation:

bit A of byte BUFFER = bit B of byte PORT


I have 2 questions, the first is about efficiency:
I'd like to read from a port with this operation (yess it would be 
better to do this with a PAL or interface chip), and therefore I only 
have 30 or so clock cycles to do my bit operation.
I thought this should be plenty, but it wasn't. Looking at the 
disassembly it became clear why. Is there a nicer or more efficient way 
to do this?

Secondly, I'd like to have a timer interrupt (Here I wrote the dummy 
function timer_start()) which should interrupt the reading after a 
certain period (for example if no more data come in - this is not shown 
in the code for simplicity).
In Java I have this wonderful throw operation for interrupts. Is there a 
possibility to do this here as well, or do I really need to check the 
timer every iteration? What's the most efficient implementation?
In Assembler I imagine I could, within the timer interrupt routine, 
change the return address, so after the timer interrupt the program 
doesn't jump back to my loop but to the code after it. Is this possible 
in C?


I'm using AVR Studio 5 with maximum optimization option for my Atmega8 
with 16MHz. I'd like to read a data but @500kHz -> 32 clock cycles.


I'd be glad for any hints or book suggestions :)
Thank you very much in advance and best regards
Alex

1
// C-CODE
2
void myfunction() {
3
  start_timer(); // Starts a timer that should interrupt the loop
4
  for(uint16_t i=0; i<1000; i++) { // This is time critical -> I don't want to check the timer every loop iteration
5
    if(PORTD & (1<<DATA)) buffer[i/8] |= 1 << (i&0xFF);
6
  }
7
}


1
# DISASSEMBLY
2
    # void myfunction() {
3
    # start_timer(); // Starts a timer that should interrupt the loop
4
    # for(uint16_t i=0; i<1000; i++) { // This is time critical -> I 
5
    # don't want to check the timer every loop iteration
6
0000042C  LDI R24,0x00    Load immediate 
7
0000042D  LDI R25,0x00    Load immediate 
8
    # if(PORTD & (1<<DATA)) buffer[i/8] |= 1 << (i&0xFF);
9
0000042E  LDI R20,0x01    Load immediate 
10
0000042F  LDI R21,0x00    Load immediate 
11
00000430  SBIS 0x12,2    Skip if bit in I/O register set 
12
00000431  RJMP PC+0x0014    Relative jump 
13
    # if(PORTD & (1<<DATA)) buffer[i/8] |= 1 << (i&0xFF);
14
00000432  MOVW R30,R24    Copy register pair 
15
00000433  LSR R31    Logical shift right 
16
00000434  ROR R30    Rotate right through carry 
17
00000435  LSR R31    Logical shift right 
18
00000436  ROR R30    Rotate right through carry 
19
00000437  LSR R31    Logical shift right 
20
00000438  ROR R30    Rotate right through carry 
21
00000439  SUBI R30,0x6B    Subtract immediate 
22
0000043A  SBCI R31,0xFE    Subtract immediate with carry 
23
0000043B  MOVW R18,R20    Copy register pair 
24
0000043C  MOV R0,R24    Copy register 
25
0000043D  RJMP PC+0x0003    Relative jump 
26
0000043E  LSL R18    Logical Shift Left 
27
0000043F  ROL R19    Rotate Left Through Carry 
28
00000440  DEC R0    Decrement 
29
00000441  BRPL PC-0x03    Branch if plus 
30
00000442  LDD R22,Z+0    Load indirect with displacement 
31
00000443  OR R22,R18    Logical OR 
32
00000444  STD Z+0,R22    Store indirect with displacement 
33
    # for(uint16_t i=0; i<1000; i++) { // This is time critical -> I 
34
    # don't want to check the timer every loop iteration
35
00000445  ADIW R24,0x01    Add immediate to word 
36
00000446  LDI R18,0x03    Load immediate 
37
00000447  CPI R24,0xE8    Compare with immediate 
38
00000448  CPC R25,R18    Compare with carry 
39
00000449  BRNE PC-0x19    Branch if not equal 
40
}
41
0000044A  RET     Subroutine return

von Oliver (Guest)


Rate this post
useful
not useful
A variable shift is a very expensive opertation on an AVR. Better use 
constant shifts.
1
  for(uint16_t i=0; i<1000; i++) { // This is time critical -> I don't want to check the timer every loop iteration
2
    switch (PORTD & (1<<DATA)) {
3
    case 1<<0:
4
      buffer[i/8] |= 1<<0;
5
      break;
6
    case 1<<1:
7
      buffer[i/8] |= 1<<1;
8
      break;
9
    case 1<<2:
10
      buffer[i/8] |= 1<<2;
11
      break;
12
    case 1<<3:
13
      buffer[i/8] |= 1<<3;
14
      break;
15
    case 1<<4:
16
      buffer[i/8] |= 1<<4;
17
      break;
18
    case 1<<5:
19
      buffer[i/8] |= 1<<5;
20
      break;
21
    case 1<<6:
22
      buffer[i/8] |= 1<<6;
23
      break;
24
    case 1<<7:
25
      buffer[i/8] |= 1<<7;
26
      break;
27
    }
28
  }

Oliver

von Oliver (Guest)


Rate this post
useful
not useful
>  switch (PORTD & (1<<DATA)) {

should be

>if (PORTD & (1<<DATA))
>  switch (i&0xFF)

but anyhow, you got the idea ;)

Oliver

von Rolf Magnus (Guest)


Rate this post
useful
not useful
> have 30 or so clock cycles to do my bit operation.
> I thought this should be plenty, but it wasn't. Looking at the
> disassembly it became clear why.

The main problem here is that the AVR can only shift by one bit, so if 
you get your shift width from a variable, the compiler has to do this as 
a loop. In addition, since i is of type int, the the shift operation is 
done in 16 bit.

> Is there a possibility to do this here as well, or do I really need to check the
> timer every iteration? What's the most efficient implementation?

I would check it every iteration. Alternatively, you can make two nested 
loops and only check once per iteration of the outer loop if it's ok to 
have the loop run for a few more microseconds before it stops. That 
would also have the advantage that your loop counters could be 8 bit and 
you could lose the division.

> In Assembler I imagine I could, within the timer interrupt routine,
> change the return address, so after the timer interrupt the program
> doesn't jump back to my loop but to the code after it. Is this possible
> in C?

No. There is setjmp/longjmp, but that probably won't work from an ISR. 
Even in assembler, I would consider it a dirty hack.

I'd suggest something like this:
1
for (uint8_t i = 0; i < 125; i++)
2
{
3
    uint8_t bitval = 1;
4
    for (uint8_t j = 0; j < 8; j++)
5
    {
6
        if(PORTD & (1<<DATA))
7
            buffer[i] |= bitval;
8
        bitval <<= 1;
9
    }
10
}

When compiling this with -O3, the code will be quite long due to loop 
unrolling, but since that code does 8 iterations, it should in fact be 
faster than yours.

von Oliver (Guest)


Rate this post
useful
not useful
After reading your posta second time, there are some questions ;)

Fist of all,
>if(PORTD & (1<<DATA))
will not do, what you want.

But:
What exactly do you want to achive?
Is it necessary to do the bit shift during the measurement, or can it be 
done later?
What is DATA? Where does it come from?
What happens, if you have finished all 1000 loop iterations, without 
timer interrupt?

Oliver

von Alex P. (alex_p)


Rate this post
useful
not useful
Awesome! Thanks a lot Rolf and Oliver, that was just what I was looking 
for.

> I would check it every iteration.
What a pitty! It would be awesome to have interrupts that can interrupt 
loops or even functions without much computational effort.

> if(PORTD & (1<<DATA))
I should have explained, DATA is just the bitnumber where in PORTD the 
data comes in. In fact, I had to test the clock as well and everything, 
but I left it out for the sample here because that worked fine.
That's why I wanted the timer interrupt, because what if I'm waiting for 
the clock and the sender doesn't want to send me any more data?
1
while(!(PORTD & (1<<CLOCK)) && (stillTimeLeft)) // I need to add the (stillTimeLeft)-bit to check the timer 
2
       // because my interrupt can't stop this loop otherwise
I would get stuck in that loop then.


But your suggestion with the switch statement looks much better in the 
Disassembler:
1
  # case 5: buffer[i>>3] |= (1<<5); break;
2
00000555  MOVW R30,R24    Copy register pair 
3
00000556  LSR R31    Logical shift right 
4
00000557  ROR R30    Rotate right through carry 
5
00000558  LSR R31    Logical shift right 
6
00000559  ROR R30    Rotate right through carry 
7
0000055A  LSR R31    Logical shift right 
8
0000055B  ROR R30    Rotate right through carry 
9
0000055C  SUBI R30,0x6B    Subtract immediate 
10
0000055D  SBCI R31,0xFE    Subtract immediate with carry 
11
0000055E  LDD R18,Z+0    Load indirect with displacement 
12
0000055F  ORI R18,0x20    Logical OR with immediate 
13
00000560  STD Z+0,R18    Store indirect with displacement 
14
00000561  RJMP PC-0x0012    Relative jump

von Oliver (Guest)


Rate this post
useful
not useful
>> if(PORTD & (1<<DATA))
>I should have explained, DATA is just the bitnumber where in PORTD the
>data comes in.

Well, in PORTD never anything will come in...

Again my question: Why can't you do the bitshifting stuff after the 
measurement loop has finished? This would reduce the required cycles in 
the measurement loop significantly.

Oliver

von Alex P. (alex_p)


Rate this post
useful
not useful
> Well, in PORTD never anything will come in...
Aaah yes I meant PIND of course sorry.

> Why can't you do the bitshifting stuff after the
measurement loop has finished?
That's what I did in the end, but I think the much nicer solution would 
have been to improve efficiency and do it directly instead of just 
"recording" it in realtime and then getting the data out afterwards.

Alex

von Joseph (Guest)


Rate this post
useful
not useful
hi all
I'm using this macro for long time.
1
#define checkbit(byte,bit) byte&(1<<bit)
2
#define bitcopy(dest_byte,dest_bit,src_byte,src_bit) dest_byte=checkbit(src_byte,src_bit)?dest_byte|(1<<src_bit):dest_byte&~(1<<src_bit)

but i didn't test its assembly equal.
i hope helped you.

von gjl (Guest)


Rate this post
useful
not useful
If you really need to quench out the last bit of performance, you can 
use some GCC fu:
 
1
/* In some header */
2
#include <stdint.h>
3
4
/* Same as  1 << (n % 8))  */
5
static __inline__ __attribute__((__always_inline__))
6
uint8_t bitmask_asm (uint8_t n)
7
{
8
    uint8_t mask;
9
    __asm__ ("ldi  %0, 1 << 1  $"
10
             "sbrs %1, 1       $"
11
             "clr  %0          $"
12
             "sbrc %0, 0       $"
13
             "lsl  %0          $"
14
             "sbrc %1, 2       $"
15
             "swap %0"
16
             : "=&d" (mask) : "r" (n));
17
    return mask;
18
}
19
20
/* Same as  1 << (n % 8))  */
21
static __inline__ __attribute__((__always_inline__))
22
uint8_t bitmask (uint8_t n)
23
{
24
    return __builtin_constant_p (n) ? (1 << (n % 8)) : bitmask_asm (n);
25
}
26
27
/* I some module */
28
#include <avr/io.h>
29
30
void set (uint8_t data, uint8_t x)
31
{
32
    extern uint8_t c;
33
34
    PORTB |= bitmask (1+2+3);
35
    PORTB |= bitmask (data);
36
    PORTB &= ~bitmask (1+2+3);
37
38
    if (PORTB & bitmask (data))
39
        c |= bitmask (x);
40
}
 
Compiling this with an optimizing avr-gcc yields, here with version 4.7:
 
1
set:
2
  sbi 0x18,6   ;  11  *sbi  [length = 1]
3
  in r25,0x18   ;  13  movqi_insn/4  [length = 1]
4
  mov r18,r24   ;  51  movqi_insn/1  [length = 1]
5
/* #APP */
6
  ldi  r24, 1 << 1
7
  sbrs r18, 1
8
  clr  r24
9
  sbrc r24, 0
10
  lsl  r24
11
  sbrc r18, 2
12
  swap r24
13
/* #NOAPP */
14
  or r25,r24   ;  16  iorqi3/1  [length = 1]
15
  out 0x18,r25   ;  18  movqi_insn/3  [length = 1]
16
  cbi 0x18,6   ;  23  *cbi  [length = 1]
17
  in r25,0x18   ;  25  movqi_insn/4  [length = 1]
18
  and r25,r24   ;  28  andqi3/1  [length = 1]
19
  breq .L1   ;  30  branch  [length = 1]
20
/* #APP */
21
  ldi  r25, 1 << 1
22
  sbrs r22, 1
23
  clr  r25
24
  sbrc r25, 0
25
  lsl  r25
26
  sbrc r22, 2
27
  swap r25
28
/* #NOAPP */
29
  lds r24,c   ;  34  movqi_insn/4  [length = 2]
30
  or r24,r25   ;  35  iorqi3/1  [length = 1]
31
  sts c,r24   ;  36  movqi_insn/3  [length = 2]
32
.L1:
33
  ret   ;  54  return  [length = 1]
 
In the 1st and 3rd call of bitmask the argument is known at compile 
time and the compiler can fold the expressions to SBI resp. CBI.

If the argument to bitmask is not a compile time constant, then the 
optimized asm sequence will be used.  That sequence takes 7 ticks, and 
you need some additional ticks for IN and OUT.

Notice that in the latter case the port change is not atomic.

Moreover, the asm sequence is only expanded once for data and then 
reused in the remainder.  That's the reason why the asm should not be 
volatile:  The asm is const (like a function can be const) and thus has 
no side effects and can be reused.

von Hanno (Guest)


Rate this post
useful
not useful
For the sake of completeness:

The AVR instruction set provides the BLD and BST operations which allow 
easy transfer of single bits from one register to another, both at 
arbitrary bit positions. They use the T-bit in the SREG which is 
otherwise not used by gcc.

Using inline assembler a single bit can be transferred between two 8-bit 
variables with two instructions in two cycles as simple as:
1
asm volatile (
2
  "bst  %[src],  %[srcbit] \r\n"
3
  "bld  %[dest], %[destbit] \r\n"
4
  : [dest] "+r" (dest)
5
  : [src] "r" (src),
6
    [srcbit] "n" (2),
7
    [destbit] "n" (7)
8
);

Using BLD/BST, no shifts/rotates are needed, and no AND/OR operation 
either. Only the designated bit in dest is affected.

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.