EmbDev.net

Forum: ARM programming with GCC/GNU tools Issue while porting Multicore in GCC Boot


von Monika T. (Company: ST Microelectronics) (monikat)


Rate this post
useful
not useful
Hello,

I am working on bare metal ARM cortex-A53 using GCC compiler 4.8.2-3 , 
I am facing an issue for Multicore boot in GCC . Multicore boot and test 
are working fine when I use DS-5 with almost same boot code. For gcc 
,single core is OK but issue is with multicore.

By some debugging I could figure out that my stack/heap were overlapping 
, so to avoid such issue and to make sure that heap for 4 cores are @ 
different memory locations I explicitly provided different address for 
heaps of 4 cores.
Following is the linker code  for that:

MEMORY
  {
        ram (rw) :     ORIGIN = 0,     LENGTH = 0x14000000
  uncachedram (rwx) :   ORIGIN = 0x14000000,   LENGTH = 0x10000000
  cachedram1 (rwx) :   ORIGIN = 0x15000000,   LENGTH = 0x10000000
  cachedram2 (rwx) :   ORIGIN = 0x16000000,   LENGTH = 0x2A000000
  ddr (rwx)  :            ORIGIN = 0x40000000,   LENGTH = 0xEAFFFFFF
  }

/* Highest address of the user mode stack */
_estack = 0x16000000;    /* end of RAM */

_Min_Heap_Size0 = 0x100000;      /* required amount of heap  */
_Min_Stack_Size0 = 0x10000; /* required amount of stack */

_Min_Heap_Size1 = 0x100000;      /* required amount of heap  */
_Min_Stack_Size1 = 0x10000; /* required amount of stack */

_Min_Heap_Size2 = 0x100000;      /* required amount of heap  */
_Min_Stack_Size2 = 0x10000; /* required amount of stack */

_Min_Heap_Size3 = 0x100000;      /* required amount of heap  */
_Min_Stack_Size3 = 0x10000; /* required amount of stack */

/* User_heap_stack section, used to check that there is enough RAM left 
*/
  ._user_heap_stack :
  {
    . = ALIGN(4);
    PROVIDE ( end0 = . );
    PROVIDE ( _end_ = . );
    . = . + _Min_Heap_Size0;
    PROVIDE (_max_heap0 = . );
    . = . + _Min_Stack_Size0;
    . = ALIGN(4);


    PROVIDE (end1 = . );
     . = . + _Min_Heap_Size1;
    PROVIDE (_max_heap1 = . );
    . = . + _Min_Stack_Size1;
    . = ALIGN(4);


    PROVIDE (end2 = . );
     . = . + _Min_Heap_Size2;
    PROVIDE (_max_heap2 = . );
    . = . + _Min_Stack_Size2;
    . = ALIGN(4);

    PROVIDE (end3 = . );
     . = . + _Min_Heap_Size3;
    PROVIDE (_max_heap3 = . );
    . = . + _Min_Stack_Size3;
    . = ALIGN(4);
  } >ddr

The sbrk implementation is as follows:

caddr_t _sbrk_r ( struct _reent *r, int incr )
{
   unsigned int id;
   id = get_cpu_id();
   if(id == 0)
   {
   extern   int end0;
   extern  void* _max_heap0;
   static void *heap_end0;
   void *prev_heap_end0;

   if (heap_end0 == NULL)
        heap_end0 = (void *)&end0 ;


     prev_heap_end0 = heap_end0;

   if (heap_end0 + incr > &_max_heap0)
   {
        r->_errno = ENOMEM;


        return (caddr_t) -1;
    }

    heap_end0 += incr;

    return (caddr_t) prev_heap_end0;
   }
   else if(id==1)
   {
   extern   int end1;
   extern  void* _max_heap1;
   static void *heap_end1;
   void *prev_heap_end1;

   if (heap_end1 == NULL)
        heap_end1 = (void *)&end1 ;


     prev_heap_end1 = heap_end1;

   if (heap_end1 + incr > &_max_heap1)
   {
        r->_errno = ENOMEM;


        return (caddr_t) -1;
    }

    heap_end1 += incr;

    return (caddr_t) prev_heap_end1;
   }
   else if(id == 2)
   {
   extern   int end2;
   extern  void* _max_heap2;
   static void *heap_end2;
   void *prev_heap_end2;

   if (heap_end2 == NULL)
        heap_end2 = (void *)&end2 ;


     prev_heap_end2 = heap_end2;

   if (heap_end2 + incr > &_max_heap2)
   {
        r->_errno = ENOMEM;


        return (caddr_t) -1;
    }

    heap_end2 += incr;

    return (caddr_t) prev_heap_end2;
   }
   else
   {
   extern   int end3;
   extern  void* _max_heap3;
   static void *heap_end3;
   void *prev_heap_end3;

   if (heap_end3 == NULL)
        heap_end3 = (void *)&end3 ;


     prev_heap_end3 = heap_end3;

   if (heap_end3 + incr > &_max_heap3)
   {
        r->_errno = ENOMEM;


        return (caddr_t) -1;
    }

    heap_end3 += incr;

    return (caddr_t) prev_heap_end3;
   }


}

I run test which does malloc for buffer sizes 256,512,1024 and 8192 and 
also printed the address along with cupid , following is the outcome for 
GCC and Ds-5 compiler



GCC  Heap_Base  Address returned from malloc for buffer size = 256 
Address returned from malloc for buffer size = 512  Address returned 
from malloc for buffer size = 1024  Address returned from malloc for 
buffer size = 8192
Cpu0  40000000       0x40000518  0x40110210  NO Result test HANG  NO 
Result test HANG
Cpu1  40110000       0x40000930  0x40110008  NO Result test HANG  NO 
Result test HANG
Cpu2  40220000       0x40000828  0x40110820  0x40110e30  NO Result test 
HANG
Cpu3  40330000  0x40000410  0x40000620  0x40000a38  0x40330008



DS-5  Heap_Base  Address returned from malloc for buffer size = 256 
Address returned from malloc for buffer size = 512  Address returned 
from malloc for buffer size = 1024  Address returned from malloc for 
buffer size = 8192
Cpu0  40000000       0x40000018  0x40021220  0x40042628  0x40063e30
Cpu1  43C00000  0x43c00018  0x43c21220  0x43c42628  0x43c63e30
Cpu2  47800000  0x47800018  0x47821220  0x47842628  0x47863e30
Cpu3  4B400000  0x4b400018  0x4b421220  0x4b442628  0x4b463e30



So by looking at the results above this seems that the malloc in case of 
GCC allocates a big chunk of memory and then on any further request 
allocates from same already reserved chunk of memory whereas
In case of DS-5 this is not case , This could be due to following reason
1.  In case of Ds-5 the stack heap setup is done at boot time before 
calling main test  using __user_initial_stackheap (in my case I 
overridden it using __user_setup_stackheap where I reserved heap /stack 
for all 4 cores @ different memory location).
Following is the section of code for this :
__user_setup_stackheap
        0x00001840:    e1a0400e    .@..    MOV      r4,lr
        0x00001844:    ee100fb0    ....    MRC      p15,#0x0,r0,c0,c0,#5
        0x00001848:    e2000003    ....    AND      r0,r0,#3
        0x0000184c:    e3a0150f    ....    MOV      r1,#0x3c00000
        0x00001850:    e0030190    ....    MUL      r3,r0,r1
        0x00001854:    e59f0018    ....    LDR      r0,[pc,#24] ; 
[0x1874] = 0x40000000
        0x00001858:    e0800003    ....    ADD      r0,r0,r3
        0x0000185c:    e59fd014    ....    LDR      sp,[pc,#20] ; 
[0x1878] = 0x43c00000
        0x00001860:    e08dd003    ....    ADD      sp,sp,r3
        0x00001864:    e59f1010    ....    LDR      r1,[pc,#16] ; 
[0x187c] = 0x43b00000
        0x00001868:    e0833001    .0..    ADD      r3,r3,r1
        0x0000186c:    e1a02003    . ..    MOV      r2,r3
        0x00001870:    e12fff14    ../.    BX       r4



2.  In case of GCC when I call malloc it in turn call sbrk where I 
define the heap section for each cores , but if the memory chunk is 
already available then sbrk is not called and memory is allocated from 
the available chunk that’s why in above table for GCC case the allocated 
address is not range of the respective cpus e.g. compiler first reserves 
a chunk of 12 K (0x3000)memory from 0x40000000 (which is cpu0 heap) and 
then starts allocating from this memory and once this is exhausted again 
call sbrk and then reserves memory from the heap which cpu it is .
3.  I tried to trace the sbrk call for 4 cores and got following :
  Heap_base  incr  Heap_base  incr  Heap_base  incr
Cpu0  0x40000000      0x418  0x40000418      0xbe8  0x40001000  0x3000
Cpu1  0x40110000     0x3000
Cpu2  0x40220000    0x3000  0x40223000  0x1000
Cpu3

This means that if the call goes to sbrk then the cpu’s get correct 
address in their reserved range of heap , but if it does not go to sbrk 
then the address allocated might be from any other cores range

Based on above findings , following are my queries:

1.  Is there any way in case of GCC compiler also through which in boot 
itself I can tell compiler about the heap for all cores like I was doing 
in ds-5 case using “__user_setup_stackheap”
2.  What is the flow for GCC for initialization sequence like one shown 
in above diagram for ds-5
3.  Is there any compiler option through which this can be achieved
4.  Am I  missing something in case of GCC implementation ?

Please provide your feedback on the above queries and please correct me 
if you see some gap in my findings .
Thanks in Advance for your help.

Thanks and Regards,
Monika

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.