Hello, Please help me understand how the compiler does the flowing optimization: I have an array of strings (pointers to char) declared this way: const char* const STR_ARRAY[] = { "first", "second", "third", "fourth", "fifth", ... and so on .. "last" } After compiling, I fond in the .map file the memory zone where the string where saved and I was surprised to see that some optimization was made. .rodata.str1.1 0x01055586 0x1af3 .myStrArray.o 0x2060 (size before relaxing) The actual size of my strings is 0x2061 (all the letters + the null terminator for each string), but only 0x1af3 bytes used.. Thank you, Andrew
Are there duplicate strings, or strings where one is identical to the ending of another?
Thank you for the reply A.K., There are no duplicate strings, but indeed there are strings where one is identical to the ending of another (like: "first" and "the first"). More there are situations where the beginning of one sting is identical to another string (like: "first" and "first string") or the beginnings of two strings are identical (like: "the first" and "the second"). Is this how the optimization is made? Thanks again, Andrew
It was just a guess, but I do know that compilers such as GCC are able to share identical strings. If some string is idential to the trailing part of another, it can be omitted entirely as the corresponding part of the longer one can be used instead. It won't work with shared prefixes though, since they miss the \0 in the proper place.
Since both sizes are shown in the map file, it likely is caused by some optimization done by the linker though.
I spent some time analyzing the memory and I observed that you were right. For two strings, when one is identical to the ending of the other, for example "first\0" and "the first\0" only the longer string, in this case "the first" is stored in memory. I have not investigated if the optimization is made by the linker or the compiler. Thanks, Andrei
(size before relaxing) GCC wrote: > There are no duplicate strings, but indeed there are strings where one > is identical to the ending of another (like: "first" and "the first"). It's calles string merging. The trigger is the right sections flags set by the compiler, which then are avaluated by the linker in order to merge the strings. Look at the generated assembler code. GCC puts the strings into sections with flags like "aMS". "M" allows merging and "S" marks it as a section containing strings.
the size after relaxing should be always smaller than before
pooter wrote: > the size after relaxing should be always smaller than before No, not necessarily. It can be the same size. And depending on what relaxations are performed, code sie can even increase. An example for increasing code size is when a jump / call is out of reach and a jump pad just after the function is added that forwards to the the jump target, and the original jump jumps to the pad.
I just could not depart your site prior to suggesting that I really enjoyed the usual info an individual provide to your visitors? Is gonna be back incessantly in order to investigate cross-check new posts.
Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
Log in with Google account
No account? Register here.