[STM32] - part 4 - CPU goes brrrrr

This is the final part of a series of articles. I’d suggest going through part 1, part 2 and part 3 first.

I’ve been working on a project (which you can find here), using Bluepill board with a STM32F103C8T6 microcontroller. It’s a USB keyboard project. It uses the libopencm3 project and I’d consider this a pretty minimalistic, baremetal project. It’s great for analyzing what happens when a ARM Cortex-M processor boots.

Let’s analyze the linker script and the final binary of that project to understand more about the boot up of a STM32.

A quick primer on the compilation process:

compiler goes through all of the source files, generating target architecture instructions and saves those in object files (using the ELF format)
object files hold symbols and each symbol has been assigned a memory section
linker grabs all those object files and, according to the linker script’s rules, merges the sections and places them in the proper memory segments
linker also checks if all the necessary symbols are present (resolving symbols)

You’re still with me?

Let me read that one more time…

Don’t worry, we’ll take a look at the examples of compilation and linking phase.

Compilation of a source file would look like so (make syntax):

CFLAGS := -mcpu=cortex-m3 -mthumb
CFLAGS += -Wall -Wextra -Werror -Wno-char-subscripts -Wno-unused-but-set-variable
CFLAGS += -DSTM32F1 -DDISCOVERY_STLINK $(INCLUDE_PATHS)
CFLAGS += -std=gnu11
CFLAGS += -O3 -g3

build:
	arm-none-eabi-gcc $(CFLAGS) -c main.c -o main.o

Compilation step is done per source file. As long as you’re not using the extern symbols, the only dependency here is the header files you include in a source file. That’s why we need to add the include paths, with $(INCLUDE_PATHS).

Let’s assume all the source files have been compiled and make stored the list of the object files in the OBJ variable.

I guess, for this simple example, adding:

    OBJ = main.o

in the Makefile, would do the job, right?

Exactly. Now for the linking stage:

LDFLAGS :=
LDFLAGS := --specs=nano.specs
#LDFLAGS += --specs=nosys.specs
# libraries
LDFLAGS += -lopencm3_stm32f1
LDFLAGS += -L../libopencm3/lib
# Compiler flags on the linking stage.
LDFLAGS += -mthumb -mcpu=cortex-m3
LDFLAGS += -nostartfiles
LDFLAGS += -lc  # use libc
# Linker flags
# Stack grows downwards from the end of RAM (0x2000_0000 + 0x5000).
# The RAM size = 20480 B. In hex that's 0x5000.
# This symbol is also defined in the cortex-m-generic.ld.
LDFLAGS += -Wl,--defsym,_stack=0x20005000
LDFLAGS += -Wl,-T,memory.ld
LDFLAGS += -Wl,-Map=mapfile
LDFLAGS += -Wl,-gc-sections
LDFLAGS += -Wl,--print-memory-usage
LDFLAGS += -O3

link:
	arm-none-eabi-gcc -o app.elf $(OBJ) $(LDFLAGS)

You’ve probably noticed that we’re calling the same program arm-none-eabi-gcc for linking as we did for compilation.

Oh yeah, I totally noticed that!

That’s because the compiler calls the linker by itself. Flags preceded with -Wl are actually the linker flags. This time the input is the list of all of the object files - linker’s job is to put all those together into a single binary.

We’re outputting an ELF file here which isn’t something you can use to flash your microcontroller. It’s just a convenient format for storing instructions, data and debug information. You can very easily convert it into either HEX or BIN format (using objcopy) - both of which you can flash your microcontroller with.

I won’t explain each of the flags used here. The libopencm3 is linked as a static library (archive of ELF files - nothing special really). The interesting part is the -T, memory.ld and the --defsym,_stack=0x20005000. The first one specifies the linker script and the second one creates a symbol. Most of the symbols come from the actual source files. Something like where to put the stack is very much target specific, since different targets have different memory layouts, so it makes sense to add it here. We’re already very focused on which processor are we targeting at the linking stage.

My linker script is divided into two files, where the first one includes the second one. The first one defines the available memory. This’ll differ between different microcontrollers.

memory.ld

/* Define memory regions. */
MEMORY
{
  /* FLASH memory 0x0800 0000 : 0801 FFFF (128K) */
  rom (rx)   : ORIGIN = 0x08000000, LENGTH = 128K
  /* SRAM memory (20K = 0x5000) */
  ram (rwx) : ORIGIN = 0x20000000, LENGTH = 20K
}

INCLUDE cortex-m-generic.ld

The second one (cortex-m-generic.ld) instructs linker which sections (collections of symbols), go where:

cortex-m-generic.ld

/*
 * This file gets included in the memory.ld.
 */

/*
 * Force symbol to be entered in the output file as an undefined symbol.
 * Hope linker will find it in one of the compilation units.
 */
EXTERN(vector_table)

/* Define the entry point of the output file. */
ENTRY(reset_handler)

/*
 * The SECTIONS command tells the linker how to map input sections of the object
 * files, which are also ELF format, into output sections of the final ELF file
 * and how to place the output sections in memory.
 */
SECTIONS
{
  /*
   * The sections of an object file can printed with: arm-none-eabi-objdump -h
   *
   *   . is a location counter
   *   * is a wildcard
   *   *(.text) means all '.text' input sections in all input files
   */
  .text : {
    *(.vectors)         /* All .vectors sections from all files */
    *(.text*)           /* All the .textANYTHING sections from all files */
    . = ALIGN(4);
    *(.rodata*)         /* All the .rodataANYTHING sections from all files */
    . = ALIGN(4);
  } > rom               /* This section goes into ROM memory */

  /*
   * C++ Static constructors/destructors, also used for __attribute__
   * ((constructor)) and the likes
   */
  .preinit_array : {
    . = ALIGN(4);
    __preinit_array_start = .;
    KEEP (*(.preinit_array))        /* Keep the symbols even if they are not referenced */
    __preinit_array_end = .;
  } > rom

  .init_array : {
    . = ALIGN(4);
    __init_array_start = .;
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array))
    __init_array_end = .;
  } > rom

  .fini_array : {
    . = ALIGN(4);
    __fini_array_start = .;
    KEEP (*(.fini_array))
    KEEP (*(SORT(.fini_array.*)))
    __fini_array_end = .;
  } > rom

  /*
   * Another section used by C++ stuff, appears when using newlib with
   * 64bit (long long) printf support
   */
  .ARM.extab : {
    *(.ARM.extab*)
  } > rom

  /*
   * Index table for C++ exceptions unwinding.
   */
  .ARM.exidx : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
  } > rom

  . = ALIGN(4);
  _etext = .;

  /*
   * .data - Initialized global, static objects.
   * i.e.: static int a = 4;
   */
  .data : {
    _data = .;
    *(.data*)  /* Read-write initialized data */
    . = ALIGN(4);
    _edata = .;
  } > ram AT > rom
  /*
   * So the '> ram AT > rom'...
   * The VMA (Virtual Memory Address) for the .data section
   * is in the ram and the LMA (Load Memory Address) is in
   * the rom. This is the data that will be copied from rom
   * to ram as one of the first steps of the boot process.
   */
  /* Create a symbol which is referenced in the lib/cm3/vector.c */
  _data_loadaddr = LOADADDR(.data);

  /*
   * .bss - Uninitialized global, static objects.
   * i.e.: static int a;
   */
  .bss : {
    *(.bss*)  /* Read-write zero initialized data */
    *(COMMON)
    . = ALIGN(4);
    _ebss = .;
  } > ram

  /*
   * The .eh_frame section appears to be used for C++ exception
   * handling. You may need to fix this if you're using C++.
   */
  /DISCARD/ : { *(.eh_frame) }

  . = ALIGN(4);
  end = .;
}

/*
 * Define a symbol only if it is referenced and is not defined by any
 * object included in the link - basically a fallback definition.
 */
PROVIDE(_stack = ORIGIN(ram) + LENGTH(ram));

Ok, this looks like gibberish.

Linker script’s syntax isn’t that easy to read. I’ve tried to comment this linker script as much as I could. The main pattern is:

.final_section : {
  made out of these things;
} > goes into this memory space

Whenever you see names without a dot in front of them, with an assignment operator = it’s most probably a new symbol, e.g.: just_a_name = .; or just_a_name = LOADADDR(.data);. That means you can do this in your code and it will point to a valid part of the memory:

extern uint8_t just_a_name[];

Ok lets see if all of this makes sense. Let’s analyze the binary of the project I mentioned at the beginning of this article.

readelf --segments src/pawusb.elf

Here is the output of the command which prints the memory segments of the elf file:

Elf file type is EXEC (Executable file)
Entry point 0x8002b15
There are 3 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x014070 0x08004070 0x08004070 0x00008 0x00008 R   0x4
  LOAD           0x010000 0x08000000 0x08000000 0x04078 0x04078 R E 0x10000
  LOAD           0x020000 0x20000000 0x08004078 0x00028 0x00264 RW  0x10000

 Section to Segment mapping:
  Segment Sections...
   00     .ARM.exidx
   01     .text .ARM.exidx
   02     .data .bss

Notice the VirtAddr and the PhysAddr columns. That’s the VMA (Virtual Memory Address) and the LMA (Load Memory Address). For the first two segments the VMA and LMA are the same - they start with 0x08 which, if you read the second part you already know, points to the FLASH memory. If you look at the Section to Segment mapping output you’ll see that those two segments hold .ARM.exidx and .text sections. First section being a C++ specific index table for exceptions unwinding and the second one being the actual instructions. Since the microcontroller executes the instructions from the FLASH memory the address starting with 0x08 makes sense.

The last segment is different. The VirtAddr points to the 0x2 address space. Sections that it includes? .data (variables with known data) and .bss (variables without values). Those two sections hold the data. 0x2 address space points to SRAM.

The process of uploading the binary to the processor only writes into the FLASH memory.

But we need the .data and the .bss in the SRAM memory! We’ve been betrayed!

Our MCU has to copy the data from PhysAddr into VirtAddr whenever it boots. The actual binary has to have instructions for that. The copy this amount of data from a fixed address in the FLASH memory into a fixed address in the RAM memory instructions have been implemented for us in the libopencm3/lib/cm3/vector.c. Part of that file is:

void __attribute__ ((weak)) reset_handler(void)
{
	volatile unsigned *src, *dest;
	funcp_t *fp;

	for (src = &_data_loadaddr, dest = &_data;
		dest < &_edata;
		src++, dest++) {
		*dest = *src;
	}

	while (dest < &_ebss) {
		*dest++ = 0;
	}

  // More stuff...
	...

}

Notice that the _data_loadaddr, _data, _edata, and _ebss symbols aren’t defined in this file. All of them pop up in the linker script we discussed earlier. Those are the fixed address I just mentioned. _data points to the FLASH memory and the _data_loadaddr points to SRAM. The dest doesn’t change between the for and the while loop. That’s because the .bss section’s data gets copied just after the .data section. Nothing happens automagically!

If you want to know more about the EFL’s sections you can run:

readelf --sections src/pawusb.elf

That will print all the sections. You’ll see some extra sections like those starting with .debug. Those are the debug symbols used by the debugger.

There are 22 section headers, starting at offset 0x714a8:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08000000 010000 004070 00  AX  0   0  8
  [ 2] .preinit_array    PREINIT_ARRAY   08004070 020028 000000 04  WA  0   0  1
  [ 3] .init_array       INIT_ARRAY      08004070 020028 000000 04  WA  0   0  1
  [ 4] .fini_array       FINI_ARRAY      08004070 020028 000000 04  WA  0   0  1
  [ 5] .ARM.exidx        ARM_EXIDX       08004070 014070 000008 00  AL  1   0  4
  [ 6] .data             PROGBITS        20000000 020000 000028 00  WA  0   0  4
  [ 7] .bss              NOBITS          20000028 020028 00023c 00  WA  0   0  4
  [ 8] .debug_info       PROGBITS        00000000 020028 00fae5 00      0   0  1
  [ 9] .debug_abbrev     PROGBITS        00000000 02fb0d 002b85 00      0   0  1
  [10] .debug_loc        PROGBITS        00000000 032692 00751f 00      0   0  1
  [11] .debug_aranges    PROGBITS        00000000 039bb1 000848 00      0   0  1
  [12] .debug_ranges     PROGBITS        00000000 03a3f9 000eb8 00      0   0  1
  [13] .debug_macro      PROGBITS        00000000 03b2b1 008db5 00      0   0  1
  [14] .debug_line       PROGBITS        00000000 044066 008976 00      0   0  1
  [15] .debug_str        PROGBITS        00000000 04c9dc 01fdd3 01  MS  0   0  1
  [16] .comment          PROGBITS        00000000 06c7af 00004c 01  MS  0   0  1
  [17] .ARM.attributes   ARM_ATTRIBUTES  00000000 06c7fb 00002b 00      0   0  1
  [18] .debug_frame      PROGBITS        00000000 06c828 0017fc 00      0   0  4
  [19] .symtab           SYMTAB          00000000 06e024 002170 10     20 299  4
  [20] .strtab           STRTAB          00000000 070194 00122a 00      0   0  1
  [21] .shstrtab         STRTAB          00000000 0713be 0000ea 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  y (purecode), p (processor specific)

If you want to know where each symbols lands in those sections, you can check the mapfile, which the linker can generate if you use this linker flag LDFLAGS += -Wl,-Map=mapfile.

Earlier I’ve mentioned that the ARM Cortex-M processors expect a vector table to be present in the beginning of the executable memory (that’s FLASH in the most common boot mode). Looking at the mapfile I’ve found this part:

.text           0x0000000008000000   0x4070
 *(.vectors)
 .vectors       0x0000000008000000   0x150 ../libopencm3/lib/libopencm3_stm32f1.a(vector.o)
                0x0000000008000000                vector_table
 *(.text*)
 .text          0x0000000008000150   0x268 main.o
                0x00000000080001f0                _putchar
                0x00000000080001f4                str_len

A .text section starts with all the data specified to live in the .vectors section. Looking at the libopencm3/lib/cm3/vector.c one can find this structure:

__attribute__ ((section(".vectors")))
vector_table_t vector_table = {
	.initial_sp_value = &_stack,
	.reset = reset_handler,
	.nmi = nmi_handler,
	.hard_fault = hard_fault_handler,
	.sv_call = sv_call_handler,
	.pend_sv = pend_sv_handler,
	.systick = sys_tick_handler,
	.irq = {
		IRQ_HANDLERS
	}
}

First value references the _stack symbol. This one has been defined both through a command line parameter --defsym,_stack=0x20005000 and in the linker script PROVIDE(_stack = ORIGIN(ram) + LENGTH(ram));. The second field points to the reset_handler function. We just looked at that function. That’s the one that copies the data from FLASH to SRAM.

The same function calls the pre_main and finally the main function, which enters your application code.

At this point the microcontroller goes brrrrrrr…

It has been a few years since part 3 of this series. That’s a yikes on my part. Consider this part to be the final one.

Nice, now I know everything there is to know!

Sure, but if you want to learn more, I suggest reading this article:

Adding code to an existing ELF file

STM32 Explained part 3