Tuesday, April 19, 2016

ATMega323 port

This post will talk about the portable part of FreeRTOS. As the Windows port is not exactly a port (the FreeRTOS actually runs on top of another operating system, which incurs in a few anomalies when the was made), I'll examine the Atmega323 for WinAVR port files, located under FreeRTOS\Source\portable\GCC\ATMega323. First of all, a quick review on the source file structure.

FreeRTOS Source Files Organization

When you download and extract the zip file containing the FreeRTOS [1], you'll see two directories: FreeRTOS-Plus and FreeRTOS. The first one comprises the FreeRTOS+ ecosystem [2] and the second, the FreeRTOS source code itself (I'll focus on that). Under FreeRTOS directory, there are 3 folders: License (only has the license itself), Demo (all the different official ports and demo applications available) and Source (the real time kernel source). The core RTOS kernel, contained under the Source folder, is comprised of basically three files: tasks.c, queue.c and list.c. The others (croutine.c for co-routine implementation, timers.c for software timers and event_groups.c) are optional.

Each ported processor will require some specific code (mainly the files port.c and portmacro.h), which, for the official ports, is located under FreeRTOS/Source/Portable/[compiler]/[architecture]. For instance, the ATMega323 port we'll check later can be found in FreeRTOS\Source\portable\GCC\ATMega323, as AVR-GCC (WinAVR [3]) is the compiler and ATMega323 is the architecture. The Windows port files we've been using can be found under FreeRTOS\Source\portable\MSVC-MingW.

As the memory management (heap) routines are also needed, the samples we discussed earlier are also provided in the portable layer/folder structure under FreeRTOS\Source\portable\MemMang. There you'll find the 5 sample heap implementations, but you can also write your own and place it there.

Under the Demo folder is where you'll find the specific demo applications source code, along with the common demo implementation code for several functionalities (queues, semaphores, timers, etc) I talked a few times about (here and here). Each official demo application has its own folder, named to indicate the port to which they relate, under FreeRTOS/Demo/ (this is where the FreeRTOSConfig.h is located, for example) as well as an official webpage [4]. Under FreeRTOS/Demo/Common/Minimal/, you'll find the basic implementation for the functionalities that is shared between several applications. The Windows port uses the file under FreeRTOS/Demo/Common/Full/, but that should be avoided, as those are deprecated.
The basic structure of the Windows port demo application can be seen below:

FreeRTOS/
 +- Demo/
 |   +- Common/Minimal/
 |   +- WIN32-MingW/
 +- Source/
     +- *.c
     +- include/
     +- portable/
         +- MemMang/
         +- MSVC-MingW/


Creating a new application

To create a new application from an existing port, the quickest path is to use a Demo application and modify it to fit your needs. Compile and run the standard demo project and, when it runs as expected, you can add and/or remove whatever you may want.

Official Porting Guide

Here, I'll give you a little resume on the official porting guide provided by FreeRTOS [5]. Later on, we'll check the ATMega323 port files.

So the first thing you should do is to familiarize yourself with the source files organization (done that!) and create a folder under FreeRTOS/Source/portable/[compiler]/[processor]. Copy an empty port.c and portmacro.h there (you can use complete files from other ports, just remember clean all functions and macro bodies and only leave the stubs). Create a directory for the demo project for the new port under FreeRTOS/Demo/[architecture_compiler] and add a copy of FreeRTOSConfig.h and main.c (remember to only leave the stubs and modify some of the options from the config file that are hardware dependent, such as the tick rate, heap size, etc - check my other post about that). Now create a new folder under [architecture_compiler] called ParTest and copy some version of ParTest.c with just the stubs inside. This file will have a few LED tests that should run when your port is working fine: setup a few GPIOs to be used as LED outputs, turn on, off or toggle specific LEDs (remember what I said in some earlier post? blinking LEDs is the hello world of the embedded applications).

Now that everything is in place, you can create a project (makefile) that will successfully compile (not run, as there are a lot of stubs to implement) everything:
  • The basic kernel: Source/tasks.c, queue.c and list.c
  • The portable bits: Source/portable/[compiler]/[processor]/port.c
  • Memory management: Source/MemMang/heap_?.c (choose one of them)
  • Application specifics: Demo/[architecture_compiler]/main.c and ParTest/ParTest.c
And the hard part: implementing the stubs. The official guide suggests to start from the pxPortInitializeStack(), as it's very architecture dependent.

ATMega323 port.c

The windows port is not really a conventional port, since the FreeRTOS runs over another operating system that is not real time, and there surely might be a few things to learn by examining it, but as not to waste any time, I'll examine the Atmega323 port for the WinAVR (AVR-GCC) compiler, since it'll be much closer to our goal of porting to an Atmega2560. The port files can be found under FreeRTOS\Source\portable\GCC\ATMega323/ and the demo project under FreeRTOS\Demo\AVR_ATMega323_WinAVR/. You can make an Eclipse project with the files (it's not ready, as was the Windows port project), but as this is for studies purposes, there is no need. I should just warn you of one thing:


Starting with the pxPortInitialiseStack() in port.c, as the official guide says. This function is responsible for initializing the stack of a task as if it has already been there, so that the context change runs as smoothly as possible. This means some data must be stored in a certain order so that it can be retrieved in the right order later.

StackType_t *pxPortInitialiseStack( 
                    StackType_t *pxTopOfStack, 
                    TaskFunction_t pxCode, 
                    void *pvParameters )

First the types:

  • StackType_t, is defined in portmacro.h as a portSTACK_TYPE, which is defined as a uint8_t. This represents the type of the data that will be stored in the stack (in this case, 1 byte-wide).
  • TaskFunction_t, is the return type that has to be used by the tasks. It's defined, in projdefs.h (Source/include/projdefs.h) as a pointer to a void-type function.
The parameters are: a pointer to the current top of the stack, a pointer to the start of the tasks code and a pointer to a set of parameters. This set of parameters is the same you define when creating the task with xTaskCreate(). 

Now to the code, you'll see the first thing it does is to insert a few values to the start of the stack (0x11, 0x22 and 0x33). I haven't actually found much about it, but since this demo was written a long time ago, I presume it was used before any stack overflow detection mechanism was implemented (a few other Demo projects I checked don't have this). Ok, so after each value is added to the stack, the pxTopOfStack pointer is updated to the next position.

Next, the address to the start of the code (pxCode) is added to the stack (as the address is 16-bit wide, it has to be added in two steps, LSB first). Next, the 32 CPU registers (ATMega323 has 32 general purpose registers, where the last 6 are actually 3 16-bit registers called X, Y and Z [6] page 11) are stored with a few singularities: the global interrupt flag is inserted just after register 0 (see ATMega323's SREG) and the address to the parameters is placed just before the X register (also in two steps, as the pxCode address). In the end, the new pxTopOfStack is returned.

Now let's check the context saving and restoring. Context saving means to save all internal registers of the microcontroller in order to, when a task goes back to running mode, it seems like nothing is changed. For instance, if the task stops in the middle of some calculations, a few values will be stored in the registers and those should be saved so that the calculations can go on. Context restoring means to get all the stored values and put them back to the correspondent register. Those functions (portSAVE_CONTEXT() and portRESTORE_CONTEXT()) are written in assembly and are responsible for saving the registers I described earlier. These two and the pxPortInitialiseStack() must be absolutely synchronized, otherwise the value of one register can end up in another and the tasks will most likely fail.

portSAVE_CONTEXT() uses the following assembly instructions (you can check ATMega323 datasheet page 233 [6] for that):
  • push r?: pushes the value in a register into the stack (the stack pointer is already in the right position)
  • in r?,<REG>: loads the value of an I/O Space Register (<REG>) to a Rx register
    • ps.: addresses 0x3D and 0x3E correspond to the Stack Pointer register addresses
  • cli: disables the global interrupts
  • clr r?: clears the register
  • lds r?,<variable>: load direct from RAM (the value in the <variable> is stored in r?)
  • st x+,r?: store indirect and post increment (the value in the address pointed by the X register is updated with the value in the r? register, then the value of X is incremented)
portRESTORE_CONTEXT():
  • ld r?,x+: that's the opposite from "st x+,r?", so the value addressed by the X register is stored in r?, then the value of X is incremented
  • out <REG>,r?: the opposite of "in r?,<REG>" (this time, the Stack Pointer registers are referenced by its names __SP_L__ and __SP_H__)
  • pop r?: pops a value from the stack to the register.
Ok, but when are these functions used? One time is when the task yields manually (i.e. calls vPortYield(), that, if you follow the defines, is the implementation of taskYIELD()). When the task is manually yielded, it has its context saved (a call to portSAVE_CONTEXT()), then the context is changed to the next task to run (you can check the vTaskSwitchContext() in tasks.c) and then the context of the new task is restored (a call to portSAVE_CONTEXT()). Another place those functions are used is when a Tick occurs and another task must take place (vPortYieldFromTick(): the process is the same I described, with the difference that the Tick count is incremented).

Another port function is configuring the timer to generate the tick (prvSetupTimerInterrupt()). This is very architecture specific, since the internal registers should be configured such as an event occurs at the Tick Rate. For ATMega323, the Timer 1 is used in Output Compare mode, the counter is reset after an interrupt and the prescaler is set to 64. In the end, the interrupt is enabled (it will actually only be enabled when the global interrupt flag is set). The interrupt routine is also defined in port.c and has two options: when the scheduler is preemptive (the scheduler stops the current task to other take place), vPortYieldFromTick() is called; when the scheduler is set to cooperative (each task has to yield itself), only the Tick Count is incremented.

Last, but not least, the xPortStartScheduler() function, as the name says, will start the scheduler. First, it calls the prvSetupTimerInterrupt() described above, then, restores the context (portRESTORE_CONTEXT()), which will configure the microcontroller to run the task pointed by pxCurrentTCB. In the end, there is a asm call to "RET", which is a subroutine return and all it does is put the next data in the Stack in the Program Counter (PC), which, if you remember from the pxPortInitialiseStack(), was the pxCode parameter, which is the address where the task code starts (it is added before any register is added to the stack, as it would happen when the Tick Interrupt happens - the point where the code stopped is saved to the stack, then the context is saved, starting from register r0).

As a final observation in this file, I just talk a little about the attributes signal and naked, that we see in some function definitions (at least I didn't know they even exist), such as interruptions. These are directives to the compiler change the way it builds its output. The signal attribute ensures that the compiler inserts code that will save and restore every register that has been used in the interruption code and that the return will be done by a "RETI" instruction instead of the original "RET", as the first will re-enable the interruptions on exit [7]. The naked attribute ensures that the compiler won't add any code to the start and end of the interrupt function, which means nothing will be saved nor restored and this will all be user's responsibility. This is specially useful when the Tick interrupt with preemptive scheduler occurs, since the context switch is done there. Without the attribute, a few registers would be saved to the stack, then the current context saved (causing the registers to be saved twice in the stack), the task context is changed, the new task's context is restored and finally the code produced by the compiler would restore the registers saved, losing the tasks context. Also, when naked is used, the return method ("RETI") must be explicitly declared.

ATMega323 portmacro.h

The portmacro.h contains a few definitions of variable types that are highly architecture dependent such as the portLONG, portSHORT, portSTACK_TYPE, portBASE_TYPE, etc. A few macros are also defined here, such as those for critical code management (you need to disable interruptions): portENTER_CRITICAL() first saves the Status Register, then disables the global interruption flag (even if it was already disabled) and pushes the saved Status Register to the stack; the portEXIT_CRITICAL() pops the saved Status Register from the stack and restores it (there is no need to re-enable the global interruption flag, since it is part of the Status Register). The portSTACK_GROWTH is defined according to the datasheet [6], page 22, where it says
"The Stack Pointer is decremented by one when data is pushed onto the Stack with the PUSH instruction (...)"
thus the portSTACK_GROWTH is defined as -1.



[1] http://www.freertos.org/a00104.html
[2] http://www.freertos.org/FreeRTOS-Plus/
[3] http://winavr.sourceforge.net/
[4] http://www.freertos.org/a00090.html
[5] http://www.freertos.org/FreeRTOS-porting-guide.html
[6] http://www.atmel.com/Images/doc1457.pdf
[7] http://www.freertos.org/implementation/a00012.html

Thursday, April 7, 2016

Memory Management - Heap? Stack Overflow?

Memory Management

Every new task created will require some RAM memory to be allocated by the RTOS kernel (both for the task control and for the task's stack) in a heap (basically an area in the memory known by kernel where everything is stored). The same happens to every queue, mutex, semaphore, etc. When some of those are deleted, the memory should (must?) be freed. In a common OS, with no time restrictions, this is performed by using malloc() and free(). The problem is that these commands are usually non deterministic (the time spent can vary), may take a lot of unnecessary code space, are not thread safe and can just not be available in the system. To solve these problems, the memory management algorithm is allowed to be written by the user, which means this is located rather than in the core code of the FreeRTOS, but in the portable layer. Instead of calling malloc() and free(), the kernel will call pvPortMalloc() and pvPortFree().


Nevertheless, FreeRTOS includes 5 sample implementations of the memory management unit that can be used. These are called "heap_*" and each has its own source file located in Source/Portable/MemMang (the Full Demo only uses the heap_5.c). Other implementations may be added and used, but one of those samples must be included in your project (the RTOS kernel will use it, even though the application uses another one). The 5 sample implementations are describe below.
  • heap_1.c: this is the simplest one and doesn't permit memory to be freed. This is used a lot, as lots of applications creates and initializes all the tasks, queues, semaphores, etc, at system boot and never delete them. The implementation simply divides an array as RAM is requested (the size of this "array", which is actually the size of the heap, is defined in FreeRTOSConfig.h as configTOTAL_HEAP_SIZE). The API function xPortGetFreeHeapSize() may be used to optimize this value. [1]
  • heap_2.c: this is similar to heap_1, but allow freeing memory. To keep it simple, there is no de-fragmentation algorithm, which means that adjacent free blocks aren't combined in one larger block (this is done in heap_4, it's called coalescence algorithm). This implementation should be used if the amount of memory freed and "malloced" are always the same, i.e., the tasks's stacks are always the same size or the amount of queue storage is always the same, otherwise, the memory may become fragmented into small blocks and future allocations may fail. configTOTAL_HEAP_SIZE controls the size of the heap. [2]
  • heap_3.c: just implements a wrapper so that the compiler library's free() and malloc() implementation are used. It just makes those functions thread safe. Beware that this implementation is not deterministic and will probably increase the kernel code size. configTOTAL_HEAP_SIZE has no effect. [3]
  • heap_4.c: works similar to heap_2, but implements a coalescence algorithm (combines adjacent free memory blocks into one larger block). This is quite useful when the application needs to allocate memory by itself (by the use of pvPortMalloc() and pvPortFree()), instead of using an API for that. If necessary, the heap can be located in a specific location (address) by setting the option configAPPLICATION_ALLOCATED_HEAP in FreeRTOSConfig.h and explicitly declaring the ucHeap[ configTOTAL_HEAP_SIZE  ] array (more details on [4]).
  • heap_5.c: implements all that heap_4 does, plus allows heap to be formed by several different sized blocks in RAM, instead of one contiguous huge block (i.e., one block starts at address A, with size X, the next starts at address B, with size Y, and so on). The heap is initialized by calling vPortDefineHeapRegions(), which takes as parameter a structure that describes it. Nothing can use the heap before this functions is executed. As Full Demo uses the heap_5, it has to call the function (you can check under prvInitialiseHeap() in main.c). More details can be seen in the official webpage [5].
A few tricks can be used when talking about the heap management. In systems where there is an external, slower, RAM, using two implementations at once allows task stacks and other RTOS objects to be placed in the fast internal RAM, while application data can be placed in the external one.

Stack Usage - Stack Overflow

The task stack is managed by the task itself, which has to be kept within the size determined at creation time (when xTaskCreate() is called). Sometimes, the task tries to use more memory than is available, which causes a popular problem, a common reason for the system's instability: a Stack Overflow.
To ease the debugging of this problem, FreeRTOS offers two mechanisms to check for stack overflow and call a function (hook) if that happens. The configCHECK_FOR_STACK_OVERFLOW, in FreeRTOSConfig.h, chooses between the two methods and, if one of the methods is chosen, a hook function must be provided (vApplicationStackOverflowHook()).

  • Method 1 (configCHECK_FOR_STACK_OVERFLOW set to 1): the stack pointer is checked whenever the task is swept out of the running state. If the pointer is out of boundaries, the hook is called.
  • Method 1 and 2 (configCHECK_FOR_STACK_OVERFLOW set to 2): the task's stack is filled with known values when the task is first created. Whenever the task is swept out of the running state, the last 16 bytes are check and, if any of those are not as predicted, the hook is called. This method can only be used in conjunction to the method 1.


[1] http://www.freertos.org/a00111.html#heap_1
[2] http://www.freertos.org/a00111.html#heap_2
[3] http://www.freertos.org/a00111.html#heap_3
[4] http://www.freertos.org/a00111.html#heap_4
[5] http://www.freertos.org/a00111.html#heap_5