Figure 1: An Example of Kernel Service Calls
We can see that the SamplerTask is running, but it does not clear the watchdog timer in the last execution of the Task, and therefore allows a watchdog reset to occur. So, why didn’t SamplerTask reset the watchdog timer? Let’s enable Kernel Service calls in figure 1 to see what the task was doing.
Detecting Priority Inversion Using Tracealyzer
Tracealyzer is a profiling tool that provides engineers with visibility into the sequence of events occurring within their software designs. This example from Percepio illustrates how to detect a Priority Inversion with Tracealyzer, and is about an engineer who had an issue with a randomly occurring reset. By placing a breakpoint in the reset exception handler, they figured out that it was the watchdog timer that had expired. The watchdog timer was supposed to be reset in a high priority task that executed periodically.
The ability to insert custom Tracealyzer User Events allowed the engineer to gain greater visibility. ‘User Events’ are similar to a classic “printf()” call and in this example were added when the watchdog timer was reset, and when it expired. User Events also support data arguments, used to log the timer value (just before resetting it) to see the watchdog “margin”, i.e., remaining time. The result can be seen below, in the yellow text labels of figure 2.
Figure 2: An Example of User Events
Figure 3: Switching between ServerTask and ControlTask
The last event of SamplerTask is a call to xQueueSend, an OPENRTOS® function that puts a message in a message queue. Note that the label is red, meaning that the xQueueSend call blocked the task, which caused a contextswitch to ServerTask before the watchdog timer had been reset, which caused it to expire and reset the system.
Fixing Priority Inversion by Tuning Task Priority Levels
A solution could be to change the scheduling priorities, so that ControlTask gets higher priority than ServerTask. Figure 5 shows the result of switching the task scheduling priorities between ServerTask and ControlTask. The system now shows a much more stable behavior. The CPU load of SamplerTask (here red) is quite steady around 20%, indicating a stable periodic behavior, and the watchdog margin is a perfect “line”, always at 10 ms. It does not expire anymore – problem solved! (Note that the task colours have changed due to the change in relative priority levels.)
So why was xQueueSend blocking the task? By doubleclicking on this Event Label, we open the Object History View, showing all operations in this particular queue, “ControlQueue”, as illustrated in figure 3. The far right column shows a visualization of the buffered messages. We can see that the message queue already contains five messages and is probably full, hence the blocking. But the ControlTask is supposed to read the queue and make room, why hasn’t this worked as expected? To investigate this, it would be interesting to see how the watchdog margin varies over time. We have this information in User Event Logging, and by using the User Event Signal Plot, we can plot the watchdog margin over time. By adding a CPU Load Graph on the same timeline, we can see how the task execution affects the watchdog margin, as shown in figure 5.
Figure 4: Object History View
Figure 5: User Event Signal Plot
In the CPU Load Graph, we can see that the ServerTask is executing a lot in the second half of the trace, and this seems to impact the watchdog margin. ServerTask (bright green) has higher priority than ControlTask (dark green), so when it is executing a lot in the end of the trace, we see that ControlTask is getting less CPU time. Most likely, this could cause the full message queue, since ControlTask might not be able to read messages fast enough when the higher priority ServerTask is using most of the CPU time. This is an example of a Priority Inversion problem, as the SamplerTask is blocked by an unrelated task of lower priority.
Fixing Priority Inversion by Using a Gatekeeper
To avoid Priority Inversion when accessing system resources we recommend using a ‘Gatekeeper’ Task. A basic Gatekeeper Task, figure 6, consisting of a Task that controls the ‘resource’, a queue for receiving data / command, and a call back function.
Application Tasks write data /commands to the queue instead of directly accessing the resource. The Gatekeeper Task processes the data /commands and the resource is updated accordingly. When the resource changes state an ISR is triggered (say it’s a new network message) which is placed in the Queue, the Gatekeeper Task will then execute the registered call-back function to pass back the new data to the Application Task.
This method prevents Priority Inversion when accessing system resources and is our recommended solution.
Figure 6: A Gatekeeper Task