I have just spent the last few days and weeks trying to track down a very annoying, yet somewhat subtle bug in a program that I have been making for work. So, first a little background. We are making a gateway that talks over radio to other, remote lamps in order to control them and get status back. One thing that it must be able to do is to change the frequency of a remote lamp. In order to do this, you must first change your local frequency(if it has to be changed) and then tell the lamp the new frequency to go to, and ensure that you can still talk to the lamp once that has happened. This is all controlled by a web page, so the sequence looks something like this:
Web Request --> (http) --> Gateway --> (serial, 115200) --> Radio --> (RF) --> Lamp Radio
However, we were having a problem with this sequence. Almost all the time, the responses that we were getting back from the radio were short by a few bytes. Sometimes it would be 1, other times it would be 15 bytes that we missed. Moreover, the bytes that were dropped changed position. Generally it wouldn’t really matter, since there were a bunch of unused bytes that were all initialized to 0x00, but we couldn’t tell where in that sequence the bytes were getting dropped. Now let’s go into some pseudo-code here. This is what the update looked like:
void update( Map<String,Object> JSONData ){ if( JSONData[ "attr1" ] ) writeToDatabase( JSONData[ "attr1" ] ) // ..repeat the above a few times if( JSONData[ "frequency" ] ){ //the frequency needs to be updated new ThreadToUpdateTheFrequency().start(); } if( JSONData[ "attr4" ] ) writeToDatabase( JSONData[ "attr4" ] ) //repeat the above a few more times
There’s not anything wrong with this code. Sure, it’s probably inefficient, but that’s somewhat besides the point. This code works sometimes, however a large portion of the time it will randomly drop bytes in the new thread that is started if the frequency is changed.
To debug this, I tried everything that I could think of. Timing the sequence between us writing out data to the radio and receiving it back, writing data out at random times, etc. Many hours were spent on this in order to figure out what was going on. One of the things that we noticed though was that this only happened at certain times – when we were actually writing to the database. If we were writing out commands without a write to the database, everything worked properly. Moreover, if we stepped through the process in the debugger, everything would work properly.
What? Why would the database cause a problem?
Well, let’s back up a bit and talk about the hardware that this is running on. Linux is using an SD card as the root filesystem, and the processor is an Atmel SAMA5D3. This is not a particularly fast processor, in addition to being only single-core. What finally fixed the problem is the following:
void update( Map<String,Object> JSONData ){ if( JSONData[ "attr1" ] ) writeToDatabase( JSONData[ "attr1" ] ) // ..repeat the above a few times for ALL ATTRIBUTES if( JSONData[ "frequency" ] ){ //the frequency needs to be updated new ThreadToUpdateTheFrequency().start(); }
Why would moving the new thread to the end of the updates fix anything? Well, here’s the basic theory that I have come up with:
SQLite is writing out to the database. When it does that, it attempts to lock the database(see the documentation for some more details). This takes quite a bit of time, especially talking to the SD card which is running at just a few megabytes per second at the most. At some point during this time, a new interrupt comes in. However, the kernel does not process the interrupt fast enough before a new character comes in on the serial line, and thus some data will get dropped since the processor is assuming that the data has been fully read.
Oddly enough, dmesg does not show many problems; I only get a few input overruns. It looks like there could be a kernel bug here compounded by how fast the SD card can be read from.
Leave a Reply