Monday, July 12, 2021

Defensive coding for long-term installations

There are special challenges when writing code that must run for long periods. Contexts include embedded applications, gallery installations, information kiosks, and lighting & signage displays.

When we test code it is, perforce, for short durations. Yet we must ensure that our application remains robust over the long term. This short article will consider one issue: counting. I will use particulars germane to Arduino and Processing, these being two common environments used by creative coders.


First I will briefly mention pernicious bug sources, including memory leaks in the application framework itself, operating system bugs, etc. There is little one can do to code safely in an environment that itself is not safe. Limiting the number of times that resources are loaded can help. Providing for a complete reset is also a good option.

Managing an expected reset (or indeed an unexpected power failure) is not always easy. A gallery or museum might power down all devices at the end of the day and restart again in the morning. The boot process must be completely transparent, since the operator cannot be expected to know anything about your software or hardware. Commercial operating systems and off-the-shelf computer hardware don't necessarily accommodate headless booting, etc... at least not without effort.

But let me leave aside those concerns and assume that we have an application that must run for an arbitrary period. We don't know how long, but it will be much longer than we can reasonably test for. I will address one, simple coding issue: counting. This will provide an example of a general design principle.

Problems with counting

Long duration errors can easily creep into code when counting. Many applications need to change state. It is common to define a variable that will increment for each pass through the main logic of the application. Typically, this counter is checked against a modulo expression, in order to trigger different sections of code.

The problem with this design pattern is that the counters always grow, and will exceed the bounds allocated to them, given enough time.

Consider first the Arduino environment, which offers several possible integer variable types. The simple int stores a 16-bit (2-byte) value on Arduino Uno and similar boards. One complication is that on an Arduino Due (and similar), an int instead stores a 32-bit (4-byte) value. That inconsistency can be avoided by using a long to store a 32-bit value, no matter the hardware.

Note that these containers must include positive and negative values (plus zero), so the positive range is roughly half the total. For this reason we also have available the unsigned long, which devotes the entire 4-byte extent to positive values.

In Processing an int is already 32 bits, while a long is 64 bits. This certainly limits any overflow problems.

The table below summarises.

variable typebit lengthmax value
unsigned long32 positive only4,294,967,295

Some sanity check are now in order.

If you are counting frames at 60 fps, you will exceed the range of a 16-bit variable in a little over 9 minutes. The total count after one day will be 5,184,000. A 32-bit variable is hence good for 414 days.

What if you are counting milliseconds? You get through 86,400,000 in a day, so a 32-bit variable is good for just under 25 days.

Arduino users can move up to the unsigned long, but this only delays the inevitable overflow. Processing users can use long, which is so huge that it might indeed be a solution.

A couple special things for Processing coders to note. The pre-defined variable frameCount is an int variable. The function millis() returns an int value. Both must be used with caution.

The better way

But there is a better solution that completely avoids worrying about which maximum values apply in the current context. Don't assume the best; assume the worst. Code as though you will always encounter this problem. This means adopting a robust design pattern.

A simple solution is to reset the counter after some arbitrary value (hopefully a value of use to your application, but it doesn't need to be). This puts an upper limit on the value of the variable. This might require changing the logic of your code. Perhaps there is a natural cycle to your application that can utilise this reset repetition.

If you do need to track a very large number, without resetting, it's possible to split this into a high byte and a low byte (or 2 high bytes and 2 low bytes, or 4 and 4) and manage the internal arithmetic yourself, using variables with bit lengths you can guarantee will be available on your patform.

So, yes, the problem is simple and the answer is (usually) simple. But it's a good example of a defensive coding practice that avoids problems before they happen. Your code will be more robust when translated from one environment to another, when changing hardware, etc.


No comments:

Post a Comment