How in 2022 can we still be subjected to time formatting and storage bugs? You know, the same thing that the Y2K Problem/Millennium Bug was.
Sadly, in software, memory can be short.
On the 1st of January 2022, owners of various Honda and Acura car models built between 2004 to 2012 found that their in-car navigation systems didn’t think it was the first day of 2022.
Instead, the navigation systems decided it was the 1st of January 2002 and to advance the clock an hour ahead.
Trying to adjust the date and time to what it should be, didn’t work for affected vehicle owners.
These embedded systems weren’t the only ones deciding to clock out at the start of the New Year.
Microsoft Exchange (for on-premises users) also decided to throw in a similar headache for IT admins after a spam filter hadn’t been coded in a way to handle the new date.
Both are examples of time formatting and storage bugs.
But how do these bugs happen, and why are they a problem for software teams, especially those working in embedded software? Let’s find out.
In this blog post:
What is a time formatting and storage bug, and how do they happen?
Software bugs of the time formatting and storage variety result from a software calculation going awry as the time calculated in software tries to roll over to the calculated time and can’t. It happens as a form of integer overflow. If there aren’t measures at a software level to handle this, software can behave in unintended ways.
When a failed time rollover happens, often it’s because an integer you’re trying to store exceeds the size of memory you’ve allocated to store that data in. And so that calculation overflows.
As software carries out various arithmetic calculations to return values like time and date, the integer returned cannot be bigger than the space allocated to it in memory.
Think of memory like an incredibly fickle tea mug
In our example, the mug can hold the tea equivalent of an unsigned 8-bit integer. How much integer tea can it hold? The maximum is 255.
Fill it up past its limit, say try to pour in the tea equivalent of a 256 integer and—
Unlike a real mug and tea, there’s much less tea left in the mug.
That data overflow in the software causes an unintended rollover, rolling the stored data over to zero. But in software, the mess left by that zero can be more than just the incorrect time and date or tea everywhere. Teatime could be cancelled.
This means that the more memory you allocate to something, the bigger the integer you can store then the less likely you are to experience overflow.
But also, if your software has steps to tackle expected overflow at a code level because you specified how the system should behave in this situation and your testers tested for it? It becomes less of a risk—and this is important in embedded systems where memory is at a premium versus a desktop computer. You can’t always throw more memory at the problem when it comes to embedded.
So, what if your tea mug is an in-car navigation system?
Honda, Acura, and the GPS week number rollover
To understand what happened with those in-car navigation systems at the start of 2022, we need to look at how GPS using embedded systems handle time.
For years, embedded systems that use GPS (Global Positioning System) have been coded to allow for a quirk of GPS.
Why? Because of the GPS epoch. It’s a particular starting point used to calculate time and date, and it starts from the 6th of January 1980. The week is stored as a 10-bit integer, which means it only has a range of 0-1023. If this rollover issue hasn’t been accounted for in software as the software counts the weeks, strange things can happen.
Strange things like an in-car navigation system deciding it’s gone back in time.
Why suspect this is what’s happened to older Honda and Acura in-car navigation systems? One affected car owner went into the nav system’s diagnostic menu and saw that the system’s GPS date displayed “The 19th of May, 2002”:
“Going into the ‘hidden’ diagnostic menu on Sunday, I discovered the GPS date was now ‘The 19th of May, 2002’. That date was exactly 1024 weeks ago (1024 = 2^10) We jumped forward an hour due to May being in daylight savings time.”
What was the expression of this rollover for users not poking about in the diagnostic menu? A calendar and date set to the 1st of January 2002, and the time an hour ahead.
Like I said earlier, the expression of rollovers can be messy and unpredictable when they happen.
A fix may be coming later this year for affected vehicles, sometime in August.
It’s also worth noting that this isn’t the first time the handling of GPS week number rollover has been an issue. More generally, there were issues in 1999 (the first rollover), and 2019 saw even wider issues due to the prevalence of GPS in consumer devices and infrastructure.
In November 2038, there will be another rollover that, if unaccounted for correctly, could cause issues for products with GPS.
And 2038 presents its own problem in the shape of a time and date rollover bug: a much bigger problem and a different epoch.
The Year 2038 Problem
When the 19th of January 2038 arrives, there is a chance that many embedded systems could function unexpectedly (or stop working) if they use time for calculations or diagnostic logging.
Many computer systems, including embedded systems, use “Unix time” (based on the Unix operating system) to calculate time and date. It’s an international standard for digital time, and it’s the number of seconds that have passed since 00:00:00 UTC on the 1st of January 1970.
Time, built on this Unix epoch, has often been treated as a signed 32-bit integer in the past.
So far, so good. Right?
Well, here’s the thing: if a system uses a signed 32-bit integer for time, the maximum it can count from the start of Unix time is 2,147,483,647 seconds after the epoch.
For Unix time, 2,147,483,647 seconds after it started is 03:14:08 UTC on Tuesday, the 19th of January 2038.
Why only 2-31 seconds after the epoch?
As said earlier, Unix time is often a signed integer rather than an unsigned integer.
In the C language, on a 32-bit system, you can have integers that are within:
- Signed range: −2,147,483,647 to +2,147,483,647
- Unsigned range: 0 to 4,294,967,295
But if you have a 64-bit system, the integers in C can be within:
- Signed range: −9,223,372,036,854,775,807 to +9,223,372,036,854,775,807
- Unsigned range: 0 to 18,446,744,073,709,551,615
So, an unsigned 64-bit integer could count for a long time. It is a mug that can hold over 292 billion years’ worth of time.
Systems that use signed 32-bit integers for time and have no methods for handling the Unix time rollover could face significant problems in 2038.
The Y2038 Problem and embedded software
Since 2000, the number of embedded systems has grown exponentially. Embedded systems are everywhere, from cars to internet routers, medical devices to assembly-line robots, smartphones to Bluetooth speakers.
While Y2038 poses less risk to desktop computing than Y2K did, embedded systems and software don’t get let off so easily.
Where’s the Y2038 problem for embedded software?
Many existing embedded systems and ones built today use 32-bit processors. The use of 64-bit for embedded systems is generally not a thing. It is currently mainly found in complex embedded devices like smartphones.
Storage is such a big issue in embedded software because everything needs to set parameters to work with. There could be memory limitations, causing the date to be stored as a 32-bit integer. Sometimes the limitation is down to the size of the processor. Embedded devices have small resources often because units are being produced with the minimum needed to function to keep costs down, alongside constraints around power use. Using cheaper options are fine until we start encountering problems like this.
While many consumer-level devices aren’t built to last beyond five or ten years, that doesn’t necessarily mean consumers will have stopped using them in 2038. After all, some of the Hondas affected by the GPS epoch were built in 2004 and still driving around in 2022.
Embedded systems in use across manufacturing, healthcare, utilities, defence may include signed, 32-bit integers for time.
The risk comes when time plays such a massive role in the function of a system, and there’s a chance that the device will still be in use in 2038.
What can you do to ensure your products aren’t affected by these bugs?
How you handle Y2038, or any time formatting and storage bug is not that simple, but it’s also dependent on:
- How crucial time and date is to your product’s function;
- The expected lifetime of a device;
- The risk presented by the device malfunctioning due to time formatting and storage bugs.
If you have a device out there on the market and no way to deliver an over-the-air (OTA) update, then there’s not much you can do. You can only plan for the future.
So, if you need to start planning with your product, hardware and software teams for handling Y2038 now or for future products, here are some steps you can take.
Check the expected lifetime of your system
At the time of writing, 2038 is under 16 years away. If your embedded system is likely to be used until 2038 or longer, you need to plan for that.
The service life of a system varies from country to country. It is also dependent on its use and the materials used in its construction. You may need to consider after-sale markets, where products previously used in high income and middle-income countries find homes in lower and low-income countries.
But if you expect a product to be in use for ten years, it’s a reasonable step to double that estimate.
Aircraft, trains, spacecraft, energy and utility systems, industrial systems, some electronic medical devices often have a service life beyond 16 years.
Ask your software team to look at time rollovers
Your software engineers and software test engineers can test and see if your existing code is at risk. If your system uses GPS, it would also be worth checking to see if it handles GPS epochs fine.
If your specification says how long the clock needs to work, checking for these bugs will be part of your software testing process. So, make sure your specification covers expectations for how long the clock needs to work.
Where possible, shift to using 64-bit for anything to do with time or at least unsigned 32-bit
If there is no reason your system needs to count backwards in time and date, then it might as well use an unsigned 64-bit integer to store time by using a 64-bit time library. Though switching to a 64-bit library will mean the process will be slower than 32-bit, if you’re not checking time often, that isn’t much of an issue. The minimum improvement would be switching to unsigned 32-bit, which would last until 2106.
Make checking for these bugs part of any technical debt project
Embarking on a technical debt project to clean up the code base of an existing product that could still be in use by 2038? Ensure that analysing that legacy code includes checking it over for time formatting and storage bugs. Else improving your software could be a waste of time.
Switch to Behaviour-Driven Development for effective requirements capture
Behaviour-Driven Development (BDD) is a way of capturing required software functionality so that it’s understood unambiguously across the software team and stakeholders. Doing this before software development begins captures how software should behave. Remember to include how time should behave, and testers will understand how it should work in the system.
Include over-the-air (OTA) updates in future products
Working through technical debt or dealing with discovered bugs and security flaws is a lot easier when you can send updates over the air. Ensuring a product has this kind of functionality can bring its challenges. Still, they are all ones your team can overcome.
Keeping OTA as frictionless as possible and testing updates and fixes thoroughly are just some of the ways you can make OTA work. Software development practices like Test-Driven Development (TDD) can also hugely help because it makes it easier for software engineers to respond to bugs found after deployment.
There’s a good reason why many don’t remember Y2K
The reason why there were so few headlines about the Millennium Bug happening once clocks ticked over from 1999 to 2000 wasn’t that there was no problem. It was because governments and businesses spent billions of dollars on finding software workarounds for critical systems to ensure that it didn’t become a largescale problem.
There wasn’t tea everywhere because software engineers did an excellent job.
Why is psychological safety critical to managing software risk?
Are you wondering what it really takes to prevent hazards and harms from embedded devices and systems? The tool many organisations are missing from their software development and product teams’ toolboxes:
You can download your copy of Why is psychological safety critical to managing software risk today. It doesn’t matter if you’re new to the concept or need a refresher; in our new ebook, you’ll learn about:
- What is psychological safety?
- The four stages vital to building psychological safety within teams.
- How psychological safety diminishes project risk.
- Project management practices that work well in organisations with psychological safety.
- How two businesses squandered psychological safety and paid the price for it.
Did you know that we have a monthly newsletter?
If you’d like insights into software development, Lean-Agile practices, advances in technology and more to your inbox once a month—sign up today!Find out more