Future Legacy Applications

August 14, 2020 11-minute read

Applications become "legacy" when;

The target platform is deprecated and cannot be recreated, necessitating a rewrite,
The framework it is built on becomes passe and critical security fixes and new features are added only to a new non-backwards compatible version of the framework, necessitating a rewrite,
The 3’rd party services the application depends on are deprecated or turned off, necessitating a rewrite,
The APIs the application uses are forcibly made obsolete in current releases of the OS (recent versions of iOS or OSX, for example.) This is related to the first point. Also, necessitating a rewrite.

We can attempt to minimize future breakage by;

Core application functionality should not rely on the availability of bells and whistles,
Minimize dependencies, and
Dependencies should be managed,
Simplify deployment environments,
Simply architecture; make change no more difficult than necessary,
Version control everything,
Write some documentation, and make sure it lives with the application
Use technologies that have a history of maintaining backwards compatibility.

Introduction

The primary distinguishing feature of any one thing that has lasted the test of time is that there were probably a lot more of them back in the day.

Castles, bridges, forts, battleships, mainframes, PDAs, Blockbuster stores, software applications. All that survive, survive mostly due to luck rather than any lofty aspirations or far thinking designs by their creators (the monuments and monoliths that are the pyramids of Egypt and Stonehenge number the few exceptions).

On the surface, the Romans seemed to be pretty good at building structures that lasted. After all, there are bridges still standing that were built in Roman times.^[1]. But what we see are the few surviving examples, with many resting now in pieces at the bottom of rivers, ravines, or having been disassembled piecemeal to be used as raw materials in new construction projects. And much of what the Romans built didn’t last. Their apartment building skills, for example, left much to be desired.^[2]

The primary sources most often speak of insulae in reference to their hazards. There are multiple references to the dangers of living in insulae within texts such as satires and histories. It seems the main dangers of living in Roman apartments were fire and collapse

— Rome: A City of Rental Property
https://romaninsulae.weebly.com/dangers-of-living-in-an-insula.html

With few exceptions, our fore-bearers likely had no notion their software would remain in active use twenty or thirty (or forty, or fifty) years after being deployed, by pack mule or camel caravan, into production. If they had, would they have thought to allocate an extra byte or two to store a four-digit year to head off the Y2K^[3] horror decades hence?

Memory and disk space were expensive and at a premium, so allocating a few extra bytes to a date field would have made no business value sense. Much better to kick that particular can down the road and into the sweaty paws of future hoards of willing consultants and contractors. This leads to a discussion on future-proofing versus the overwhelming urge to overengineer.

Overengineering

The divide between future-proofing and overengineering seems slim. They seem to be two sides of the same coin, or perhaps one side of two different coins.

Overengineering a bad or wrong design makes future-proofing that much more difficult. Like reinforcing a home to withstand a hurricane, using stronger materials will only allow the structure to withstand winds up to a certain magnitude. To withstand stronger winds the entire design needs to change, perhaps necessitating razing the home to its foundations and starting again.

Future-proofing is not about adding a features that you assume your users may ask for in the future. These features will go unused, effectively cluttering your business logic (requiring testing prior to each release, or being hidden by sweeping under a carpet of feature flags), or most likely when the feature is requested the requirements will be so different as to necessitate a rewrite.

A question that should be answered early on is what is expected business value to be gained by future-proofing, as this requires an investment of time and effort driven by architects and designers with some experience and commensurate skill. It is perfectly acceptable getting something out there as fast as possible and subsequently iterating until it becomes good enough (There are exceptions to this approach when working in the financial, health, regulatory and hardware domains.)

Creating a "Google Scale" architecture from the outset brings with it Google scale problems.^[4];

The chances are vanishingly slim that any of your production systems will ever have a need to process a Google level of load.
You don’t have the resources to implement and provide the ongoing operation support necessary to keep a system of that complexity, that can actually scale and handle that kind of load, running. And relying on SaaS and the integrated technologies of your chosen cloud provider can become expensive fast.
You will get it wrong the first time. And making mistakes at the architecture level is expensive and hard to change later. There is a reason the Googles have had to invent new technologies to support their business growth. Standing up a few Apache open source technologies that are today’s Du Jour, discovered via blog or YouTube channel then glued together by event sourcing, won’t solve any problem anywhere.

In reality there comes time where parts of the system will have to be redesigned and rewritten. Twitter, Netflix, Reddit are examples of companies where the only answer to continued growth was re-architecture. And that’s OK. It took Netflix something like five years, starting from their massive customer facing outage of 2008, to finally complete their migration to microservices. That’s a long time. And like Netflix, you will to have to live in your home as it is redesigned to withstand an F5 tornado simply because it is impractical to feature freeze legacy systems while engineering resources are moved to busy themselves creating a new system from scratch.

The business won’t stop demanding new features for the guesstimated six months to a year that it will take to build a new system (See Netflix above.) Features will continue to be added to the old system by the very developers who are becoming increasingly annoyed that they are stuck maintaining the legacy system while their colleagues get to play with the new shiny. And when the new system is finally "ready", well, it can’t launch without supporting all the features that were added to the legacy system in the meantime. All because Product rightly thinks that customers will riot and jump ship. Netscape tried and ended up losing all market share and subsequently going out of business.^[5] So don’t do that.

An alternative is to brand a new product or service around the new architecture. New customers are drawn to the shiny and sign up. Existing customers are incrementally (voluntarily or involuntarily) migrated as the features they care about are rolled out. When customers using the old system approach zero you can begin to discuss the plan to sun-set and subsequently pull the plug.

Or maintain both products indefinitely. This has a cost but you now have two products and both have paying customers.^[6]

Joel: We have never really killed things that people were using, to be quite honest. The StackExchange 1.0 sites, anybody that had a site that was working, that site is still running, they have either migrated to StackExchange 2.0 or we are just continuing to operate the site on StackExchange 1.0. It doesn’t cost us that much to do it.

If you had starting using FogCreek CityDesk in 2000, our windows-based content management system which allowed one person to contribute to a website which they published manually by FTP. If you started using that in 2000, it still works today. We fixed a bug that showed up in Windows 7, maybe Vista was the last bug we actually had to fix.

You can still use that today and call up FogCreek and get support on that, usually redirected to me. Almost nobody is. We try not to leave people behind on these things.

— Trello: How A Proven Founder Launches A Startup
https://mixergy.com/interviews/trello-joel-spolsky-interview

Repair or Replace

There was once a culture, not too long ago, of repair not replace. Computers, phones, TV’s, VCRs, shoes, clothes. If you yourself couldn’t repair it you took it to someone who could because buying new was expensive. But did this culture influence the approach building software? Well developers being developers, so the answer is most likely no. It’s hard enough trying to recall that went through your head looking at the code you wrote last Wednesday at 2am - never mind the non-thoughts that may have passed through the mind of anyone else. And what few comments there are in the code are all wrong anyway. Much easier to rewrite from scratch.

New is cheap. The availability of Frameworks, libraries, third party SaaS solutions, mean standing up a new thing is relatively easy, but having the new replace the old is not. See the Ninety-ninety rule.^[7] Plenty of horror stories recount failed efforts to replace legacy systems due to massive schedule and cost overruns, or the new systems don’t perform quite as expected (or at all) when in production.^[8]

It seems that we have always been in a mess of our own making.

Simplicity

A good approach is to strive to architect for simplicity. Make the system as simple to change as possible (or put another way, don’t make the system more difficult to change than necessary - I got this from somewhere. Google currently fails me, I’ll add a ref when I find it.) Software is resilient when simple. Complex software is brittle. It is easy designing complexity. It takes work and time to design simplicity and requires a level of experience on the part of the architect or designer to design for change without overengineering.

Everything Should Be Made as Simple as Possible, But Not Simpler ^[9]

— Albert Einstein (most likely)

Dependencies

Minimize external dependencies where possible.^[10] And examine the development history for those dependencies that you do choose. If your architecture will rely on a platform, SDK, virtual machine, framework, API, or service, how long typically does that dependency go without a backwards incompatible breaking change? Six months, a year, thirty years, indefinitely?

It’s fine to rely on dependencies that have a history of making braking changes, or service providers that have a history of regularly turning off features, ending products and breaking APIs. But go into architectural decisions with eyes wide open.

Remember than in a years time, when you are already overloaded with bug fixes and new feature requests, you may also be forced to rewrite your application from the ground up due to a decision that you are making now.

Version Control Everything

It is not just the source code and assets (localization, not just art) that should be versioned. Version everything. Libraries, SDKs, build, test and deployment pipelines, tool chains, scripts, configuration files, perhaps entire environments.

When old software needs to change it is usually time-critical. Taking a week to debug how to actually rebuild the code is just unacceptable.

It helps if you own it

Ownership means that software is within your sphere of influence to change or not.

What’s up with old software?

So, how can any software that is older than five years continue working?

Applications that ran on "big iron" were somewhat resilient to change just by the nature of the environment and the type of work performed.

Environment: Pretty much the entire stack, from hardware on up was proprietary and closed source. There were fewer dependencies in the software, fewer abstraction layers in the tech stack, and fewer dependencies in the development, deployment and production environments. You created an application and ran it on a machine. And then you went home. Software worked until the underlying hardware failed and the requisite replacement parts became unavailable.^[11]
Work: Most processing on big iron were in the form of batch jobs that, once submitted, ran until completion — a much simpler paradigm than millions of always connected clients demanding instant feedback requiring constantly horizontally scaling web services.

Thus, modern IBM mainframes can continue to support many applications originally running on IBM hardware introduced in the 1960s^[12] (One of the reasons to pay for support contracts.) And when the manufacturer of your mainframe/minicomputer went of business, third party commercial emulators became available for old PDP, Vax, Alpha, SPARC, and HP3000 architectures that run in the cloud and on commodity hardware.

The amount of engineering needed to maintain the decades of investment that are applications is considerable.

Normally, if given the choice between doing something and nothing, I’d choose to do nothing. But I will do something if it helps someone else do nothing. I’d work all night, if it meant nothing got done.

— Ron Swanson
Parks and Rec

Version History

1.1, <2020-08-17 Mon>
1.0, <2020-08-14 Fri>