Originally published - 2/16/15
During the discussion of DevOps, "Technical Debt" came up and is worth discussion.
What is Technical Debt ?
Ward Cunningham first used the term in the context of software development and is an analogy for all those things that should have been done but were not during the Development cycle.
Typically these things included documentation, full QA testing and addressing "structural issues". There has been some good debate about whether Technical Debt is good, bad or indifferent.
Johann & Wolff wrote this piece, Managing Technical Debt.
However I'm going to extend the conversation down the stack to the operating systems and infrastructure that support the code, and ultimately the business.
Poor Process and Patching.
Technical Debt in the infrastructure space begins the first time you skip a patch bundle, and multiplies every time a patch bundle is released from then on. Patching is a neverending part of operating technology. There are consistent notifications about Microsoft Patch Tuesdays, Apple's new IOS release, updates to Android, security holes in BASH, issues with the XEN hypervisor, and updates from all the other Infrastructure software authors.
Why skip in the first place?
Typically the reasons show the conflict between Operational Security, "We need to patch this now", and the business, "We can't impact our customers at this time", and Operational Scale "we can't reach that many systems in the maintenance window".
What about the other issues that Ward Cunningham identified ? QA / documentation / training ? Are those also issues in the Infrastructure space ?
Of Course.
With complex infrastructure, especially cloud, failure to fully test before going operational can leave you with cloud pieces (storage / compute / network / orchestration) that may not work as expected. This leaves you with islands of infrastructure that fail to communicate and then make the patching and support issues worse.
Failures of documentation leave support teams in the dark as to the locations of infrastructure nodes, modules and incomplete configuration information makes resolving support tickets take that much longer.
So, how do you design your infrastructure to make these issues limited in scope, develop support processes and documentation to limit Technical Debt ?
What if you are not a technology company - could you have a technical debt problem ?
Yes, of course, a well publicized example was that the Target hack was found to originate from an HVAC vendor...
When was the last time you trained your users, or patched your systems, or checked the vulnerabilities in your machine shop ? Did someone setup a wireless network for the office 'cause it was super convenient ?
In many ways a company that is focused on building products / delivering services and uses technology for their email, book keeping can be at a greater risk.
Contact MXL Consulting for assistance with infrastructure & operational design to minimize Technical Debt and reduce business risk.