First Published in Feb 2015
In part IV I explored the people side of Managed Hosting and #Devops, now lets talk process.
ITIL, and it's many, many, many pages would have us believe that having a clearly defined and documented process for everything is the most efficient way to go.
There are a couple of well known and major challenges with this approach, especially for an environment where there are several hundred customers whose environments are all different.
The time and resources neede to create all the process documentation
The time to keep all the process documentation up to date
With all those resources invested in developing the process, there is no incentive to change the process as times and technologies evolve, this is a perfect example of the Shirky Principle - the process itself becomes inflexible.
There are certainly clients for whom a clearly documented run book is a requirement, typically to comply with one standard or another and at a minimum every environment needs enough documentation to allow skilled support resources to effectively find their way around.
What does not scale is going too deep so that you have different process for each customer to achieve the same ends. The goal should be to have just enough process to delver consistent, reliable service (the minimum viable process) and just enough customization to accommodate 90% of customers requirements.
Typically this can be presented as "here are the tools and the process we use to patch servers, and you may choose the day and time."
As an MSP we run this line constantly - with the lean process that we have in place to operate hundreds of customers how much of the individual customers process can be layered on top without impacting overall scalability and operations.
However this does not mean that we can rest on our laurels - as trends in our customers change we have to evolve our core processes, just enough, to be able to support them. In addition to understanding our customers, MSP's need to be constantly questioning why we have the processes we do to support the products we support, checking our assumptions and questioning yesturday's best practices.
The automated processes should be measurable with real time feedback as to effectiveness. Central collection of log data, tying in automated deployment, automated QA, patching, configuration management systems into the overall telemetry system gives a real time feedback loop that can be acted upon by the 24*7 operations staff.
One point of note, given the complexity of systems, with several levels of virtualization and multiple vendors now, outages are harder to diagnose, root causes can be more ethereal and the ability to recover systems quickly can be more important than an ultimate diagnosis.
In summary - hosting providers walk a fine line between customers demands for documents, process, and ITIL certification from customers but have to develop their own processes sufficient to support an infrastructure of hundreds of customers. Not an easy task.
Photo by Hello I'm Nik 🎞 on Unsplash