Performing Intelligent Operations in the real world
Expanding on last month's post, this month we will look at how we perform and practice Intelligent Operations in the real world at Itoc. To understand how it all ties together and why, we need to first understand the fundamental needs and justifications of an intelligent platform. The journey so far...
Quickly revisiting the “olden days” when we still had dial up modems, satisfying clunk-a-click keyboards by IBM and big CRT’s it was the comfortable world of ITIL. Managed services was all about outages and then hero moments of break fix in the early 2000’s. Everything was siloed. This was where we had two opposing cultures either side of the DMZ called “release management”. All concerned quickly realised that this was not a way to scale going forwards and so, after a few years DevOps was born. Initially, this wasn’t about automation, it was about a clash of cultures. In reality, automation was a side-effect byproduct of changing the working culture so two tribes could co-exist.
Now, having said this, I think the term DevOps is a little overused in the current day and it has outlived its original meaning but this is another discussion for another post! Nonetheless, many people have hung on to the term and it has become ubiquitous. DevOps was originally focussed on the efforts that made release management over the dev/production divide as smooth and seamless as possible. It has since grown far beyond a simple release/delivery automation domain.
The age of “DevOps” since the naughties has evolved beyond simple configuration and installation automation. It has spread to not only continuous delivery and integration but also into business instrumentation and notification. Everything that is deterministic and well understood is fair game to be automated! We have seen the rise of the SaaS technology partner ecosystem to make it easier to support operational environments and beyond. With all the ubiquitous tooling that is API driven everything can be integrated.
If we look at the journey of “DevOps” to date over the last 10 years several drivers have remained constant, those that continue to fuel the “innovation” in the production and Managed Service domain:
- Ever faster compilation, test, integrate and release times of applications (lifecycle)
- Increasing stability across the dev/prod release border
- Increasingly rapid automated response when metrics go beyond nominal parameters
- Deeper automated API based integration across wider toolsets and business endpoints
- Faster response and accelerated human interaction with the environment
- Decreasing operational budgets
Looking at the continual pressures that have driven the “DevOps” revolution from the beginning we still see that there are still limitations in the operational environment to conquer, chiefly the human decision element that supports operational managed services customers. Ironically, the human in the support chain is the weakest link that brings human error under pressure when critical events occur. Often this induces instability rather than increases it. The opportunity that now presents itself is automation of this final frontier, to automate the human management aspect of operational environments.
A traditional Cloud Managed Service Provider (MSP) by tiered departmental domain
The MSP of tomorrow
As you can see in the diagrams, the day to day instruments of managed services don’t really change. We still have metrics to monitor, service level targets to meet and a tuned operational cost that allows us to stay in business. These things don’t really change. What has changed is how we managed these instruments. We start remove the human from the process. This reduces labour costs and the errors that do occur under pressure.
As the managed services business model is being continually squeezed for margins as expectations increase and platforms move towards true PaaS and SaaS more must be done with less over time. It is a natural progression to automate the human decision process now that we have maturity in the AWS platform, culture and the associated tooling. We move into the realm of labour elimination. Automated human decision making. The human now become more of a business manager.
How do we do this?
At Itoc our secret sauce to perform Intelligent Ops is a rules engine based management framework harness to contain and orchestrate all aspects of the managed services environment.
With our framework we can analyse, decompose and automate well understood deterministic Human Decision Trees (HDT) that are commonplace.
Through the use of collected historical data reflecting operational incidents, analysis of the HDT in combination with a rule engine and rich monitoring we can create ‘bots’ that perform on behalf of support engineers. Combining AWS components such as Step Functions, Lambda, Workflow and our secret sauce, our system can recognise anomalous events and react gracefully to contain and correct any undesirable outcomes.
As more data is collected we have a solid foundation for the future with new bots introduced over time as customer platforms are increasingly understood.
This does not mean we lose the human element entirely but roles do change. We see a shift from primarily technical troubleshooting to one of business intelligence and business decision making at the operational level. This allows the engineering team to still have an operational role to ensure corrective actions are taking place correctly but their time can now be spent on higher value propositions. We move up the value chain!