Maturity of a cloud
When a colleague come back from its week-end and announced that he purchased a car, you will probably ask him : “What kind of car ?”
However, you know what is a car
A car is a road vehicle, typically with four wheels, powered by an internal combustion engine and able to carry a small number of people.
But your question will look for additional information that will allow to have a better idea of the performance, capacity and other attributes of the vehicule
So, when you will invest an important part of your R&D budget in a cloud, you should also probably ask yourself : “What kind of cloud ?”
If we look at a definition of cloud computing, we can use for example the following from Google :
Cloud computing is the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.
In this definition, there is no indication of any engagement of performance or return of investment. However, you will have, to compete with the other companies, to have an efficient solution that will meet the expected level of service that you target.
One way to assess this efficiency is to use some standard metrics like this proposed set Cloud Services Industry’s 10 Most Critical Metrics
- Availability
- Reliability
- Security
- Capacity usage
- Scalability
- Response time
- Latency
- Throughput
- Service and helpdesk
- Cost by customer
However, if you are not familiar with the cloud patterns and technologies, it can be challenging to navigate though the technical vocabulary and promises of success.
The following diagram classify clouds into 6 categories of maturity and explain the main enabler that allow to move from one level of maturity to another.
The maturity assessment evaluate two aspects of your cloud :
- The servers that you have
- The applications that run on those servers to provide capabilities to your customer
L0
The minimum level of maturity is a pure compliance to the definition of cloud computing which imply that your machines are available through internet. With this level, your cloud is a “Second IT” that have the only return of investment of limiting the physical space that you allocate in your facility.
This level imply that you have to perform the provisioning of the hardware, the installation of the server operating system, the installation of the application and any other configuration activity.
Your agility is also very limited as the provisioning, maintenance or removal of resources rely on technical intervention from your team.
As each machine has a specific role, the optimization of your park and/or the optimization of the use of the capability of each machine is a challenge
L1
The first technology that will be used to improve the maturity of the cloud is the virtualization. Through the use of an resource hyper-visor, each server will have the ability to run in parallel several operating system (or virtual machine).
Servers are grouped in different datacenters (or availibility zones) so that the failure of a specific geographic location (due to weather, lost of power or any other source) will have only a partial impact on your global park.
At this level, the capacity optimization is still difficult as each virtual machine will allocate some of the shared resources for its operating system
L2
To resolve this issue, a more recent technique will allow to launch an application in a segregated environment (called container) but with a shared used of the underlying operating system.
The container engine will also resolve part of the need of communication by creating a logical routing between the several applications
L3
As you progress in the optimization of your investment, you still have to rely on manual intervention for recovery and specificity of your installation.
To move to the next level of maturity, you need to apply 2 principles : one at the application level, the other at the server level.
The first principle called “12 factor” are design and software principles that target to have applications that can live on top of an infrastructure that is not reliable. This allow to reduce the constraint on the hardware reliability and to enable fast start/stop of your applications.
More information on the 12 factors on this link
The second principle is apply to your servers and is well described with the “Pet or Cattle” image.
- A pet is a server that is treated as indispensable or unique system that can never be down (Personal attachment).
- A cattle is a server are built using automated tools, and are designed for failure (no personal attachment)
With a set of cattle server, update or maintenance of the system is done by recreating a new server and destroying the previous one.
This approach improve drastically the reliability and security of the system as any deployment can be reproduced anytime and security intrusion can be resolved by a replacement of the defective server with another server with the original configuration
More information on “Pet or Cattle” on this link
L4
Once your cloud applications and server are designed for flexibility, you can benefit from a cloud platform that will simplify your activity by performing
- Automatic deployment
Deployment of applications on the park with optimization of the use of the availability zone to setup the best configuration for resilience - Self-healing
The platform will monitor the health of the server and perform automated provisioning/decommissioning from your cloud provider
The platform will also apply the same principle for applications - Dynamic sizing
The platform will monitor the usage of the server and perform transfer of applications between servers to optimize the resources.
The platform will also provision or decommission instances of applications to maintain expected level of service
Those principles coupled with a process like DevOPS (that define a feedback loop between production and development), will allow to setup optimized level of redundancy in the solution based on performance monitoring.
This configuration adjustment is important as high availability imply to have redundant applications that are capable of delivering a service, but also imply to have redundant resources used at the same time to allow failover.
More information about DevOPS on this link
L5
When your cloud is optimized, the last level of maturity is achieve by delegation of part of your solution to a provider that can optimize (by volume, technologies or infrastructure) the cost or performance of your solution.
This imply to analyze your core business and perform a classic MTB analysis (Make – Team – Buy) on each of your application services
Whereas there is many variant of the model now, there is 4 main level of delegation :
- IAAS
Infrastructure is provided as a service to create your cloud - PAAS
Platform is provided as a service to build your application - Serverless
You execute the functions of your application as a service - SAAS
Application (Software) is provided as a service
As a note :
- IAAS delegation is often started at level L1
- PAAS delegation is started at L4
Eco-system
When you analyze the level of maturity that you want to reach on your solution (or on each part of your solution), you will have to balance the efforts of the change in regards to the benefits of the change
For example :
- The effort to migrate your applications to the desired compliance
- The effort to convert your pets to cattle
- The benefit on the reliability and performance of your system
- The benefit on the recurrent cost of ownership
However, as a reminder, in an environment where your services are part of an eco-system grid (like the travel industry), the failure of your products will have a cascading impact on all the dependent services built by your customers.