Migrating to the cloud delivers major cost savings and improved ROI. Transferring IT spending to a pay-as-you-go, operational expense (OpEx) model significantly reduces capital expenses (CapEx), as well as providing other benefits. Thatâs why a pay-as-you-go plan for cloud services has become the default for most businesses today.
It wasnât always that way. Traditionally, IT paid for hardware and budgets were CapEx-oriented. Companies would make a large upfront investment in storage, servers, routers, etc., and then leverage it for years. Everything was planned and monitored. The primary benefit of a CapEx model is stability: you know exactly what your costs will be on an annual basis, if not on a longer timeline. But the predictable costs didnât always mean predictable results.
The cloud migration trend was accelerated by a significant drop in cloud platform pricing in 2013, led by Amazonâs AWS. Companies began to realize that they could both improve operations and save money by migrating to the cloud. I remember the day that year that triggered the company I worked for to move from on-premise VMWare to the public cloud. I was talking to a colleague who told me, âI canât install the new server for you. We donât have any storage available and the EMC storage we ordered is still stuck in customsâŚâ The frustration drove us to look for more agile alternatives and eventually led us to migrate to the cloud.Â
The shift to the public cloud gave companies like mine more flexibility without compromising the predictabilityâyou still had a fairly accurate estimation of your costs. For example, ec2/VM instances, which used to make up the majority of cloud cost, have fixed pricing: X USD per VM size/region.Â
In recent years, that has changed: enter âcloud bill shock.â But why? Most applications today run Kubernetes, Lambda/serverless, RDS, PaaS, and other pay-per-use resources. How much does auto-scaling a Kubernetes Cluster cost? How much are you going to pay for Lambda or for BigQuery? Basically, you have NO idea upfront: it depends on your usage. The OpEx approach had a lot of advantagesâit gives modern businesses agility and flexibility, and you donât pay for what you donât use. However, it also makes cost forecasting much more challenging.Â
The solution: visibility, forecasting, and governance
Visibility: Itâs very challenging to understand what youâre paying for in pay-per-use models. The FinOps team is charged with evaluating the business need, usually based on extensive resource tagging. But traditional tagging is manual, error prone, and extremely inefficient, so automating the tagging is crucial. Terratag is a great open source solution that performs recursive tagging for Terraform-based provisioning across your entire set of AWS, Azure, and GCP cloud resources.Â
Forecasting: Accurate forecasting is crucial when preparing a pay-per-use cloud budget. Companies need an accurate way to analyze their payment history and usage growth rate to create expense projections. Solutions like CloudHealth, CloudCheckr, and Cloudability offer great forecasting tools to meet these needs.
Governance: Theoretically, if you can estimate that new resources are too expensive, you should be able to prevent their provisioning and deployment. Open Policy Agent can help you manage this process with basic rules (for example, âdo not provision more than 10 new instances,â or âdo not provision extra-large ec2 instancesâ) and dedicated cost estimation CLIs can give you business-level governance. Terraform-cost-estimation is a great open source solution, if you use Terraform.
In addition to visibility, forecasting, and governance, it is crucial to shift cloud cost left and give developers the tools to prevent unnecessary cost increases as early as possible.Â
Shift-Left to empower developers to control their cloud budgets
Fifteen years ago, latency issues in production were ITâs responsibility, not the developersâ. But APM solutions like NewRelic, AppDynamics, and later DataDog made it easy for developers to catch problems early, write responsible code in terms of latency, and fix degradations very early in the process. The same was true for security. Companies like Snyk shifted security left and empowered developers by providing them with tools to detect and fix security problems much faster. The same process needs to happen with cloud costâdevelopers need tools to understand how their code will affect cloud costs.Â
The problem is even greater with Infrastructure-as-Code since developers actually write and maintain the infrastructure in their git repositories. A simple âgit pushâ can lead to a major cost degradation, but since developers donât have the tools to take ownership of the process, they usually donât take it into account. Thatâs why I believe that Infrastructure as Code is forcing a revolution in cost management, just as APM did with latency and Snyk and dev-first security companies did with security. Developers must have a way to see the impact of their deployments on cloud cost with a clear correlation. Thatâs why env0 (disclaimerâIâm the co-founder and CEO of env0) is focusing on automatic cost management, especially for IaC-based deployments.
To summarize, forget about old-school cloud cost management. The cloud has moved from CapEX to OpEX, demanding new solutions. Shifting cloud cost management left by empowering developers and giving them the tools to proactively correlate deployments and cloud costs (rather than leaving it to the IT/Ops teams with the current reactive approach) can prevent cost degradations much earlier in the process and boost overall efficiency.
Migrating to the cloud delivers major cost savings and improved ROI. Transferring IT spending to a pay-as-you-go, operational expense (OpEx) model significantly reduces capital expenses (CapEx), as well as providing other benefits. Thatâs why a pay-as-you-go plan for cloud services has become the default for most businesses today.
It wasnât always that way. Traditionally, IT paid for hardware and budgets were CapEx-oriented. Companies would make a large upfront investment in storage, servers, routers, etc., and then leverage it for years. Everything was planned and monitored. The primary benefit of a CapEx model is stability: you know exactly what your costs will be on an annual basis, if not on a longer timeline. But the predictable costs didnât always mean predictable results.
The cloud migration trend was accelerated by a significant drop in cloud platform pricing in 2013, led by Amazonâs AWS. Companies began to realize that they could both improve operations and save money by migrating to the cloud. I remember the day that year that triggered the company I worked for to move from on-premise VMWare to the public cloud. I was talking to a colleague who told me, âI canât install the new server for you. We donât have any storage available and the EMC storage we ordered is still stuck in customsâŚâ The frustration drove us to look for more agile alternatives and eventually led us to migrate to the cloud.Â
The shift to the public cloud gave companies like mine more flexibility without compromising the predictabilityâyou still had a fairly accurate estimation of your costs. For example, ec2/VM instances, which used to make up the majority of cloud cost, have fixed pricing: X USD per VM size/region.Â
In recent years, that has changed: enter âcloud bill shock.â But why? Most applications today run Kubernetes, Lambda/serverless, RDS, PaaS, and other pay-per-use resources. How much does auto-scaling a Kubernetes Cluster cost? How much are you going to pay for Lambda or for BigQuery? Basically, you have NO idea upfront: it depends on your usage. The OpEx approach had a lot of advantagesâit gives modern businesses agility and flexibility, and you donât pay for what you donât use. However, it also makes cost forecasting much more challenging.Â
The solution: visibility, forecasting, and governance
Visibility: Itâs very challenging to understand what youâre paying for in pay-per-use models. The FinOps team is charged with evaluating the business need, usually based on extensive resource tagging. But traditional tagging is manual, error prone, and extremely inefficient, so automating the tagging is crucial. Terratag is a great open source solution that performs recursive tagging for Terraform-based provisioning across your entire set of AWS, Azure, and GCP cloud resources.Â
Forecasting: Accurate forecasting is crucial when preparing a pay-per-use cloud budget. Companies need an accurate way to analyze their payment history and usage growth rate to create expense projections. Solutions like CloudHealth, CloudCheckr, and Cloudability offer great forecasting tools to meet these needs.
Governance: Theoretically, if you can estimate that new resources are too expensive, you should be able to prevent their provisioning and deployment. Open Policy Agent can help you manage this process with basic rules (for example, âdo not provision more than 10 new instances,â or âdo not provision extra-large ec2 instancesâ) and dedicated cost estimation CLIs can give you business-level governance. Terraform-cost-estimation is a great open source solution, if you use Terraform.
In addition to visibility, forecasting, and governance, it is crucial to shift cloud cost left and give developers the tools to prevent unnecessary cost increases as early as possible.Â
Shift-Left to empower developers to control their cloud budgets
Fifteen years ago, latency issues in production were ITâs responsibility, not the developersâ. But APM solutions like NewRelic, AppDynamics, and later DataDog made it easy for developers to catch problems early, write responsible code in terms of latency, and fix degradations very early in the process. The same was true for security. Companies like Snyk shifted security left and empowered developers by providing them with tools to detect and fix security problems much faster. The same process needs to happen with cloud costâdevelopers need tools to understand how their code will affect cloud costs.Â
The problem is even greater with Infrastructure-as-Code since developers actually write and maintain the infrastructure in their git repositories. A simple âgit pushâ can lead to a major cost degradation, but since developers donât have the tools to take ownership of the process, they usually donât take it into account. Thatâs why I believe that Infrastructure as Code is forcing a revolution in cost management, just as APM did with latency and Snyk and dev-first security companies did with security. Developers must have a way to see the impact of their deployments on cloud cost with a clear correlation. Thatâs why env0 (disclaimerâIâm the co-founder and CEO of env0) is focusing on automatic cost management, especially for IaC-based deployments.
To summarize, forget about old-school cloud cost management. The cloud has moved from CapEX to OpEX, demanding new solutions. Shifting cloud cost management left by empowering developers and giving them the tools to proactively correlate deployments and cloud costs (rather than leaving it to the IT/Ops teams with the current reactive approach) can prevent cost degradations much earlier in the process and boost overall efficiency.