Infrastructure as Code (IaC) Pitfalls and How to Avoid Them

Infrastructure as Code. It’s not a totally new concept, but it isn’t something that everyone is doing at this point. Some have been doing it for a long time. Some have just started the journey. And some have no clue what it even is. We’re going to break it down a bit today, and talk about what it is and some of the common issues, or pitfalls, that come along with it.

What kind of pitfalls can come up?

I have great news for you... The answer is “a lot.” It’s not very good news, I guess. But as long as you take the time to consider all your options, and plan accordingly, you can mitigate just about any issues that come up. So, why is there a lot that can go wrong? It’s because of the nature of Infrastructure as Code. It’s infrastructure and it’s code. That means you take all the pitfalls that can come along with infrastructure and add them to all the pitfalls that can come along with code.

Infrastructure as Code creation, and management, seems to be the most successful when it is a joint effort between the development team and the infrastructure ops team. DevOps… get it? When this isn’t a collaborative effort, infrastructure configuration issues can come up. Dev and Ops each have their own areas of expertise. That’s not to say that people in one don’t understand the other. It just means they are experienced in their area. So why not benefit from the knowledge and experience of both?

Infrastructure Pitfalls

The first big pitfall is choosing the wrong framework for your IaC needs. Most major cloud providers have their own specialized framework. For example, Cloud Formation on AWS, and ARM templates on Azure. These are great if you are 100% dedicated to that cloud. But if you ever decide to migrate or go multi-cloud, your existing IaC configurations can’t be used on the new cloud. There are some tools to convert, but this problem can be easily solved by choosing a cloud-agnostic framework from the beginning. Frameworks like Pulumi and Terraform have the ability to deploy to pretty much any cloud provider, and even control other pieces of the infrastructure and SaaS tools.

Infrastructure teams usually don’t just spin up resources and delete them at will. Generally, lots of variables come into play. Capacity planning or cost analysis, for example. This is to control over or under-provisioning the needed resources, or even overrunning the cloud budget. This pitfall can be mitigated by the Ops team being involved in the creation of the IaC configuration files, or by helping to manage and govern the self-service of the IaC deployments done by the development team.

Another infrastructure pitfall is going to be security. In the past, lots of development teams had the luxury of secure development sandboxes. No real need to involve security until the time when their project is being turned over from development to production, at which point then security was involved as an afterthought. By shifting security left with IaC in your deployment process, you can work to mitigate security risks and misconfigurations before they happen. Utilizing tools like Open Policy Agent for Policy as Code can help you ensure that no deployment of IaC resources ever happens when the code files contain infrastructure security misconfigurations. Open Policy Agent will parse your IaC configuration files and check them against Policy as Code files that you create to set up the guardrails of your deployments.

As I mentioned before, IaC comes with not only the pitfalls of infrastructure but also the pitfalls of code.

Code Pitfalls

One of the biggest code pitfalls is a very common issue with code in general. Redundant coding. This essentially means that you’re creating entire duplicate sets of your code for each individual environment, and hard coding the customization values into each set of files. The concept of DRY is usually pretty well known in the software development community. Don’t repeat yourself (DRY, or sometimes do not repeat yourself) is a principle of software development aimed at reducing repetition of software patterns,² replacing it with abstractions or using data normalization to avoid redundancy.

So how do we make sure we’re following this methodology while creating IaC files? There are a few ways of doing this. Using variable values during the deployment process can help enable you to create a more “generic” set of IaC configurations that you can then use in a repeatable manner, customizing each deployment using the needed values. Another way is using an IaC framework that natively enables this type of configuration creation. For example, Terragrunt is a “wrapper” for Terraform that enables the DRY methodology in the creation of IaC configuration files. It accomplishes this by restructuring the way Terraform files are organized and executed. You create 1 set of “DRY” configurations, then you use customization files to define each deployment. This allows you to write only one set of configuration files to be used for development and production deployments, but each has its own respective customization files in place so that it creates each one with the needed parameters.

Speaking of values, this is a significant topic when you’re writing your infrastructure as code files. Specifically, what the default values are for each object you’re going to be creating. If you do not specify a value for a specific object, there may be a default value associated with it. For example, if you write in your configuration to create a firewall, but don’t specify values for a security policy, it may create the firewall with a default-permit-all policy. This is very bad. How do we mitigate this pitfall? From a code perspective, we make sure that we have the needed values and parameters for all of the objects we are creating in our code files. We also talked about misconfiguration a bit earlier in the Infrastructure Pitfalls section. Using some kind of Policy as Code framework or security tool during the actual deployment phase can help you stop the deployment of misconfigured resources before it happens. It’s better to have your deployment process fail and have to fix the code and try and redeploy again than have to fix an application that has been compromised due to being deployed with that misconfiguration.

Designing your IaC configuration files can also introduce a pitfall of performance issues. This performance hit comes during deployment, re-deployment, destroy, and other maintenance tasks when you have a large state file.

When you use Infrastructure as Code, the framework you choose needs to document the active “state” of what was actually deployed. This file is used to make future deploy or destroy operations more efficient so that it doesn’t duplicate work it’s already done. For example, in Terraform, let’s say you want to scale a cluster from 2 nodes to 3 nodes in the configuration file. When you subsequently run the Apply command again, it will check the state of the active deployment and see that it has already created 2 nodes, so it will only add 1 node to make 3 nodes in the cluster. Much more efficient than tearing down the 2 existing nodes and deploying the 3 new nodes you asked for from scratch. If you have a fast development cycle, and all of your infrastructure deployments are jammed into 1 giant state file, every little update or redeployment can take a significant amount of time to execute. So how do we avoid this pitfall?

Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality³. You can design your Infrastructure as Code files and the deployment process in such a way that you can split the full infrastructure into modular pieces. The network, the storage, and the compute, all in their own bite-sized modules. You can then chain these modules into a Workflow, or just re-deploy them individually when you need to. This makes it a much more “cloud-native” or “micro-service” friendly design. Now, if you need to update just one piece of the infrastructure, you don’t have to run the update or re-deploy operation against the entire infrastructures’ state. Just that small piece that you’re updating. This will make it so that you have more state files, but in the long run, that is a much easier situation to manage.

Wrap-Up

As you can see, there is a lot of decisions and pitfalls that go into using Infrastructure as Code. From choosing a framework, to who helps create and manage the files, and how you can manage the deployment process to try and help curb some of the pitfalls that come along with IaC configuration issues. And these aren’t even close to all of the other types of pitfalls you may encounter along the way. But, hopefully, this was helpful to get your head wrapped around some of the things you may need to be thinking about. And helps you to start off on the right foot, or even to help you go back and make some changes to your existing IaC configurations or procedures.