In this blog post we’ll walk you through setting up your first monitoring and observability system to gather data about your systems, deployed in AWS cloud using env0.
When you’re deploying any software environment or infrastructure, whether it’s containerized or not, you must think about your application monitoring and observability strategy. This not only saves developers from being woken up at 2 a.m. by an alert noise to fix an issue, but it also gives you the ability to truly look into your application performance.
In this blog post, you’ll learn about what monitoring and observability are, one of the most popular stacks, and how to deploy it at scale into AWS using env0.
Monitoring and observability are often referred to as if they’re the same thing, but they have two very different purposes for developers and engineering teams.
This topic in itself can be an entire book, so I’ve tried to keep this section brief, but with enough information to provide context on the difference between their function and technologies.
Monitoring is all about observing data in real time, or tracking history of data. For example, if you’ve ever walked into a Network Operations Center (NOC) or walked over to the IT space in your organization, you may have seen some big screens with data statistics and dashboards.
Monitoring collects telemetry data that gives real time visibility and the ability to look into application performance. These tools collect data on how each component such as CPU, memory, bandwidth, and system performance at a high level. These software tools are often used by developers to measure software performance issues and optimize incident response times.
What makes observability important is that it deals with the "unknown unknowns," providing visibility into the entire application, and allowing developers to synthesize that raw data and form actionable insights to drive business outcomes.
Observability addresses the "three pillars" of data, then goes beyond to provide value to customers:
In short—with monitoring, you gather real time data about each system, component, service, events, infrastructure, etc. But despite the growing body of data under our control, in any complex system, blind spots will always remain.
Observability goes beyond what's monitored and under control. The telemetry data collected (from logs, metrics, and traces) provides context that leads to insights that drive business value for customers. Observability == inference.
Now that you know the theory behind the collection of data and hypothesis inference about your systems, you may be wondering: What tools and platforms are available to help developers collect data and hypothesize inferences about their service?
There's a greater variety than users might expect. A few notable ones are:
In the cloud native, Kubernetes, and containerization world, a lot of users gravitate towards one specific stack: Prometheus and Grafana.
Although this stack can be used to manage workloads outside of Kubernetes clusters, they both have a ton of compatibility and support for containerized applications at scale. They’re both an open source tool / software which gives developers a ton of control. It also means you don’t have to pay for them, which is a great value for users who otherwise wouldn't be able to access the data.
Prometheus and Grafana are enterprise ready in a lot of cases. This stack isn’t just for pre production processes. Many large organizations deploy these technologies at scale in their cloud and have battle-tested it to capture data for clusters ranging from 5-500.
If you want a little enterprise support behind it, a lot of the major cloud providers also have Prometheus and Grafana services (for example Amazon Grafana is one of the available managed, scaled, done-for-you AWS Services).
So, which tool collects what data?
Prometheus is all about collecting observability data. It consumes the raw data from a specific endpoint. In Kubernetes, the `metrics/` endpoint can be exposed and then Prometheus can retrieve metrics from that endpoint.
Grafana is used to view the data in UI-friendly and human-readable dashboards. Although you can definitely view the data in Prometheus and they’re very readable, they’re much easier to interpret and act upon when monitored in dashboards via Grafana.
So far you’ve read a lot from a theoretical perspective of what monitoring does, what observability does, and a few different software tools that you can test to help with these use cases in production. Now, it’s time to dive into the hands-on piece and deploy our own tools and infrastructure into the cloud using AWS services.
To follow along with the hands-on portion of this blog post, you will need the following:
This section will demonstrate the infrastructure configuration that will be used to deploy Prometheus and Grafana services into an Amazon EKS cluster and start gathering our own observability data.
Because this is an installation of your own tools of Prometheus and Grafana in the cloud, there isn’t specific infrastructure code you need to reference for this. Instead, you’ll use an env0 configuration to deploy into AWS. The env0 configuration contains steps to launch the Amazon EKS cluster in a step-by-step fashion.
The workflow goes as follows:
- aws eks --region=$AWS_DEFAULT_REGION update-kubeconfig --name $CLUSTER_NAME
- if [[ ! -e helm-v3.10.2-linux-amd64.tar.gz ]]; then wget https://get.helm.sh/helm-v3.10.2-linux-amd64.tar.gz; fi
- tar -zxvf helm-v3.10.2-linux-amd64.tar.gz
- ./linux-amd64/helm repo add grafana https://grafana.github.io/helm-charts
- ./linux-amd64/helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
- ./linux-amd64/helm repo update
# - kubectl create namespace monitoring
# - kubectl create sa release-name-grafana-test -n monitoring
- ./linux-amd64/helm template prometheus-community/prometheus -n monitoring > prom.yaml
- ./linux-amd64/helm template grafana/grafana -n monitoring > graf.yaml
Ensure you add the code above to the root directory of the Git repo you’re using to deploy and name it [.code]env0.yaml[.code].
With the [.code]env0.yaml[.code] configuration, you can now prepare env0 to deploy Prometheus and Grafana into Amazon EKS.
First, create a new environment.
For the VCS environment, choose to run Kubernetes so that the system can easily scale.
Select the repo where the [.code]env0.yaml[.code] exists along with the branch. You can leave the Kubernetes folder blank as you’re not deploying a Kubernetes manifest.
Ensure that the [.code]AWS_DEFAULT_REGION[.code] and [.code]CLUSTER_NAME[.code] variables match your EKS cluster.
Once complete, you’ll see in the env0 dashboards that your cloud deployment has started.
You’ll get a prompt to approve the Prometheus and Grafana deployments.
Once complete, you’ll see the AWS resources deployed to the cloud available on your AWS Elastic Kubernetes Service cluster.
Congratulations! You're now gathering data from the services you've deployed to the cloud in AWS!
This is part four of a four-part series. Keep reading to learn more!