Cloud Governance Checklist for Platform Teams

Use it as a gap assessment, a new-environment setup checklist, or an audit preparation tool.

Policy & Compliance Controls

These controls ensure your infrastructure is defined, deployed, and enforced according to rules your organization has agreed on — before resources reach production.

1. All infrastructure changes go through code review. No resource is created or modified by clicking in the console. Every change starts as a pull request against an IaC repository, reviewed by at least one other team member before merge and deploy.

2. Infrastructure scanning runs on every PR. A security scanner (tfsec, Checkov, or equivalent) runs automatically on every infrastructure pull request. HIGH and CRITICAL findings block merge. Results are visible in the PR interface, not just in a separate dashboard.

3. OPA or Sentinel policies are enforced before apply. Policy-as-code runs against the Terraform or OpenTofu plan before apply executes. Policies encode organizational requirements — mandatory tags, prohibited resource types, approved regions — not just generic security rules.

4. Resource tagging is enforced, not suggested. Required tags (Environment, Owner, CostCenter, Team) are validated by policy. Resources missing mandatory tags are blocked at deploy time, not flagged after the fact. Tag enforcement applies to all cloud providers in scope.

5. Approved module library is in use. Teams source infrastructure from a curated module library (internal registry, Terraform Registry with pinned versions, or a private repository). Direct resource authoring outside approved modules requires review and justification.

6. Compliance framework mapping is documented. Each active security control is mapped to at least one compliance framework (CIS Benchmarks, SOC 2, PCI-DSS, HIPAA, ISO 27001 — whichever applies). Unmapped controls are reviewed quarterly for relevance.

7. Environment promotion gates are enforced. Code must pass through defined environment stages (dev → staging → production) with required approvals at each gate. Emergency bypass procedures exist but are logged and reviewed.

8. Secrets are never stored in IaC code or state files. Terraform state files containing sensitive values are encrypted at rest. Secrets are referenced from a secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) rather than stored in tfvars files or environment variables in CI.

Access & RBAC Controls

Overpermissioned identities — human and machine — are the most common source of cloud security incidents. These controls minimize the blast radius of credential compromise or insider threat.

9. Least-privilege IAM is applied to all identities. Every IAM role, service account, and user has only the permissions required for its stated function. Wildcard actions (s3:*, iam:*) are absent except in explicitly justified break-glass roles. Permissions are reviewed quarterly.

10. No long-lived human credentials exist. No IAM users with programmatic access keys are used for human access. Engineers authenticate via SSO (AWS IAM Identity Center, Azure AD, Google Workspace) and assume roles with time-limited sessions. Static access keys are treated as a P1 finding.

11. MFA is enforced for all cloud console access. Multi-factor authentication is mandatory for all human users with cloud console access, including read-only roles. MFA enforcement is applied at the identity provider level, not per-account.

12. CI/CD pipelines use short-lived credentials. Pipelines authenticate using OIDC federation (GitHub Actions, GitLab, etc.) rather than stored IAM keys. Each pipeline has its own scoped role with the minimum permissions needed to deploy its specific workload.

13. Privileged access requires approval and is time-bound. Break-glass and production-write access is gated behind an approval workflow. Sessions are time-limited (4–8 hours maximum). All privileged access events are logged with the requestor, approver, duration, and actions taken.

14. Service-to-service access uses workload identity. Applications authenticate to cloud services using workload identity (IAM Roles for Service Accounts, Azure Managed Identities, GCP Workload Identity Federation) rather than embedded credentials or instance profiles with broad permissions.

15. Unused permissions and roles are removed regularly. IAM Access Analyzer, Azure Advisor, or equivalent tooling runs monthly to surface unused permissions and roles. Findings are remediated within a defined SLA (recommended: 30 days for unused, 7 days for overpermissioned active roles).

16. Cross-account and cross-tenant access is inventoried. All trust relationships that allow one account, subscription, or project to access resources in another are documented and reviewed quarterly. External trust relationships (third-party tools, vendor access) require explicit approval and expiry dates.

Cost & FinOps Controls

Cloud cost overruns rarely happen because of a single bad decision — they accumulate through unreviewed resources, missing guardrails, and no one watching the numbers. These controls create the visibility and accountability loops that prevent surprises.

17. Budget alerts are configured for every account and environment. Every cloud account and environment has at least one budget alert at 80% and 100% of the monthly target. Alerts notify the responsible team via email and a monitored Slack channel. Alerts without a named owner are treated as misconfigured.

18. Cost anomaly detection is enabled. Cloud-native anomaly detection (AWS Cost Anomaly Detection, Azure Cost Alerts, GCP Budget Alerts) is configured to surface unexpected spend spikes — not just threshold breaches. Alerts route to the FinOps or platform team within 24 hours of detection.

19. Resource rightsizing is reviewed quarterly. Compute and database resources are reviewed for utilization vs. provisioned capacity on a quarterly basis. Recommendations from cloud advisor tools (AWS Compute Optimizer, Azure Advisor, GCP Recommender) are tracked as actionable items with owners.

20. Idle and orphaned resources are identified and cleaned up. Unattached EBS volumes, unused Elastic IPs, stopped instances older than 30 days, and unattached load balancers are surfaced monthly. A cleanup process exists with a defined SLA. Resources persisting beyond the SLA are escalated to an owner or terminated.

21. Spot and committed use discounts are applied where appropriate. Reserved Instances, Savings Plans (AWS), Azure Reserved VM Instances, or GCP Committed Use Discounts are applied to stable baseline workloads. Coverage targets (recommended: 70–80% of predictable compute) are tracked and reviewed biannually.

22. Cost is allocated to teams and products, not just accounts. Chargeback or showback reporting is operational. Every team can see the cost of the resources they own, attributed through tags, accounts, or subscriptions. Cost allocation reports are shared with engineering managers monthly.

Drift & Observability Controls

Drift — the gap between what your IaC says exists and what actually exists in the cloud — silently accumulates in every environment. These controls make drift visible and keep it bounded.

23. Drift detection runs on a regular schedule. Automated drift detection (Terraform plan in detect mode, env0 drift detection, or equivalent) runs against all production environments at least daily. Drift findings are routed to the owning team with a remediation SLA.

24. Console changes are blocked or alerted on in production. In production environments, either direct console changes are blocked by SCP/policy, or CloudTrail-based alerting fires within minutes when a resource is modified outside of IaC. "Shadow ops" in production is treated as an incident.

25. All environments have a known desired state. Every environment managed by the platform team has a corresponding IaC definition in a repository. Environments with no IaC definition are treated as unmanaged and flagged for remediation. "ClickOps" environments are not acceptable in any tier above dev.

26. Cloud resource inventory is maintained and current. A complete, up-to-date inventory of cloud resources exists — whether from a CMDB, cloud asset service (AWS Config, Azure Resource Graph, GCP Asset Inventory), or IaC-derived catalog. Inventory staleness is monitored; gaps trigger investigation.

Audit & Reporting Controls

Governance without evidence isn't governance. These controls ensure your posture is documented, reviewable, and defensible.

27. API activity logging is enabled across all accounts and regions. CloudTrail (AWS), Azure Activity Log, or GCP Audit Logs are enabled in all accounts and regions, including management/root accounts. Logs are stored in a centralized, tamper-resistant location with a minimum 12-month retention period.

28. Security findings have owners and resolution SLAs. Every finding from security scanners, cloud security posture management (CSPM) tools, or manual reviews is assigned to a named owner within 48 hours. SLAs by severity are defined and tracked: CRITICAL (24h), HIGH (7d), MEDIUM (30d), LOW (90d).

29. Governance posture is reported to leadership on a regular cadence. A monthly or quarterly governance report is delivered to engineering leadership. It covers: open findings by severity, SLA compliance, cost vs. budget, drift incidents, and compliance framework coverage. The report is data-driven, not anecdotal.

30. Disaster recovery and incident response procedures are tested. Runbooks for common incident types (credential leak, misconfiguration in production, cost spike) exist and are tested at least annually. State backup and recovery procedures for Terraform/OpenTofu state files are documented and have been exercised.

How env0 Automates This Checklist

Running these 30 controls manually across multiple cloud accounts, IaC frameworks, and teams is operationally expensive. env0 is a deployment and governance platform that automates a significant portion of this checklist without requiring teams to build and maintain custom tooling.

Policy & compliance (Controls 1–8)

env0 integrates security scanning directly into deployment pipelines. Checkov and tfsec run automatically on every deployment, with configurable thresholds that block applies on HIGH or CRITICAL findings. OPA policy enforcement runs against the plan file before every apply — policies are centrally managed and applied consistently across all environments and teams. No per-repo pipeline configuration required.

Access & RBAC (Controls 9–16)

env0 provides role-based access control at the organization, project, and environment level. OIDC federation is built in — no stored cloud credentials in CI. Deployment approvals are enforced as required gates: specific environments (staging, production) require named approvers before any apply proceeds, with a full audit trail of who approved what and when.

Cost & FinOps (Controls 17–22)

env0 displays cost estimates before every deployment using Infracost integration, so engineers see projected cost impact before applying. Budget alerts and cost allocation by team, project, and environment are surfaced in the platform dashboard. Idle environment detection flags environments that haven't had a deployment in a configurable period, prompting review or teardown.

Drift & observability (Controls 23–26)

env0 runs scheduled drift detection against all managed environments on a configurable schedule. When drift is detected, the owning team receives a notification with the specific resources that have changed and the option to remediate (re-apply) or acknowledge the change. Drift history is logged, providing a record of when environments diverged and how.

Audit & reporting (Controls 27–30)

Every deployment event in env0 — plan, apply, approval, policy evaluation, drift detection — is logged with actor, timestamp, environment, and outcome. Audit logs are exportable and can be forwarded to SIEM tools. Governance reporting dashboards aggregate findings, deployment activity, cost, and policy compliance across the full estate, providing the data needed for leadership reporting without manual aggregation.

‍

in this post

Most cloud governance failures aren't architectural — they're operational. Open ports that should have been closed, IAM roles that accumulated permissions over time, budgets that nobody was watching. This checklist covers the 30 controls that platform and security teams consistently find missing during cloud audits, organized by domain so you can work through them systematically. Use it as a gap assessment, a new-environment setup checklist, or an audit preparation tool. Policy & Compliance Controls These controls ensure your infrastructure is defined, deployed, and enforced according to rules your organization has agreed on — before resources reach production. 1. All infrastructure changes go through code review. No resource is created or modified by clicking in the console. Every change starts as a pull request against an IaC repository, reviewed by at least one other team member before merge and deploy. 2. Infrastructure scanning runs on every PR. A security scanner (tfsec, Checkov, or equivalent) runs automatically on every infrastructure pull request. HIGH and CRITICAL findings block merge. Results are visible in the PR interface, not just in a separate dashboard. 3. OPA or Sentinel policies are enforced before apply. Policy-as-code runs against the Terraform or OpenTofu plan before apply executes. Policies encode organizational requirements — mandatory tags, prohibited resource types, approved regions — not just generic security rules. 4. Resource tagging is enforced, not suggested. Required tags (Environment, Owner, CostCenter, Team) are validated by policy. Resources missing mandatory tags are blocked at deploy time, not flagged after the fact. Tag enforcement applies to all cloud providers in scope. 5. Approved module library is in use. Teams source infrastructure from a curated module library (internal registry, Terraform Registry with pinned versions, or a private repository). Direct resource authoring outside approved modules requires review and justification. 6. Compliance framework mapping is documented. Each active security control is mapped to at least one compliance framework (CIS Benchmarks, SOC 2, PCI-DSS, HIPAA, ISO 27001 — whichever applies). Unmapped controls are reviewed quarterly for relevance. 7. Environment promotion gates are enforced. Code must pass through defined environment stages (dev → staging → production) with required approvals at each gate. Emergency bypass procedures exist but are logged and reviewed. 8. Secrets are never stored in IaC code or state files. Terraform state files containing sensitive values are encrypted at rest. Secrets are referenced from a secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) rather than stored in tfvars files or environment variables in CI. Access & RBAC Controls Overpermissioned identities — human and machine — are the most common source of cloud security incidents. These controls minimize the blast radius of credential compromise or insider threat. 9. Least-privilege IAM is applied to all identities. Every IAM role, service account, and user has only the permissions required for its stated function. Wildcard actions (s3:*, iam:*) are absent except in explicitly justified break-glass roles. Permissions are reviewed quarterly. 10. No long-lived human credentials exist. No IAM users with programmatic access keys are used for human access. Engineers authenticate via SSO (AWS IAM Identity Center, Azure AD, Google Workspace) and assume roles with time-limited sessions. Static access keys are treated as a P1 finding. 11. MFA is enforced for all cloud console access. Multi-factor authentication is mandatory for all human users with cloud console access, including read-only roles. MFA enforcement is applied at the identity provider level, not per-account. 12. CI/CD pipelines use short-lived credentials. Pipelines authenticate using OIDC federation (GitHub Actions, GitLab, etc.) rather than stored IAM keys. Each pipeline has its own scoped role with the minimum permissions needed to deploy its specific workload. 13. Privileged access requires approval and is time-bound. Break-glass and production-write access is gated behind an approval workflow. Sessions are time-limited (4–8 hours maximum). All privileged access events are logged with the requestor, approver, duration, and actions taken. 14. Service-to-service access uses workload identity. Applications authenticate to cloud services using workload identity (IAM Roles for Service Accounts, Azure Managed Identities, GCP Workload Identity Federation) rather than embedded credentials or instance profiles with broad permissions. 15. Unused permissions and roles are removed regularly. IAM Access Analyzer, Azure Advisor, or equivalent tooling runs monthly to surface unused permissions and roles. Findings are remediated within a defined SLA (recommended: 30 days for unused, 7 days for overpermissioned active roles). 16. Cross-account and cross-tenant access is inventoried. All trust relationships that allow one account, subscription, or project to access resources in another are documented and reviewed quarterly. External trust relationships (third-party tools, vendor access) require explicit approval and expiry dates. Cost & FinOps Controls Cloud cost overruns rarely happen because of a single bad decision — they accumulate through unreviewed resources, missing guardrails, and no one watching the numbers. These controls create the visibility and accountability loops that prevent surprises. 17. Budget alerts are configured for every account and environment. Every cloud account and environment has at least one budget alert at 80% and 100% of the monthly target. Alerts notify the responsible team via email and a monitored Slack channel. Alerts without a named owner are treated as misconfigured. 18. Cost anomaly detection is enabled. Cloud-native anomaly detection (AWS Cost Anomaly Detection, Azure Cost Alerts, GCP Budget Alerts) is configured to surface unexpected spend spikes — not just threshold breaches. Alerts route to the FinOps or platform team within 24 hours of detection. 19. Resource rightsizing is reviewed quarterly. Compute and database resources are reviewed for utilization vs. provisioned capacity on a quarterly basis. Recommendations from cloud advisor tools (AWS Compute Optimizer, Azure Advisor, GCP Recommender) are tracked as actionable items with owners. 20. Idle and orphaned resources are identified and cleaned up. Unattached EBS volumes, unused Elastic IPs, stopped instances older than 30 days, and unattached load balancers are surfaced monthly. A cleanup process exists with a defined SLA. Resources persisting beyond the SLA are escalated to an owner or terminated. 21. Spot and committed use discounts are applied where appropriate. Reserved Instances, Savings Plans (AWS), Azure Reserved VM Instances, or GCP Committed Use Discounts are applied to stable baseline workloads. Coverage targets (recommended: 70–80% of predictable compute) are tracked and reviewed biannually. 22. Cost is allocated to teams and products, not just accounts. Chargeback or showback reporting is operational. Every team can see the cost of the resources they own, attributed through tags, accounts, or subscriptions. Cost allocation reports are shared with engineering managers monthly. Drift & Observability Controls Drift — the gap between what your IaC says exists and what actually exists in the cloud — silently accumulates in every environment. These controls make drift visible and keep it bounded. 23. Drift detection runs on a regular schedule. Automated drift detection (Terraform plan in detect mode, env0 drift detection, or equivalent) runs against all production environments at least daily. Drift findings are routed to the owning team with a remediation SLA. 24. Console changes are blocked or alerted on in production. In production environments, either direct console changes are blocked by SCP/policy, or CloudTrail-based alerting fires within minutes when a resource is modified outside of IaC. "Shadow ops" in production is treated as an incident. 25. All environments have a known desired state. Every environment managed by the platform team has a corresponding IaC definition in a repository. Environments with no IaC definition are treated as unmanaged and flagged for remediation. "ClickOps" environments are not acceptable in any tier above dev. 26. Cloud resource inventory is maintained and current. A complete, up-to-date inventory of cloud resources exists — whether from a CMDB, cloud asset service (AWS Config, Azure Resource Graph, GCP Asset Inventory), or IaC-derived catalog. Inventory staleness is monitored; gaps trigger investigation. Audit & Reporting Controls Governance without evidence isn't governance. These controls ensure your posture is documented, reviewable, and defensible. 27. API activity logging is enabled across all accounts and regions. CloudTrail (AWS), Azure Activity Log, or GCP Audit Logs are enabled in all accounts and regions, including management/root accounts. Logs are stored in a centralized, tamper-resistant location with a minimum 12-month retention period. 28. Security findings have owners and resolution SLAs. Every finding from security scanners, cloud security posture management (CSPM) tools, or manual reviews is assigned to a named owner within 48 hours. SLAs by severity are defined and tracked: CRITICAL (24h), HIGH (7d), MEDIUM (30d), LOW (90d). 29. Governance posture is reported to leadership on a regular cadence. A monthly or quarterly governance report is delivered to engineering leadership. It covers: open findings by severity, SLA compliance, cost vs. budget, drift incidents, and compliance framework coverage. The report is data-driven, not anecdotal. 30. Disaster recovery and incident response procedures are tested. Runbooks for common incident types (credential leak, misconfiguration in production, cost spike) exist and are tested at least annually. State backup and recovery procedures for Terraform/OpenTofu state files are documented and have been exercised. How env0 Automates This Checklist Running these 30 controls manually across multiple cloud accounts, IaC frameworks, and teams is operationally expensive. env0 is a deployment and governance platform that automates a significant portion of this checklist without requiring teams to build and maintain custom tooling. Policy & compliance (Controls 1–8) env0 integrates security scanning directly into deployment pipelines. Checkov and tfsec run automatically on every deployment, with configurable thresholds that block applies on HIGH or CRITICAL findings. OPA policy enforcement runs against the plan file before every apply — policies are centrally managed and applied consistently across all environments and teams. No per-repo pipeline configuration required. Access & RBAC (Controls 9–16) env0 provides role-based access control at the organization, project, and environment level. OIDC federation is built in — no stored cloud credentials in CI. Deployment approvals are enforced as required gates: specific environments (staging, production) require named approvers before any apply proceeds, with a full audit trail of who approved what and when. Cost & FinOps (Controls 17–22) env0 displays cost estimates before every deployment using Infracost integration, so engineers see projected cost impact before applying. Budget alerts and cost allocation by team, project, and environment are surfaced in the platform dashboard. Idle environment detection flags environments that haven't had a deployment in a configurable period, prompting review or teardown. Drift & observability (Controls 23–26) env0 runs scheduled drift detection against all managed environments on a configurable schedule. When drift is detected, the owning team receives a notification with the specific resources that have changed and the option to remediate (re-apply) or acknowledge the change. Drift history is logged, providing a record of when environments diverged and how. Audit & reporting (Controls 27–30) Every deployment event in env0 — plan, apply, approval, policy evaluation, drift detection — is logged with actor, timestamp, environment, and outcome. Audit logs are exportable and can be forwarded to SIEM tools. Governance reporting dashboards aggregate findings, deployment activity, cost, and policy compliance across the full estate, providing the data needed for leadership reporting without manual aggregation.

Cloud Governance Checklist: 30 Controls Every Platform Team Should Have