Oct 20, 2024

#08. Fix the root cause, not the symptoms: Cloud Governs vs Cost Optimizations

Do you have continuous cycles of cost optimizations? Then you are fixing the problem symptoms, not the root cause.

One common problem many organizations face is the cycle of cost optimizations. It typically runs every month after each cloud bill arrives. FinOps practitioners often prepare a long list of cloud usage optimizations and send it to the application teams for implementation. Once some of these optimizations are set, the development team returns to their normal routines, inadvertently creating more cloud waste—and so the cycle continues.

Is this cycle avoidable? Well, perhaps not 100%, but to a large extent, yes! What you need is cloud policy and governance in place.

TL;DR

  • Instead of doing multiple cycles of optimizations, setting cloud usage guardrails will result in cost avoidance.
  • Governance policies should be set through automation and code.

Cloud Policy and Governance

Definition (the What?)

Governance is a set of processes (e.g., meetings between different stakeholders, or documentation of business requirements) to ensure guardrails for cloud spend are in place. These guardrails consist of cloud policies (e.g., Cloud IAM  or Identity Access Management policies) typically implemented in code and deployed automatically to enforce them.

Purpose (the Why?)

Doesn't limiting cloud use go against its main purpose? Isn't the cloud meant to encourage creativity and new ideas?

Developers need some freedom to use cloud resources, but this freedom must come with responsibility. Consider this: if every developer had an unmonitored cloud sandbox account, what would prevent them from misusing it—say, for cryptocurrency mining? While this is an extreme and unlikely scenario, it illustrates a real risk. Without proper oversight, developers might inadvertently spin up oversized resources that lack a genuine business purpose and leave them running unchecked.

So cloud policy and governance follows the business needs. It should simply define what seems reasonable to fulfill those needs and set it as the standard (default). If the need arises, these restrictions can be lifted to cover demands or facilitate innovation.

Implement it (the How?)

Let's start by making our cloud policies match what the business needs. This means our governance plan will involve everyone who helps create these business rules. We gather these rules from different groups, including:

  • General IT, Security and Compliance Group:  general rules set by the organization’s business (e.g., customer data should be backed up in multiple zones)
  • Center of Cloud Excellence Group: ****the general recommendations and standards that meets the business needs (e.g., which EC2 families, size, operating systems, and regions to be recommended as standard).
  • Application Teams Group: The specific needs demanded by the business applications (e.g., load balancer type, database type, or licenses needed).

The FinOps function in the organization should collect all these requirements and coordinate among the entities to produce a set of cloud policies. These policies should then be passed to the CCoE or Cloud Foundation team, who can implement them as code (e.g., Terraform templates of Cloud IAM policies) and deploy them to the general cloud infrastructure.

This diagram illustrates a governance model showing how the FinOps team collects various requirements from different entities before application teams are allowed to deploy cloud resources. This governance enables the CCoE or cloud foundation to implement specific restrictions on cloud resources, aligning their usage with business needs. As a result, it helps avoid cloud waste and achieve cost savings without requiring further cost optimizations.

Note: Cloud policies don't need to be overly comprehensive or cover all cloud resources. Usually, it's enough to analyze the main 3–5 services driving around 80% of the spend and govern them. The rest is just fine-tuning.

Results

The more cloud policies you set for the organization, the less you need to worry about optimizing usage. For instance, if a policy restricts oversized instances, you don't have to worry about rightsizing. Similarly, if a policy enforces shutting down EC2s on weekends, you don't have to worry about idle instances. These policies automatically align cloud usage with business value and resolve potential conflicts between engineers, business teams, and finance departments.

Summary

FinOps helps organizations set guardrails for cloud usage through cloud policies and governance. These policies question the business value of cloud resources and restrict cloud waste, thus becoming a proactive measure rather than the reactive approach of cost optimization.

Thanks for reading! Share if you found it helpful. Have questions or suggestions for future topics? We'd love to hear from you!