A Closer Look at Cloud FinOps

When I was first introduced to Agile development, it felt like a natural flow for developers and business stakeholders to collaborate and deliver functionality in short iterations. It was rewarding (and sometimes disappointing) to demo features every two weeks and get direct feedback from users. Continuous Integration tools matured to make the delivery process more automated and consistent. However, the operations team was left out of this process. Environment provisioning, maintenance, exception handling, performance monitoring, security – all these aspects were typically deprioritized in favor of keeping the feature release cadence. The DevOps/DevSecOps movement emerged as a cultural and technical answer to this dilemma, advocating for a much closer relationship between development and operations teams.

Today, companies are rapidly expanding their cloud infrastructure footprint. What I’ve heard from discussions with customers is that the business value driven by the cloud is simply too great to ignore. However, much like the relationship between development and ops teams during the early Agile days, a gap is forming between Finance and DevOps teams. Traditional infrastructure budgeting and planning doesn’t work when you’re moving from a CapEx to OpEx cost structure. Engineering teams can provision virtually unlimited cloud resources to build solutions, but cost accountability is largely ignored. Call it the pandemic cloud spend hangover.

Our customers see the flexibility of the cloud as an innovation driver rather than simply an expense. But they still need to understand the true value of their cloud spend – which products or systems are operating efficiently? Which ones are wasting resources?

I decided to look into FinOps practices to discover techniques for optimizing cloud spend. I researched the FinOps Foundation and read the book, Cloud FinOps. Much like the DevOps movement, FinOps seeks to bring cross-functional teams together before cloud spend gets out of hand. It encompasses both cultural and technical approaches.

Here are some questions that I had before and the answers that I discovered from my research:

Where do companies start with FinOps without getting overwhelmed by yet another oversight process?

Start by understanding where your costs are allocated. Understand how the cloud provider’s billing details are laid out and seek to apply the correct costs to a business unit or project team. Resource tagging is an essential first step to allocating costs. The FinOps team should work together to come up with standard tagging guidelines.

Don’t assume the primary goal is cost savings. Instead, approach FinOps as a way to optimize cloud usage to meet your business objectives. Encourage reps from engineering and finance to work together to define objectives and key results (OKRs). These objectives may be different for each team/project and should be considered when making cloud optimization recommendations. For example, if one team’s objective is time-to-market, then costs may spike as they strive to beat the competition.

What are some common tagging/allocation strategies?

Cloud vendors provide granular cost data down to the millisecond of usage. For example, AWS Lambda recently went from rounding to the nearest 100ms of duration to the nearest millisecond. However, it’s difficult to determine what teams/projects/initiatives are using which resources and for how long. For this reason, tagging and cost allocation are essential to FinOps.

According to the book, there are generally two approaches for cost allocation:

Tagging – these are resource-level labels that provide the most granularity.

Hierarchy-based – these are at the cloud account or subscription-level. For example, using separate AWS accounts for prod/dev/test environments or different business units.

Their recommendation is to start with hierarchy-based allocations to ensure the highest level of coverage. Tagging is often overlooked or forgotten by engineering teams, leading to unallocated resources. This doesn’t suggest skipping tags, but make sure you have a consistent strategy for tagging resources to set team expectations.

How do you adopt a FinOps approach without disrupting the development team and slowing down their progress?

The nature of usage-based cloud resources puts spending responsibility on the engineering team since inefficient use can affect the bottom line. This is yet another responsibility that “shifts left”, or earlier in the development process. In addition to shifting left on security/testing/deployment/etc., engineering is now expected to monitor their cloud usage. How can FinOps alleviate some of this pressure so developers can focus on innovation?

Again, collaboration is key. Demands to reduce cloud spend cannot be a one-way conversation. A key theme in the book is to centralize rate reduction and decentralize usage reduction (cost avoidance).

Engineering teams understand their resource needs so they’re responsible for finding and reducing wasted/unused resources (i.e., decentralized).
Rate reduction techniques like using reserved instances and committed use discounts are best handled by a centralized FinOps team. This team takes a comprehensive view of cloud spend across the organization and can identify common resources where reservations make sense.

Usage reduction opportunities, such as right sizing or shutting down unused resources, should be identified by the FinOps team and provided to the engineering teams. These suggestions become technical debt and are prioritized along with other work in the backlog. Quantifying the potential savings of a suggestion allows the team to determine if it’s worth spending the engineering hours on the change.

https://www.finops.org/projects/encouraging-engineers-to-take-action/

How do you account for cloud resources that are shared among many different teams?

Allocating cloud spend to specific teams or projects based on tagging ensures that costs are distributed fairly and accurately. But what about shared costs like support charges? The book provides three examples for splitting these costs:

Proportional – Distribute proportionally based on each team’s actual cloud spend. The more you spend, the higher the allocation of support and other shared costs. This is the recommended approach for most organizations.
Evenly – split evenly among teams.
Fixed – Pre-determined fixed percentage for each team.

Overall, I thought the authors did a great job of introducing Cloud FinOps without overwhelming the reader with another rigid set of practices. They encourage the Crawl/Walk/Run approach to get teams started on understanding their cloud spend and where they can make incremental improvements. I had some initial concerns about FinOps bogging down the productivity and innovation coming from engineering teams. But the advice from practitioners is to provide data to inform engineering about upward trends and cost anomalies. Teams can then make decisions on where to reduce usage or apply for discounts.

The cloud providers are constantly changing, introducing new services and cost models. FinOps practices must also evolve. I recommend checking out the Cloud FinOps book and the related FinOps Foundation website for up-to-date practices.