May 4, 2025

#27. Govern Your Storage Cost — Part 1: Storage Fundamentals

Understanding storage services to set the right governance strategy

When looking at services driving cloud spend, storage and compute typically top the list. Whether you realize it or not, storage costs are often blended or hidden within different services (e.g., databases, content delivery, backup systems). Despite this, companies and large organizations rarely discuss a governance strategy for costs—and then wonder why their cloud bill is skyrocketing.

In this article, we will explore storage fundamentals to better understand which storage types suit different scenarios. In the next article, we will examine how to build a strategy to govern storage in an organization.

Understanding Storage Categories

Storage is an abstract term given to different services and mediums, but ultimately we can classify storage by three fundamental categories.

Object StorageObject storage is designed for unstructured data like files, images, videos, and backups. Object storage typically comes in multiple tiers ranging from frequently-accessed standard storage to rarely-accessed archival storage, allowing cost optimization based on access patterns and retention requirements.

Block StorageBlock storage provides the fundamental building blocks for persistent storage in cloud environments. Think of it as aIt delivers raw storage volumes that can be formatted with different technologies to serve various needs. This includes direct-attached volumes for single instances, network file systems for multi-instance access, and database storage engines. Block storage focuses on performance, reliability, and direct control of the storage medium.

Cache StorageCache storage leverages high-speed memory (RAM) for temporary data storage, dramatically reducing access latency. Unlike persistent storage options, cache is designed for transient data that requires extremely fast access, such as session information, frequently queried database results, and application acceleration. Cache storage trades durability for speed, complementing rather than replacing persistent storage solutions.

From a fundamental infrastructure perspective, all storage services fall into one of these three categories. For example, Network File Systems (e.g., EFS, Azure Files) are multi-instance block storage pre-formatted with a network file system that follows network protocols (NFS, SMB). Similarly, Direct Block Storage (e.g., EBS, Google Persistent Disks) is block storage typically attached to a single compute instance, where the operating system formats it with a file system. The same applies to storage used for databases, which formats block storage to enable fast reads and writes.

Backups vs Snapshots

Backups and snapshots are usually stored in object storage services (i.e., s3, blob or google storage). However, its done automatically behind the scenes without requiring direct user management. Which means, you are charged for the service as is, without splitting the extra storage costs. The question is, when to use what?

Snapshots

point-in-time captures of storage volumes that preserve the exact state of data at a specific moment. They're stored as incremental changes after the initial full snapshot, capturing only data blocks that have changed since the previous snapshot. Snapshots are ideal for short-term protection, quick recoveries, and providing a foundation for cloning environments.

Backups

typically more comprehensive, often including application-consistent data copies with metadata and potentially multiple systems in a single backup set. While snapshots generally remain within the same storage system, backups are frequently stored in separate locations or even off-site for disaster recovery purposes. Backups usually include specific retention policies and may involve different technologies like backup software.

So how does this affect pricing?

Let's imagine a scenario where we have a 100GB database that grows by 5% (5GB) daily. What would be the total storage size for snapshots versus backups after 30 days? Here's a simple calculation:

Snapshots

  • Day 1: Full snapshot = 100 GB
  • Days 2-30: Daily delta of 5 GB each day = 29 days × 5 GB = 145 GB
  • Total snapshot storage: 100 GB + 145 GB = 245 GB

Backup Storage

  • Week 1 (Day 1): 100 GB
  • Week 2 (Day 8): 100 GB + (7 × 5 GB) = 135 GB
  • Week 3 (Day 15): 100 GB + (14 × 5 GB) = 170 GB
  • Week 4 (Day 22): 100 GB + (21 × 5 GB) = 205 GB
  • Week 5 (Day 29): 100 GB + (28 × 5 GB) = 240 GB
  • Total backup storage: 100 + 135 + 170 + 205 + 240 = 850 GB

Clearly, even though we take snapshots more frequently, they consume significantly less storage since they only capture changed data blocks. In contrast, backups store a complete copy of the data each time we create one.

Block Storage VS File Storage vs Network Storage

Block Storage (e.g., Amazon EBS)

Block storage works by breaking down data into smaller, equal-sized pieces called blocks. Think of it like storing pieces of a puzzle, where each piece has its own label. This type of storage is great for applications that need fast performance, like databases (MySQL, PostgreSQL, and Oracle). However, there's one limitation: each block storage unit can only connect to one EC2 server at a time in the same location (Availability Zone). However, you can still attach multiple storage units to a single server if you need more space.

File Storage (e.g., Amazon EFS)

File storage organizes data in a hierarchical structure of files and folders, providing a familiar interface to users. EFS allows thousands of EC2 instances to simultaneously mount the same file system across different Availability Zones within the same region. All instances share access to the same data, making it ideal for web serving applications, containerized applications (Kubernetes, Docker), and content management systems that require shared data access.

Network Storage

Network storage provides storage resources over a network and typically includes two types: NAS (Network-Attached Storage), which provides file-level storage accessed over standard ethernet networks, allowing multiple clients/servers to connect simultaneously and provide shared file access across a network.

SAN (Storage Area Network), which provides block-level storage over specialized high-speed networks designed for storage traffic. Unlike NAS, traditional SANs typically restrict each volume to connecting with only one host at a time

Summary

This article covered the fundamentals of cloud storage services. While some services appear similar and can serve the same purpose, their pricing models and usage patterns are fundamentally different. These differences will serve as principles for setting a storage governance strategy, which we will explore in detail in the next article.

Do you like our blog posts? It means a lot to us when you rate (👍, ❤️, 👏) and share them. Thank you so much for your support.

In the Same Category