Block, File, Object - Demystifying the Big Three of Storage Systems

May 13, 2024 — #interview #SDE

Hey data hoarders and system architects! In the digital world, data is king, and where and how we store that data is a decision of monumental importance. From the fleeting bits in your computer's lightning-fast RAM to the petabytes archived in the cloud, there's a whole hierarchy of storage. But when we talk about persistent storage for our applications and systems, things generally boil down to a few key architectural approaches.

You've likely heard terms like Block, File, and Object storage thrown around. While they all store data, they do so in fundamentally different ways, each tailored for specific needs regarding performance, accessibility, structure, cost, and scale. Today, let's explore these three broad categories of storage systems and understand when you might choose one over the others.

The Foundation: Why Different Storage Systems?

Before diving into the types, it's good to remember the general storage hierarchy: ultra-fast, small, expensive memory (like CPU registers and RAM) sits at the top, while slower, larger, and cheaper options (like SSDs, HDDs, and cloud object storage) form the lower tiers. This hierarchy exists because different applications and data types have wildly different requirements. The "big three" persistent storage types we're discussing today represent different ways to organize and access data at the more persistent end of this spectrum.

Diving into the Main Categories of Storage Systems

Storage systems generally fall into three broad categories.

1. Block Storage: The Raw Building Blocks

Think of block storage as the foundational layer, offering raw storage capacity to servers.

What it is: Block storage was the first to emerge, with its origins in the 1960s. It provides data storage in fixed-size chunks called "blocks," which are then organized into "volumes" presented to a server. Common examples include your computer's internal Hard Disk Drives (HDDs) or Solid-State Drives (SSDs) when they are physically attached to servers.
How it works: The server's operating system (or a specialized application like a database management system or a virtual machine engine) has direct control over these raw blocks and can format them to create a file system or manage them directly for performance. It's the most flexible and versatile form of storage.
Accessibility: While often physically attached, block storage can also be network-attached using protocols like iSCSI (Internet Small Computer System Interface) or Fibre Channel (FC). Even when networked, it still presents itself to the server as raw blocks. Typically, block storage is fully owned and accessed by a single server at a time; it's not inherently designed for simultaneous shared access by multiple servers in the same way file storage is.
Good for: Virtual machines (VMs), high-performance applications like databases that need direct disk access.

2. File Storage: Organized and Sharable

File storage is likely the type of storage you're most familiar with from everyday computer use.

What it is: File storage is built on top of underlying block storage. It provides a higher-level abstraction, organizing data into files and hierarchical directories (folders). It's the most common general-purpose storage solution.
How it works: Users and applications interact with data through file paths (e.g., /home/user/documents/report.docx). The file storage system manages the translation of these file operations into block-level operations.
Accessibility: A major advantage of file storage is its ability to be shared. It can be made accessible to a large number of servers (and users) simultaneously using common file-level network protocols such as SMB/CIFS (Server Message Block/Common Internet File System, often used in Windows environments) and NFS (Network File System, common in Unix/Linux environments). Servers accessing this type of storage don't need to deal with the complexities of managing raw blocks or formatting volumes.
Good for: General-purpose file system access, sharing files and folders within an organization.

3. Object Storage: Built for Scale and Durability

Object storage is a newer paradigm, especially popular in cloud environments, designed to handle massive amounts of data with a different set of trade-offs.

What it is: Object storage is new and makes a very deliberate tradeoff to sacrifice performance for high durability, vast scale, and low cost. It targets relatively "cold" (less frequently accessed) data and is mainly used for archival and backup.
How it works: It stores all data as "objects" in a flat address space – there's no hierarchical directory structure like file systems. Each object typically consists of the data itself (the payload), a variable amount of metadata (descriptive information about the object), and a globally unique identifier (the object ID).
Accessibility: Data access is normally provided via a RESTful HTTP-based API. This makes it highly accessible from anywhere on the internet (given proper permissions) and well-suited for web-native applications.
Good for: Binary data, unstructured data, backups, archives, data lakes, cloud-native applications, static website hosting.
Examples: Prominent public cloud examples include AWS S3, Google Cloud Storage, and Azure Blob Storage.

At a Glance: Block vs. File vs. Object Storage

To help summarize the key distinctions, here’s a comparison table:

Feature	Block Storage	File Storage	Object Storage
Mutable Content	Yes	Yes	No (object versioning is supported, in-place update is not)
Cost	High	Medium to high	Low
Performance	Medium to high, very high (especially with SSDs)	Medium to high	Low to medium (relatively slow compared to other types)
Consistency	Strong consistency	Strong consistency	Strong consistency (though some systems might offer eventual for certain ops)
Data Access	SAS/iSCSI/FC	Standard file access, CIFS/SMB, and NFS	RESTful API
Scalability	Medium scalability	High scalability	Vast scalability
Good For	Virtual machines (VM), high-performance applications like databases	General-purpose file system access	Binary data, unstructured data, backups, archives, data lakes

Key Takeaways

Block Storage offers raw, high-performance access ideal for databases and VMs, but is typically managed by a single server.
File Storage provides a familiar, hierarchical structure for organizing and sharing files across a network, great for collaborative environments.
Object Storage is built for massive scale, durability, and cost-effectiveness, accessed via APIs, making it perfect for cloud-native applications, backups, archives, and data lakes.

Choosing the right storage system is a crucial architectural decision. Understanding these fundamental categories, their characteristics, and how they compare will help you select the most appropriate foundation for your data's needs, whether you're optimizing for raw speed, easy sharing, or planet-scale data archiving.