Logo

dev-resources.site

for different kinds of informations.

Research Paper Series: Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3

Published at
1/13/2025
Categories
aws
cloud
softwaredevelopment
testing
Author
sf_1997
Author
7 person written this
sf_1997
open
Research Paper Series: Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3

The paper "Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3" presents a pragmatic approach to ensuring the correctness of ShardStore, a key-value storage node in Amazon S3.

A bit on ShardStore

ShardStore, a key-value storage node in Amazon S3, plays a critical role in efficiently storing and retrieving objects by organizing data into extents. Here’s how it works and interacts with extents:

Key Components and Workflow

  1. ShardStore Overview:

    • ShardStore serves as a layer in the S3 architecture, responsible for handling object storage within shards.
    • Data is partitioned into shards to ensure scalability, fault isolation, and efficient data management.
  2. Interaction with Extents:

    • Extents are fixed-size blocks of data, typically spanning multiple megabytes. They are the primary storage units managed by ShardStore.
    • Each extent contains:
      • Data for multiple objects or object fragments.
      • Metadata for efficiently locating and retrieving specific pieces of data.
  3. Key-Value Abstraction:

    • ShardStore exposes a key-value interface, where the key maps to a specific object or fragment, and the value references its location within extents.
    • This abstraction decouples the logical organization of data from its physical storage, allowing ShardStore to optimize for performance and durability.
  4. Write Workflow:

    • When new data is written to ShardStore:
      1. The data is assigned a key and appended to an available extent.
      2. Metadata is updated to reflect the key-to-extent mapping.
      3. The extent, now containing the new data, is persisted to disk and replicated for fault tolerance.
  5. Read Workflow:

    • To read data:
      1. ShardStore uses the key to locate the corresponding extent and offset.
      2. The relevant extent is loaded from disk (or memory if cached).
      3. Data is extracted and returned to the client.
  6. Crash Consistency and Concurrency:

    • ShardStore employs techniques such as journaling and atomic updates to ensure crash consistency during writes.
    • Concurrent read and write operations are managed using fine-grained locking and careful metadata updates to prevent conflicts.
  7. Extent Lifecycle Management:

    • Garbage Collection: As objects are deleted or overwritten, extents containing obsolete data are compacted to reclaim space.
    • Replication: Extents are replicated across multiple nodes for durability and availability.

LSM Tree chunk reclamation

  • Load Balancing: ShardStore dynamically moves extents between nodes to balance storage and compute load.

Benefits of Using Extents in ShardStore

  • Scalability: Extents enable efficient storage of millions of objects within a shard.
  • Performance: Sequential writes to extents reduce disk I/O overhead, improving throughput.
  • Fault Isolation: By partitioning data into shards and extents, failures in one area are less likely to impact the entire system.

This interaction between ShardStore and extents underpins the scalability, durability, and performance of Amazon S3, making it a key innovation in distributed storage systems.

Back to validations

Traditional formal verification methods can be resource-intensive and challenging to maintain, especially in large-scale, evolving systems. To address this, the authors advocate for "lightweight formal methods," emphasizing automation, usability, and continuous validation alongside ongoing software development.

A central aspect of their approach is the development of executable reference models that serve as specifications against which the implementation is validated. These models are written in the same programming language as the implementation (Rust), facilitating their integration into the development process and enabling engineers to maintain them as the system evolves.

The authors decompose correctness into independent properties, each verified using the most appropriate tool. For instance, property-based testing is employed to ensure that the implementation conforms to the reference model under various scenarios, including crash consistency and concurrent executions. This method has been effective in identifying subtle bugs that might have been missed through traditional testing methods.

By integrating these lightweight formal methods into the engineering workflow, the team has prevented 16 issues from reaching production, including complex crash consistency and concurrency problems. Notably, the approach has been adopted by non-formal-methods experts, with engineers contributing to the development and maintenance of the reference models, demonstrating the practicality and scalability of the method in a real-world, large-scale system like Amazon S3.

This work was recognized with a best-paper award at the ACM Symposium on Operating Systems Principles (SOSP) in 2021, highlighting its significance in the field of automated reasoning and formal methods.

For a more in-depth understanding, you can access the full paper here.

cloud Article's
30 articles in total
Favicon
What is Cloud Service Providers? Types, Benefits, & Examples
Favicon
DEPLOYING A WEB APPLICATION WITH ARM TEMPLATE AND AZURE CLI
Favicon
Top 7 Kubernetes Certifications in 2025
Favicon
Configuring Public IP addresses in Azure
Favicon
Kubernetes on Hybrid Cloud: Talos network
Favicon
Cloud computing can be confusing, but it doesn't have to be! ☁️🤔 In the latest episode of Cloud in List of Threes (CiLoTs), I’m serving up easy-to-digest (pun intended 🤭) explanations analogy to explain Regions, Availability Zones, and Edge Locations
Favicon
From Regions to Edge Locations: A CiLoTs Guide to Cloud Infrastructure
Favicon
Unlocking AI Potential: Simplifying Generative AI with AWS Bedrock
Favicon
Want to learn a top skill in 2025? Then check out Cloud Computing.
Favicon
🤖 DevOps-GPT: Automating SRE Resolutions with AI-Powered Agents and Insights 🤖
Favicon
Joining the AWS Community Builders Program A Journey of Growth and Collaboration
Favicon
Por onde começar os estudos da AWS - Parte 1
Favicon
Introduction to AWS services & It's carrier opportunities
Favicon
Research Paper Series: Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3
Favicon
Patching Scheduled Auto Scaling Groups with AWS
Favicon
S3 Batch Operations: Simplify Repetitive Tasks
Favicon
Firebase Alternatives to Consider in 2025
Favicon
Key Lift and Shift Migration Use Cases for Cloud Migration Success
Favicon
AWS, Azure, and GCP: Which provider is best for your needs?
Favicon
open source multi tenant cloud database
Favicon
Iniciando nuevamente...
Favicon
Oracle Cloud’s 25A Release Is Coming: Are You Prepared?
Favicon
Google Cloud Shell: Establishing Secure Connections via SSH
Favicon
Create spot instances on GCP & AWS
Favicon
Kubernetes on Hybrid Cloud: Bare-metal or Hypervisor
Favicon
🌥️ Evolução da Hospedagem em Nuvem
Favicon
Por que e como rodar bancos de dados em diferentes nuvens?
Favicon
What Makes a Good Cloud Architect?
Favicon
Attended Re:invent Recap at aws user group surat
Favicon
MDM Cloud Migration Architecture: The Future of Data Management

Featured ones: