Data Availability is Not Data Storage

Recall this grade school experience: you raise your hand and ask, “Can I go to the bathroom?” To which your teacher responds with “I don’t know. Can you?” Might seem far fetched, but this is a perfect entry point to understanding the difference between data availability and data storage.

Let's bring this analogy close to the subject at hand and say Google Drive is acting like your teacher. You upload a photo, and the next day, you want to show that photo to a friend. You ask Google, “Can you show me the photo I uploaded yesterday?”

Imagine if Google responded, “I mean, I have it available. I can show you that photo,” and then just sent you a face cutout from the photo as proof.

You would rightfully be a bit confused. You asked to download your photo, not proof from Google that they have your photo.

The thing is, that’s the core function data availability blockchains perform. All we ask is that they provide us with proof that the chain has the data available if we need it. We don’t actually want to download all of the data from them unless we have to.

Data availability chains like Avail allow users (other blockchains) to upload data, and at a later date, simply check that all their data is available without actually retrieving the contents of the data itself. 

Read more: Unlocking the Modular Blockchain Future

This is a very different task from what data storage blockchains like Arweave, IPFS, Filecoin, and Sia are asked to perform.

Where decentralized storage chains like Arweave allow end users to store and retrieve files on the Arweave blockchain, Avail is designed to allow other chains to store their chain's activity on the Avail blockchain.

Light clients benefit the most from using Avail. They actually have a goal of never downloading data at all if they don’t have to. The more data they need to download, the more resource intensive it is to be a light client. 

Read more: Understanding Avail & Modular Blockchains through Metaphors

Avail can provide a mathematical proof that, "the data you're looking for is still here if you need it."

While that explains the differences between storage and availability, the question remains: why would you want just a guarantee of availability at all? The answer is security.

Proof that the data is around - that the data is available - is enough for light clients to be certain that no one's hiding any suspicious activity. If it's available, it's definitionally not hidden. Knowing it’s not hidden is all these light clients are looking for, because hidden data is what allows for "data withholding attacks".

What Are Data Withholding Attacks

Data withholding attacks describe a scenario where malicious validators vote to add a block containing invalid, or missing transactions to a chain. While full nodes can immediately see that the block contains an error, light clients can be fooled since they look only at block headers which are written in part by the validators.

One fix would be for light clients to download all of a block's data in order to verify correctness. But this would turn the light client into a full node, increasing resource requirements to participate in the network.

A better fix? Blockchains can upload their transaction data to Avail. Avail processes uploaded data using things like erasure coding, and KZG commitments. In this processing step, light clients are incredibly likely to find missing data by requesting a few random kilobytes from each block.

The process of sampling those few random kilobytes can be thought of as light clients checking to make sure Avail is not lying when it says it has the data available. By sampling, they ask, “Do you have all the transaction data available if I were to need it?”. If the first few samples come back positive, the light client can be statistically certain that the rest of the data is there if needed.

This lets light clients reach guarantees of data availability all on their own without the need to trust validators, and without making themselves subject to data withholding attacks.

Contrast Avail’s use with decentralized storage. Users of storage services ask, "Hey, I want to see my photo," and they expect to have all of that data explicitly retrieved and returned.

All that is to say that Avail does not compete with decentralized storage providers like Arweave, IPFS, or Filecoin.

The Avail testnet is already live with updated versions on the way. As Polygon works toward the Avail mainnet, we’re interested in partnering with any teams looking to implement data availability solutions on their chains.

If you want to learn more about Avail, or just want to ask us a question directly, we would love to hear from you. Check out our repository, join our Discord server, or email us at [email protected]

Let’s bring the world to Ethereum!

Website | Twitter | Ecosystem Twitter| Developer Twitter | Studios Twitter | Telegram | Reddit | Discord| Instagram | Facebook | LinkedIn

More from the Polygon Blog
Data Availability is Not Data Storage

Recall this grade school experience: you raise your hand and ask, “Can I go to the bathroom?” To which your teacher responds with “I don’t know. Can you?” Might seem far fetched, but this is a perfect entry point to understanding the difference between data availability and data storage. Let's bring this analogy close to […]

Read More
The Future is Now for Ethereum Scaling: Introducing Polygon zkEVM

We all know that Ethereum needs to scale, and we at Polygon believe that zero-knowledge (ZK) tech is the most promising pathway to get there. But that path has often seemed as if it would be long and winding. The conventional wisdom has been that the crypto space would need many years to develop Layer […]

Read More
Polygon Reaches First Sustainability Milestone by Achieving Network Carbon Neutrality

Polygon has made a major first step toward becoming carbon negative with the retirement of $400,000 in carbon credits representing 104,794 tonnes of greenhouse gasses, or the entirety of the network’s CO2 debt since inception.  The milestone comes after Polygon in mid-April released its Green Manifesto, part of its broader vision for sustainable development. The […]

Read More
Polygon Avail Launches on Testnet to Turn Monolithic Chains Modular

If we want the entire world to join Web3, blockchains will need to handle more transactions. Monolithic blockchains can’t scale because they’re asked to perform too many tasks (execution, settlement, and data availability) at once. But if chains were able to focus on just one part of the stack at a time, the entire ecosystem […]

Read More
Plonky2 is Now Open-Source

Earlier this year, Polygon announced Plonky2, a zero-knowledge proving system that represents a major breakthrough for ZK tech. Plonky2 offers two main benefits: incredibly fast proofs and extremely efficient recursive proofs. It’s a huge leap forward for the ZK space, and we’ve been blown away by the response from the developer community: people want to […]

Read More
The Complete Beginner’s Guide to NFTs on Polygon

Ever wanted to learn how to buy NFTs, or are you looking for a new platform to do it? Then you might want to consider Polygon.  Learning about NFTs can be a big jungle, but on Polygon, we make it simple to get started through our beginner-friendly ecosystem. Buy NFTs with low gas fees and […]

Read More
Polygon is Now Home to Over 37,000 DApps

More than 37,000 decentralized apps (dApps) have been built on Polygon, according to latest data from Alchemy, the world's leading Web3 development platform. That’s almost double the number in March and a fourfold increase since the start of the year. The number of monthly active teams, the most direct measure of developer activity on the […]

Read More