Update Mar 31: published on ethresear.ch.
Today we are publishing our sharded PIR design — a step toward making every read from Ethereum’s state private by default.
The Problem
When a wallet checks your balance, looks up a transaction, or reads a smart contract’s storage, it sends a query to a remote server. That server — whether an RPC provider, a light client gateway, or a full node — learns exactly which records you accessed. Over time, these access patterns paint a detailed picture: which accounts you control, which tokens you hold, which DeFi protocols you use, and when.
This metadata leakage enables frontrunning, MEV extraction, and the correlation of on-chain pseudonyms with real-world identities. Even with encrypted connections, the pattern of what you read reveals your intent.
Private Information Retrieval (PIR) can solve this: it allows clients to read data from a server without the server learning what was queried. The cryptographic guarantees are strong — the server processes an encrypted query and returns an encrypted response, learning nothing about the client’s interest. However, PIR schemes come with significant performance overhead: server computation scales with database size, communication costs can be high, and preprocessing may be required. Applying PIR naively to all of Ethereum’s state would be impractical.
Our Approach: Sharded PIR
Ethereum’s state is large (~100–300 GB depending on representation), heterogeneous (accounts, storage slots, contract code, receipts, logs), and constantly changing (new blocks every 12 seconds). No single PIR scheme handles all of these well.
Our design addresses this by sharding — segmenting Ethereum data into slices, each served by a PIR engine tuned for that slice’s size, update frequency, and access profile:
- Small, hot, latency-critical (1–10 GB): curated high-demand data — balances, recent transactions, popular contract storage. Updated every block.
- Small, moderate churn (10–20 GB): all account headers + contract bytecode.
- Medium, high churn (80–100 GB): account headers + internal trie nodes up to the state root, enabling client-side Merkle validation.
- Large, mixed churn (100–300 GB): full storage including internal nodes — verifiable ERC and DeFi positions.
- Massive, append-only (2–30 TB): historical state snapshots — immutable per block. Transaction history, analytics, accounting.
Clients query all engines simultaneously. The key insight: Q simultaneous queries to N engines provide the same privacy as querying a single engine hosting the entire dataset — because the server cannot correlate queries across engines.
Scheme-Agnostic by Design
The architecture abstracts over the underlying PIR scheme. We are actively building, evaluating, and integrating multiple constructions — each paired with a data slice based on its tradeoffs:
- insPIRe — a doubly-stateless scheme being GPU-benchmarked and tuned for serving Ethereum hot mutable state where both client and server must remain stateless to avoid session linkability.
- VIA — a lattice-based scheme with reusable primitives, being specified and implemented across three variants (VIA, VIA-B, VIA-CB) via a grant.
- OnionPIRv2 — an FHE-native single-server scheme with strong performance characteristics for medium-sized databases.
- Harmony / RMS24 — preprocessing-based schemes well-suited to immutable or slowly-changing data slices where one-time hint generation cost is amortized over many queries.
This scheme-agnostic interface means the system improves as the field advances — new PIR constructions can be swapped in without changing the client or the API surface.
What’s Next Q2/3
We are now moving from design to implementation. In the near term:
- Flesh out the sharded design architecture — the software engineering side of things
- Finish our in-house scheme and decide on which 2+ schemes to choose for the first iteration of the sharded design
- End-to-end testing of the sharded design with at least one integration (most likely an Ethereum SDK such as ethers.js or viem, and one select wallet like Kohaku that integrates with it — if it proves easier we may also go with a light client as the first integration)
Read the full design note here.