A Sharded PIR Design
for the Ethereum State

A practical path to private reads of Ethereum data

Ali Atiia Lead — Privacy of Reads
Ethereum Foundation
privreads.ethereum.foundation
QR code linking to privreads.ethereum.foundation
Motivation

Ethereum users routinely query state data from remote servers The content and the patterns of these queries leak privacy

a curious/malicious server can profile a user just by tracking what they read — undermining other privacy measures the user has worked hard to follow (eg shielding assets).

Part 1 · Background  |  where reads happen

The edge reads, providers see 👀

balance nonce allowances contract code merkle roots

they don't or can't hold the full state, so they query remote providers

RPC provider
Sees: which address, which slot, which block, when, from where.
Part 1 · Background  |  what gets leaked

A read is a fingerprint

  • Holdings inferred from which balances and slots you poll
  • On-chain ↔ off-chain identity correlated by IP, timing, query mix
  • Frontrunning & MEV: a wallet polling a liquidation price is a tell
  • Behavioral profile assembled over time, even without any tx broadcast
Reads defeat the rest of the stack.
Shielding your transactions doesn’t help if the read pattern alone tells the same story.
Part 1 · Background  |  what gets queried

What does “the edge” actually read?

Hot state
“What is my balance now?”
eth_getBalance
eth_getStorageAt
eth_call
eth_getCode
Txs & blocks
“Did my tx land?”
eth_getBlockByNumber
eth_getTxByHash
eth_getTxReceipt
Historical state
“What was my balance on Dec 31?”
eth_getBalance(@blockN)
eth_getStorageAt(@blockN)
archival state
(“data warehouse” in traditional db lingo)

Answer: a bit of everything — any complete privacy story has to cover all three.

The Ethereum state

Part 1 · Background  |  the state

The Ethereum state, briefly

  • A key→value store of accounts; code and storage live one indirection deeper
  • Committed under the state root (a Merkle-Patricia trie) in every block header
  • Every read is verifiable with Merkle roots to the state root in the block header
BLOCK HEADER stateRoot 0x4c1a…9b txRoot 0xbf02…e1 receiptRoot 0x77df…42 STATE accounts nonce · balance codeHash · storageRoot ↳ contract code via codeHash ↳ contract storage via storageRoot TXNS tx[0] tx[1] in this block RECEIPTS rcpt[0] rcpt[1] in this block

Reading a value can entail fetching a merkle proof anchored to the state root in the block header — if the user wants to independently verify the value (and soon, the zkVM proof of that header).

Part 1 · Background  |  trie geometry

MPT will be upgraded from, eventually

MPT — today arity 16

v branch · 16 children + 1 value slot extension · "a8c1" LEAF · ACCOUNT [nonce, balance, storageRoot, codeHash] STORAGE TRIE branch slot a slot b a separate trie!
Disadvantages of MPT
  • Heavy proofs — 15 siblings × 32 B per level
    typical ~2.4–2.9 KB (depth 5–6) · worst-case ~3.8 KB (depth ~8)
  • Worst-case stateless proof per block ~300 MB
  • Storage slots require a second trie traversal
  • Keccak — slow in zkVMs

UBT — EIP-7864 (draft) arity 2

STEM (31 bytes) 0x4c1a…9b 256 LEAVES BELOW STEM hdr code# slot[0] slot[1]
Advantages of UBT
  • Lean proofs — 1 sibling × 32 B per level → ~1 KB (~4× smaller)
  • Single trie — storage & code under one root
  • Snark-friendly hash (BLAKE3 / Poseidonexact choice TBD) → 3–100× proving speedup
  • Page-based locality — meaningful gas savings under Verkle/UBT witness pricing for storage-heavy dApps

We are using UBT as the source DB for PIR, with a zk proof of its equivalence to mainnet MPT.

Part 1 · Background  |  unification

Many roots today, one tomorrow

MPT — many trees, many roots state root state trie accounts storage root storage contract A storage root storage contract B storage root storage contract C … one per contract FLAT KEY-VALUE contract bytecode addressed by codeHash UBT — one tree, one root unified state root unified state tree accounts · code · storage — all addressed by stem acct A code A stor A acct B code B stor B … every key sits under a stem
What “tree of trees” costs
  • One state root + many storage roots + a flat code store. Three commitment regimes side by side.
  • A balance proof = path through state trie. A storage-slot proof = two paths.
What unification buys

Private Information Retrieval (PIR)

Part 1 · Background  |  PIR

PIR at a glance

Private Information Retrieval: read $\mathrm{DB}[x]$ without revealing $x$.

  • Server holds $\mathrm{DB}$; client wants $\mathrm{DB}[x]$
  • Client sends a query $q$ that encodes the index of $x$ under a cryptographic veil
  • Server computes a response $r$ — learning nothing about $x$
  • Client decodes $r$ to get $\mathrm{DB}[x]$
CLIENT THE WIRE SERVER what they know what travels what they see WANTS x = 42 9f3a 0c7d e2.. b410 ff8c 1a.. q SEES 9f3a…1a SERVER COMPUTES OVER OPAQUE INPUT r = f(q, DB) never decrypts q SENDS 3c8e…5d 3c8e a112 7f.. d6e2 4b09 23.. r DECODES DB[42] privacy claim server's view of q leaks no information about x
Part 1 · Background  |  a quick primer

A one-hot vector picks one row

  • A vector $q$ of length $N$
  • Zero everywhere except a single $1$ at index $x$
  • Inner-product against any vector $\mathrm{DB}$ —
  • … selects exactly $\mathrm{DB}[x]$, zeroing the rest.
$\phantom{\mathtt{enc}(}q\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}1\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\times$
$\times$
$\times$
$\times$
$\times$
$\phantom{\mathtt{enc}(}\mathrm{DB}\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[0]\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[1]\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[x]\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[3]\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[N{-}1]\phantom{)}$
$=$
$=$
$=$
$=$
$=$
$q[i] \cdot \mathrm{DB}[i]$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}\mathrm{DB}[x]\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
$\phantom{\mathtt{enc}(}0\phantom{)}$
sum $=$ $\phantom{\mathtt{enc}(}\mathrm{DB}[x]\phantom{)}$

We can send $q$ to the server while hiding the “$1$”

Part 1 · Background  |  single-server

Single-server PIR: encrypt the one-hot

  • Client encrypts each entry of $q$ under SHE/FHE.
  • Server computes $\sum_i q[i] \cdot \mathrm{DB}[i]$ homomorphically — an inner product on ciphertexts.
  • Client decrypts the response and recovers $\mathrm{DB}[x]$.
$\mathtt{enc}(q)$
$\mathtt{enc}(0)$
$\mathtt{enc}(0)$
$\mathtt{enc}(1)$
$\mathtt{enc}(0)$
$\mathtt{enc}(0)$
$\times$
$\times$
$\times$
$\times$
$\times$
$\mathtt{enc}(\mathrm{DB})$
$\mathtt{enc}(\mathrm{DB}[0])$
$\mathtt{enc}(\mathrm{DB}[1])$
$\mathtt{enc}(\mathrm{DB}[x])$
$\mathtt{enc}(\mathrm{DB}[3])$
$\mathtt{enc}(\mathrm{DB}[N{-}1])$
$=$
$=$
$=$
$=$
$=$
$q[i] \cdot \mathrm{DB}[i]$
$\mathtt{enc}(0)$
$\mathtt{enc}(0)$
$\mathtt{enc}(\mathrm{DB}[x])$
$\mathtt{enc}(0)$
$\mathtt{enc}(0)$
sum $=$ $\mathtt{enc}(\mathrm{DB}[x])$

Privacy here is computational — it relies on the semantic security of the encryption.

The catch: the server must touch every record to compute that sum — cost is $O(N)$ per query. Skipping any record would leak that the client didn’t want it.

Part 1 · Background  |  two-server PIR

Two non-colluding servers

PIR with XOR’ing

  1. Client wants index $x$; encodes it as a one-hot vector $q$ over $[N]$.
  2. Splits $q$ into two random XOR shares: $q = q_1 \oplus q_2$. Each share alone is uniform random.
  3. Sends $q_1$ to $S_1$, $q_2$ to $S_2$. Each server XORs the $\mathrm{DB}$ entries selected by its share.
  4. Client XORs the responses: $r_1 \oplus r_2 = \mathrm{DB}[x]$.
CLIENT q = q1q2 wants DB[x] q1 q2 SERVER S1 share q1: 1 0 1 1 0 1 0 1 compute r1 = ⊕i q1[i] · DB[i] sees only random bits SERVER S2 share q2: 0 0 1 1 1 1 0 1 compute r2 = ⊕i q2[i] · DB[i] sees only random bits r1 r2 CLIENT XORs r1r2 = DB[x] no collusion
012 345 678 91011 x = 7 q1 random 1 0 1 1 0 1 0 1 1 0 0 1 → S1 q2 random 1 0 1 1 0 1 0 0 1 0 0 1 → S2 q one-hot 0 0 0 0 0 0 0 1 0 0 0 0 selects DB v₀ v₁ v₂ v₃ v₄ v₅ v₆ v₇ v₈ v₉ v₁₀ v₁₁ EACH SHARE ALONE IS UNIFORM RANDOM S1 sees random bits; S2 sees random bits. Only the XOR of both reveals x.

Privacy here is information-theoretic — it follows from how the query is split, not from any encryption.

Currently we are focused exclusively on single-server schemes

The Performance Tradeoffs of PIR Schemes

Part 1 · Background  |  the landscape

The design space

Can the client store and persist data? No Yes CLIENT-STATELESS CLIENT-STATEFUL Acceptable that server holds per-client state? Acceptable to require synchronous client↔server comm during preprocessing? Yes No No Yes Server-stateful Spiral · VIA-C OnionPIRv2 Per-client keys cached on the server. queries linkable Double-stateless YPIR · VIA · InsPIRe Server preprocesses; no per-client state. ★ fully stateless Download-hint SimplePIR DoublePIR Server → client hints, downloaded once. refresh on update Interactive-hint RMS24 · Plinko HarmonyPIR Client streams DB during setup. ★ sublinear online O(N) O(N) · smaller k O(N) · smaller k N  ★
Part 1 · Background  |  the cost wall

The $O(N)$ wall, and how preprocessing breaks it

SERVER PAYS cheap clients · expensive nodes CLIENT PAYS heavy clients · sublinear server Server-stateful Spiral · VIA-C · OnionPIRv2 O(N) server no client cost Double-stateless YPIR · VIA · InsPIRe O(N), smaller k server preprocesses offline Download-hint SimplePIR · DoublePIR + √N client storage ↻ refresh on update Interactive-hint ★ RMS24 · Plinko · HarmonyPIR N server client streams DB once the only sublinear cell PREPROCESSING MOVES COST RIGHTWARD further right = cheaper server, but more client work
server compute client preprocessing client storage (hints) communication (Q + R) Server-stateful Spiral · VIA-C · OnionPIRv2 O(N) — touches every record log N ~30 KB Double-stateless YPIR · VIA · InsPIRe O(N), smaller k · offline preproc log N – √N ~KB – MB Download-hint SimplePIR · DoublePIR O(N), small const N hints N online ~240 KB→1 MB, grows w/ DB Interactive-hint RMS24 · Plinko · HarmonyPIR N stream entire DB — once N hints N * ~12 KB * Different cost shapes — and online comm itself differs by an order of magnitude. Download-hint pays √N per query online; * interactive-hint is small after a one-time DB-stream setup. Concrete: 1 GB DB · OnionPIRv2 ~30 KB · InsPIRe ~400 KB · SimplePIR 240 KB (→ ~960 KB at 16 GB) · Plinko ~12 KB. Source: PIR Tutorial & Survey, Table II.
QR code linking to Vitalik Buterin's Plinko PIR tutorial
Part 1 · Background  |  the bottom line

No single PIR scheme dominates

family / examples online server cost client storage per-client server state update cost / freshness online comm
Server-stateful Spiral · VIA-C · OnionPIRv2 ! ×
Double-stateless YPIR · VIA · InsPIRe !
Download-hint SimplePIR · DoublePIR ! ! ×
Interactive-hint RMS24 · Plinko · HarmonyPIR ! ! !

Sharding

Part 2 · Approach  |  slicing the state

Sharding the Ethereum State

Size ↓  Update frequency
Update frequency ↓  Size
1–10 GB
Curated
M
C
L
F
ETH & ERC* balances, transfer & shielding events, recent block’s txs & receipts, storage of popular defi contracts, select historical storage/events of popular contracts
10–20 GB
accounts & contract code
M
C
L
F
60–100 GB
accounts with merkle proofs
M
C
L
F
100–300 GB
accounts w/ proofs, storage tries
M
C
L
F
2–20 TB
full archival state
M
C
L
F
Mutability Churn Latency sensitivity Frequency of access
Part 2 · Approach  |  slice ↔ scheme

Sharding the Ethereum State

Size ↓  Update frequency
Update frequency ↓  Size
1–10 GB
1–10 GB
Curated
Curated
10–20 GB
10–20 GB
accounts & contract code
accounts & contract code
60–100 GB
60–100 GB
accounts with merkle proofs
accounts with merkle proofs
100–300 GB
accounts w/ proofs, storage tries
2–20 TB
2–20 TB
full archival state
full archival state
Part 2 · Approach  |  a problem

Knowing which slice → leakage

which slice is being queried leaks privacy — the server knows the content of each slice, and can make correlations over time

1–10 GB
PIR engine 1
10–20 GB
PIR engine 2
60–100 GB
PIR engine 3
100–300 GB
PIR engine 4
2–20 TB
PIR engine 5
Part 2 · Approach  |  the core trick

Genuine + decoy queries = full privacy

What the client knows vs. what the network observer sees.

Client’s view — ground truth

labelled · asymmetric · 1 real + $k-1$ decoys
CLIENT x in slice 3 PIR Engine #1 PIR Engine #2 PIR Engine #3 PIR Engine #4 q (real) decoy decoy decoy

Observer’s view — on the wire

all rays identical · no labels · uniformly distributed
CLIENT x = ? PIR Engine #1 PIR Engine #2 PIR Engine #3 PIR Engine #4 q? q? q? q?

Observer’s view = monolithic PIR over the whole state. Performance = per-slice optimized.

privacy Decoys are real PIR queries — not blanks. They cost $k\times$ wire, but each shard sees a uniform access just as before.
Part 2 · Approach  |  a side-channel

The wire is safe, but what about the clock

Decoys hide which slice — per-slice response times can give it away.

The failure mode

  • Response time is different across PIR engines.
  • If the client acts as soon as it receives the response of the real query, the observer correlates time → recovers which slice.

Mitigations

  • Wait-for-all: client doesn’t act until every shard has responded.
  • $m$-of-$k$: tolerate slowest few; discard their slices this round.
  • Constant background traffic: client always queries every slice.
time → t=0 PIR Engine #1 PIR Engine #2 PIR Engine #3 PIR Engine #4 naive client acts now → slice = PIR Engine #1 wait-for-all → all slices indistinguishable

Middleware

Part 2 · Approach  |  the middleware

A universal PIR interface

The edge keeps speaking standard Ethereum RPC. The PIR backend evolves independently.

  • Edge unchanged. Wallets, dApps, light clients keep speaking eth_getBalance, eth_getProof, …
  • Adapter in the SDK (ethers/viem) translates RPC into a GraphQL-shaped PIR query plan.
  • Query router constructs k queries (1 real + decoys), dispatches per-slice.
  • Decoupling: PIR families and slice boundaries evolve under the interface.
EDGE wallets dApps frontends SDKs Ethereum RPC ADAPTER in ethers / viem / SDKs GraphQL schema Query router universal PIR interface PIR GATEWAY implements GraphQL schema · dispatches to slices PIR Engine #1 PIR Engine #2 PIR Engine #3 PIR Engine #4
Part 2 · Approach  |  recap

Summary of Sharded Design

EDGE wallets · dApps frontends light clients RPC MIDDLEWARE adapter RPC → GraphQL query router 1 real + k−1 decoys Engine A Server-stateful 1–10 GB Engine B Double-stateless 60–100 GB Engine B′ DS + sidecar 100–300 GB Engine C Interactive-hint 2–20 TB k parallel queries · 1 real + decoys privacy: ≡ monolithic PIR performance: per-slice optimized

Optimizations

Part 3 · Optimizations  |  three levers

Three levers to scale sharded PIR

Lever 1

Shrink the DB

Reduce what each PIR engine has to scan per query.

  •  PIR for Merkle proofs — serve proofs, not nodes
  •  SNARKify archival roots — serve proofs of stem nodes instead of Merkle proofs
Lever 2

Slice smarter

Carve slices along their natural seams — especially mutability.

  •  Sidecar pattern — isolate the mutable tail
  •  Verifiable UBT equivalence — capitalize on the uniformity of binary tries
Lever 3

Build better schemes

Push the per-slice scheme harder — especially toward GPU-native designs.

  •  Double-stateless GPU-tailored R&D
  •  Delegated hints · DEPIR to the rescue?
Part 3 · Optimizations  |  lever 1b — shrink the DB

Snarkifying away merkle roots → reduce db size massively

Before vs after: snarkifying upper trie roots into one SNARK proof eliminates the need to store upper-trie roots; lower-trie nodes and leaves remain.

What gets dropped: upper trie levels — most-mutated, biggest by volume. Lower nodes & leaves stay; PIR continues unchanged.

Why it pays: ~50–80% archival DB-size reduction (§8 of the ethresear.ch post). Smaller DB → cheaper PIR per query.

Part 3 · Optimizations  |  lever 2 — slice smarter

Isolate mutability — the sidecar pattern

Client 1. Query both engines in parallel 2. If key in sidecar → use sidecar response; else → main Main PIR engine DB snapshot at t = T0 large · preprocessed once lazy re-preprocess every E blocks Sidecar PIR engine Updated entries since T0 small · updated per block reset after each fold query response query response fold

Big snapshot stays cold; fresh writes live in a small, fast engine; client always sees the freshest answer.

Part 3 · Optimizations  |  lever 2 — slice smarter

Verifiable UBT ↔ MPT equivalence

Block N · header MPT root (canonical, mainnet) UBT root (PIR backend's view) zk equivalence proof "both roots commit to the same state"

∀ (key, value): (key, value) ∈ MPT  ⇔  (key, value) ∈ UBT

Why this anchors everything

  • PIR backend can use UBT before it lands on mainnet.
  • SNARKification (slide 30) operates on the UBT side.
  • One verification surface for all PIR engines — the UBT root.

Cost shape

  • One proof per block, generated server-side; clients verify it once.
  • SNARK-friendly UBT hash (BLAKE3 / Poseidon—TBD) keeps prover work tractable.
in progress initial UBT-enabled Geth / Ethrex nodes in testing right now.
Part 3 · Optimizations  |  lever 3 — better schemes

Researching double-stateless GPU schemes

Active in-house research direction: PIR schemes where both client and server are stateless — and the server kernel is shaped to be GPU-native from the start.

Why double-stateless?

  • No per-client server state — server scales horizontally without client tracking.
  • No client hint to keep fresh — client carries only its keys.
  • Clean trust model — every client looks the same to the server.
  • Updates are clean — no broadcast hint to invalidate, no per-client preproc to redo.

Family includes e.g. HintlessPIR, YPIR, VIA, InsPIRe. Online server work is still $O(N)$ — but with much smaller constants than FHE-PIR.

Why GPU-tailored?

  • Server work is dense linear algebra — matrix–vector multiply over the DB.
  • Batch friendly — many concurrent queries amortize one DB pass.
  • Small modular params fit in GPU L2 cache — bandwidth-bound, not compute-bound.
  • Stateless server means many GPUs can serve the same DB without coordination.

Designing the scheme around the GPU memory hierarchy — not just porting an existing scheme to CUDA — is what unlocks order-of-magnitude gains. (See Part 4 for current GPU PIR numbers.)

research Open target: match or beat GPIR throughput on Ethereum-shaped DBs while staying double-stateless. Construction details forthcoming.
Part 4 · Progress

Progress, ongoing work, references

Progress

  • Correctness-focused specs of Plinko, RMS24, VIA schemes.
  • Speccing Plinko surfaced the invertible PRF as an intractable bottleneck before months of impl work.
  • GPU acceleration of insPIRe scheme.
  • Reproducing benchmarks of existing schemes.

Ongoing work

  • New double-stateless GPU-tailored schemes — can they hit GPIR throughput while keeping the cleaner trust model?
  • Universal PIR middleware spec & wallet integration.
  • SNARKification cost analysis (binary trie).

References

QR: ethresear.ch Sharded PIR post QR: Private Reads PIR benchmarks QR: privreads.ethereum.foundation/code (PIR specs) QR: PIR-Eng-Notes GitHub

Thank you

QR: ethresear.ch Sharded PIR post QR: Private Reads PIR benchmarks QR: privreads.ethereum.foundation/code (PIR specs) QR: PIR-Eng-Notes GitHub