A practical path to private reads of Ethereum data
a curious/malicious server can profile a user just by tracking what they read — undermining other privacy measures the user has worked hard to follow (eg shielding assets).
they don't or can't hold the full state, so they query remote providers
eth_getBalance(@blockN)eth_getStorageAt(@blockN)Answer: a bit of everything — any complete privacy story has to cover all three.
Reading a value can entail fetching a merkle proof anchored to the state root in the block header — if the user wants to independently verify the value (and soon, the zkVM proof of that header).
We are using UBT as the source DB for PIR, with a zk proof of its equivalence to mainnet MPT.
Private Information Retrieval: read $\mathrm{DB}[x]$ without revealing $x$.
We can send $q$ to the server while hiding the “$1$”
Privacy here is computational — it relies on the semantic security of the encryption.
The catch: the server must touch every record to compute that sum — cost is $O(N)$ per query. Skipping any record would leak that the client didn’t want it.
PIR with XOR’ing
Privacy here is information-theoretic — it follows from how the query is split, not from any encryption.
Currently we are focused exclusively on single-server schemes
| family / examples | online server cost | client storage | per-client server state | update cost / freshness | online comm |
|---|---|---|---|---|---|
| Server-stateful Spiral · VIA-C · OnionPIRv2 | ! | ✓ | × | ✓ | ✓ |
| Double-stateless YPIR · VIA · InsPIRe | ! | ✓ | ✓ | ✓ | ✓ |
| Download-hint SimplePIR · DoublePIR | ! | ! | ✓ | × | ✓ |
| Interactive-hint RMS24 · Plinko · HarmonyPIR | ✓ | ! | ! | ! | ✓ |
which slice is being queried leaks privacy — the server knows the content of each slice, and can make correlations over time
What the client knows vs. what the network observer sees.
Observer’s view = monolithic PIR over the whole state. Performance = per-slice optimized.
Decoys hide which slice — per-slice response times can give it away.
The edge keeps speaking standard Ethereum RPC. The PIR backend evolves independently.
eth_getBalance, eth_getProof, …Reduce what each PIR engine has to scan per query.
Carve slices along their natural seams — especially mutability.
Push the per-slice scheme harder — especially toward GPU-native designs.
What gets dropped: upper trie levels — most-mutated, biggest by volume. Lower nodes & leaves stay; PIR continues unchanged.
Why it pays: ~50–80% archival DB-size reduction (§8 of the ethresear.ch post). Smaller DB → cheaper PIR per query.
Big snapshot stays cold; fresh writes live in a small, fast engine; client always sees the freshest answer.
∀ (key, value): (key, value) ∈ MPT ⇔ (key, value) ∈ UBT
Active in-house research direction: PIR schemes where both client and server are stateless — and the server kernel is shaped to be GPU-native from the start.
Family includes e.g. HintlessPIR, YPIR, VIA, InsPIRe. Online server work is still $O(N)$ — but with much smaller constants than FHE-PIR.
Designing the scheme around the GPU memory hierarchy — not just porting an existing scheme to CUDA — is what unlocks order-of-magnitude gains. (See Part 4 for current GPU PIR numbers.)