# Hard Problems in Cryptocurrency: Five Years Later

*Special thanks to Justin Drake and Jinglan Wang for feedback*

In 2014, I made a post and a presentation with a list of hard problems in math, computer science and economics that I thought were important for the Cryptocurrency space (as I then called it) to be able to reach maturity. In the last five years, much has changed. But exactly how much progress on what we thought then was important has been achieved? Where have we succeeded, where have we failed, and where have we changed our minds about what is important? In this post, I’ll go through the 16 problems from 2014 one by one, and see just where we are today on each one. At the end, I’ll include my new picks for hard problems of 2019.

The problems are broken down into three categories: (i) cryptographic, and hence expected to be solvable with purely mathematical techniques if they are to be solvable at all, (ii) consensus theory, largely improvements to proof of work and proof of stake, and (iii) economic, and hence having to do with creating structures involving incentives given to different participants, and often involving the application layer more than the protocol layer. We see significant progress in all categories, though some more than others.

## Cryptographic problems

- Blockchain Scalability
One of the largest problems facing the Cryptocurrency space today is the issue of scalability … The main concern with [oversized blockchains] is trust: if there are only a few entities capable of running full nodes, then those entities can conspire and agree to give themselves a large number of additional bitcoins, and there would be no way for other users to see for themselves that a block is invalid without processing an entire block themselves.

Problem:create a Blockchain design that maintains Bitcoin-like security guarantees, but where the maximum size of the most powerful node that needs to exist for the network to keep functioning is substantially sublinear in the number of transactions.

Status: **Great theoretical progress, pending more real-world evaluation**.

Scalability is one technical problem that we have had a huge amount of progress on theoretically. Five years ago, almost no one was thinking about sharding; now, sharding designs are commonplace. Aside from ethereum 2.0, we have OmniLedger, LazyLedger, Zilliqa and research papers seemingly coming out every month. In my own view, further progress at this point is incremental. Fundamentally, we already have a number of techniques that allow groups of validators to securely come to consensus on much more data than an individual validator can process, as well as techniques allow clients to indirectly verify the full validity and availability of blocks even under 51% attack conditions.

These are probably the most important technologies:

**Random sampling**, allowing a small randomly selected committee to statistically stand in for the full validator set: https://github.com/ethereum/wiki/wiki/Sharding-FAQ#how-can-we-solve-the-single-shard-takeover-attack-in-an-uncoordinated-majority-model**Fraud proofs**, allowing individual nodes that learn of an error to broadcast its presence to everyone else: https://bitcoin.stackexchange.com/questions/49647/what-is-a-fraud-proof**Proofs of custody**, allowing validators to probabilistically prove that they individually downloaded and verified some piece of data: https://ethresear.ch/t/1-bit-aggregation-friendly-custody-bonds/2236**Data availability proofs**, allowing clients to detect when the bodies of blocks that they have headers for are unavailable: https://arxiv.org/abs/1809.09044. See also the newer coded Merkle trees proposal.

There are also other smaller developments like Cross-shard communication via receipts as well as “constant-factor” enhancements such as BLS signature aggregation.

That said, sharded blockchains have still not been seen in live operation. On the theoretical side, there are mainly disputes about details remaining, along with challenges having to do with stability of sharded networking, developer experience and mitigating risks of centralization; fundamental technical possibility no longer seems in doubt. But the challenges that *do* remain are challenges that cannot be solved by just thinking about them; only developing the system and seeing ethereum 2.0 or some similar chain running live will suffice.

- Timestamping

Problem:create a distributed incentive-compatible system, whether it is an overlay on top of a blockchain or its own blockchain, which maintains the current time to high accuracy. All legitimate users have clocks in a normal distribution around some “real” time with standard deviation 20 seconds … no two nodes are more than 20 seconds apart The solution is allowed to rely on an existing concept of “N nodes”; this would in practice be enforced with proof-of-stake or non-sybil tokens (see #9). The system should continuously provide a time which is within 120s (or less if possible) of the internal clock of >99% of honestly participating nodes. External systems may end up relying on this system; hence, it should remain secure against attackers controlling < 25% of nodes regardless of incentives.

Status: **Some progress**.

Ethereum has actually survived just fine with a 13-second block time and no particularly advanced timestamping technology; it uses a simple technique where a client does not accept a block whose stated timestamp is earlier than the client’s local time. That said, this has not been tested under serious attacks. The recent network-adjusted timestamps proposal tries to improve on the status quo by allowing the client to determine the consensus on the time in the case where the client does not locally know the current time to high accuracy; this has not yet been tested. But in general, timestamping is not currently at the foreground of perceived research challenges; perhaps this will change once more proof of stake chains (including Ethereum 2.0 but also others) come online as real live systems and we see what the issues are.

- Arbitrary Proof of Computation

Problem:create programs`POC_PROVE(P,I) -> (O,Q)`

and`POC_VERIFY(P,O,Q) -> { 0, 1 }`

such that`POC_PROVE`

runs program`P`

on input`I`

and returns the program output`O`

and a proof-of-computation`Q`

and POC_VERIFY takes`P`

,`O`

and`Q`

and outputs whether or not`Q`

and`O`

were legitimately produced by the`POC_PROVE`

algorithm using`P`

.

Status: **Great theoretical and practical progress**.

This is basically saying, build a SNARK (or STARK, or SHARK, or…). And we’ve done it! SNARKs are now increasingly well understood, and are even already being used in multiple blockchains today (including tornado.cash on Ethereum). And SNARKs are extremely useful, both as a privacy technology (see Zcash and tornado.cash) and as a scalability technology (see ZK Rollup, STARKDEX and STARKing erasure coded data roots).

There are still challenges with efficiency; making arithmetization-friendly hash functions (see here and here for bounties for breaking proposed candidates) is a big one, and efficiently proving random memory accesses is another. Furthermore, there’s the unsolved question of whether the O(n * log(n)) blowup in prover time is a fundamental limitation or if there is some way to make a succinct proof with only linear overhead as in bulletproofs (which unfortunately take linear time to verify). There are also ever-present risks that the existing schemes have bugs. In general, the problems are in the details rather than the fundamentals.

- Code Obfuscation
The holy grail is to create an obfuscator O, such that given any program P the obfuscator can produce a second program O(P) = Q such that P and Q return the same output if given the same input and, importantly, Q reveals no information whatsoever about the internals of P. One can hide inside of Q a password, a secret encryption key, or one can simply use Q to hide the proprietary workings of the algorithm itself.

Status: **Slow progress**.

In plain English, the problem is saying that we want to come up with a way to “encrypt” a program so that the encrypted program would still give the same outputs for the same inputs, but the “internals” of the program would be hidden. An example use case for obfuscation is a program containing a private key where the program only allows the private key to sign certain messages.

A solution to code obfuscation would be very useful to Blockchain protocols. The use cases are subtle, because one must deal with the possibility that an on-chain obfuscated program will be copied and run in an environment different from the chain itself, but there are many possibilities. One that personally interests me is the ability to remove the centralized operator from collusion-resistance gadgets by replacing the operator with an obfuscated program that contains some proof of work, making it very expensive to run more than once with different inputs as part of an attempt to determine individual participants’ actions.

Unfortunately this continues to be a hard problem. There is continuing ongoing work in attacking the problem, one side making constructions (eg. this) that try to reduce the number of assumptions on mathematical objects that we do not know practically exist (eg. general cryptographic multilinear maps) and another side trying to make practical implementations of the desired mathematical objects. However, all of these paths are still quite far from creating something viable and known to be secure. See https://eprint.iacr.org/2019/463.pdf for a more general overview to the problem.

- Hash-Based Cryptography

Problem:create a signature algorithm relying on no security assumption but the random oracle property of hashes that maintains 160 bits of security against classical computers (ie. 80 vs. quantum due to Grover’s algorithm) with optimal size and other properties.

Status: **Some progress**.

There have been two strands of progress on this since 2014. SPHINCS, a “stateless” (meaning, using it multiple times does not require remembering information like a nonce) signature scheme, was released soon after this “hard problems” list was published, and provides a purely hash-based signature scheme of size around 41 kB. Additionally, STARKs have been developed, and one can create signatures of similar size based on them. The fact that not just signatures, but also general-purpose zero knowledge proofs, are possible with just hashes was definitely something I did not expect five years ago; I am very happy that this is the case. That said, size continues to be an issue, and ongoing progress (eg. see the very recent DEEP FRI) is continuing to reduce the size of proofs, though it looks like further progress will be incremental.

The main not-yet-solved problem with hash-based cryptography is aggregate signatures, similar to what BLS aggregation makes possible. It’s known that we can just make a STARK over many Lamport signatures, but this is inefficient; a more efficient scheme would be welcome. (In case you’re wondering if hash-based *public key encryption* is possible, the answer is, no, you can’t do anything with more than a quadratic attack cost)

## Consensus theory problems

- ASIC-Resistant Proof of Work
One approach at solving the problem is creating a proof-of-work algorithm based on a type of computation that is very difficult to specialize … For a more in-depth discussion on ASIC-resistant hardware, see https://blog.ethereum.org/2014/06/19/mining/.

Status: **Solved as far as we can**.

About six months after the “hard problems” list was posted, Ethereum settled on its ASIC-resistant proof of work algorithm: Ethash. Ethash is known as a memory-hard algorithm. The theory is that random-access memory in regular computers is well-optimized already and hence difficult to improve on for specialized applications. Ethash aims to achieve ASIC resistance by making memory access the dominant part of running the PoW computation. Ethash was not the first memory-hard algorithm, but it did add one innovation: it uses pseudorandom lookups over a two-level DAG, allowing for two ways of evaluating the function. First, one could compute it quickly if one has the entire (~2 GB) DAG; this is the memory-hard “fast path”. Second, one can compute it much more slowly (still fast enough to check a single provided solution quickly) if one only has the top level of the DAG; this is used for block verification.

Ethash has proven remarkably successful at ASIC resistance; after three years and billions of dollars of block rewards, ASICs do exist but are at best 2-5 times more power and cost-efficient than GPUs. ProgPoW has been proposed as an alternative, but there is a growing consensus that ASIC-resistant algorithms will inevitably have a limited lifespan, and that ASIC resistance has downsides because it makes 51% attacks cheaper (eg. see the 51% attack on Ethereum Classic).

I believe that PoW algorithms that provide a medium level of ASIC resistance can be created, but such resistance is limited-term and both ASIC and non-ASIC PoW have disadvantages; in the long term the better choice for Blockchain consensus is proof of stake.

- Useful Proof of Work
making the proof of work function something which is simultaneously useful; a common candidate is something like Folding@home, a