Fully Decentralizing dApps

In the previous post here (please do read this first!), I talked about two key issues that affect dApps: the bootstrapping problem, and the issue of outsourced verification. The first problem I somewhat discussed how it can be addressed: when bootstrapping, we need to be careful we were not victim to a man-in-the-middle attack. Once bootstrapped, we have an initial point of verification; but then it brings us to the second, arguably harder, question: “how do we maintain verification without a trusted party doing that for us, or without doing it ourselves?” To come up with solutions to this problem, we first need to take a step back and look at two things: why we should truly and fully decentralize dApps, and, within our environment, how we can do so.

Why should we decentralize Apps / dApps?

In my previous post, I posed the question “What advantages does a dApp offer us if it is only ever used exactly as an App?” Indeed, when you only ever use a dApp as an App, you also throw away many (if not all) of the properties that differentiate them. If you only even use a trusted third party to interface an application that does not require a trusted third party, you might as well require one. Doing so would give you many benefits: less complexity, improved functionality, scalability, and more. If the Steemit.com interface replaced it’s working internals (STEEM) with a centralized system overnight, as a centralized system it could continue to operate extraordinarily well. Rewards could be distributed on-time, content could be posted and displayed to users, but it would never actually need to pass through to the STEEM blockchain. Very few people would notice the sleight of hand -- only those running their own validation and noticing all the content on the site does not appear to match all the data they are validating.

Yet, if you are here on Steemit, and if you are using cryptocurrencies -- you probably already know the answer to the question of “why” we should decentralize Apps. We want to cut out the middle man and remove trust from the equation. We are sick of middle-men taking a cut, controlling what we see, and telling us what we can or cannot do. Now, I’m not saying Steemit does any of these things, but when we develop these tools of the future, we should not design it in the same way as the past -- requiring a third party -- or we will make the same mistakes as history has. Fresh, new centralized systems are rarely designed to be corrupt, be we later see the centralization of power being the catalyst for the corruption. When we design our fresh, new dApps once again with centralization, the first iteration does not appear corrupt -- but through slow evolution the opportunity arises.

Thus, when I refer to truly decentralizing dApps, what I refer to is the idea of removing the outsourcing of verification. It is, in fact, the last step towards removing the middle-men from the equation. Only when a dApp no longer requires nor de-facto uses a trusted third party can it become truly decentralized.

To do this, first, we need to understand our environment itself: Cryptocurrencies, and Blockchains, and the opportunities they grant us to do this.

What differentiates Cryptocurrencies from Blockchains?

When we talk about “Cryptocurrencies”, the term “Blockchain” is often used interchangeably as a description. However, if you ask me (and many other academics), the differentiation between Cryptocurrencies and Blockchains is quite important. A Blockchain is a type of Cryptocurrency, but not all Cryptocurrencies are Blockchains. Blockchain, a term introduced by Bitcoin, referred to the idea of keeping a timeline in a linked list of blocks with an ever-moving “timestamp” of proof of work being needed to keep the list secure.

However, not all Cryptocurrencies are deployed this way. STEEM in fact, in the purist definition, is not a Blockchain, it is only acting like one. There. I said it. And, as we will see, perhaps this is a good thing! The data-structure that STEEM deploys -- a linked list in the form of a batch of transactions linked to a previous batch, with total ordering -- certainly seems to resemble a “Blockchain”, though. So what is the difference?

The key differentiator is PoS vs PoW. While most “Blockchains” would have you believe these are two different sides to a similar coin, these two systems are fundamentally different at their core, so much so that it feels like comparing the Sorting problem to the Travelling Salesman problem. While PoW systems with their ever moving work-time-stamps require a linearization of data for their properties (Bitcoin makes the correct reference to Markov Chains), PoS, while it certainly could use the same structure, does not specifically require it to function. In PoW systems, the world is governed by probabilities: the likelihood of orphaned blocks in the chain, 51% attacks creating longer chains, and the probability growing ever closer to 1 of a confirmation of a transaction, where, for all intents and purposes, the probability of your transaction being confirmed eventually comes so close to 1 that it might as well be. But -- it’s not.

And this key difference drives PoS to be different. In PoS, we can mathematically point to an event that confirms a transaction -- with probability 1 -- of being impossible to reverse. Instead of an ever moving work-time-stamps, PoS systems actually use a much older, well studied concept of “Byzantine Fault Tolerance” (BFT), a property investigated in academia for over 30 years. And indeed, BFT systems do not require a lot of the properties that PoW blockchains do -- the most notable being the linked-list “block” data structure. We have already seen Directed Acyclic Graph structures employed by cryptocurrencies such as NANO that have already begun to take advantage of the innate properties and historical work done on BFT. As another example property that is used but not required, (probabilistic) immutability is a staple of “Blockchains”, and while often implemented, it is also not a requirement of BFT.

BFT Cryptocurrencies (which encompasses PoS and DPoS, and thus includes STEEM), when compared to PoW currencies, have a few more important properties that differentiate them. The first one that is often used to compare the two is the permissionless aspect. Indeed, BFT cryptocurrencies are by nature permissioned: to enter the ecosystem, your entry point requires permission from within the ecosystem. This is often in the form of a barter transaction (buying entry from another user). With PoW systems, the entry point is by nature permissionless. Although you can always enter with permission (again, via barter), you can also enter the ecosystem (with some probability) without permission (by mining a block). Interestingly, by PoW being permissionless, it also removes the bootstrapping problem (the proof of this is left as an exercise to the reader, though if you’re curious, leave a comment and I can try to explain).

A second property is the verifiability. When you are verifying the status of a transaction, with PoW you can never be sure a transaction is valid, as there is always some chance of an attacker holding more hashpower in reserve, even to the point of re-writing the entire history. Although in practice the economic structure makes this unlikely, academically this makes validation hard, as you can never ratify a transaction with probability 1. However, with BFT, we have a structure in place: when 2/3rds+1 of validation entities have signed off on a transaction, this transaction is confirmed. Period. The transaction cannot be undone within the rules of the system.
(Two notes on this -- one, a hardfork, which could change this data, is rather the instantiation of a new system, and not the continuation of the old one, and thus does not invalidate this design. Two, if the system by design allowed “reversibility”, this reverse of the transaction is a new transaction, and the system remains incremental-only. As a more understandable example of why this is the case, consider returning a broken product to a store: the return of your money is considered a new transaction, not a deletion of the previous purchase.)

But enough of the history lesson and definition of terms. Can we use the properties of BFT systems to our advantage?

Breaking the Shackles of “Thinking like a Blockchain”: using BFT for Outsourced Verification

Currently, when a light client connects to a trusted endpoint, they are assuming that the endpoint is feeding them the correct information (and thus outsourcing verification to this third party). We inherited this loose sense of data aggregation from “Blockchains,” but constraining our thinking to the restrictions of Blockchains is not required if we consider the system as a traditional BFT system.

An interesting property of BFT is that, not only can we use it for ratifying data (in the form of validating transactions), we can also use it for distributing data. Instead of a light client requesting data from a single trusted source, suppose the client collects data from each of the validation sources, using BFT quorum behaviour. When this data is collected, the resulting quorum of returned information that has 2/3rds+1 majority consensus is indeed the fully validated and ratified data, or the system of validating sources is in itself Byzantine. This is the secret sauce. With collecting data via the BFT quorum, you can get full validation status without performing the validation yourself.

Notably, this property of being able to both ratify and collect data from a BFT cryptocurrency is unique -- the PoW systems do not have this ability, there is no way to collect with probability 1 any data -- because, (i) it never reaches probability 1, and (ii) there is no provable set of validating sources to form a quorum. As a simple proof for (ii), the set of sources that form a quorum for PoW is both known and unknown hashpower -- and hashpower can only be proved to exist, as it cannot be proved to not exist. With BFT systems, we have a definition of the quorum by design, and thus the ability to collect via quorum any data.

How would this enable or benefit dApps?

I will once again acknowledge the Bootstrapping problem: indeed, a dApp does need to first identify the current set of validators (e.g. witnesses). However, once the set of validators is known, knowing changes to the set can be determined via the current set. I will bring again the Bootstrapping problem a comparison to what most users already face on the internet. When determining a program is a valid one or a virus, some amount of investigation needs to occur. Once the user determines a program is not a virus, and indeed performs what it is advertised to do, the user can operate the program without worry. In a similar vein, once the dApp identifies the current witness set, it can then continue without worry. From the user’s perspective, once the user identifies that the program is indeed a proper implementation of BFT quorum validation (and is not a malicious program), the user can use it without worry.

Upon completing the bootstrapping process, the user can then be sure that the interface and all associated data it retrieves (via quorum) is thus valid, or the system itself is invalid. dApps thus become truly decentralized, with no trusted interface required -- the validation of collected data is outsourced only to the real validators ratifying the system.

So what would such a system look like, or require, for STEEM?

Unfortunately, the current environment of STEEM does not support such a design. As you can imagine, requiring the validators (witnesses) to also validate data requests would indeed require them to offer up public API endpoints that will respond to RPC requests for data. To-date, while we have a few APIs offered up publicly, there is still a large amount of reliance on the Steemit Inc. provided API. Indeed, we will need 2/3rds+1 of witnesses to offer APIs to ensure dApps can collect validated data in the face of no attacker, and all witnesses to offer APIs to ensure validated data collection in the face of near-byzantine attacker.

I will not lie, a shift towards a truly decentralized model is not easy, and there are many technical challenges that do come with it. But I do believe that progressing towards such a model is in our collective best interests. There are two aspects that we can target, (i) decentralizing standalone dApps, and (ii) decentralizing WebApps like Steemit.com.

The first, you can think of as ensuring certain programs like wallets (e.g. Vessel) have a truly decentralized model for interfacing with data on the STEEM blockchain. These standalone Apps would become dApp interfaces that are genuinely powered in a fully verifiable way, without any middlemen.

The second, as a more long-term goal, is decentralizing web apps. This is far harder to do: the front ends (e.g. the site that deploys the website) would be a third party, but the underlying data requests would be directed in a quorum manner to ensure decentralized data collection. To implement such a design for Steemit.com itself, it would require a fair amount of resources from all validators. Further, it would be much harder to prove properties about internal workings if it remains hosted externally (on a website), rather than downloaded locally. Rather, a more decentralized design would not be served from the web, and instead be used from a standalone downloadable app.

It will be a long road to fully decentralize the future, but won’t it be nice to say STEEM is truly end-to-end decentralized, with no trusted parties, for all users?

Putting My Money Where my Mouth Is

While building this train of thought towards the decentralization of dApps, and realizing the solution would be to have witnesses offering API endpoints for dApps to use, I decided I would need to put my money where my mouth is.

Soon I will be announcing a high-performance, custom built server I have built, and will be deploying solely for the purpose of publicly servicing dApps and end-users. Many witnesses already offer such services, and I believe having a cooperative effort to get more API resources up and running will be a big step towards starting the decentralization process for STEEM.

Stay tuned for the announcement about this! (I’m hoping to have it ready before Steemfest -- I just finished the build today!)

p.s. Like what I'm doing for Steem? If you want to see development of @steemcleaners, @cheetah, @guard, and associated efforts continue, please vote for me as a witness here!