Mastodon logo Rss logo

Does my project need a blockchain?

February 2023 · 14 minute read

Due to the nature of the subject, I would like to start out by saying that this post will discuss blockchains from a purely technical perspective and will not address potential applications, such as cryptocurrencies or other digital assets.

With that out of the way… Every now and then at work, I run into some ambitious architect that has had the brilliant idea that they absolutely need or at least could benefit from adding a blockchain into their project to solve one problem or another. However, it turns out that it never actually is a blockchain that the project needs. I have thought quite a bit about why all these people believe that a blockchain would be right for them and one reason is probably because blockchains are a cool new technology. To be fair, it is part of the job description of a software architect to be on the constant look out for new technology that could potentially be beneficial to their projects. But I also think that an important reason to why they are so keen on adding blockchains to their projects is because on a basic level blockchains look so simple. It is quite easy to imagine how to implement a blockchain. Or it is easy to imagine how to implement certain parts of a blockchain. With this post, I hope to explain the more nuanced properties of a blockchain system so that architects reading this post will understand why they with a near 100% certainty do not need a blockchain in any of their projects. But I hope to also explain in what situations a blockchain could be a reasonable solution.

What is a blockchain (really)?

Most of the times, blockchain technology is described as something along the lines of: a distributed, immutable, public ledger that is secured by cryptographic chaining. Or perhaps: a construct where data is stored in blocks, where each block contains a hash of the previous block. While none of these definitions are wrong, they are not necessarily helpful. If we look at the first one it surely gives us a lot of information, however, it is not necessarily the information that we need. It tells us that it is immutable, but it does not tell us how that is achieved other than that “cryptographic chaining” is somehow involved. The second statement gives us somewhat more information, at least in terms of understanding how a blockchain could be implemented but it lacks an explanation of why we might want to store data in blocks as described. If we start with the low-level details and look at how we might represent a block in a blockchain, it could be done like this:

{
  "body": {
    "transactions": [
      transfer(100, "bf38a1f028b9152c186d753314f176e6ef51e3c1ab88b2ed1dbafff0593294b7", "4a51934f0b20d5a4b3425d75c52374493ac6a560cb65f82826b089dc098e02ed"),
      transfer(5, "bf38a1f028b9152c186d753314f176e6ef51e3c1ab88b2ed1dbafff0593294b7", "46332e5e2a8983718a78833e1fa208666fbcff21c85114cb6e8aa2ecfc83a9c8")
    ],
  },
  "pubkey": "MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtFkV71rafoJkuKjgCRBSeT7wMvbnDqhRcZxllqmWV15T/8k7D0FagvWiN/un45PrZ8xVMAssZrT4si6+JJY0IxJ2H3aF04FpWYelhjro1Mawyqj5zn/kyWxVJy3QJ7cE09Rt86+EXYA1UVTKFLUf7LFwRP1hzXpL02ZEJ/T54qjaLNyTYvZJsqVnqUvQ5lsk/O60+93vQ3Ntf4fTvq1R7kTpsktVs1zw3/W+y6PMYj2Si0hgcgAiIEz4oTut9x577q9k2BZloldlZpTSmJ+jg0PDYaffAfsHnC/EKH+IHzPblOwrBKakbzItWJvY7eDoQ6Qr6leKeqMScwkAKKR65QIDAQAB",
  "signature": "KRdMOZPAXX4hIFymfB9qiiNDYl5A8NIr2HTfDWOyUjoxFotAyBGc3okrIkReMA4AG8Pni9aal+ZiOzR2hLCu+gEs7T95+tZD5RHumb0N8yPP9kPCLa6ja+CCjaxGpUegxsw6RzCjmxCVYE+pnD+FGJsoKuXREiqN6YBILuLGoLmi3fpPfSjaOI3sFOotq1OJVWkTwTvGPy0fXWXnNWoRVlPKWKvAjEavcCfVbueWHdlhKXROB8fXF3Z/gTXJnijFj0rv4VSWia0aUhpkkxeqmcfP7PL9OJGgoVtHFE402Usz4+i9ylVoh7SlTnDWqRyCfZrVBTTLY5SlXHJvAvDxYA==",
  "previousBlockHash": "983163de3a47b09bd82d1a93f2c109aca62a4b7f5338459e95bbe37572650dd7",
}

As we can see, a block in a blockchain can be very simple. It is here represented as a json object consisting of four properties, body, signer, signature, previousBlockHash. In the body property, we keep the actual content of the block: an array of transactions in this case. In the field signature we keep a base64 encoded signature over the data that is stored in the body property. This signature is produced using the private key that corresponds to the public key from the pubkey field and in this way, anyone can validate that the data is the same that was once signed. Lastly, the block also contains a reference to the previous block in the previousBlockHash field which, as a surprise to no one, contains the hash of the previous block. Similarly, the next block (if any) that is added to the chain will contain 3e31e4dd5326c112a545f9029263025673d3c82b1ba33c7db9166313eeb68d26 in its previousBlockHash, i.e., the hash of the block displayed above. The reference back to the previous block is key to the power of the blockchain because if even a single bit in the previous block is changed, the hash of that block will be completely different. Thus, the value in the previousBlockHash filed will not correspond to the actual value and it can therefore be detected that something has changed, which is what gives us part of the ability to claim that data on the blockchain is immutable. The data can still change but at least we can detect that it was changed. Another neat property of this construct is that if someone is trying to change data even further back in the chain than just the previous block they would, for everything to check out, must update the hash of each subsequent block up to the latest block. Or to put it in another way: this means that the further back in history you want to make your change, the more hashes you must recompute, i.e., the more expensive it will become. The example block as shown above is of course a very simple (although functioning) block structure. In a real-world scenario, the block will contain a merkle tree [1] in order to make validations more efficient.

We now know how a blockchain stores its data for it to maintain integrity. Although it looks a bit complex, with lots of cryptographic concepts, we can understand how it fits together, and we can understand how we would go about building something like this with code. And I think that this is the main reason why people think that blockchains are a good idea to introduce into the project, that they think that they have understood the complicated part and realized that it is not so complicated. Unfortunately, it turns out that the above is the easy part. The hard part comes when we need to construct a mechanism to enforce immutability of the chain, and not just detect integrity faults.

Consensus

The second central thing to a blockchain system, aside from the fact that data is organized in a chain of blocks, is the fact that the system does not take place on a single computer but rather on a multitude of computers, or nodes as they are usually called. Together these nodes create a network and a very simple blockchain network is depicture below. The network in this case consists of five nodes and what is important to note is that each one of these nodes maintains their own identical copy of the entire blockchain up until a certain point.

A blockchain consensus network

To add a new block to the chain, the nodes must reach a consensus on the fact that this block should indeed be added. There are many different consensus algorithms, the main ones being Proof of work [2], Proof of stake [3] and Delegated proof of stake [4] but generally, the process of adding a block is something like the following:

  1. One node N is selected to produce a block.
  2. N gathers transactions that was submitted to the node into a block B and signs it.
  3. N distributes B to all the other nodes in the network, also known as N suggests B to the network.
  4. Each node validates B, i.e., checks that all transactions are valid. For example, by validating that an account has enough balance to perform a transfer, that the signer of the transaction is the account owner, etc.
  5. If a majority of the nodes concludes that the block is valid, it will be added to the chain.

Depending on what consensus algorithm that is chosen, the exact details of each step will vary. For example, how a node is selected to produce a block depends on the consensus algorithm in question, how many nodes that are required to reach a majority can differ, and the punishment for producing invalid block also depends on the consensus algorithm used.

What problems are created by a blockchain?

Now that we know a little bit more about how a blockchain system works, we can think about what all these things would imply for our system if we wanted to integrate a blockchain into it. Here are but a few examples of problems that need to be solved within the system.

Performance

Given the description above about how a blockchain system works, it should come as a surprise to no one that a blockchain system is really slow. For example, bitcoin and Ethereum, the two largest blockchain systems in existence today only have 7 and 27 transactions per seconds respectively at the time of writing this [5]. This is mostly due to the consensus mechanism, but it is also true that all the signing and signature verification adds a significant overhead to all the nodes in the network.

Lack of central control

Perhaps the strongest argument for why a near 100% of all organizations should not want a blockchain anywhere near their systems. To get any value out of the blockchain, it needs to be distributed among a network of nodes and the nodes cannot be under the control of the organization. Because if they were, the organization would be able to do whatever they wanted with the network, for example rewriting the history of the blockchain. This also means that to make any changes to the deployed network, a majority of all the nodes need to be convinced to approve the change, which is not a technical problem, but a social problem. Should the organization insist on having control over the majority of the nodes, they have effectively just created the slowest and most inefficient database imaginable and would have been better off by simply storing the data in MySQL directly.

Eventual consistency

Since transactions must be included into a block, which must be accepted by all the nodes that are part of the consensus, it might take a long time from when the user submits the data to when it shows up in the application. Depending on the consensus algorithm being used, there are different things to take into consideration when reasoning about how long it will take for a transaction to be included. On EVM based blockchain (for example Ethereum), you can pay a higher transaction fee to give your transaction higher probability of being included, i.e., you can pay for it to be included faster. In the consensus mechanism used by Bitcoin (proof of work), one might end up in a scenario where the chain “splits” at the top, into two chains that are being worked on at the same time. If you are unlucky and your transaction ends up in the wrong chain, it will be discarded when the problem is eventually resolved, and you would have to resubmit it.

Byzantine fault tolerance

Since your new blockchain system is being distributed out to a plethora of actors, of which you ideally do not know too many, you must account for the fact that not every actor will have good intent. Some of these actors may in fact be malicious and might try to make bad things happen to the network. Such as colluding to reach a consensus majority and change the rules of the system. Something that makes this harder is that it is almost impossible to know if a node is honest or malicious in every single instance. This problem is referred to as Byzantine Generals Problem [6] (named after the paper that first described it) and for the blockchain system to be reliable, it must handle this problem. In other words, it must be Byzantine fault tolerant.

Determinism

We have already talked about how every block needs to be accepted by all the nodes in the network to be added to the chain. And that the way that the nodes determine whether to accept a block or not is by validating it against all previous operations on the chain. For this to work, the computations on the blockchain needs to be 100% deterministic. Because if the validating nodes reaches different values for a block than what the suggesting node did for this block, then the block would be rejected. This means that every potentially non-deterministic operation is strictly forbidden on the blockchain. Examples of operations that are disallowed on the blockchain are for example random, current timestamp and any type of API call over the internet; and here it becomes obvious that building applications on the blockchain requires a very different way of thinking than when building classical applications.

To make it even more complicated, it is not even enough if all the nodes in the network reaches consensus on a block. Whenever a new node is added to the system it needs to rebuild the entire blockchain from the first block (the genesis block) up until the latest block. Which means that even 30 years from now, every operation must give the same value as when it was first computed. Otherwise, the system would slowly grind to a halt as existing nodes drops off and no new nodes can be added.

Compliance

Data that goes on the blockchain can never be deleted. All data on the blockchain is also always public. This means that everything that touches the blockchain will be there for everyone to see forever (or at least as long as the system keeps running). As one can imagine, this might make certain regulatory requirements hard to comply with, e.g., “the right to be forgotten” as it is described in EU’s General Data Protection Regulation (GDPR). One might be tempted to imagine ways to work around this, for example by either hashing or encrypting data before storing it on the blockchain. However, would the data be exposed once, for example if the encryption key is compromised, then all the data that has been encrypted with that key is no more secret than it would have been if it would have been stored in clear text in the first place. Since the data will be there for as long as the system is running, there is also always the possibility that the encryption algorithm used to encrypt the data is cracked, or that someone brute forces the key to decrypt the data.

What problems are solved by a blockchain?

If blockchain systems are so incredibly complex, a reasonable question to ask would be: why in the world would anyone want to use them at all? The reason is of course because blockchains also solve problems, albeit a very specific set of problems. However, if you have any of these problems, there are hardly any better solution to them than a blockchain.

Censor resistance

The main reason why anyone bothers with all the hassle of the consensus mechanism, is because it does add protection towards censorship. Data that has been recorded on the blockchain will be public and unchanged. Forever. No central entity, such as an authoritarian state can change that, unless they take down every single node on the network. Something that might prove difficult if the nodes are distributed among different jurisdictions.

Trust lessness & Immutability

Most of the design choices made in blockchain technology is about removing trust. All the hashing and signing and verifications back and forth, as well as the multiple copies of the blockchain, serves to remove the need for trust. The whole system is designed with the assumption that everyone will misbehave as much as they possibly can and thus, appropriate measures has been taken to remove the possibilities to misbehave. This contrasts with traditional security solutions, e.g., PKI infrastructure, where a lot of trust is placed in just a few actors which all have the possibility to misbehave.

A consequence of this lack of trust is that we must rethink our security model when building applications on the blockchain. In traditional web applications the client is served from the server and the client blindly trusts everything the server says. In a blockchain system, the client should ideally be a local client on the user’s computer or at least be served from a server independent from the blockchain. Furthermore, it should not rely on data from a single server or node. Instead, it should get the same data from multiple nodes in the blockchain and compare the results to assure that the server is not lying to it. This might sound complicated, and perhaps it is, but this way of designing the system moves power from the server owner to the user of the system. Which results in a democratization of technology.

Public by default

As has been said many times in this post already, all data on the blockchain is always public. This, as we saw in previous sections, can be both a blessing and a curse. However, if you need to build a fully transparent system, where information is supposed to be widely available. Blockchains are a great way to assure that. Especially in combination with the previous point about trust lessness and immutability because it assures that no one can cheat. It provides a way to give back control to the user.

But when can we make use of a blockchain?

Now that we know about how blockchains work under the hood, as well as what benefits it can provide and what drawbacks it has it should be obvious to everyone that most companies should stay far away from blockchains. However, if you want to build a system, that does not rely on one specific entity but instead on a majority vote from a community of smaller actors, a system where all information is public and where users themselves can partake in setting the rules for the system. Where it is more important that operations are correct forever than that they are fast. In short, if you want to build a system that belongs to the users. Blockchains could be a good choice.


  1. https://www.derpturkey.com/merkle-tree-construction-and-proof-of-inclusion/amp/ ↩︎

  2. https://ethereum.org/en/developers/docs/consensus-mechanisms/pow/ ↩︎

  3. https://www.fool.com/investing/stock-market/market-sectors/financials/cryptocurrency-stocks/proof-of-stake/ ↩︎

  4. https://limechain.tech/blog/delegated-proof-of-stake-explained/ ↩︎

  5. https://www.ledger.com/academy/glossary/transactions-per-second-tps ↩︎

  6. https://cryptopotato.com/byzantine-fault-tolerance-in-blockchain-a-closer-look/ ↩︎

Software Development, Blockchain, Software Architecture,