What the heck is blockchain?
Lately, I've been talking more and more about blockchain and its potential impact. As I've been learning more about the technology and sharing what I've learned with my friends, I've decided it would be useful to write an introductory post to the technology, paving the way for subsequent posts on how this technology may be used to create value in our society.
At a high level, blockchain allows us to store information in a resilient manner that is tamper-resistant. Because of its structure, blockchains remove the need to place trust in some central authority to maintain and secure our data. In this post, we'll discuss how to store information in a blockchain and what makes it resistant to fraudulent activity and data tampering.
Hashing functions
The first concept we must discuss in order to talk about blockchains is a hash function, specifically, a cryptographic hash function. Hash functions are used to map data of arbitrary size to data of a fixed size (referred to as the hash). Cryptographic hash functions are a special type of hash functions which easily allows for someone to map input data to a hash, but prevents someone from being able to recreate the input data given a hash value. In this sense, a cryptographic hash function is considered a "one-way" function.
The SHA256 algorithm is a popular cryptographic hashing function that is used widely across many industries to keep data (such as passwords) secure. You can play with a demo here which will calculate the hash of any input data you provide. For example, if your input data is "This is a test." the resulting hash will always be "a8a2f6ebe286697c527eb35a58b5539532e9b3ae3b64d4eb0a46fb657b41562c".
Creating a block
Blockchains are made of individual blocks that each contain data. For the case of a digital asset such as bitcoin, this data will be transactions between people sending some quantity of the asset to one another.
Each block contains some header information (discussed in the following section), the data to be stored, and something called nonce. The nonce is an arbitrary number that we combine with our data stored in the block to slightly alter the resulting hash. When someone refers to "mining" with respect to the blockchain, what they're really referring to is searching for the right nonce value such that our resulting hash meets some criteria. For example, we might require that the hash must start with four zeros; in reality, we could set our criteria to any number of things, the important part is that we're making the miner search (ie. work) for a nonce that will yield a hash meeting our criteria. Similar to the previous section, you can explore what a block is with this demo. Go ahead and input some data, and then change the nonce value, observing what happens to the resulting hash.
When you click the "Mine" button the computer will start with an initial nonce value of 1, and increment the nonce until the resulting hash meets the specified criteria. In this demo, we're looking for a nonce value that results in a hash beginning with four zeros.
Because the cryptographic hash function is considered a "one-way" function, there's no way to reverse engineer the function to calculate the proper nonce value. Thus, we must simply brute force guess the right nonce. This brute force guessing takes work, and finding the right nonce value serves as a "proof of work". This is an important part of the process, one that we'll revisit later.
A block is considered valid if the hash meets a specified criteria.
Chaining blocks together
The real value emerges when we chain blocks together as a linked list. Each block has a header which contains information on the block's sequence number and the hash of the previous block. If we include the previous hash (combined with the data and nonce value of the current block) when calculating the hash of the current block, we can create a dependency chain between blocks.
As new information flows in, we create a new block and append it to this linked list.
Recall that the hash of a block depends on three things:
- The hash of the previous block.
- The data of the current block.
- The nonce of the current block.
[{\rm{SHA256}}\left( {{\rm{Prev\hspace{1mm}hash, data, nonce}}} \right) \to {\rm{Hash}}]
If we were to update the data of the middle block (highlighted below in purple), this would also affect the resulting hash of the block. Moreover, because the blocks are linked together by including the hash of the previous block, the following block is now changed as well (highlighted in yellow). After making a change to one block, we break the hash criteria for that block, and every subsequent block, to be considered valid (the hash no longer contains four zeros, in this example).
If we wanted to make a change to a block in the blockchain, we'd need to re-"mine" the block in order to find the proper nonce such that our hash meets the specified criteria. And because the blocks are linked, we'd need to find a new nonce value (by re-"mining") for every subsequent block such that all of the updated hash values meet the specified criteria. Thus, it is possible to change data stored on a blockchain, but it would take work to find the new nonce values in order for our hash to meet the specified criteria. This introduces a cost for changing data in the blockchain.
I encourage you to play around with the demo to see how changing information in one block affects the subsequent blocks.
A distributed network
Up until now, we've discussed a somewhat annoying way of keeping track of information. Why must we subject ourselves to so much work each time we create a block? Thus far, nothing we've discussed is "tamper-proof", we've just made it laborious (and thus costly) to go back and make changes to the blockchain. What's the big deal?
So far we've discussed the blockchain from the perspective of a single player, but the true blockchain protocol requires a network of participants, each maintaining their own blockchain. Thus, our ledger of data must be distributed across a network.
Thus, each member of the network is responsible for keeping their blockchain up to date and accurate by communicating with other participants on the network and listening for new blocks.
Network participants can broadcast new information to be stored on the blockchain. Block creators (ie. miners) listen for new information being broadcast and collect this information into a block, going to work searching for the proper nonce value for the hash to meet its criteria. Once a block creator has discovered the proper nonce, they broadcast this to the network and network participants add this block to the end of their chain. You can easily verify that the block creator expended work to create this block by confirming the block hash meets the specified criteria. Additionally, the block creator is typically rewarded for their work which provides the incentive to continue mining.
Participants can exit and enter the network at any time; upon entering or re-entering the network, they simply adopt the most trusted blockchain found on the network and listen for new blocks broadcasted by block creators. Additionally, if at some point your blockchain conflicts with someone else's blockchain on the network, you defer to the longest chain as the true ledger. We consider the longest chain on the network to be the most trusted because that chain has had the most work put into it. By doing so, we've introduced a way to reach decentralized consensus.
Now let's revisit this concept of "proof of work" and consider why we might subject ourselves to so much work in order to add or change information to the blockchain.
Let's suppose that you broadcast a transaction to be stored on the blockchain and one of the miners decides to change this transaction. For a concrete example, suppose you were trying to send your mother some money, but the miner decides to rewrite the transaction such that you send the miner money instead of your mother. The miner could make this change to the data and then get to work finding the right nonce such that the block is considered valid, broadcasting their block to the network once they find the right nonce.
However, as long as there are other miners on the network that are not in coordination with the malevolent miner, they'll broadcast a different block to be added to the chain (one containing the true information).
Now that there's a conflict, we must wait until one chain grows to be longer than the other. In order for the malevolent miner to maintain the longest chain, they would have to work at breakneck pace furiously mining new blocks in order to keep up with the pace of the valid blockchain that all of the other miners are contributing to. At a certain point, it simply becomes infeasible for a single entity to maintain this fraudulent chain unless they control close to 50% of the mining network.
Thus, because we've required a certain amount of work to be put into mining a block, we're able to make fraudulent activity cost prohibitive. Further, because the miners will only receive a reward for blocks that they contribute to the trusted blockchain, they're incentivized to maintain the most accurate blockchain themselves.
The fact that the blockchain is replicated and distributed across many participants in the network is an important one. Rather than trusting one central authority, we've now developed a system that allows us to place our trust into a network.
Consensus protocols
The "proof of work" scheme discussed in this post is loosely based on the consensus protocol for bitcoin (the hash criteria was simplified for this introductory post). However, it's worth noting that there exists a whole family of consensus protocols for determining what to trust in a decentralized network. The two main approaches today for a consensus protocol are proof of work and proof of stake, the differences between the two approaches are discussed here.
Summary
The blockchain protocol has introduced a way to securely keep track of information without the need to place trust in a central authority. This protocol combines computer networking, cryptographic techniques, and economic incentives to introduce a method for maintaining a distributed, decentralized ledger that remains secure as long as there are sufficient participants in the network.
This approach of securely storing information on a network can be used for anything from keeping track of financial transactions (Bitcoin) to hosting code that can be run autonomously (Ethereum). A large number of industries are now looking at the technology to see how they can take advantage of its benefits.
Further reading
In future posts, I'll take a look at bitcoin, smart contracts, and token economies. Until then, enjoy these resources to continue exploring blockchain technologies.
- Original bitcoin paper
- History of blockchain developments
- Blockchain demo by Anders Brownworth
- Blockchains don’t scale. Not today, at least. But there’s hope.
- A Letter to Jamie Dimon - and anyone else still struggling to understand cryptocurrencies
- Hackernoon: 3 steps to understanding blockchain
- Making Sense of “Cryptoeconomics”
- A Hitchhiker’s Guide to Consensus Algorithms
- Learn Blockchains by Building One