Blockchain is a way of storing digital data. The data can literally be anything. For Bitcoin, it’s the transactions (logs of transfers of Bitcoin from one account to another), but it can even be files; it doesn’t matter. The data is stored in the form of blocks, which are linked (or chained) together using cryptographic hashes — hence the name “blockchain.”
All of the magic lies in the way this data is stored and added to the blockchain. A blockchain is essentially a linked list that contains ordered data, with a few constraints such as:
- Blocks can’t be modified once added; in other words, it is append only.
- There are specific rules for appending data to it.
- Its architecture is distributed.
Enforcing these constraints yields the following benefits:
- Immutability and durability of data
- No single point of control or failure
- A verifiable audit trail of the order in which data was added
We’ll be storing data in our blockchain in a format that’s widely used: JSON. Here’s what a post stored in blockchain will look like:
The generic term “data” is often replaced on the internet by the term “transactions.” So, just to avoid confusion and maintain consistency, we’ll be using the term “transaction” to refer to data in our example application.
The transactions are packed into blocks. A block can contain one or many transactions. The blocks containing the transactions are generated frequently and added to the blockchain. Because there can be multiple blocks, each block should have a unique ID.
We’d like to prevent any kind of tampering in the data stored inside the block, and detection is the first step to that. To detect if the data in the block has been tampered with, you can use cryptographic hash functions.
A hash function is a function that takes data of any size and produces data of a fixed size from it (a hash), which is generally used to identify the input. The characteristics of an ideal hash function are:
- It should be easy to compute.
- It should be deterministic, meaning the same data will always result in the same hash.
- It should be uniformly random, meaning even a single bit change in the data should change the hash significantly.
The consequence of this is:
- It is virtually impossible to guess the input data given the hash. (The only way is to try all possible input combinations.)
- If you know both the input and the hash, you can simply pass the input through the hash function to verify the provided hash.
This asymmetry of efforts that’s required to figure out the hash from an input (easy) vs. figuring out the input from a hash (almost impossible) is what blockchain leverages to obtain the desired characteristics.
We’ll store the hash of the block in a field inside our Block object, and it will act like a digital fingerprint (or signature) of data contained in it:
from hashlib import sha256
Note: In most cryptocurrencies, even the individual transactions in the block are hashed and then stored to form a hash tree (also known as a merkle tree). The root of the tree usually represents the hash of the block. It’s not a necessary requirement for the functioning of the blockchain, so we’re omitting it to keep things simple.
Okay, we’ve now set up the blocks. The blockchain is supposed to be a collection of blocks. We can store all the blocks in the Python list (the equivalent of an array). But this is not sufficient, because what if someone intentionally replaces an old block with a new block in the collection? Creating a new block with altered transactions, computing the hash, and replacing it with any older block is no big deal in our current implementation.
We need a way to make sure that any change in the previous blocks invalidates the entire chain. The Bitcoin way to do this is to create dependency among consecutive blocks by chaining them with the hash of the block immediately previous to them. By chaining here, we mean to include the hash of the previous block in the current block in a new field called previous_hash.
Okay, if every block is linked to the previous block through the previous_hash field, what about the very first block? That block is called the genesis block and it can be generated either manually or through some unique logic. Let’s add the previous_hash field to the Block class and implement the initial structure of our Blockchain class.
from hashlib import sha256
Now, if the content of any of the previous blocks changes:
- The hash of that previous block would change.
- This will lead to a mismatch with the previous_hash field in the next block.
- Since the input data to compute the hash of any block also consists of the previous_hash field, the hash of the next block will also change.
Ultimately, the entire chain following the replaced block is invalidated, and the only way to fix it is to recompute the entire chain.
There is one problem, though. If we change the previous block, the hashes of all the blocks that follow can be re-computed quite easily to create a different valid blockchain. To prevent this, we can exploit the asymmetry in efforts of hash functions that we discussed earlier to make the task of calculating the hash difficult and random. Here’s how we do this: Instead of accepting any hash for the block, we add some constraint to it. Let’s add a constraint that our hash should start with “n leading zeroes” where n can be any positive integer.
We know that unless we change the data of the block, the hash is not going to change, and of course we don’t want to change existing data. So what do we do? Simple! We’ll add some dummy data that we can change. Let’s introduce a new field in our block called nonce. A nonce is a number that we can keep on changing until we get a hash that satisfies our constraint. The nonce satisfying the constraint serves as proof that some computation has been performed. This technique is a simplified version of the Hashcash algorithm used in Bitcoin. The number of zeroes specified in the constraint determines the difficulty of our proof of work algorithm (the greater the number of zeroes, the harder it is to figure out the nonce).
Also, due to the asymmetry, proof of work is difficult to compute but very easy to verify once you figure out the nonce (you just have to run the hash function again):
Notice that there is no specific logic to figuring out the nonce quickly; it’s just brute force. The only definite improvement that you can make is to use hardware chips that are specially designed to compute the hash function in a smaller number of CPU instructions.
To add a block to the chain, we’ll first have to verify that:
- The data has not been tampered with (the proof of work provided is correct).
- The order of transactions is preserved (the previous_hash field of the block to be added points to the hash of the latest block in our chain).
Let’s see the code for adding blocks into the chain:
The transactions will be initially stored as a pool of unconfirmed transactions. The process of putting the unconfirmed transactions in a block and computing proof of work is known as the mining of blocks. Once the nonce satisfying our constraints is figured out, we can say that a block has been mined and it can be put into the blockchain.
In most of the cryptocurrencies (including Bitcoin), miners may be awarded some cryptocurrency as a reward for spending their computing power to compute a proof of work. Here’s what our mining function looks like:
from hashlib import sha256
Okay, now it’s time to create interfaces for our blockchain node to interact with the application we’re going to build. We’ll be using a popular Python microframework called Flask to create a REST API that interacts with and invokes various operations in our blockchain node. If you’ve worked with any web framework before, the code below shouldn’t be difficult to follow along.
These REST endpoints can be used to play around with our blockchain by creating some transactions and then mining them.
from flask import Flask, request
Up to this point, the blockchain that we’ve implemented is meant to run on a single computer. Even though we’re linking block with hashes and applying the proof of work constraint, we still can’t trust a single entity (in our case, a single machine). We need the data to be distributed, we need multiple nodes maintaining the blockchain. So, to transition from a single node to a peer-to-peer network, let’s first create a mechanism to let a new node become aware of other peers in the network:
# Contains the host addresses of other participating members of the network
A new node participating in the network can invoke the register_with_existing_node method (via the /register_with endpoint) to register with existing nodes in the network. This will help with the following:
- Asking the remote node to add a new peer to its list of known peers.
- Initializing the blockchain of the new node with that of the remote node.
- Resyncing the blockchain with the network if the node goes off-grid.
However, there’s a problem with multiple nodes. Due to intentional manipulation or unintentional reasons (like network latency), the copy of chains of a few nodes can differ. In that case, the nodes need to agree upon some version of the chain to maintain the integrity of the entire system. In other words, we need to achieve consensus.
A simple consensus algorithm could be to agree upon the longest valid chain when the chains of different participating nodes in the network appear to diverge. The rationale behind this approach is that the longest chain is a good estimate of the most amount of work done (remember proof of work is difficult to compute):
Next, we need to develop a way for any node to announce to the network that it has mined a block so that everyone can update their blockchain and move on to mine other transactions. Other nodes can simply verify the proof of work and add the mined block to their respective chains (remember that verification is easy once the nonce is known):
# endpoint to add a block mined by someone else to
The announce_new_block method should be called after every block is mined by the node so that peers can add it to their chains.
Now, it’s time to start working on the interface of our application. We’ve used Jinja2 templating to render the web pages and some CSS to make things look nice.
Our application needs to connect to a node in the blockchain network to fetch the data and also to submit new data. There can also be multiple nodes, as well.
The fetch_posts function gets the data from the node’s /chain endpoint, parses the data, and stores it locally.
The application has an HTML form to take user input and then makes a POST request to a connected node to add the transaction into the unconfirmed transactions pool. The transaction is then mined by the network, and then finally fetched once we refresh our web page: