If you have been reading the news recently you might have noticed a cryptocurrency called Bitcoin which has been gaining incredible interest. At the time of writing, each Bitcoin has a value of over $10,000 with a total market capitalisation of approximately $180 billion. Underlying Bitcoin is blockchain, a very clever way of maintaining decentralised data on the Internet. In this article we are going to look at how bitcoin and blockchain work and how blockchain can be used to the advantage of machine learning applications.
Bitcoin and the Blockchain
In essence, a blockchain a distributed database in which each computer in the network can potentially modify the database but modifications are only accepted if there is a consensus. The "block" bit refers to a record or set of records and each block is chained together using cryptographic tools such as public key cryptography and hashing in order to ensure security. In the example of Bitcoin, the blockchain represents a ledger such as the one shown below.
Each line in the table represents a transaction. For example, in the first row Alice pays Bob $10. By keeping a list of all transactions and knowing how much each person has at the start we know how much they currently have. If each person starts with $10 then it is easy to work through and see that Alice, Bob and Carol end up with $8, $15, and $7 respectively.
One of the innovations of Bitcoin and blockchain is allowing this kind of ledger to be kept decentralised and to avoid fraudulant transactions by using consensus mechanisms. This is achieved without the need for a central trusted authority such as a bank. Since there are a finite number of Bitcoins (21 million) which are released incrementally, they can be used as a store of value just like any other currency.
The implications of a cryptographic currency such as Bitcoin are potentially revolutionary. Transactions can occur globally in minutes, not days as with some bank transactions, and fees are low. In addition, bitcoin payments cannot be reversed avoiding the possibility of fraud through reversing transactions as with credit card or bank payments.
Naturally there are some disadvantages to Bitcoin. Storing money on phones and computers makes them even greater targets for hackers. In addition, the currency is extremely volatile and has some difficulties scaling up to the huge usage it currently has.
Blockchain and Machine Learning
So what has all of this got to do with machine learning? Since the inception of Bitcoin in 2008, there have been many other uses for blockchain-based technology. We focus on Golem a "worldwide supercomputer", iExec a platform for building decentralised applications, and Filecoin which is a storage network.
Golem and iExec
The idea of Golem and iExec is to use spare computing power from idle computers. Such an idea is not novel, for example SETI provided a program called SETI@home to analyse radio signals in 1999 that would use spare computing power whenever a screensaver was active. SETI@home was extraordinarily successful, having over 145,000 active computers in the system as of 23 June 2013. In addition HTCondor uses spare computing power on local networks.
Golem and iExec differ from SETI@home and HTCondor as they are more general purpose and users pay for the computing power they use using a cryptocurrency (based on Ethereum). If the price of such a service was competitive with clouding computing services (and Golem's FAQ implies it will be) it could reduce the cost of distributed computing whilst compensating providers to incentivise them.
There are naturally many hurdles to the technology which must be solved before it is ready for real applications. For example, how can data privacy be ensured and how can the system be kept secure. For Golem the initial use case is a CGI rendering task, it is easy to see that it can be applied to machine learning applications. It would be paritcularly useful for deep learning if it could allows easy and cheap access to GPU computing (iExec plans to provide GPU access).
Filecoin and IPFS
Filecoin is a blockchain application which allows users to rent out their spare storage space. According to this site 50% of the storage space available worldwide is unused. Filecoin aims to tap into this massive resource by creating a storage marketplace paid for using Filecoin tokens. One of the advantages of a decentralised file storage network is it has an improved resilience to failure compared to centralised storage.
Filecoin is based on InterPlanetary File System which markets itself as a peer-to-peer hypermedia protocol. In its whitepaper it states "IPFS could be seen as a single Bittorrent swarm, exchanging objects within one Git repository." The version history of every file is stored along with indexing information and human-readable names.
Clearly storage is another important consideration in machine learning applications, particularly Big Data ones. Filecoin and similar technologies such a peer to peer databases can play an important role in providing resources for these kind of applications.
Bitcoin is making waves in the financial world, and its underlying technology, blockchain, has been shown to have numerous applications. We looked at blockchain-based compute and storage solutions Golem, iExec and Filecoin/IPFS. Although these are at very early stages of their development, their potential is great. The promise of cheap, reliable and plentiful computing resources promises to make machine learning more accessible, accurate and scalable.
- "Bitcoin: A Peer-to-Peer Electronic Cash System" by Satoshi Nakamoto
- BigchainDB, a blockchain database
- Bitcoin: Seven questions you were too embarrassed to ask
- Filecoin White Paper
- Gridcoin is another decentralised computing platform with cryptocurrency rewards
- Golem White Paper
- How blockchain will disrupt traditional computing
- Learning blockchains by building one
- orbit-db a distributed database based on IPFS
Subscribe to SimplyML: Simply Machine Learning
Get the latest posts delivered right to your inbox