Random or Pseudo-random?
There seems to be quite a misunderstanding among people when it comes to randomness in the digital domain. As it turns out a computer is a deterministic entity, which follows certain fixed patterns: It takes data as input, manipulates it, and produces some output. Deterministic means that the output is determined based on the input data and the manipulation rules. Thus a machine can not create a RANDOM output (produce random data). You can not tell it to pick a random input or use a random function. When you type 7 + 3 + 2 in a calculator, it will always return 12. The same holds for any digital device.
It can however produce data, which is almost as good as random. This is called pseudo-randomness and consists of taking multiple hard to predict inputs and mixing them with one-way functions such as a cryptographic hash. The most common inputs are current time, current temperature, or any form of noise – each with a hundred-digit precision. In theory, anyone knowing what input data and which functions the computer uses, can predict the output. However, since time, temperature, etc. are not fixed, but constantly changing(floating), it is very hard to determine the precise value used as input. Due to the unpredictable properties of both the input data and the one-way functions, the resulting output holds many of the properties of a true random variable – hence the term pseudo-random.
[To make it clear: if you instruct a computer to use the last 6 nanoseconds of the current time, this input is not exactly random, but it will still be different (almost) every time.]
Randomness in Bitcoin
The original Bitcoin client creates private keys in a (pseudo-)random manner: every new key-pair is independent from all the others. This adds an additional layer of security, since an attacker obtaining any number of private keys previously generated by the client, can not predict what keys will be generated in the future. The drawback is that the user needs to make backups quite frequently. Usually the client creates a set of 100 random keys. Once they are used, it creates another 100 and so on. You need to make a backup once a new set of 100 keys is created or you will never be able to recover them in case your wallet gets corrupted or deleted. If you fail to realise why, consider the following scenario:
1. You start using the original BTC client and it creates 100 random keys.
2. You make a backup of the 100 keys.
3. After a few months of usage you use up all 100 keys and the client creates another set of 100 random keys.
4. Unless you make a backup again, you will never be able to recover those new keys and any BTC associated with them, because they are randomly generated and since the initial backup only included the first 100 keys created.
Recall that due to security concerns, each key is only used for one transaction and replaced afterwards. Thus a user making frequent transactions will have to backup quite often. Some users may perceive this as inconvenient and developers created an alternative method of key generation, which sacrifices the security of random keys for better usability in the form of only one required backup. Welcome to deterministic wallets.
In very simple terms, deterministic wallets create one random private key and based on that, they generate all subsequent keys. They do this by performing arithmetic operations to the “core” private key. Remember that a private key is just a number as any other and can be subject to any operation such as addition, multiplication, etc. For example the very first key created could be
(core + 1) x 2, the second could be:
(core + 2) x 3, then
(core + 4) x 4,
(core + 8) x 5,
(core + 16) x 6,
(core + 32) x 7 etc. You can probably recognise the pattern. The reasoning is that the entire space for possible private keys is so big (2256) that it is impractical for an attacker to pinpoint the core key, the pattern used for address generation and based on that predict any subsequently generated keys. For the user, however, this brings the advantage of only having to backup the core key and the generation pattern. To make things even simpler, the backup consists only of a few dictionary words, which can be easily memorised or simply written on a piece of paper. If anything would happen to the wallet, it can be entirely recovered just by the word sequence provided. Any keys, no matter if 200 or 200.000 will be calculated and recovered from it as well.
This was a short overview of deterministic wallets and their advantages. Below I will discuss them in more detail and also talk about their biggest disadvantage.
The dirty details
The exact procedure of key generation in a hierarchically deterministic wallet is a s follows:
1. Choose an initial number, called SEED.
2. Feed the seed through a SHA512 hash function.
3. Split the resulting 512bit number in half.
4. Use the leftmost 256bits as “core” private key and derive from it the “core” public key.
5. Use the rightmost 256bits as “core” chaincode.
6. Feed the core public key, core chaincode, and a sequence number (1, 2, 3, 4, 5, …) through a SHA512 hash.
7. Split in half and add the resulting leftmost 256bits to the core private key to create a new child private key.
8. Use the resulting rightmost 256bits as the chaincode for the newly created private key.
9. Either go back to step 6, increase the sequence number (seq. +1 ) and derive the next private key from the core key (horizontal generation)
10. Do the same procedure with the newly created child key and it own chaincode (vertical generation).
The difference is that with 9. all new private keys will be direct children keys of the core key, while 10. creates grandchildren. Of course the hierarchy can go further down into grand-grand-grand-grand…. children. This is why this method is called hierarchically determined key generation: you start with one key and derive the rest in a hierarchical manner. Once again this may sound confusing, but hopefully the following diagram makes it more understandable:
There is an inherent weakness in this method of generating keys. It allows another person to know precisely which keys your wallet is going to create in the future and get access to all of them – i.e. steal all your Bitcoins. It is not immediately clear how, but try to understand the parent-chain of keypair 2.3.2 depicted below (follow the red arrows):
Most wallets do not even try to hide the core public key and the core chaincode. They even allow you to easily export them in order to create a so called “watch only” wallet. This type of wallets, reconstruct the whole hierarchy of public keys of your original wallet (Verify yourself that this is possible by knowing only the core public key and core chaincode). Since the private keys are missing, these wallets can not create transactions, however they are still able to check the balances of all your addresses – hence the term “watch only”. Now to the crucial point: As just mentioned, someone who knows your core public key and your core chaincode is able to reconstruct the whole hierarchy of your addresses. This means that he knows all the offsets between each level of addresses. If he somehow came into position of ONLY ONE of your private keys, he will be able to derive back your core private key and any other private keys that your wallet has created or will create at any point in the future!
Still not clear how? Lets do the math for keypair 2.3.2:
1. Attacker has core public key and -chaincode.
2. He reconstructs the whole hierarchy as seen in the picture above, only without any private keys.
3. He comes into possession of private key 2.3.2.
4. He calculates private key 2.3.2 MINUS Offset 0.0.3 and obtains private key 2.3.
5. He calculates private key 2.3 MINUS Offset 0.3 and obtains private key 2.
6. He calculates private key 2 MINUS Offset 2 and obtains your core private key.
7. From here he is able to just start recreating all possible keypairs in both vertical and horizontal direction. This gives him access to all your current keys and in addition to any keys that your wallet will create in the future. Whoops…
Thus the lesson from all of this, which almost no one will tell you: Exposing private keys is always a bad idea, but in the case of deterministic wallets, it easily means loosing ALL your Bitcoins.