Like Two Drops of Water
Identical twins develop from a single fertilised cell. They share the same DNA and yet they are not entirely identical. At first glance they may seem indistinguishable, but there are mismatches. As it turns out, no two people in the world share the same fingerprint. This of course also holds true for identical twins. There are roughly 7.3 billion people on the planet and every single one is uniquely identifiable by their fingerprint – quite an amazing fact. Thus even if our eyes fool us into believing that identical twins are entirely identical, a comparison of their fingerprints would prove this to be wrong.
A Fingerprint for Information
Is it possible to create something like a fingerprint for information? The answer is yes and many may have heard the term digital fingerprint. A digital fingerprint is the output of a special mathematical function – a hash function. This function takes arbitrary data as input and produces a unique identifier for the data. The output is of a fixed size and usually way smaller than the input itself (hence the term hash). The beauty of hash functions is that even if the inputs have only one tiny, minuscule difference – the resulting outputs would be entirely different. Thus if you would feed the entire Bible into a hash function, then change only one single letter in the entire book and “hash” it again, the obtained two “fingerprints” would be completely different.
Compared to identical twins, who are easily distinguishable by their fingerprints, two identically looking, but slightly different pieces of data can be distinguished by their hashes.
Importantly, hash functions are irreversible – they are so called “one-way-functions”. This simply means that given only a hash, there is no way of telling what the original data was. To visualise the properties of a hash function, consider the following two sentences and their respective hashes (of type SHA256). Note that both sentences differ only by the very last symbol, but their hashes have nothing in common. Also, given the hash alone you can not derive the original text.
Alexandrea est fere tota suffossa specusque habet a Nilo pertinentis, quibus aqua in privatas domos inducitur, quae paulatim spatio temporis liquescit ac subsidit.
Alexandrea est fere tota suffossa specusque habet a Nilo pertinentis, quibus aqua in privatas domos inducitur, quae paulatim spatio temporis liquescit ac subsidit!
Identify the Evil Twin
Some websites publish hashes alongside files they offer for download. This is a very simple, yet powerful method for ensuring that the content you receive is the one you are supposed to receive. If the hash of the downloaded content matches the published hash, you can be certain that no one intercepted your connection and fed you with a modified content.
Is it possible that two pieces of data produce the same hash? In theory the answer is yes. A secure hash function, however, ensures that this can not be systematically achieved. This property is called collision resistance. There are two types of collision resistance.
Weak collision resistance ensures that given a specific data and its hash, it is near impossible to find another data, which produces the very same hash.
Strong collision resistance ensures that it is near impossible to find any two sets of data which produce the same hash.
(The difference between WCR and SCR is subtle and you should not worry if you do not understand it. SCR lets someone choose any two sets of data, while WCR is limited to a specific piece of data.)
In the case of SHA256 the resulting hash is 256bits long. Thus, there are roughly 100000000000000000000000000000000000000000000000000000000000000000000000000000 (78 digits) possible hash outputs. Finding a collision would require some serious luck. You are more likely to get hit by a lightning 14 times than to find a SHA256 collision.