How exactly does hashing work, and why is it irreversible?

Question:

2009-01-07 06:41:15 UTC

If someone has access to the code used for hashing (like md5() in the opensource PHP code), why is it not reversible? Surely, you should be able to take the 32-bit string and re-do the steps one by one? I tried Google but didn't find any clear answers.

Thanks.

Five answers:

LuckyStrikes

2009-01-07 06:48:37 UTC

To decrypt the data or information you will need the encryption method used as well as the encryption key or pass phrase and a program that specifically is made to undo every action that the encrypting program made using the key which can be difficult because say you are encrypting the word "hello" with the number "1" which the program uses to move down one letter in the alphabet, you get "ifmmp", and then lets say the program uses a secondary modification using the same pass phrase of "1", and instead multiples "1" by "2" and then shifts the letters again by the new value of "5" so you get "khoor", and obviously the complications are much greater with larger values used and encryption. However if you have the source code and understand how it works then a simple decryption program can be made to undo the changes observed in the source. Hashing usually assumes that you don't know the encryption code used or the pass phrase but usually the pass phrase is kept secret.

whitekt64

2009-01-07 07:44:00 UTC

Hashing isn't reversible because the input-to-hash mapping is not 1-to-1.

In your example of a 32 bit hash value, imagine what would happen if you wanted to hash 64-bit values.

Well, the set of all possible 64-bit values is obviously a lot bigger than the set of all possible 32-bit values. So if you computed the (32-bit) hash of every possible 64-bit value, there would obviously be some values that had the same hash. Therefore, given the hash, it's impossible to tell what the original input was.

Now imagine what happens if you allow the input of the hash algorithm to be some arbitrary length, not just 64 bits, and you'll see why it's not possible to reverse the algorithm.

Having two inputs map to the same hash value is usually referred to as a "hash collision". For security purposes, one of the properties of a "good" hash function is that collisions are rare in practical use. For some other applications (like the database search approach another poster mentioned) collisions aren't so much of a problem.

A Google search on "hash collisions encryption" will yield some useful pages.

2009-01-07 06:44:06 UTC

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.

As a simple example of the using of hashing in databases, a group of people could be arranged in a database like this:

Abernathy, Sara Epperdingle, Roscoe Moore, Wilfred Smith, David (and many more sorted into alphabetical order)

Each of these names would be the key in the database for that person's data. A database search mechanism would first have to start looking character-by-character across the name for matches until it found the match (or ruled the other entries out). But if each of the names were hashed, it might be possible (depending on the number of names in the database) to generate a unique four-digit key for each name. For example:

7864 Abernathy, Sara 9802 Epperdingle, Roscoe 1990 Moore, Wilfred 8822 Smith, David (and so forth)

A search for any name would first consist of computing the hash value (using the same hash function used to store the item) and then comparing for a match using that value. It would, in general, be much faster to find a match across four digits, each having only 10 possibilities, than across an unpredictable value length where each character had 26 possibilities.

The hashing algorithm is called the hash function (and probably the term is derived from the idea that the resulting hash value can be thought of as a "mixed up" version of the represented value). In addition to faster data retrieval, hashing is also used to encrypt and decrypt digital signatures (used to authenticate message senders and receivers). The digital signature is transformed with the hash function and then both the hashed value (known as a message-digest) and the signature are sent in separate transmissions to the receiver. Using the same hash function as the sender, the receiver derives a message-digest from the signature and compares it with the message-digest it also received. They should be the same.

The hash function is used to index the original value or key and then used later each time the data associated with the value or key is to be retrieved. Thus, hashing is always a one-way operation. There's no need to "reverse engineer" the hash function by analyzing the hashed values. In fact, the ideal hash function can't be derived by such analysis. A good hash function also should not produce the same hash value from two different inputs. If it does, this is known as a collision. A hash function that offers an extremely low risk of collision may be considered acceptable.

Here are some relatively simple hash functions that have been used:

The division-remainder method: The size of the number of items in the table is estimated. That number is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.)

Folding: This method divides the original value (digits in this case) into several parts, adds the parts together, and then uses the last four digits (or some other arbitrary number of digits that will work ) as the hashed value or key.

Radix transformation: Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length.

Digit rearrangement: This is simply taking part of the original value or key such as digits in positions 3 through 6, reversing their order, and then using that sequence of digits as the hash value or key.

A hash function that works well for database storage and retrieval might not work as for cryptographic or error-checking purposes. There are several well-known hash functions used in cryptography. These include the message-digest hash functions MD2, MD4, and MD5, used for hashing digital signatures into a shorter value called a message-digest, and the Secure Hash Algorithm (SHA), a standard algorithm, that makes a larger (60-bit) message digest and is similar to MD4.

2009-01-07 06:43:39 UTC

What The **** md5 PHP code 32 bit What the hell are you even talking about

2016-05-25 11:23:22 UTC

It is the resin and pollen from the Marijuana plant. The best is from Lebanon. I once liked it.

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.

about - legalese