# Hashing and Hash Cracking

## Introduction to Hasing

Hashing is a fundamental concept in cybersecurity. It’s not encryption, and it’s not the same as encoding. It’s something entirely different—and understanding how it works is crucial for working with passwords, data integrity, digital forensics, and more.

Hashing is the process of taking input data and running it through a **hash function**, which generates a fixed-length string that represents the original input. This output is called a **hash** or **digest**. The idea is simple: the same input will always produce the same output, but it’s nearly impossible to go backward from the output to the input.

Hashing is used to:

* Store passwords securely
* Check whether files have been tampered with (using `shasum` commands) Learn More at: [here](https://www.redhat.com/en/blog/hashing-checksums)
* Detect duplicate files or data
* Support many cryptographic protocols

Unlike encryption, hashing is **one-way**—you cannot decrypt a hash. If the original input is lost, there is no way to recover it from just the hash.

## How Hash Functions Work

A **hash function** takes input of any size and produces output of a fixed size. It doesn’t matter whether the input is a short word or a long file—the output size stays constant.

If you see the diagram below, even the caps `A` and `a` plain texts have different hash/digest.

<figure><img src="/files/ro1laWWXiRPsSILsQSzP" alt=""><figcaption></figcaption></figure>

Good hash functions follow these rules:

* **Deterministic**: The same input always gives the same output.
* **Fast to compute**: Can handle large inputs quickly.
* **Irreversible**: It should be impossible to figure out the original input from the hash.
* **Collision-resistant**: It should be extremely unlikely that two different inputs produce the same hash.
* **Avalanche effect**: Changing even one bit of the input drastically changes the output.

## Hash Collisions and the Pigeonhole Principle

A **hash collision** happens when two different inputs produce the same hash. This is undesirable, and secure hash functions try to make it very unlikely.

But mathematically, collisions are unavoidable. This is due to the **pigeonhole principle**: if there are more possible inputs than outputs, some inputs must share outputs.

For example, if you have 128 unique inputs but only 96 possible hash outputs, some of those inputs will map to the same output.

Historically, some older algorithms like **MD5** and **SHA1** have been broken using engineered collisions—attacks where researchers created two different files that hash to the same value.

For reference:

* [MD5 collision example](https://www.mscs.dal.ca/~selinger/md5collision/)
* [SHA1 collision example](https://shattered.io/)

As a result, these hash functions are no longer considered secure and should not be used for password storage or file verification.

## Why Use Hashing for Passwords?

When users create passwords for web applications, those passwords should never be stored as plaintext. If an attacker breaches the database, they could immediately see every user’s password. Because many people reuse passwords across sites, this would have devastating consequences.

Instead of storing the password itself, the application stores the **hash of the password**. When a user logs in:

1. They enter their password.
2. The system hashes the input.
3. It compares that hash to the stored one.

If they match, the password is correct. The actual password is never stored.

However, if two users have the same password, their hashes will also be the same. This opens up the possibility for attackers to pre-compute lists of hashes and corresponding passwords—a technique called **rainbow table attacks**.

## Rainbow Tables

A **rainbow table** is a precomputed list of hashes for known passwords. It’s a way to reverse a hash by simply looking it up in a massive database.

For example:

| Hash                             | Password |
| -------------------------------- | -------- |
| e99a18c428cb38d5f260853678922e03 | abc123   |
| e10adc3949ba59abbe56e057f20f883e | 123456   |
| b0baee9d279d34fa1dfd71aadb908c3f | 11111    |

If an attacker sees a hash in a leaked database and finds a match in the rainbow table, they’ve cracked the password instantly.

Online tools like [**Crackstation**](https://crackstation.net/) and [**Hashes.com**](https://hashes.com/en/decrypt/hash) use large internal rainbow tables to help identify password hashes. These sites are especially useful for beginners. You paste a hash, and if it’s in their database, you get the corresponding password.

## Defending Against Rainbow Tables with Salts

To make rainbow tables ineffective, modern systems use a technique called **salting**.

A **salt** is a random string added to a password before it is hashed. Since the salt is different for every user, the final hash will also be different—even if two users pick the same password.

For example:

* Password: `password123`
  * Salt1: `XJt93#1` → Hash: `A12EF...`
  * Salt2: `5Uy!33$` → Hash: `F8AC2...`&#x20;

Because the salt is random and unique per user, precomputed rainbow tables won’t work unless the attacker already knows the salt—and has a rainbow table specifically built for that salt.

<figure><img src="/files/raSKibd4G42txYrEXszY" alt=""><figcaption><p>Referenced from: <a href="https://images.ctfassets.net/23aumh6u8s0i/5C8Vmfi1nfSSh9GDQw4IxZ/d2f5d28320b37760c2e86b141822e029/password-salt-example">https://images.ctfassets.net/23aumh6u8s0i/5C8Vmfi1nfSSh9GDQw4IxZ/d2f5d28320b37760c2e86b141822e029/password-salt-example</a></p></figcaption></figure>

Salts are typically stored in the database alongside the hash. They don’t need to be secret.

Common password-hashing algorithms like **bcrypt**, **sha512crypt**, and **argon2** handle salting internally.

Learn more at: <https://auth0.com/blog/adding-salt-to-hashing-a-better-way-to-store-passwords/>

## Identifying Hash Formats

Some hash formats include a prefix that helps identify the algorithm used. These are especially common in Unix-like systems:

| Prefix | Algorithm   |
| ------ | ----------- |
| `$1$`  | md5crypt    |
| `$2a$` | bcrypt      |
| `$6$`  | sha512crypt |

On Unix systems, password hashes are stored in `/etc/shadow`, which is only readable by the root user. In older systems, they were stored in `/etc/passwd`.

On Windows, password hashes are stored in the **SAM** file. These are typically in **NTLM** format, which is based on a weaker algorithm (a variant of MD4). Windows attempts to protect this file, but tools like **Mimikatz** can be used to extract and dump password hashes.

Hash recognition tools like `hashID` can attempt to detect the hash type based on the format and length, but they are not always accurate. It's best to rely on both tooling and context.

## Cracking Hashes

To recover the original password from a hash, you must **crack** it. This means:

1. Guess a password
2. Hash it (with the same algorithm and salt, if applicable)
3. Compare the result with the target hash
4. Repeat until you find a match

This can be done manually with scripts, or using automated tools.

For beginners, the easiest approach is to use:

* [**Crackstation**](https://crackstation.net/) or [**Hashes.com**](https://hashes.com/en/decrypt/hash) to look up hashes in known databases
* Wordlists like `rockyou.txt` to guess common passwords with tools like hashcat or JohnTheRipper.

For more serious or custom cracking efforts, tools like **Hashcat** and **John The Ripper** are used. These support many hash types, wordlists, rule-based cracking, and can use GPU acceleration.

{% hint style="info" %}
We will explore **Hashcat and John The Ripper** in depth in a later chapter, including how to run it, how to choose the right attack mode, and how to optimize it for your system.
{% endhint %}

## GPUs and Cracking Speed

Hash functions require a lot of math. GPUs are well suited for this because they can run thousands of operations in parallel. This makes GPU cracking significantly faster than CPU cracking for many hash types.

However, some algorithms like **bcrypt** are specifically designed to be slow and resist GPU acceleration. These are better for password storage since they slow down attackers.

## Hashes for File Integrity

Hashing is also used to check that files haven’t been altered. When you hash a file and get a known-good value (e.g., from a developer or software vendor), you can later rehash your local copy and compare the values.

If the hashes match, the file is unchanged. If they differ, the file may have been tampered with, corrupted, or replaced.

Hashing can also be used to find duplicate files by comparing their hash values.

## Summary

Hashing is one of the most useful tools in cybersecurity. It allows us to:

* Store passwords safely
* Detect file tampering
* Find duplicate data
* Verify downloaded files

It’s also the first step in understanding more advanced cryptographic concepts. Mastering the basics of hashing—and how to crack or protect hashes—is foundational to real-world security work.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://handbook.ncateam.xyz/fundamentals/cryptography/hashing-and-hash-cracking.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
