Implementing Hash Tables from Scratch in Java

Hash tables are one of the most fundamental and powerful data structures in computer science. If you've ever used a HashMap in Java or a dictionary in Python, you've been using a hash table under the hood! But have you ever wondered how they actually work?

In this tech tutorial, we're going to build a hash table from scratch in Java. We'll cover the theory behind how hash tables work, then dive straight into the code to see how we can implement one ourselves. Our take? Understanding how to build data structures from the ground up will make you a stronger developer and definitely more attractive to potential employers.

Are you preparing for your next technical interview? Hash tables are a favorite topic for interviewers, so mastering them is essential. If you're looking for more interview prep, check out our technical assessment preparation guide for practical advice and tips.

What are Hash Tables?

Hash tables (also known as hash maps) are data structures that store key-value pairs and provide incredibly fast lookups, insertions, and deletions - typically in O(1) average time complexity. That's pretty amazing when you think about it!

The magic happens through a process called hashing. When you want to store a value, the hash table uses a hash function to convert your key into an array index. This means instead of searching through every element to find what you're looking for (which would be O(n) time), you can jump directly to the right location.

Hash tables are everywhere in real-world applications:

Database indexing for lightning-fast queries
Caching systems like Redis
Programming language implementations (variable lookups)
Web browsers (storing cookies and session data)

How Hash Tables Work - The Theory

Before we start coding, let's understand the key concepts:

Hash Function: This takes your key and converts it into an array index. A good hash function distributes keys evenly across the array to minimize collisions.

Collisions: Sometimes two different keys hash to the same index. We need a strategy to handle this! The two most common approaches are:

Chaining: Store multiple key-value pairs at the same index using a linked list
Open Addressing: Find another empty slot when a collision occurs

Load Factor: This is the ratio of filled slots to total slots. When it gets too high (usually > 0.7), performance degrades, so we need to resize our hash table.

For our implementation, we'll use chaining to handle collisions because it's simpler and more intuitive.

Building Our Hash Table Class

Let's start by creating our basic hash table structure. We'll call it SimpleHashTable to distinguish it from Java's built-in HashMap.

public class SimpleHashTable<K, V> {
    private static final int DEFAULT_CAPACITY = 16;
    private static final double LOAD_FACTOR_THRESHOLD = 0.75;
    
    private Node<K, V>[] buckets;
    private int size;
    private int capacity;
    
    // Inner class to represent key-value pairs
    private static class Node<K, V> {
        K key;
        V value;
        Node<K, V> next;
        
        Node(K key, V value) {
            this.key = key;
            this.value = value;
            this.next = null;
        }
    }
    
    // Constructor
    @SuppressWarnings("unchecked")
    public SimpleHashTable() {
        this.capacity = DEFAULT_CAPACITY;
        this.buckets = new Node[capacity];
        this.size = 0;
    }
}

Here's what's happening: we're using an array of Node objects where each node can form a linked list (that's our chaining approach). The @SuppressWarnings("unchecked") annotation is needed because Java doesn't allow generic array creation directly.

The Hash Function

Now we need our hash function. This is crucial - a poor hash function can turn our O(1) operations into O(n) disasters!

private int hash(K key) {
    if (key == null) {
        return 0;
    }
    
    // Use Java's built-in hashCode() and make it positive
    int hashCode = key.hashCode();
    return Math.abs(hashCode) % capacity;
}

We're leveraging Java's built-in hashCode() method, then using the modulo operator to ensure our index fits within our array bounds. The Math.abs() ensures we don't get negative indices.

Implementing Put (Insertion)

Time for the main event - inserting key-value pairs! This is where the magic happens.

public void put(K key, V value) {
    // Check if we need to resize
    if (size >= capacity * LOAD_FACTOR_THRESHOLD) {
        resize();
    }
    
    int index = hash(key);
    Node<K, V> head = buckets[index];
    
    // If the bucket is empty, create a new node
    if (head == null) {
        buckets[index] = new Node<>(key, value);
        size++;
        return;
    }
    
    // Check if key already exists and update value
    Node<K, V> current = head;
    while (current != null) {
        if (current.key.equals(key)) {
            current.value = value; // Update existing key
            return;
        }
        current = current.next;
    }
    
    // Key doesn't exist, add new node at the beginning of the chain
    Node<K, V> newNode = new Node<>(key, value);
    newNode.next = head;
    buckets[index] = newNode;
    size++;
}

This method handles three scenarios:

The bucket is empty - we create a new node
The key already exists - we update its value
The key is new, but the bucket has other keys - we add to the front of the chain

Implementing Get (Retrieval)

Getting values is straightforward once you understand the logic:

public V get(K key) {
    int index = hash(key);
    Node<K, V> current = buckets[index];
    
    while (current != null) {
        if (current.key.equals(key)) {
            return current.value;
        }
        current = current.next;
    }
    
    return null; // Key not found
}

We hash the key to find the right bucket, then traverse the chain until we find our key or reach the end.

Implementing Remove (Deletion)

Deletion is a bit trickier because we need to handle linked list removal:

public V remove(K key) {
    int index = hash(key);
    Node<K, V> current = buckets[index];
    Node<K, V> previous = null;
    
    while (current != null) {
        if (current.key.equals(key)) {
            // Found the key to remove
            if (previous == null) {
                // Removing the first node in the chain
                buckets[index] = current.next;
            } else {
                // Removing a node in the middle or end
                previous.next = current.next;
            }
            size--;
            return current.value;
        }
        previous = current;
        current = current.next;
    }
    
    return null; // Key not found
}

Dynamic Resizing

When our hash table gets too full, performance degrades. We need to resize and rehash everything:

@SuppressWarnings("unchecked")
private void resize() {
    Node<K, V>[] oldBuckets = buckets;
    capacity *= 2;
    buckets = new Node[capacity];
    size = 0;
    
    // Rehash all existing key-value pairs
    for (Node<K, V> head : oldBuckets) {
        while (head != null) {
            put(head.key, head.value);
            head = head.next;
        }
    }
}

Utility Methods

Let's add some helpful utility methods:

public int size() {
    return size;
}

public boolean isEmpty() {
    return size == 0;
}

public boolean containsKey(K key) {
    return get(key) != null;
}

public void display() {
    System.out.println("Hash Table Contents:");
    for (int i = 0; i < capacity; i++) {
        System.out.print("Bucket " + i + ": ");
        Node<K, V> current = buckets[i];
        while (current != null) {
            System.out.print("[" + current.key + "=" + current.value + "] ");
            current = current.next;
        }
        System.out.println();
    }
    System.out.println("Size: " + size + ", Capacity: " + capacity);
}

Testing Our Hash Table

Let's create a simple test to see our hash table in action:

public class HashTableDemo {
    public static void main(String[] args) {
        SimpleHashTable<String, Integer> hashTable = new SimpleHashTable<>();
        
        // Adding some key-value pairs
        hashTable.put("apple", 5);
        hashTable.put("banana", 3);
        hashTable.put("orange", 8);
        hashTable.put("grape", 12);
        
        System.out.println("Initial hash table:");
        hashTable.display();
        
        // Testing retrieval
        System.out.println("\nTesting get operations:");
        System.out.println("apple: " + hashTable.get("apple"));
        System.out.println("banana: " + hashTable.get("banana"));
        System.out.println("mango: " + hashTable.get("mango")); // Should be null
        
        // Testing updates
        hashTable.put("apple", 10); // Update existing key
        System.out.println("\nAfter updating apple to 10:");
        System.out.println("apple: " + hashTable.get("apple"));
        
        // Testing removal
        System.out.println("\nRemoving banana:");
        Integer removed = hashTable.remove("banana");
        System.out.println("Removed value: " + removed);
        
        System.out.println("\nFinal hash table:");
        hashTable.display();
        
        // Testing collision handling by adding more items
        System.out.println("\nAdding more items to test collisions:");
        hashTable.put("kiwi", 7);
        hashTable.put("mango", 4);
        hashTable.put("pear", 6);
        hashTable.put("peach", 9);
        
        hashTable.display();
    }
}

When you run this, you'll see how our hash table handles insertions, updates, deletions, and even collisions!

Time Complexity Analysis

Our hash table implementation provides:

Average Case: O(1) for get, put, and remove operations
Worst Case: O(n) when all keys hash to the same bucket (very rare with a good hash function)
Space Complexity: O(n) where n is the number of key-value pairs

The key to maintaining O(1) average performance is keeping the load factor reasonable and using a good hash function.

Real-World Applications and Improvements

Hash tables are used everywhere in software development:

Database Systems: Indexes use hash tables for quick record lookups
Caching: Systems like Redis use hash tables to store cached data
Compilers: Symbol tables for variable lookups during compilation
Web Development: Session storage, routing tables, and more

Some improvements you could make to our implementation:

Implement different collision resolution strategies (linear probing, quadratic probing)
Add support for custom hash functions
Implement iterator support to make it iterable
Add thread safety for concurrent access
Implement more sophisticated resizing strategies

Wrapping Up

Congratulations! You've just built a fully functional hash table from scratch. This is no small feat - you've implemented one of the most important data structures in computer science.

Understanding how hash tables work under the hood will make you a better developer and definitely help in technical interviews. Hash table questions are extremely common, and now you can not only use them but also explain how they work internally.

Want to challenge yourself further? Try implementing the improvements we mentioned, or use your hash table to solve some coding problems. You could also explore how Java's HashMap differs from our implementation - it uses a combination of chaining and balanced trees for optimal performance.

If you're getting ready for your next interview, don't forget to also check out our Java developer interview prep guide.

And if you're looking for a new tech role using Java, we've got you covered! Sign up to hackajob here, and you could be starting your next job in just weeks. Let employers contact you directly for roles that match your skills and career goals.

Like what you've read or want more like this? Follow us on socials where we post daily bite-sized tech content and tips: X (Twitter), LinkedIn, TikTok, Instagram, and YouTube.

Remember, the best way to truly master data structures is to implement them yourself and then use them in real projects. Happy coding!

Platform

Solutions

Resources