Hash tables are one of the most fundamental and powerful data structures in computer science. If you've ever used a HashMap in Java or a dictionary in Python, you've been using a hash table under the hood! But have you ever wondered how they actually work?
In this tech tutorial, we're going to build a hash table from scratch in Java. We'll cover the theory behind how hash tables work, then dive straight into the code to see how we can implement one ourselves. Our take? Understanding how to build data structures from the ground up will make you a stronger developer and definitely more attractive to potential employers.
Are you preparing for your next technical interview? Hash tables are a favorite topic for interviewers, so mastering them is essential. If you're looking for more interview prep, check out our technical assessment preparation guide for practical advice and tips.
What are Hash Tables?
Hash tables (also known as hash maps) are data structures that store key-value pairs and provide incredibly fast lookups, insertions, and deletions - typically in O(1) average time complexity. That's pretty amazing when you think about it!
The magic happens through a process called hashing. When you want to store a value, the hash table uses a hash function to convert your key into an array index. This means instead of searching through every element to find what you're looking for (which would be O(n) time), you can jump directly to the right location.
Hash tables are everywhere in real-world applications:
- Database indexing for lightning-fast queries
- Caching systems like Redis
- Programming language implementations (variable lookups)
- Web browsers (storing cookies and session data)
How Hash Tables Work - The Theory
Before we start coding, let's understand the key concepts:
Hash Function: This takes your key and converts it into an array index. A good hash function distributes keys evenly across the array to minimize collisions.
Collisions: Sometimes two different keys hash to the same index. We need a strategy to handle this! The two most common approaches are:
- Chaining: Store multiple key-value pairs at the same index using a linked list
- Open Addressing: Find another empty slot when a collision occurs
Load Factor: This is the ratio of filled slots to total slots. When it gets too high (usually > 0.7), performance degrades, so we need to resize our hash table.
For our implementation, we'll use chaining to handle collisions because it's simpler and more intuitive.
Building Our Hash Table Class
Let's start by creating our basic hash table structure. We'll call it SimpleHashTable
to distinguish it from Java's built-in HashMap.
public class SimpleHashTable<K, V> {
private static final int DEFAULT_CAPACITY = 16;
private static final double LOAD_FACTOR_THRESHOLD = 0.75;
private Node<K, V>[] buckets;
private int size;
private int capacity;
// Inner class to represent key-value pairs
private static class Node<K, V> {
K key;
V value;
Node<K, V> next;
Node(K key, V value) {
this.key = key;
this.value = value;
this.next = null;
}
}
// Constructor
@SuppressWarnings("unchecked")
public SimpleHashTable() {
this.capacity = DEFAULT_CAPACITY;
this.buckets = new Node[capacity];
this.size = 0;
}
}
Here's what's happening: we're using an array of Node
objects where each node can form a linked list (that's our chaining approach). The @SuppressWarnings("unchecked")
annotation is needed because Java doesn't allow generic array creation directly.
The Hash Function
Now we need our hash function. This is crucial - a poor hash function can turn our O(1) operations into O(n) disasters!
private int hash(K key) {
if (key == null) {
return 0;
}
// Use Java's built-in hashCode() and make it positive
int hashCode = key.hashCode();
return Math.abs(hashCode) % capacity;
}
We're leveraging Java's built-in hashCode()
method, then using the modulo operator to ensure our index fits within our array bounds. The Math.abs()
ensures we don't get negative indices.
Implementing Put (Insertion)
Time for the main event - inserting key-value pairs! This is where the magic happens.
public void put(K key, V value) {
// Check if we need to resize
if (size >= capacity * LOAD_FACTOR_THRESHOLD) {
resize();
}
int index = hash(key);
Node<K, V> head = buckets[index];
// If the bucket is empty, create a new node
if (head == null) {
buckets[index] = new Node<>(key, value);
size++;
return;
}
// Check if key already exists and update value
Node<K, V> current = head;
while (current != null) {
if (current.key.equals(key)) {
current.value = value; // Update existing key
return;
}
current = current.next;
}
// Key doesn't exist, add new node at the beginning of the chain
Node<K, V> newNode = new Node<>(key, value);
newNode.next = head;
buckets[index] = newNode;
size++;
}
This method handles three scenarios:
- The bucket is empty - we create a new node
- The key already exists - we update its value
- The key is new, but the bucket has other keys - we add to the front of the chain
Implementing Get (Retrieval)
Getting values is straightforward once you understand the logic:
public V get(K key) {
int index = hash(key);
Node<K, V> current = buckets[index];
while (current != null) {
if (current.key.equals(key)) {
return current.value;
}
current = current.next;
}
return null; // Key not found
}
We hash the key to find the right bucket, then traverse the chain until we find our key or reach the end.
Implementing Remove (Deletion)
Deletion is a bit trickier because we need to handle linked list removal:
public V remove(K key) {
int index = hash(key);
Node<K, V> current = buckets[index];
Node<K, V> previous = null;
while (current != null) {
if (current.key.equals(key)) {
// Found the key to remove
if (previous == null) {
// Removing the first node in the chain
buckets[index] = current.next;
} else {
// Removing a node in the middle or end
previous.next = current.next;
}
size--;
return current.value;
}
previous = current;
current = current.next;
}
return null; // Key not found
}
Dynamic Resizing
When our hash table gets too full, performance degrades. We need to resize and rehash everything:
@SuppressWarnings("unchecked")
private void resize() {
Node<K, V>[] oldBuckets = buckets;
capacity *= 2;
buckets = new Node[capacity];
size = 0;
// Rehash all existing key-value pairs
for (Node<K, V> head : oldBuckets) {
while (head != null) {
put(head.key, head.value);
head = head.next;
}
}
}
Utility Methods
Let's add some helpful utility methods:
public int size() {
return size;
}
public boolean isEmpty() {
return size == 0;
}
public boolean containsKey(K key) {
return get(key) != null;
}
public void display() {
System.out.println("Hash Table Contents:");
for (int i = 0; i < capacity; i++) {
System.out.print("Bucket " + i + ": ");
Node<K, V> current = buckets[i];
while (current != null) {
System.out.print("[" + current.key + "=" + current.value + "] ");
current = current.next;
}
System.out.println();
}
System.out.println("Size: " + size + ", Capacity: " + capacity);
}
Testing Our Hash Table
Let's create a simple test to see our hash table in action:
public class HashTableDemo {
public static void main(String[] args) {
SimpleHashTable<String, Integer> hashTable = new SimpleHashTable<>();
// Adding some key-value pairs
hashTable.put("apple", 5);
hashTable.put("banana", 3);
hashTable.put("orange", 8);
hashTable.put("grape", 12);
System.out.println("Initial hash table:");
hashTable.display();
// Testing retrieval
System.out.println("\nTesting get operations:");
System.out.println("apple: " + hashTable.get("apple"));
System.out.println("banana: " + hashTable.get("banana"));
System.out.println("mango: " + hashTable.get("mango")); // Should be null
// Testing updates
hashTable.put("apple", 10); // Update existing key
System.out.println("\nAfter updating apple to 10:");
System.out.println("apple: " + hashTable.get("apple"));
// Testing removal
System.out.println("\nRemoving banana:");
Integer removed = hashTable.remove("banana");
System.out.println("Removed value: " + removed);
System.out.println("\nFinal hash table:");
hashTable.display();
// Testing collision handling by adding more items
System.out.println("\nAdding more items to test collisions:");
hashTable.put("kiwi", 7);
hashTable.put("mango", 4);
hashTable.put("pear", 6);
hashTable.put("peach", 9);
hashTable.display();
}
}
When you run this, you'll see how our hash table handles insertions, updates, deletions, and even collisions!
Time Complexity Analysis
Our hash table implementation provides:
- Average Case: O(1) for get, put, and remove operations
- Worst Case: O(n) when all keys hash to the same bucket (very rare with a good hash function)
- Space Complexity: O(n) where n is the number of key-value pairs
The key to maintaining O(1) average performance is keeping the load factor reasonable and using a good hash function.
Real-World Applications and Improvements
Hash tables are used everywhere in software development:
Database Systems: Indexes use hash tables for quick record lookups
Caching: Systems like Redis use hash tables to store cached data
Compilers: Symbol tables for variable lookups during compilation
Web Development: Session storage, routing tables, and more
Some improvements you could make to our implementation:
- Implement different collision resolution strategies (linear probing, quadratic probing)
- Add support for custom hash functions
- Implement iterator support to make it iterable
- Add thread safety for concurrent access
- Implement more sophisticated resizing strategies
Wrapping Up
Congratulations! You've just built a fully functional hash table from scratch. This is no small feat - you've implemented one of the most important data structures in computer science.
Understanding how hash tables work under the hood will make you a better developer and definitely help in technical interviews. Hash table questions are extremely common, and now you can not only use them but also explain how they work internally.
Want to challenge yourself further? Try implementing the improvements we mentioned, or use your hash table to solve some coding problems. You could also explore how Java's HashMap differs from our implementation - it uses a combination of chaining and balanced trees for optimal performance.
If you're getting ready for your next interview, don't forget to also check out our Java developer interview prep guide.
And if you're looking for a new tech role using Java, we've got you covered! Sign up to hackajob here, and you could be starting your next job in just weeks. Let employers contact you directly for roles that match your skills and career goals.
Like what you've read or want more like this? Follow us on socials where we post daily bite-sized tech content and tips: X (Twitter), LinkedIn, TikTok, Instagram, and YouTube.
Remember, the best way to truly master data structures is to implement them yourself and then use them in real projects. Happy coding!