Learn how Java’s HashSet works internally with HashMap. This guide explains how hashCode() and equals() ensure uniqueness, how elements are stored in buckets, and why HashSet provides O(1) performance for add and lookup operations.
🧩 What is a HashSet?
A HashSet
in Java is a collection that stores unique elements.
Under the hood, it’s actually built on top of a HashMap
— yes, really!
Each element you add to a HashSet
is stored as a key in an internal HashMap
, and all the values are just dummy objects.
private transient HashMap<E,Object> map;
private static final Object PRESENT = new Object();
So when you do:
set.add("Alice");
it’s equivalent to:
map.put("Alice", PRESENT);
⚙️ Step-by-Step: How HashSet
Works Internally
Let’s go through the life of one element being added 👇
Suppose we do:
HashSet<Person> set = new HashSet<>();
set.add(new Person("Alice", 25));
Step 1: Call hashCode()
Java first calls:
int hash = person.hashCode();
to get a numeric value — e.g. 12345
.
Step 2: Find the bucket
Java calculates which bucket to store the object in:
bucketIndex = hash % numberOfBuckets;
If the number of buckets is, say, 16
:
12345 % 16 = 9
So the object goes into bucket #9.
Step 3: Check for duplicates
In bucket #9:
If the bucket is empty → store it there.
If not, Java calls
equals()
to check if an equal object already exists.- If
equals()
returnstrue
→ it’s a duplicate, so don’t add it. - If
equals()
returnsfalse
→ store it in the same bucket (linked list).
- If
Step 4: Store the object
If unique, Java stores it in that bucket:
[9] → Alice(25)
and marks it as a key in the underlying HashMap
.
🎨 3️⃣ Visual Diagram
Let’s imagine a HashSet
with 8 buckets.
Initial state:
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
After adding some people:
[0]
[1]
[2] → Bob(30)
[3]
[4]
[5] → Alice(25)
[6]
[7]
Now if we add another Alice:
set.add(new Person("Alice", 25));
Steps:
- hashCode() = same as old Alice → goes to bucket 5
- equals() = true (same name and age)
- So HashSet doesn’t add it again.
But if hashCode is same but equals is false:
Two people with same age but different names might land in the same bucket:
[5] → Alice(25) → Alicia(25)
→ Java will store them both but keeps them distinct by checking equals()
.
🧠 4️⃣ Summary Table
Step | What happens | Which method is used |
---|---|---|
Compute hash | Finds which bucket | hashCode() |
Check for duplicates | Compares with existing items | equals() |
Add element | Only if not found equal | Uses both |
🔍 5️⃣ Why it’s efficient
- Lookup (
contains()
) and insert (add()
) are roughly O(1) time — much faster than anArrayList
which can take O(n) forcontains()
. - But only if
hashCode()
distributes objects evenly.