hashCode
This brief lesson is about the
hashCodefunction in theObjectclass.
- Like
toStringandequals, all classes inherit (or can override) theObject'shashCodefunction.- Like
toStringandequals, you should write ahashCodefor new classes that you create.
The purpose of the hashCode function is to generate an as-far-as-possible unique integer for distinct objects.
The produced integer should be consistent for the object.
That is, the produced integer should not change unless the object itself is changed in some way.
What's hashCode used for?
Think about the HashMap data structure, discussed in a previous lesson.
The HashMap supports these fundamental operations:
putentries (key-value pairs) into the mapgetvalues from the map quickly, using a key
The operations above can, for the most part, be done in constant time. This is possible because of the "hash" part of the data structure.
Suppose you are trying to use a given key k to put some value v into the map.
The HashMap uses the key's hashCode value to "find a place" for v.
That is, you can imagine that it calls k.hashCode() to produce some unique integer, and uses that integer to choose a position in which to store v.
The same thing is done when we try to retrieve v from the map, using the key k.
That is, the map will call k.hashCode() again, come up with the same integer as before, and find the value v.
If done well, this can happen without searching the entire set of entries — so no matter how many records are in the map, we do the same amount of work to look up a value using its key.
A hashCode is an integer
The hashCode function has the following signature in the Object class:
public int hashCode()
So any override in a class you create must also have the same signature, otherwise it will not actually override the hashCode function, and won't be used by data structures like the HashMap or HashSet.
Ok, fine, it's a method that returns an int. That's easy enough! But you can't return any old integer. It's important to, as far as possible, return a unique integer for any given object, and to have that integer be consistent across multiple calls to hashCode unless the object changes.
When distinct objects have distinct hashCodes, this ensures that the objects are appropriately "spread out" in a map or set.
On the other hand, if many distinct keys produce the same hashCode, then those keys would "collide" in the map or set.
Maps and sets are able to handle these collisions (e.g., a simple way is to "stack up" all the keys that have the same hashCode).
But the more collisions you have, the more your map's performance degrades.
That is, if many collisions occur, then the bigger your map gets, the more time it takes to add to retrieve items from it.
Having distinct hash codes for distinct objects ensures that the performance of the
get,put, andcontainsoperations for maps and sets remain as close to constant time as possible.
equals and hashCode go hand-in-hand
The equals and hashCode methods must go hand-in-hand.
When you try to get an item from a map using a key, the following steps take place (a simplified overview):
- Use the key's
hashCodeto "find" the data in the map. - When you reach the computed position, check if the provided key
equalsthe key in the map entry. - If the two keys are
equal, return the value from the map entry. - If the two keys are not
equal, follow the map's collision resolution policy to check the next possible location, or returnnull.
This suggests that hashCode and equals should always take the same information into account, and should always "agree" with each other.
Not doing this will result in strange and incorrect behaviour in your HashMaps and HashSets.
These leads us to the following "contract" for the hashCode and equals methods in any class1:
- When
hashCodeis called more than once on the same object, it should consistently return the same integer, provided that no fields that are used in theequalsmethod have been modified. - If two objects are "equal" according to the
equalsmethod, then callinghashCodeon them must produce the same integer. - If two objects are not "equal" according to the
equalsmethod, it's difficult to guarantee that they return distinct integer results. But doing this as far as possible will improveHashMapandHashSetperformance.
Writing a hashCode function
So, how to write a good hashCode?
That's...out of scope for this class, and is an important research problem in its own right.
The OpenDSA Chapter on Hashing is a really good overview.
In this example, we'll look at two examples of hashCode functions.
Suppose we have a simple Person class (the same one we used in the previous lesson). Each Person has two properties:
int ageString name
And two Persons are considered equal to each other if they have equal names and ages.
This means that:
- Two
Persons with equal names and ages should produce the same hash code. - If two
Persons differ in name or age or both, they should produce different hash codes, as far as possible.
We'll look at two ways of writing the hashCode function.
Rolling our own
One option is to compute an integer ourselves.
We should make use of all instance variables that factor into the equals decision in order to meet the "contract" we talked about earlier, and we need to make it so that differences in the values of those instance variables will result in differences in our final integer.
Here's an example.
- In the code below, we start with the number
1. - Then we multiply by
37— multiplying by a prime number like37makes it more likely for us to produce a number that other objects won't produce. You can use any prime number you want. - Then we add the
name'shashCodevalue. TheStringclass already has ahashCodefunction, so we don't need to re-invent that. Note that we are assuming here that thenameis notnull! We can do this because we included code to guarantee this in the previous lesson If it's possible fornameto benull, then you need to check that first. - We multiply this result again by
37, then add theage.
@Override
public int hashCode() {
int result = 1;
result = result * 37 + this.name.hashCode();
result = result * 37 + this.age;
return result;
}
Using existing methods
A lot has been said or written about generating good hash values for different primitive data types (which in turn can be used as building blocks for generating hash values for reference types). Many of these ideas have been implemented already. In many cases it's better to use existing implementations instead of re-inventing the wheel.
The Objects2 class in Java contains number of useful static methods, among them utilities for generating a hash code based on a given set of fields.
The hash static function takes in a list of parameters, and generates a hash code based on those values.
So we could rewrite the hashCode for our Person class as:
@Override
public int hashCode() {
return Objects.hash(this.name, this.age);
}
The Objects class also provides the Objects.equals function. You can use Objects.equals(a, b) instead of a.equals(b).
This is useful in cases you are not sure if a is null or not — a.equals(b) would crash with a NullPointerException if a was null, whereas Objects.equals checks that for you, or you could check it yourself.
-
Paraphrased from the Object documentation. ↩
-
Notice the "s" in
Objects. ↩