Skip to main content

How does HashMap work in Java? (HashMap internals)

HashMaps are probably one of the most used and definitely one of the least understood Java classes. In this post let's look at the source code of HashMap.java to understand how is data inserted and retrieved from a HashMap.

We know that HashMaps work on the principle of 'hashing' and put(key, value) is used to store a key and corresponding value and get(key) is used to retrieve the value for given key.

Below image shows how HashMap stores the data. The blue squares represent the 'bucket' or indexes in the array, determined on the basis of hash value for each key. (We will see later, exactly how the hash value is calculated) At each index of the array, is stored a linked list which has nodes of the type 'Map.Entry'. Each node stores the key/value pair.


Why is the linked list needed? Because two unequal objects can have same hash value (read more about equals and hashcode here). Remember that a HashMap will store values as long as different keys are used so if those different keys result in same hash value they will reside at the same index in the table, in different nodes of the linked list.

Let's look at the default implementation of HashMap in Java.


       public V put(K key, V value) {
           if (key == null)
               return putForNullKey(value);
           int hash = hash(key.hashCode());
           int bucketIndex = (hash & (table.length-1));
           for (Entry e = table[i]; e != null; e = e.next) {
               Object k;
               if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                   V oldValue = e.value;
                   e.value = value;
                   e.recordAccess(this);
                   return oldValue;
               }
           }
   
           modCount++;
           Entry e = table[bucketIndex];
           table[bucketIndex] = new Entry<>(hash, key, value, e);

           return null;
       }


       public V get(Object key) {
           if (key == null)
               return getForNullKey();
           int hash = hash(key.hashCode());
           int bucketIndex = (hash & (table.length-1));
           for (Entry e = table[bucketIndex]; e != null; e = e.next) {
               Object k;
               if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                   return e.value;
           }
           return null;
       }



       static int hash(int h) {
           // This function ensures that hashCodes that differ only by
           // constant multiples at each bit position have a bounded
           // number of collisions (approximately 8 at default load factor).
           h ^= (h >>> 20) ^ (h >>> 12);
           return h ^ (h >>> 7) ^ (h >>> 4);
       }

Now let's analyze the code:

           int hash = hash(key.hashCode());
           int bucketIndex = (hash & (table.length-1));


('hash' function could be anything that ensures that hash value will be as unique as possible. You can implement your own HashMap with your own hash function. HashTable.java has got a different hash function.)

Note that get() and put() have lot of code in common because before putting any key/value pair it checks if it already exists. When you do get(key) the key is hashed and then bucketIndex is calculated using this hash value. The linkedList at this index will be traversed till a 'key' with matching hash value and also being  'equal' to the input parameter.

In above image K9, K19, K29 all give same hash value H9. So if you give get(K19), it will start traversing starting with K9. Same hashCode but "K9".equals("K19") is false, so it proceeds to "K19" where both conditions are satisfied.

Implementation of HashTable is similar except the fact that methods are synchronized.

A word about the keys. Immutable objects make best keys because this ensures that hash value will remain same when putting in the value and when retrieving it.String objects are the best because of this, also they implement equals and hashCode methods.

Once we understand, this seems so elementary!


Comments

Aashish said…
HashMap works on the principle of hashing. In order to understand the working of HashMap one has to understand hashing.
Below link can be useful to find out the algorithm to find duplicate or repeated elements in an array in java

How HashMap works in Java
Unknown said…
Very good post admin. Sufficient content is given here to understand every thing. For more information Please visit Java Internals
Unknown said…
Very good post admin. Sufficient content is given here to understand every thing. For more information Please visit Java Internals

Popular posts from this blog

How to upload to Google Cloud Storage buckets using CURL

Signed URLs are pretty nifty feature given by Google Cloud Platform to let anyone access your cloud storage (bucket or any file in the bucket) without need to sign in. Official documentation gives step by step details as to how to read/write to the bucket using gsutil or through a program. This article will tell you how to upload a file to the bucket using curl so that any client which doesn't have cloud SDK installed can do this using a simple script. This command creates a signed PUT URL for your bucket. gsutil signurl -c 'text/plain' -m PUT serviceAccount.json gs://test_bucket_location Here is my URL: https://storage.googleapis.com/test_sl?GoogleAccessId=my-project-id@appspot.gserviceaccount.com&Expires=1490266627&Signature=UfKBNHWtjLKSBEcUQUKDeQtSQV6YCleE9hGG%2BCxVEjDOmkDxwkC%2BPtEg63pjDBHyKhVOnhspP1%2FAVSr%2B%2Fty8Ps7MSQ0lM2YHkbPeqjTiUcAfsbdcuXUMbe3p8FysRUFMe2dSikehBJWtbYtjb%2BNCw3L09c7fLFyAoJafIcnoIz7iJGP%2Br6gAUkSnZXgbVjr6wjN%2FIaudXIqA...

Running Apache Beam pipeline using Spark Runner on a local standalone Spark Cluster

The best thing about Apache Beam ( B atch + Str eam ) is that multiple runners can be plugged in and same pipeline can be run using Spark, Flink or Google Cloud Dataflow. If you are a beginner like me and want to run a simple pipeline using Spark Runner then whole setup may be tad daunting. Start with Beam's WordCount examples  which help you quickstart with running pipelines using different types of runners. There are code snippets for running the same pipeline using different types of runners but here the code is running on your local system using Spark libraries which is good for testing and debugging pipeline. If you want to run the pipeline on a Spark cluster you need to do a little more work! Let's start by setting up a simple standalone single-node cluster on our local machine. Extending the cluster is as easy as running a command on another machine, which you want to add to cluster. Start with the obvious: install spark on your machine! (Remember to have Java a...

Example of Using SimpleHttpOperator to make POST call

Airflow has SimpleHttpOperator which can be used to invoke REST APIs. However using this operator is not exactly straightforward. Airflow needs to be told about the connection parameters and all the other information that is needed to connect to external system. For this we need to create Connections. Open 'Connections' page through Admin->Connections link.  Expand the dropdown to see the various types of connection options available. For a REST call, create an HTTP connection. Give the host URL and any other details if required. Now when we write our task using SimpleHttpOperator we will need to refer to the connection that was just created. The task below is making a post call to  https://reqres.in/api/users  API and passing it some data in JSON format. myHttpTask = SimpleHttpOperator(  task_id='get_op',  method='POST',  http_conn_id='dcro',  data=json.dumps({    "name":"Morpheus",    " job ":" L...