dev-resources.site

for different kinds of informations.

Hashing

Published at

11/13/2022

What is hashing?

The internet has been growing in its content rapidly. The content on the internet doubles every six months making it very difficult to find a particular one. The usual array and linked lists used to store data lead to a hard time when the data is large.

Thus, Enters HASHING.

A certain data of large size in an array takes O(logn) times in a sorted one and O(n) in the unsorted one. It might feel easier if the sorted array uses special techniques like binary search but if it’s not, we might have to use linear search. However, we cannot be sure if the error margin will be minimized. Hence, hashing is used so that we can have the constant time of O(1) on the operations. This is a far more effective method since the time of operation is independent of the data size n.

Map-The data structure

The word ‘map’ means the relation of two sets — (key, value). The relation between the key and the value is that given a key, you can use a function to get a value. The function is called a hash function.

For a simple understanding, let’s imagine an array where the key is the index, and the value of that index is the required value. The array X with the key “i” gives the value from X[i]. The difference from the concept of an array on hashing is that the concept of a hash table is more generalized in comparison. The latter can have any value as a key instead of having an integer all the time. It can even be a name or an object times according to your requirement. The only trick here is to have a valid hash function to form an index that can store the object at a particular location from where we can later find the object easily.

Example: -

Let’s imagine a set of strings {“pragya”, “sapkota”, “medium”}. Now we need to store these strings on a table. We need to have a quick update and the operation time should be O(1). We can’t be responsible for the ordering of the operation or even the maintenance.

Now, let’s think of a schema where we number the alphabets like a=1, b=2, c=3,. ….., z=26.

So, let’s have a number for each string.

“pragya” = 16+18+1+7+25+1 = 68
“sapkota” = 19+1+16+11+15+20+1 = 83
“medium” = 13+5+4+9+21+13 = 65

If you have a table of size 4, we can use the hash function as
sum mod 4
and thus place it on the table for the table.

“pragya” = 68 mod 4 = 0
“sapkota” = 83 mod 4 = 3
“medium” = 65 mod 4 = 1

The table will look as below: -

0	1	2	3
pragya	medium		sapkota

As you can notice from above that the idea is computing a location for a string with the help of a hash function. Here, the hash function is sum of the characters mod Table size.

The way of searching a string with the use of the value is a very effective way of storing words in a dictionary.

Hashing: The problems

On the above strings {“pragya”, “sapkota”, “medium”} we used the lowercase letters which have the numbers 68, 83, and 65 respectively. However, if we have the permutations of the letters such as “aygarp” for “pragya” or “saPkotA” for “sapkota” or “Medium” for “medium” — the numbers will the same as 68, 83, and 65. This leads to storing more than one string in one location which ultimately leads to what we call on hashing terms — COLLISION.

Likewise, another problem with hashing is determining the size of a hash table. The size is preferred to be a prime number to prevent collisions.

All of this leads us to two points as follows: -

Picking a good hash function
Dealing with the collisions

The hash function is supposed to be easy to compute and the one to avoid the maximum collisions.

Consistent Hashing

If you have five servers and one of the servers dies due to some reason, the hashing function won’t be able to know it by itself. And if you then add a replacement server, it won’t know about it too. But the mod operator still stays 5 despite one server being removed and the other being added. For solving this problem, we have Consistent Hashing.

The concept of consistent hashing was brought to reduce the problem of hashing. However, it doesn’t eliminate those problems. This may not sound much in small-scale data but for large data — this is very important. The function applies to the requests and the servers to result in output in a set range of values. It reduces the problem of failed servers where the traffic still gets routed and the new server that has no allocations.

Conclusion

Hashing can be an intricate topic when we learn system design. But we can divide each of the terms and mini topics to learn them individually in detail. Consistent hashing is one of the important ones to study and understand so that we understand other system design concepts like load balancing.

pragyaasapkota / System-Design-Concepts

A repo with some system design concepts.

System Design

Systems design is the process of defining elements of a system like modules, architecture, components and their interfaces and data for a system based on the specified requirements.

This is a index for the concepts of system.

If you wish to open these in a new tab, Press CTRL+click

S.N.	Table of Content
1.	Caching
2.	Network Protocols
3.	Storage: The Underrated Topic
4.	Latency and Throughput
5.	System Availability
6.	Leader Election
7.	Proxies
8.	Load Balancing
9.	Endpoint Protection
10.	HTTPS: Is it better than HTTP?
11.	Polling and Streaming
12.	Long Polling
13.	Hashing
14.	CAP Theorem
15.	PACELC Theorem
16.	Messaging and Pub-Sub
17.	Database Relational Database Non-relational Database
18.	Logging, Monitoring, and Alerting
19.	Distributed System
20.	Scaling
21.	Event Driven Architecture (EDA)

Thank you!!!

Show some ❤️ by starring this repository!

View on GitHub

I hope this article was helpful to you.

Please don’t forget to follow me!!!

Any kind of feedback or comment is welcome!!!

Thank you for your time and support!!!!

Keep Reading!! Keep Learning!!!

hashing Article's

30 articles in total

What is Hashing? A Complete Guide for Developers and Security Professionals