How to Tell if Family of Hash Functions Are Universal
| Commodity Index |
|---|
| Universal Hashing |
| Parity |
Page i of 2
Hashing is a fun thought that has lots of unexpected uses. Here nosotros look at a novel type of hash function that makes information technology easy to create a family unit of universal hash functions. The method is based on a random binary matrix and is very unproblematic to implement.
Put simply you requite a hash office an detail of information x and it returns a number h(x).
You can utilize this number for all sorts of things simply in general a hash part has a number of desirable properties.
- The first is that h(ten), the hash value should be smaller in the sense it takes less storage than the original data x.
- The 2nd is that if you accept a a set of data items then h(10) should spread the data items out over the range of possible hash values i.e. the human relationship betwixt x and h(x) should non exist as well regular.
Typically hash functions are used for storage, shop x in Data[h(10)], or for security, if h(x) changes then x has been changed.
If you are using a hash function for security then you need a higher class of hash function - a cryptographic hash. The hash part described in the case tin can exist extended to a cryptographic hash but in its electric current course it is suitable for use every bit a storage hash.
Find that as a hash role "condenses" down an item of data to a smaller number of possible hash values it is non unique. That is, for any hash function there will be usually quite a lot of other data values, y say, for which h(y)=h(x). This is often chosen a standoff and it is perfectly natural for a hash part and all hashing algorithms take to deal with the situation in one style or another.
In this article the focus is on hash functions used for data storage and retrieval.
Families of hash functions
In the early days of hashing you generally just needed a unmarried good hash office. Today things are getting increasingly complex and you often demand whole families of hash functions.
One time such family is the Universal Hash. This is a ready of hash functions with an interesting additional property. First we need to look at the problem that this additional property is designed to solve.
Consider a hash storage scheme based on storing x in a location given by h(10) which ranges from 0 to N-1.
What is the worse thing that can happen?
The reply is that you are given Northward things to store that all map to the aforementioned hash value - i.e. you endeavor and shop everything in the aforementioned location and every access involves a collision.
You can prove quite easily that for any hash function it is possible to find a dataset that provides this worst example.
For example if the number of things yous could ask to store is greater than N*North then if you attempt to store N*N information elements in the assortment with merely N storage locations it is obvious that each location will store Due north data elements i.eastward. in that location are going to be North collisions. Just selection the data mapped to the same location and you have your worst instance dataset.
It is a unproblematic result of mapping a big gear up to a pocket-size number of locations - there have to be collisions - and you can find whatever number but by filling the storage and keeping going.
So can you protect your hashing scheme against this assail?
Aye you tin past having a family of hashing functions H that you lot randomly select from earlier starting the algorithm. In this case whatever worst case fix of data that some i has selected has only a probability of being worst case for the hash function selected.
If the family unit of hash functions is such that given x and y and then the probability that h(x)=h(y) for a hash function drawn randomly from chiliad possible hash functions is one/yard and then the family unit is chosen a universal hash family.
Universal hash families are particularly useful for algorithms that demand multiple hash functions or which need the data structure to exist rebuilt if too many collisions occur (await out for Cuckoo hashing coming soon).
So nosotros need an example of a universal hash family.
There are standard examples of universal hash functions created using the usual "multiplication mod a prime" - i.eastward. like to congruential generators.
However, there is a little known method based on using a random matrix. It has lots of advantages - it's a universal family, information technology performs well, it's easy to understand and it's quick to compute.
The thought is very simple.
Suppose you have an input data item that you have input data with grand bits and you want a hash part that produces n bits and then outset generate a random nxm binary matrix 1000. The hash part simply consists of working out
h(x)=Mx
where you consider x to exist a binary vector.
For instance, suppose you have a four scrap input value and want to generate a three bit hash value. Then generating a random matrix gives say:
( 0 1 0 0 )
M= ( 1 0 1 1 )
( 1 1 0 one )
and if the data value was 1011 the hash value would exist computed as:
( 0 i 0 0 )(1) (0)
Mx= ( 1 0 i 1 )(0) = (1)
( i 1 0 1 )(1) (0)
(1)
or in other give-and-take h(1011)=Chiliad(1011)=010.
If you find the math difficult to follow information technology might assist to be reminded that in binary (or mod ii) arithmetic 1x1=one, 0x0=0 and 1x0=0, too 0+0=0, 0+one=1 and i+1=0.
At that place are a number of other ways to expect at the fashion the arithmetic is done that suggest unlike ways of implementing the algorithm.
The first is to notice that what you are doing is anding each row with the data column vector. That is taking the second row as an instance:
( 1 0 1 1 )And (1 0 1 one) = (1 0 1 1)
and and so you add together up the $.25 in the consequence:
1+0+1+1=ane
Calculation up the bits in the effect can also be interpreted every bit the parity function because the result is cypher if there are an fifty-fifty number of ones and 1 if there are an odd number of ones. Notice also that this involves m Ands and m parity determinations.
The second manner of looking at the arithmetic is to notice that the multiplication picks out the columns of M corresponding to where the data has a one then does a bitwise addition or sectional or. For example:
( 0 1 0 0 )(one) (0)
Mx= ( 1 0 ane 1 )(0) = (ane)
( 1 1 0 1 )(1) (0)
(one)
picks out the kickoff, third and fourth columns of Grand and adds or sectional ors them together:
( 0 + 0 + 0 ) (0)
( ane + 1 + 1 ) = (1)
( 1 + 0 + ane ) (0)
Observe that this might involve every bit many as thousand columns and probably m iterations to class the event. Every bit m>n the get-go method is worth looking at more closely.
There may exist other ways to translate the arithmetic equally logical operations just these two are the most useful. Which is best depends on the hardware available. For example, if the machine you are working with has a fast fashion to make up one's mind the parity of a word then the first method should work well.
An implementation
To demonstrate how the idea works let'southward create a small C# class that implements both approaches to the random binary matrix hash.
The code is easy plenty to catechumen into whatsoever other language.
Commencement we need to create the random matrix. Instead of working with a bit array it makes more sense to use an Int32 for each row of the array and assume that the input data is an int32. For simplicity nosotros can utilize Int32 and just use the lower 31 $.25 to give a positive number range for the data.
So outset a new C# projection and add a new class complete with random number generator and Int32 assortment ready to hold the rows of the flake matrix:
class Mhash
{
Random R = new Random();
Int32[] hashmatrix;
Int32 M=0;
The constructor creates the random matrix with thou rows and stores grand for other methods to brand use of:
public Mhash(Int32 grand)
{
Yard=thou;
hashmatrix= new Int32[M];
for (int i = 0; i < M; i++)
{
hashmatrix[i] = R.Next();
}
}
One time the constructor is finished we can use it to create a hash object capable of taking a positive Int32 and returning an 1000 flake hash.
The next step is to create the method that does this job:
public Int32 Hash1(Int32 x)
{
Int32 hash=0;
for (int i = 0; i < M; i++)
{
hash=hash<<1;
hash = hash | parity(10 & hashmatrix[i]);
}
return hash;
}
The method takes each row of the matrix and ands it with the data. The result is then converted by the parity role into a 0, for even parity or a ane, for odd parity. Don't worry how parity works for the moment. At the end of the routine nosotros have an m bit hash to return.
If you want to write the method a little more compactly you could write the body of the for loop as:
for (int i = 0; i < 1000; i++)
{
hash <<= one;
hash |= parity(x & hashmatrix[i]);
}
Now to try it out all y'all take to do is:
Mhash hashobj=new Mhash(8);
Int32 h1 = hashobj.Hash1(x);
and you have an viii-fleck hash for any positive Int32 you intendance to supply.
How to Tell if Family of Hash Functions Are Universal
Source: https://www.i-programmer.info/programming/theory/2664-universal-hashing.html
0 Response to "How to Tell if Family of Hash Functions Are Universal"
Post a Comment