Hash Table
Hash Table
Exercise 1 - Calculate, Insert, find load factor and steps
This exercise helps you to find out all about how hash tables work and how to compare different hashing schemes. We’ll use the same keys in two hash tables that use different hashing schemes.
(a) Calculate hash values for the following keys using the hash function
h(k) = k mod 19.
738, 570, 687, 111, 530, 966, 524, 382, 701, 842, 348, 22, 658, 220, 817
[To do a mod calculation, you can use the Scientific View of the Calculator accessory program supplied by Windows, or you can use Google.]
| k | 738 | 570 | 687 | 111 | 530 | 966 | 524 | 382 | 701 | 842 | 348 | 22 | 658 | 220 | 817 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| h(k) | 16 | 0 | 3 | 16 | 17 | 16 | 11 | 2 | 17 | 6 | 6 | 3 | 12 | 11 | 0 |
(b) Insert these values (in this order) into an empty hash table of size 19, using the linear probing scheme, with hash function h(k) = k mod 19.
738, 570, 687, 111, 530, 966, 524, 382, 701, 842, 348, 22, 658, 220, 817
| index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| value | 570 | 817 | 382 | 687 | 22 | 842 | 348 | 524 | 220 | 658 | 738 | 530 | 111 |
简要插入过程(关键冲突)
- 570 → 0(空,放入)
- 738 → 16(空,放入)
- 687 → 3(空,放入)
- 111 → 16(被占)→17(被占)→18
- 530 →17(空,放入)
- 966 →16(被占)→17(被占)→18(被占)→0(被占)→1(空,放入)
- 524 →11(空,放入)
- 382 →2(空,放入)
- 701 →17(被占)→18(被占)→0(被占)→1(被占)→2(被占)→3(被占)→4(被占)→5(空)→6(被占)→7(空,放入)
- 842 →6(空,放入)
- 348 →6(被占)→7
- 22 →3(被占)→4
- 658 →12(空,放入)
- 220 →11(被占)→12
- 817 →0(被占)→1
(c) What is the load factor of the table constructed in part (b) above?
The load factor is defined as:
- Number of keys inserted: 15
- Hash table size: 19
Load factor =
(d) What is the average number of steps for a successful search in the hash table from part (b) above? (count 1 inspection of a hash table slot as 1 step)
| Key | h(k)=k mod19 | Final index | Steps |
|---|---|---|---|
| 738 | 16 | 16 | 1 |
| 570 | 0 | 0 | 1 |
| 687 | 3 | 3 | 1 |
| 111 | 16 | 18 | 3 |
| 530 | 17 | 17 | 1 |
| 966 | 16 | 1 | 5 |
| 524 | 11 | 11 | 1 |
| 382 | 2 | 2 | 1 |
| 701 | 17 | 7 | 10 |
| 842 | 6 | 6 | 1 |
| 348 | 6 | 7 | 2 |
| 22 | 3 | 4 | 2 |
| 658 | 12 | 13 | 2 |
| 220 | 11 | 12 | 2 |
| 817 | 0 | 1 | 2 |
总步数:35
元素个数:15
平均成功查找步数:
(e) Insert these values (in this order) into an empty hash table of size 19, using the double hashing scheme, with hash function h(k) = k mod 19 and probe function p(k) = (7k mod 18) + 1
738, 570, 687, 111, 530, 966, 524, 382, 701, 842, 348, 22, 658, 220, 817
哈希函数:,探测步长:
| 项目 | 738 | 570 | 687 | 111 | 530 | 966 | 524 | 382 | 701 | 842 | 348 | 22 | 658 | 220 | 817 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | 0 | 3 | 16 | 17 | 16 | 11 | 2 | 17 | 6 | 6 | 3 | 12 | 11 | 0 | |
| 1 | 13 | 4 | 4 | 3 | 13 | 15 | 11 | 12 | 9 | 7 | 11 | 17 | 11 | 14 | |
| 双重哈希最终位置 | 16 | 0 | 3 | 1 | 17 | 10 | 11 | 2 | 7 | 6 | 13 | 14 | 12 | 3 | 4 |
| 线性探测最终位置 | 16 | 0 | 3 | 18 | 17 | 1 | 11 | 2 | 7 | 6 | 7 | 4 | 13 | 12 | 1 |
最终状态:
| 索引 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 值 | 570 | 111 | 382 | 220 | 842 | 701 | 524 | 658 | 348 | 817 | 738 | 530 | 966 |
(f) What is the average number of steps for a successful search in the hash table from part (e) above? (count one inspection of a hash-table slot as one step)
公式:
- 探测序列:
- 步数 = 找到该键时的 每键步数(精确计算)
| 项目 | 738 | 570 | 687 | 111 | 530 | 966 | 524 | 382 | 701 | 842 | 348 | 22 | 658 | 220 | 817 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | 0 | 3 | 16 | 17 | 16 | 11 | 2 | 17 | 6 | 6 | 3 | 12 | 11 | 0 | |
| 步长 | 1 | 13 | 4 | 4 | 3 | 13 | 15 | 11 | 12 | 9 | 7 | 11 | 17 | 11 | 14 |
| 最终位置 | 16 | 0 | 3 | 1 | 17 | 10 | 11 | 2 | 3 | 6 | 13 | 6 | 12 | 3 | 14 |
| 探测步数 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 3 | 1 | 2 | 3 | 1 | 2 | 2 |
总步数 & 平均值
总步数 =
元素个数 =
平均步数 =
Exercise 2 – Hash Functions and Table Sizes
What keys should we use for our data? What makes a good hash function? What size of hash table should be used? Discuss these issues in the following scenarios:
- a hash table used to store details of all the students currently enrolled at CDUT
- a hash table used inside a browser to store all the URLs that you have visited within the past week
| Item | University Students | Browser Weekly URLs |
|---|---|---|
| Best Key | Unique Student ID | Full URL string |
| Good Hash Function | Simple numeric mod / lightweight hash | String rolling hash (FNV, Murmur) |
| Ideal Table Size | Fixed prime, ~1.2–1.5 × max students | Dynamic prime, resizable, |
| Load Factor | ≤ 0.8 | ≤ 0.7 |
| Stability | Very stable | Highly dynamic |
Exercise 3 – Quadratic Probing
Repeat Exercise 1 with the same data, but for the quadratic-probing hashing scheme. Compare the average number of steps for a successful search in your table to that taken by the linear probing and double hashing hash tables.
-
Hash function:
-
Quadratic probing:
-
Table size (prime, , so quadratic probing covers all slots)
-
Insert order: 738, 570, 687, 111, 530, 966, 524, 382, 701, 842, 348, 22, 658, 220, 817
Full insertion & probe steps (steps = )
| Key | 738 | 570 | 687 | 111 | 530 | 966 | 524 | 382 | 701 | 842 | 348 | 22 | 658 | 220 | 817 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| h(k) | 16 | 0 | 3 | 16 | 17 | 16 | 11 | 2 | 17 | 6 | 6 | 3 | 12 | 11 | 0 |
| Probe sequence | 16 | 0 | 3 | 16→17 | 17→18 | 16→17→18→0→5 | 11 | 2 | 17→18→0→1→4 | 6 | 6→7 | 3→4→8 | 12 | 11→12→15 | 0→1 |
| Final index | 16 | 0 | 3 | 17 | 18 | 5 | 11 | 2 | 4 | 6 | 7 | 8 | 12 | 15 | 1 |
| Steps | 1 | 1 | 1 | 2 | 2 | 5 | 1 | 1 | 5 | 1 | 2 | 3 | 1 | 3 | 2 |
Total steps for quadratic probing
Total steps = 31
Average steps =
Final hash table (quadratic probing, size 19)
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | 570 | 817 | 382 | 687 | 701 | 966 | 842 | 348 | 22 | 524 | 658 | 220 | 738 | 111 | 530 |
Comparison of average successful search steps
| Method | Average steps | Value |
|---|---|---|
| Linear probing | 35/15 | |
| Quadratic probing | 31/15 | |
| Double hashing | 25/15 |
Conclusion
- Double hashing is best (fewest collisions, least clustering)
- Quadratic probing is better than linear probing (avoids primary clustering)
- Linear probing worst (strong clustering, longest probes)
Exercise 4 – Double Hashing
In Exercise 1, the probe function used for the double hashing scheme was Explain why the +1 is included in the probe function, i.e. what would be the consequence of using for example .
The prevents the probe step from being zero. Without it, some keys would produce , causing the probe sequence to stall at the initial hash position and making insertion impossible even when empty slots exist.
Exercise 5 – Unsuccessful Searching
Also calculate the average number of steps taken for an unsuccessful search in the linear probing hash table from Exercise 1.
This time you can’t take the average over the elements inserted, you’ll have to take the average over all possible slots that an element not stored in the hash table could hash to.
Linear Probing (original table, size )
Rules for unsuccessful search steps:
For a given starting slot :
-
Steps = number of consecutive occupied slots starting at , until the first empty slot (including the occupied ones, stop at empty).
-
We average over all 19 table slots (0 to 18).
First, recall the final linear probing table (index: value / empty)
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | 570 | 817 | 382 | 687 | 22 | empty | 842 | 348 | empty | empty | empty | 524 | 220 | 658 | empty | empty | 738 | 530 | 111 |
| State | OCC | OCC | OCC | OCC | OCC | EMP | OCC | OCC | EMP | EMP | EMP | OCC | OCC | OCC | EMP | EMP | OCC | OCC | OCC |
Legend:
-
OCC: occupied
-
EMP: empty
Unsuccessful steps per starting index Count how many probes until first empty slot:
-
Index 0: 0,1,2,3,4 → 5 steps (hit empty at 5)
-
Index 1: 1,2,3,4 → 4 steps
-
Index 2: 2,3,4 → 3 steps
-
Index 3: 3,4 → 2 steps
-
Index 4: 4 → 1 step
-
Index 5: empty → 1 step
-
Index 6: 6,7 → 2 steps
-
Index 7: 7 → 1 step
-
Index 8: empty → 1 step
-
Index 9: empty → 1 step
-
Index 10: empty → 1 step
-
Index 11: 11,12,13 → 3 steps
-
Index 12: 12,13 → 2 steps
-
Index 13: 13 → 1 step
-
Index 14: empty → 1 step
-
Index 15: empty → 1 step
-
Index 16: 16,17,18 → 3 steps
-
Index 17: 17,18 → 2 steps
-
Index 18: 18 → 1 step
Total steps
Average unsuccessful steps
Exercise 6 – Chaining
Show how to insert the following stream of records with key fields into an initially-empty hash table of size 13 (indexed by 0 .. 12), using the chaining collision-resolution scheme, using the hashing function h defined by . 534, 702, 105, 523, 959, 699, 821, 883, 842, 686
Step 1: Compute hash values
Hash function:
-
534 → 1
-
702 → 0
-
105 → 1
-
523 → 3
-
959 → 10
-
699 → 10
-
821 → 2
-
883 → 12
-
842 → 10
-
686 → 10
Step 2: Insert with chaining (separate chaining, append to end of list)
Hash table size = 13 (indices 0–12)
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chain | 702 | 534 → 105 | 821 | 523 | 959 → 699 → 842 → 686 | 883 |
Final hash table (chaining) visual
0: [702]
1: [534, 105]
2: [821]
3: [523]
4: []
5: []
6: []
7: []
8: []
9: []
10: [959, 699, 842, 686]
11: []
12: [883]
Worst situation time complexity: O(n)
Exercise 7 – Chaining versus Linear Probing
Consider the average number of steps required for a successful search in a hash table using chaining. Is the chaining scheme better or worse for successful searches, on average, than the linear probing scheme?
Average Successful Search: Chaining vs. Linear Probing
For separate chaining (the method you just used):
-
Each index holds a linked list of collided keys.
-
For a successful search:
- We hash to the correct index (1 step).
- Then traverse the list until we find the key.
-
The average number of steps is:
where (load factor).
For linear probing:
-
It suffers from primary clustering (long consecutive occupied blocks).
-
The average successful search steps are much more sensitive to clustering and given by:
-
This grows very rapidly as approaches 1 (much faster than chaining).
Conclusion:
Chaining is significantly better for successful searches on average
because:
- It avoids clustering entirely.
- Collisions are confined to their own linked list, not spreading across the table.
- The average search length grows linearly and slowly with load factor, whereas linear probing’s average steps explode due to clustering.
In short:
✅ Chaining: better average performance
❌ Linear probing: worse due to primary clustering
支持与分享
如果这篇文章对你有帮助,欢迎分享给更多人或赞助支持!