In addition to the B-trees and B+trees that we discussed last week, we can also store database keys and data in a hash table index. For an on-disk hash table, each bucket is (typically) a page, as is each overflow bucket. In a dynamic hash table, the number of buckets increases gradually in proportion to the number of items in the table, with minimal need for re-hashing existing items.
Start with an empty hash table containing 2 buckets. Using linear dynamic hashing add the following values, in order, to the table: 1000, 972, 713, 12, 436, 113, 116. Use the hash function h(x) = x % 10 (x mod 10). Grow the table whenever the number of items becomes more than twice the number of buckets.
First we calculate i and f. i = log2 n = log2 2 = 1. f = k/2n = 0/2·2 = 0, where k is the number of entries. With this formula for f, we must always ensure that f <= 1. We start with the following hash table:
0 [] 1 []
Insert 1000: h(1000) = 0 = 0000. The rightmost i bit of h(1000), where i = 1, is 0, so we add 1000 to 0. f is now 1/(2·2) = 1/4.
0 [1000] 1 []
Insert 972. h(972) = 2 = 0010. The rightmost 1 bit is 0, so add 972 to 0. f = 2/4 = 1/2.
0 [1000, 972] 1 []
Insert 713. h(713) = 3 = 0011. The rightmost bit is 1, so add 713 to 1. f = 3/4.
0 [1000, 972] 1 [713]
Insert 12. h(12) = 2 = 0010. The rightmost bit is 0, so add 12 to 0. f = 4/4 = 1, so on the next insert we will have to add a bucket.
0 [1000, 972, 12] 1 [713]
Insert 436. h(436) = 6 = 0110. However, when we insert 436 we have to grow the table by 1 bucket because with 436, f = 5/4 > 1. So, n += 1 yields n = 3.
Now we must test to see whether we need to add to i. i = log2 3, 1 < i < 2, so we round up to i = 2. Now we grow the table by a bucket, add a leading zero to the current hash values, split the current bucket 0, and use i = 2 rightmost digits (10) to insert 436.
00 [1000] (because 1000 hashes to 0000, or 00) 01 [713] (we don't need to look at the contents of this bucket at all) 10 [972, 12, 436] (**2 hashes to 0010 and **6 hashes to 0110, or 10)
Note that f is now 5/(2·3) = 5/6.
Insert 113. h(113) = 3 = 0011. The rightmost i = 2 bits are 11. However, there is no bucket 11 so we ignore the leading digit and use the bucket specified by the rightmost i - 1 bits. So, we insert 113 into bucket 01. Now f = 6/6 = 1, so on the next insert we’ll have to add a bucket.
00 [1000] 01 [713, 113] 10 [972, 12, 436]
Insert 116. h(116) = 6 = 0110. However, we must grow the table by 1 bucket, bucket 11 (and split bucket 01). Now n = 4. i = log2 4 = 2 still.
Notice that we’re not splitting most efficiently since we’re splitting a bucket with only 2 values (and not even splitting it evenly). If we were limited to, say, 3 values per bucket, we would have to use an overflow bucket to store 116 because it is the fourth value with a hash of 10 (and so we have to “chain” on an overflow bucket). After this insert f = 7/(4·2) = 7/8.
00 [1000] 01 [] 10 [972, 12, 436]---[116] 11 [713, 113] (They both hash to 0011 or 11.)
Recall that:
In the exercises below, determine whether or not the provided schedule has each of the above properties.
T1 | T2 |
---|---|
read(X) | |
read(X) | |
write(Y) | |
write(Y) | |
commit | |
commit |
Is this schedule...
serializable?
conflict serializable?
recoverable?
cascadeless?
T1 | T2 |
---|---|
read(X) | |
write(X) | |
read(X) | |
write(Y) | |
commit | |
commit |
Is this schedule...
serializable?
conflict serializable?
Since we cannot move T2’s write to X either before or after all of T1’s operations, there’s no way to produce a conflict-equivalent serial schedule.
recoverable?
cascadeless?
T1 | T2 |
---|---|
read(X) | |
write(X) | |
read(X) | |
write(Y) | |
commit | |
commit |
Is this schedule...
serializable?
conflict serializable?
recoverable?
cascadeless?
T1 | T2 | T3 |
---|---|---|
read(A) | ||
read(A) | ||
read(B) | ||
write(A) | ||
read(B) | ||
write(A) | ||
write(A) | ||
commit | ||
commit | ||
commit |
Is this schedule...
serializable?
conflict serializable?
recoverable?
cascadeless?
T1 | T2 | T3 | T4 | |
---|---|---|---|---|
1 | read(X) | |||
2 | read(X) | |||
3 | write(Y) | |||
4 | read(Y) | |||
5 | read(Y) | |||
6 | write(X) | |||
7 | read(W) | |||
8 | write(Y) | |||
9 | read(W) | |||
10 | read(Z) | |||
11 | write(W) | |||
12 | read(Z) | |||
13 | write(Z) |
Draw a precedence graph to determine if this schedule is conflict serializable.
T1 | T2 |
---|---|
write(X) | |
read(X) |
Since T1’s write to X comes first in this schedule, and that write conflicts with T2’s read of X, T1 must come before T2 in the equivalent serial schedule.
When we’re done drawing the graph, if we end up with a cycle, it means that we have an impossible situation (e.g., transaction A must come before transaction B, but transaction B must come before transaction A) and the schedule is not conflict serializable.
As you can see, there are cycles in the graph: T1→T3→T4→T1 and T1→T2→T3→T4→T1. This means that the schedule is not conflict serializable.
Last updated on October 3, 2024.