E-66
Database Systems
  • Home
  • Lectures
  • Problem Sets
  • Sections
  • Syllabus
  • Schedule
  • Staff
  • Policies
  • Canvas
  • Gradescope
  • Ed Discussion

Section 4

Berkeley DB

We will now go over the basics of using Berkeley DB in Java. We will write some code fragments that create and open and environment, create a simple database, and store and look up key-value pairs.

Open the Berkeley DB Java Edition API and review the APIs of the Database, Environment, and DatabaseEntry classes. Keep the API open as we work through the following exercises. (Note: The Berkeley DB API should also be of great assistance to you on Problem Set 2 — bookmark it!)

We will start with this Java code template: Section4.java

Note that in order for the code to compile, you’ll have to follow the instructions from Problem Set 2 for setting up your environment. You’ll also need the RowOutput.java and RowInput.java files from the starter code of Problem Set 2.

  1. Use the EnvironmentConfig and Environment classes to create and open a BDB environment.

  2. Use the DatabaseConfig and Database classes to create and open a database. Use the file name courses.db for the database.

  3. Using the DatabaseEntry class, create a new record and use the put() method of the Database class to store it in the database.

    • The key of the record should be a fixed-length string of length 7. For this record, use the string "CSCIE66".

    • The data portion of the record should consist of two fixed-length fields: a seven-character string and a four-byte integer. The string should be an instructor ID, and the integer the number of enrolled students.

    • Use a value of "1234567" as the instructor ID for this record, and a value of 35 for the enrollment.

    • Use the RowOutput class to write the actual bytes to the key and data portions of the record.

    • Lastly, determine the outcome of the put() method using the OperationStatus class.

  4. Now use the get() method of the Database class to retrieve the record by its key "CSCIE66". Use the RowInput class to unmarshall the two fields of the record. If the OperationStatus of the retrieval was a success, print the key and value to standard out.

  5. Finally, iterate over all records in the database using the Cursor class. Use the unmarshalling code you wrote previously to read back the value part of each record, and print each key and value to standard out.

    The solutions for this part can be found in Section4Solutions.java

B-Trees and B+Trees

B-trees and B+trees are balanced search trees designed for on-disk data. The difference between the two is that in B+trees, all of the actual data is stored in the leaf nodes, and the internal nodes contain keys which may or may not appear in the actual dataset. The leaf nodes are also typically linked together to form a linked list at the bottom of the B+tree. Each node in the tree (typically) correspond to a page.

  1. Insert the following sequence of keys into an initially empty B-tree of order 2:

    51, 3, 10, 77, 20, 40, 34, 28, 61, 80, 68, 93, 90, 97, 14

    We inserted the above keys into an initially empty B-tree of order 2 (i.e. m = 2).

                          ---------------------
                          | 20 | 40 | 68 | 90 |
                         _+----+----+----+----+_________
               _________/     /      \__  \__________   \_____________
              /              /          \            \                \
    ---------------      -----------    -----------    -----------    -----------
    | 3 | 10 | 14 |      | 28 | 34 |    | 51 | 61 |    | 77 | 80 |    | 93 | 97 |
    ---------------      -----------    -----------    -----------    -----------
    

    The sequence of insertions is as follows. We indicate inserted values using a prefixed asterisk.

    -------
    | *51 |
    -------
    
    -----------
    | *3 | 51 |
    -----------
    
    -----------------
    |  3 | *10 | 51 |
    -----------------
    
    ----------------------
    |  3 | 10 | 51 | *77 |
    ----------------------
    
               -------
               | *20 |
               +-----+
              /       \
    -----------      -----------
    |  3 | 10 |      | 51 | 77 |
    -----------      -----------
    
                ------
                | 20 |
               _+----+_
              /        \
    -----------         -----------------
    |  3 | 10 |         | *40 | 51 | 77 |
    -----------         -----------------
    
                ------
                | 20 |
               _+----+_
              /        \
    -----------        ----------------------
    |  3 | 10 |        | *34 | 40 | 51 | 77 |
    -----------        ----------------------
    
                  -----------
                  | 20 | 40 |
                __+----+----+___
               /       |        \
    -----------   ------------   -----------
    |  3 | 10 |   | *28 | 34 |   | 51 | 77 |
    -----------   ------------   -----------
    
                  -----------
                  | 20 | 40 |
                __+----+----+___
               /       |        \
    -----------   ------------   -----------------
    |  3 | 10 |   | 28 | 34 |    | 51 | *61 | 77 |
    -----------   ------------   -----------------
    
                  -----------
                  | 20 | 40 |
                __+----+----+__
               /       |       \
    -----------   -----------   ----------------------
    |  3 | 10 |   | 28 | 34 |   | 51 | 61 | 77 | *80 |
    -----------   -----------   ----------------------
    
                       -----------------
                       | 20 | 40 | *68 |
                      _+----+----+-----+_
               ______/  ___/      \__    \____
              /        /             \        \
    -----------   -----------   -----------   -----------
    |  3 | 10 |   | 28 | 34 |   | 51 | 61 |   | 77 | 80 |
    -----------   -----------   -----------   -----------
    
                       ----------------
                       | 20 | 40 | 68 |
                      _+----+----+----+_
               ______/  ___/      \__   \_____
              /        /             \        \
    -----------   -----------   -----------   -----------------
    |  3 | 10 |   | 28 | 34 |   | 51 | 61 |   | 77 | 80 | *93 |
    -----------   -----------   -----------   -----------------
    
                              ----------------
                              | 20 | 40 | 68 |
                      ________+----+----+----+
               ______/  __________/   _/      \____ 
              /        /             /             \
    -----------   -----------   -----------   ----------------------
    |  3 | 10 |   | 28 | 34 |   | 51 | 61 |   | 77 | 80 | *90 | 93 |
    -----------   -----------   -----------   ----------------------
    
                              --------------------
                              | 20 | 40 | 68 | 90 |
                      ________+----+----+----+----+____
               ______/  __________/   _/      \____    \_________
              /        /             /             \             \
    -----------   -----------   -----------   -----------   ------------
    |  3 | 10 |   | 28 | 34 |   | 51 | 61 |   | 77 | 80 |   | 93 | *97 |
    -----------   -----------   -----------   -----------   ------------
    
                              ---------------------
                              | 20 | 40 | 68 | 90 |
                             _+----+----+----+----+_________
                     _______/     /      \__  \__________   \_____________
                    /            /          \            \                \
    ----------------      -----------    -----------    -----------    -----------
    | 3 | 10 | *14 |      | 28 | 34 |    | 51 | 61 |    | 77 | 80 |    | 93 | 97 |
    ----------------      -----------    -----------    -----------    -----------
    

  2. Now, we want to insert the same sequence of keys into an initially empty B+tree of order 2 (i.e., m = 2).

    1. Before performing the insertions, do you expect the result to be the same or different? Why?

      Different. Because data items must be only in the leaf nodes, the rules for splitting a leaf node change somewhat. The key of the middle item is copied up into the parent so that it appears in both the parent and the new leaf node created as a result of the split. In addition, the leaf nodes will be linked together.

    2. Perform the sequence of insertions and draw the B+tree as it is formed.

      We indicate inserted values using a prefixed asterisk.

      -------
      | *51 |
      -------
      
      -----------
      | *3 | 51 |
      -----------
      
      -----------------
      |  3 | *10 | 51 |
      -----------------
      
      ----------------------
      |  3 | 10 | 51 | *77 |
      ----------------------
      
                 -------
                 | *20 |
                 +-----+
                /       \
      -----------      -----------------
      |  3 | 10 |------| *20 | 51 | 77 |
      -----------      -----------------
      
                 ------
                 | 20 |
                 +----+
                /      \
      -----------      ----------------------
      |  3 | 10 |------| 20 | *40 | 51 | 77 |
      -----------      ----------------------
      
                  -----------
                  | 20 | 40 |
                 _+----+----+_______
                /       \           \
      -----------   ------------    ----------------
      |  3 | 10 |---| 20 | *34 |----| 40 | 51 | 77 |
      -----------   ------------    ----------------
      
                      -----------
                      | 20 | 40 |
                 _____+----+----+________
                /          |             \
      -----------   -----------------    ----------------
      |  3 | 10 |---| 20 | *28 | 34 |----| 40 | 51 | 77 |
      -----------   -----------------    ----------------
      
                      -----------
                      | 20 | 40 |
                 _____+----+----+________
                /          |             \
      -----------   ----------------    ----------------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 | *61 | 77 |
      -----------   ----------------    ----------------------
      
                             ----------------
                             | 20 | 40 | 61 |
                 ____________+----+----+----+_________
                /                 |     \             \
      -----------   ----------------    -----------   -----------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 77 | *80 |
      -----------   ----------------    -----------   -----------------
      
                             ----------------
                             | 20 | 40 | 61 |
                 ____________+----+----+----+_________
                /                 |     \             \
      -----------   ----------------    -----------   ----------------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | *68 | 77 | 80 |
      -----------   ----------------    -----------   ----------------------
      
                             ---------------------
                             | 20 | 40 | 61 | 77 |
                       ______+----+----+----+----+__________
                ______/           |     \    \_________     \_________
               /                 /       \             \              \
      -----------   ----------------    -----------   -----------   -----------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 | *93 |
      -----------   ----------------    -----------   -----------   -----------------
      
                             ---------------------
                             | 20 | 40 | 61 | 77 |
                       ______+----+----+----+----+__________
                ______/           |     \    \_________     \_________
               /                 /       \             \              \
      -----------   ----------------    -----------   -----------   ----------------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 | *90 | 93 |
      -----------   ----------------    -----------   -----------   ----------------------
      
                                            ------
                                            | 61 |
                                       _____+----+_____________
                                      /                        \
                             -----------                     -----------
                             | 20 | 40 |                     | 77 | 90 |
                       ______+----+----+                     +----+----+_____
                ______/           |     \                   /      \___      \______
               /                 /       \                 /           \            \
      -----------   ----------------    -----------   -----------   -----------   -----------------
      |  3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | *97 |
      -----------   ----------------    -----------   -----------   -----------   -----------------
      
                                               ------
                                               | 61 |
                                          _____+----+_____________
                                         /                        \
                                   -----------                     -----------
                                   | 20 | 40 |                     | 77 | 90 |
                             ______+----+----+                     +----+----+_____
                      ______/           |     \                   /      \___      \______
                     /                 /       \                 /           \            \
      -----------------   ----------------    -----------   -----------   -----------   ----------------
      |  3 | 10 | *14 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | 97 |
      -----------------   ----------------    -----------   -----------   -----------   ----------------
      

    3. How would you perform a range search on this tree for all keys in the range (25, 60)? Why is such a query easier in a B+tree than it is in a standard B-tree?

      The search starts at the root node that contains 61. Since 25 < 61, we would continue the search in the left child of the root node, call it node20-40. 25 is between 20 and 40, so we continue searching in the middle child of node20-40. The middle child is a leaf node. We now start scanning the items in the leaf node. The first item, 20, is ignored as it is not in the range. The next two items, 28 and 34, are in the required range. We follow the link connecting the current node to its right sibling, and we continue scanning in the right sibling node. The items in the sibling node, 40 and 51, are in range. We follow the link to the next leaf node, and since its first key (the 61) is greater than 60, we stop scanning.

      This type of range-search query is easier in a B+tree because we can take advantage of the links connecting the leaf nodes. In a standard B-tree, we would need to go back up the tree in order to find the next leaf node.

                                                  ||
                                                  \/
                                                -OO---
                                                O 61 |
                                           OOOOO+----+_____________
                                          O                        \
                                   -----OO----                     -----------
                                   | 20 O 40 |                     | 77 | 90 |
                             ______+----O----+                     +----+----+_____
                      ______/           O     \                   /      \___      \______
                     /                 O       \                 /           \            \
      -----------------   -----=============  =============   -----------   -----------   ----------------
      |  3 | 10 |  14 |---| 20 | *28 | *34 |==| *40 | *51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | 97 |
      -----------------   -----=============  =============   -----------   -----------   ----------------
      

Last updated on September 27, 2024.