E-66
Database Systems
  • Home
  • Lectures
  • Problem Sets
  • Sections
  • Syllabus
  • Schedule
  • Staff
  • Policies
  • Canvas
  • Gradescope
  • Ed Discussion

Section 7

XML

We want to represent a bank’s data in XML. It currently uses a relational DBMS with the following relations:

Account(number CHAR(10) PRIMARY KEY, branch VARCHAR(20), balance FLOAT);
Customer(number CHAR(8) PRIMARY KEY, name VARCHAR(20), address VARCHAR(30));
Owns(account_number CHAR(10), customer_number CHAR(8),
    PRIMARY KEY (account_number, customer_number),
    FOREIGN KEY (account_number) REFERENCES Account,
    FOREIGN KEY (customer_number) REFERENCES Customer
);
  1. What would be some of the advantages and disadvantages of representing this data in XML?

    In general, representing the data in XML makes it easier to exchange the data among applications and to integrate it with other XML data from other sources. However, in general, storing the data in XML is less efficient (tags are repeated in the file, the file is plain-text instead of binary, etc.) and queries on the data may be less efficient than querying a relational DBMS.

XQuery

Testing your queries

You can test the expressions you write for this section using this sample XML file: bank.xml

Note: Do not merely copy and paste the result of clicking on the link to the XML file—some web browsers will interpret the XML document as HTML or try to display the XML tree from the root element and hide the schema (e.g., Chrome). Instead, right-click the link and click Save Link As... (or similar).

  • Download the XML file to a known directory.
  • Open it in BaseX. (For details about how to use BaseX, see the last problem of Problem Set 3.)
  • Test your queries by entering them in the Editor window in BaseX.

Note: In section, we discussed how you could use Berkeley DB XML for these queries. As a result, we needed to add a collection argument to the beginning of our XPath expressions. This is not necessary in BaseX, so we’ve revised the solutions below accordingly.

Recall that XPath models an XML document as a tree in which the nodes represent elements and attributes, and that it allows us to access collections of nodes from the tree.

XPath expressions

  1. Write an XPath expression to obtain...

    1. the name elements of all the bank’s customers

      /bank/customer/name
      

    2. the account numbers of all accounts with a balance that is greater than 400

      /bank/account[balance > 400]/account-num
      

  2. How would you change your XPath expressions if the schema...

    1. specified that the customer name was an attribute of a customer element?

      After realizing that we need to use @ to specify the attribute of an element, we would alter our query to:

      /bank/customer/@name
      

    2. specified that an account number was an attribute of an account element?

      /bank/account[balance > 400]/@account-num
      

These exercises are based on examples from the book Database System Concepts by Silberschatz, Korth, and Sudarshan.

FLWOR expressions

  1. Write an XQuery FLWOR expression that includes a where clause to obtain the account numbers of all accounts with a balance that is greater than 400.

    for $a in /bank/account
    where $a/balance > 400
    return $a/account-num
    

  2. Write an XQuery FLWOR expression to produce the join of the account, customer, and owner elements. The result of the query should contain elements of type cust-acct, which consists of a customer element followed by an associated account element.

    for $a in /bank/account,
        $c in /bank/customer,
        $o in /bank/owner
    where $a/account-num = $o/account-num
      and $c/customer-num = $o/customer-num
    return <cust-acct>{ $c, $a }</cust-acct>
    

  3. Write an XQuery FLWOR expression to list, for each branch of the bank, the account numbers of all accounts at that branch. The resulting elements should have the following form:

    <branch-accounts>
      <branch-name>branch 1 name</branch-name>
      <account-num>branch 1 account number 1</account-num>
      <account-num>branch 1 account number 2</account-num>
    ...
    </branch-accounts>
    <branch-accounts>
      <branch-name>branch 2 name</branch-name>
      ...
    </branch-accounts>
    ...
    

    for $b in distinct-values(/bank/account/branch)
    return <branch-accounts>
           {
              <branch-name>{ $b }</branch-name>,
              for $a in /bank/account
              where $a/branch = $b
              return $a/account-num
           }         
           </branch-accounts>
    

    Here we use distinct-values() to eliminate duplicate branch names. Note that distinct-values() gives us the distinct values of each branch element. This means that $b will contain strings throughout the FLWOR expression.

    If we weren’t using distinct-values() here, we’d need to use $b/text() to properly extract the value inside the branch elements:

    for $b in /bank/account/branch
    return <branch-accounts>
           {
              <branch-name>{ $b/text() }</branch-name>,
              for $a in /bank/account
              where $a/branch = $b
              return $a/account-num
           }
           </branch-accounts>
    

  4. Now write an XQuery FLWOR expression to determine the number of accounts that each customer owns. The resulting elements should have the following form:

    <customer-account-count>
      <customer-name>customer 1 name</customer-name>
      <account-count>customer 1 account count</account-count>
    </customer-account-count>
    <customer-account-count>
      <customer-name>customer 2 name</customer-name>
      <account-count>customer 2 account count</account-count>
    </customer-account-count>
    ...
    

    for $c in /bank/customer
    return <customer-account-count>
           {
              <customer-name>{ $c/name/text() }</customer-name>,
              <account-count>
                { count(/bank/owner[customer-num = $c/customer-num]) }
              </account-count>
           }
           </customer-account-count>
    

    We use $c/name/text() to remove the tags for the name elements. This allows us to use different element tags—in this case, customer-name rather than name. This type of transformation can be useful when exchanging data or integrating data from heterogeneous sources.

These exercises are based on examples from the book Database System Concepts by Silberschatz, Korth, and Sudarshan.