Problem Set 3

Due before the start of lecture on November 5, 2024. See below for a summary of the policies regarding late submissions.

Preliminaries

Homework is due prior to the start of lecture. If it is submitted more than 10 minutes after the start of lecture, it will be considered late. There will be a 10% deduction for submissions made any time in the following seven days, up to the start of the next lecture. We will not accept any homework that is more than 7 days late. Plan your time carefully, and don’t wait until the last minute to begin an assignment. Starting early will give you ample time to ask questions and obtain assistance.

In your work on this assignment, make sure to abide by our policies on academic conduct.

If you have questions while working on this assignment, please attend office hours or post them on Ed Discussion.

Part I

50 points total

Creating the necessary folder

Create a subfolder called ps3 within your e66 folder, and put all of the files for this assignment in that folder.

Creating the necessary file

This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:

Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to ps3_partI.
Add your work for all of the problems from Part I to this file.
Once you have completed Part I, choose File->Download->PDF document, and save the PDF file on your machine. The resulting PDF file (ps3_partI.pdf) is the one that you will submit. See the submission guidelines at the end of Part I.

Problem 1: Conflict serializability

10 points total

Below are two schedules in which actions by three transactions (T1, T2, and T3) are interleaved.

schedule 1:

w1(A); r2(A); w2(B); r3(B); w3(C); r1(C)

schedule 2:

r1(C); w1(A); r3(A); w2(B); r3(B); w3(C)

(Note: When determining conflict serializability, the order of the commits does not matter, so we haven’t included them above.)

For each of the above schedules, you should take the following steps:

In your copy of the ps3_partI template (see above), edit the diagram that we have provided for the schedule, making whatever changes are needed to construct the schedule’s precedence graph.
State whether the schedule is conflict serializable.
If the schedule is conflict serializable, state one possible equivalent serial schedule. If the schedule is not conflict serializable, explain briefly why the schedule is not equivalent to the serial schedule T1; T2; T3. Your explanation should include a discussion of why particular actions from the original schedule are not consistent with that serial schedule.

Problem 2: Two-phase locking and isolation

10 points total

Assume that transactions are operating concurrently in a DBMS in which:

The system is using two-phase locking to ensure serializability.
The system allows shared locks to be upgraded to exclusive locks.
It is not using update locks.

Consider these two transactions:

T1: r(B); r(C); r(A); w(C); commit
T2: r(B); r(A); r(C); w(A); commit

The following is the beginning of one possible schedule of these transactions, with appropriate lock instructions added:

T1	T2
sl(B) r(B) sl(C) r(C) ...	sl(B) r(B) xl(A) sl(C) u(B) ...

Could T2 have unlocked B earlier than it does in this partial schedule? Explain why or why not.
Assume that after the operations in the partial schedule above have occurred, the DBMS lets transaction T1 go next. Would T1 be able to perform its next operation – i.e., would it be able to read A? Explain briefly why or why not. Make sure to take into account any additional lock request that it may need to make before attempting the read.

Regardless of your answer for part 2, let’s assume that after the operations in the partial schedule above have occurred, the DBMS lets transaction T2 go next instead of T1. In addition, assume that both transactions are able to complete the rest of their actions, and that the complete schedule of the transactions looks like this:

T1	T2
sl(B) r(B) sl(C) r(C) sl(A) r(A) xl(C) w(C) commit u(A); u(B); u(C)	sl(B) r(B) xl(A) sl(C) u(B) r(C) w(A) u(A) commit u(C)

Does the above schedule observe strict locking? Explain briefly why or why not.

Is the above schedule recoverable? Explain briefly why or why not.
Is the above schedule cascadeless? Explain briefly why or why not.

Problem 3: Lock modes

9 points total; 1.5 points each part

Consider the following partial schedule of the transactions T1, T2, and T3:

xl2(A); w2(A); sl1(B); r1(B); ul3(C); ...

Given this partial schedule, which of the following lock requests would be granted if it were the next operation in the schedule, and which would be denied?

ul1(A)
sl3(B)
ul2(B)
sl1(C)
xl3(C)
ul2(C)

Fill in the provided table with the system’s response to each request, along with a brief explanation of your answer. (Remember: uli(X) indicates Ti’s request of the update lock for X.)

Problem 4: Deadlock detection

9 points total

For each sequence of operations below, determine whether deadlock will occur under rigorous two-phase locking. You should assume that:

an exclusive lock for an item is requested just before writing that item
a shared lock for an item is requested (if needed) just before reading that item
a transaction commits immediately after completing its final read or write
upgrades of shared locks are allowed
update locks are not being used.

If deadlock occurs, show the partial schedule in table form (including lock operations and any commits) up to the point of deadlock, along with the waits-for graph at the point of deadlock.

If deadlock does not occur, show the full schedule, including lock operations and commits. In that case, you do not need to include the waits-for graph.

In either case, your schedule should include any changes to the sequence of operations that occur because a transaction is forced to wait for a lock. If a transaction waits for a lock that is subsequently granted, you should include two lock requests – one for the initial request, and one at the point at which the lock is granted. If two transactions wait for a lock held by the same other transaction, and that other transaction subsequently commits, you should assume that the waiting transaction with the smaller transaction number is granted its request first.

Important

Don’t forget that if a transaction is forced to wait for a lock, it cannot make any forward progress until that lock is granted.

sequence 1:

r2(C); r1(D); w1(C); r3(C); w2(C); w3(D); w1(D)

sequence 2:

w3(A); r2(A); w2(A); r1(B); r3(B); w3(B); w1(B)

Problem 5: Timestamps and multiple versions

12 points total; 4 points each part

Consider the following sequence of operations:

s1; s2; s3; s4; s5; w3(A); w1(A); w4(A); r5(A); r2(A); c1; c2; c3; c4; c5

where si indicates the start of transaction Ti, which is when its timestamp is assigned, and ci indicates the commit of transaction Ti.

Assume that the transactions are assigned the following timestamps, based on the order in which they start:

TS(T1) = 10
TS(T2) = 20
TS(T3) = 30
TS(T4) = 40
TS(T5) = 50

and that RTS(A) and WTS(A) are both initially 0.

Given these assumptions:

In the first table for this problem in ps3_partI, we’ve filled in the row for the first operation. Complete the table to show how the system would respond to the remaining read and write requests when it is using regular timestamp-based concurrency control without commit bits.
- In the first column of the table, put the requested operation.
- In the second column of the table, indicate the response of the system by selecting the correct option from the following list:
  - allowed
  - ignored
  - denied; roll back
- In the third column of the table, you should do one of the following:
  - If the action is allowed, summarize any changes to the state maintained for item A, as we’ve done in the first row.
  - If the action is ignored or denied, include a brief explanation. For example, if a transaction T7 tried to read item B and its read were too late, you would include something like this:
```
TS(T7) < WTS(B)
```
- You do not need to restart a transaction that is rolled back, which means that you can skip any requests by a transaction that come after the point at which it is rolled back.
- Because commits don’t have an effect when we’re not using commit bits, you do not need to include the commit actions (c1, c2, etc.) in this table.
Complete the second table that we’ve provided to show the system’s response to this sequence of operations if the DBMS is using regular timestamp-based concurrency control with commit bits.
- In addition to the reads and writes, you should include the commits at the end of the sequence since they may cause a change in state.
- In the second column of the table, there are now four options to choose from:
  - allowed
  - ignored
  - denied; roll back
  - denied; make wait
- If a transaction is made to wait, your table should include an additional row for the operation in question when the wait comes to an end – something like:
  
  request response of the system
  
  ...
  
  r7(X) denied; make wait
  
  ...
  
  r7(X) allowed
Explain what will happen in response to this sequence of operations if the DBMS uses multiversion timestamp-based concurrency control without commit bits.
- In the third column, make sure to specify which version’s state is being updated (e.g., “create A(t) with RTS = 0” or “RTS(A(t)) = ...”, where t is the timestamp of the version).
- Similarly, if a transaction is allowed to read A, make sure to indicate in the second column which version it is allowed to read (e.g., if there was a version with a timestamp of 100 that it was allowed to read, you would say “allowed to read A(100)”)
- Because commits don’t have an effect when we’re not using commit bits, you do not need to include the commit actions (c1, c2, etc.) in this table.

request	response of the system
...
r7(X)	denied; make wait
...
r7(X)	allowed

Submitting your work for Part I

Login to Gradescope by clicking the link in the left-hand navigation bar. Once you are in logged in, click on the box for CSCI E-66.

Submit your ps3_partI.pdf file using these steps:

If you still need to create the PDF file, open your file on Google Drive, choose File->Download->PDF document, and save the PDF file on your machine.
Click on PS 3: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF button and find the ps3_partI.pdf that you created in step 1. Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
- Click the title of the problem.
- Click the page(s) on which your work for that problem can be found.
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful. Click the (x) button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.

Important

It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough time to do so, wait an hour or two and then try again. If you are unable to submit and it is close to the deadline, email your homework before the deadline to esaracin@fas.harvard.edu

Part II

50-60 points total

Problem 6: Converting the Person table to XML

25 points total

In this problem, you will write a series of methods that can be used to create an XML version of the Person table from Problem Set 1. Your methods will be part of a larger program that uses the JDBC framework to connect to the SQLite database that you used in PS 1 and to execute the SQL queries needed to extract the necessary data.

Obtaining the necessary files

You should begin by downloading the following zip file: problem6.zip

Unzip/extract the contents of the file.

Depending on your system, after extracting the contents you will either have:

a folder named problem6 that contains all of the files that you need for this problem
an outer folder called problem6 that contains an inner folder named problem6 that contains all of the Java files that you need for this problem.

Take the problem6 folder that actually contains the necessary files and drag it into your ps3 folder so that you can easily find and open it from within VS Code.

Getting started

Read through our overview of the JDBC framework.
Launch VS Code on your laptop.
In VS Code, select the File->Open Folder or File->Open menu option, and use the resulting dialog box to find and open the problem6 folder that you created above – the one that contains the provided files. (Note: You must open the folder; it is not sufficient to simply open one of the Java files in the folder.)

The name of the folder should appear in the Explorer pane on the left-hand side of the VS Code window, along with a list of all of its contents.
Click on the name XMLforPeople.java in the Explorer pane, which will open the file that you need to modify.
Review all of the code that we’ve provided before you start writing any new code. See below for some additional information on what we’ve given you.

The provided code

The class that you will be completing is called XMLforPeople. In this class we’ve given you:

the constructor for the class, which takes the name of a SQLite file that should contain a relational database with the schema outlined in Problem Set 1; the constructor establishes a connection to the SQLite database, and it stores the resulting Connection object in a field called db.
a helper method called simpleElem(), which takes as inputs the name and value of a simple element and returns a string of the form "<name>value</name>". This method should only be used to form simple elements – ones that do not have any child elements.
a helper method called resultsFor(), which takes as input a string presenting a SQL query for the movie database and returns a ResultSet object that can be used to process the results of that query. You are welcome to use this method in the code that you write, although doing so is not required. It will also be useful in testing one of the methods that you write.
a method called idFor(), which takes as input the name of a person and performs a query to find and return the person’s id; this method will be useful when testing the methods that you write.
a method called createFile(), which performs a query to obtain the id numbers of all of the people in the Person table, and which processes them one at a time
the main method, which runs the full program.

Note that many of these methods – and all of the methods that you will implement – include a throws clause in their method header. This clause is needed because the code included in these methods may throw the exception(s) mentioned in the throws clause, and instead of catching them, we are simply declaring that they may be thrown.

The methods you will write

You will implement a number of non-static methods of the XMLforPeople class.

Important

We’ve provided the headers of the methods that you will implement. You must not change these headers in any way.
In the code that you write, you must limit yourself to the packages that we’ve imported at the top of the starter file. You must not use classes from any other Java package. In addition, you must not use any Java features that were not present in Java 8.
We have given you a separate Java class TestDriver that you can use when testing each method. Simply add the appropriate test code to the main method of this class, and run TestDriver to see if you obtain the correct output.

Note: If you are having trouble running TestDriver using the Run link or Run button, you should be able to compile and run it from the command line of the Terminal as follows:
- to compile:
```
javac -cp 'lib/*' *.java
```
- to run on Windows:
```
java -cp 'lib/*;.' TestDriver
```
- to run on macOS:
```
java -cp 'lib/*:.' TestDriver
```
The two commands for running the program are almost identical, but in the Windows version there is a semi-colon (;) before the period in the string after -cp, whereas the macOS version uses a colon (:).

Here are your tasks:

Implement the method called fieldsFor() whose header we have provided. It takes a string representing the id number of a person, and it should return a string containing a sequence of XML elements – one for each non-null field (i.e., column value) in that person’s tuple. You may assume that the person’s name will never be null. If the person’s dob or pob has a null value, that field should not be included in the returned string. If there is no person with the specified id, the method should simply return the empty string.

For example, if you run the following test code (adding it to the main method in TestDriver.java):
```
XMLforPeople xml = new XMLforPeople("movie.sqlite");
System.out.println(xml.fieldsFor(xml.idFor("Julianne Moore")));
System.out.println(xml.fieldsFor("1234567"));   // no person with that id
System.out.println(xml.fieldsFor(xml.idFor("Elisabeth Moss")));
```
you should see:
```
    <name>Julianne Moore</name>
    <dob>1960-12-03</dob>
    <pob>Fayetteville, North Carolina, USA</pob>
     
     
    <name>Elisabeth Moss</name>
    <dob>1983-01-01</dob>
     
```
Notes:
- In our database, Elisabeth Moss has a null value for her pob, and thus that element is not included in the string returned for her.
- You should see an extra blank line when you print the results of any call that produces one or more fields (including the blank line after the results for Elisabeth Moss shown above), because the string returned by the method ends with a newline, and the println method adds its own newline.
- The second println statement prints only a blank line because there is no person whose id is 1234567, and thus the call xml.fieldsFor("1234567") returns an empty string. As a result, there are two blank lines after the results for Julianne Moore: one after her fields, and one from the printing of the empty string.
Important guidelines:
- You must begin by performing the appropriate SQL query. Use the idFor() method as a model for what you should do.
- When processing the results, make sure to follow the approach given in our JDBC overview.
- The elements for the non-null fields should appear in the same order as the corresponding columns in the Person table:
  - name
  - dob
  - pob
- You should use the provided simpleElem() method to form the XML element for a given field.
- Each element should be on its own line, which you can accomplish by including a newline character ("\n") after each element.
- Each element should be preceded by exactly four spaces, and there should be no extra spaces at the end of a given line.
Implement the method called movieElemsFrom() whose header we have provided. It takes a ResultSet object for a SQL query that produces tuples of the form (movie year, movie name), and it creates and returns a string containing the XML for a sequence of 0 or more complex elements of type movie that contain nested child elements of type year and name. If the ResultSet object has no results, the method should simply return the empty string.

For example, if you run the following test code:
```
XMLforPeople xml = new XMLforPeople("movie.sqlite");
String query1 = "SELECT year, name FROM Movie WHERE year = 2020;";
ResultSet results1 = xml.resultsFor(query1);
System.out.println(xml.movieElemsFrom(results1));

String query2 = "SELECT year, name FROM Movie WHERE earnings_rank = 1;";
ResultSet results2 = xml.resultsFor(query2);
System.out.println(xml.movieElemsFrom(results2));
```
you should see:
```
      <movie>
        <year>2020</year>
        <name>Nomadland</name>
      </movie>
      <movie>
        <year>2020</year>
        <name>The Father</name>
      </movie>
      <movie>
        <year>2020</year>
        <name>Minari</name>
      </movie>
 
      <movie>
        <year>2015</year>
        <name>Star Wars: The Force Awakens</name>
      </movie>
 
```
Here again, you should see an extra blank line when you print the results of any call that produces one or more elements (including the blank lines shown above), because the string returned by the method ends with a newline, and the println method adds its own newline.

Important guidelines:
- You shouldn’t make any assumptions about the number of movies in the ResultSet. Rather, your code should be able to handle an arbitrary number of results. See the createFile() method for a model of how to do this. If there are no results in the ResultSet, the method should return an empty string.
- You should use the provided simpleElem() method to form the XML for each simple element. However, you should not use simpleElem to form the outer movie elements, which are not simple because they have children.
- The returned string should be formatted as shown above. The outer start tag, outer end tag, and each child element should be on its own line, with a newline character ("\n") at the end of each line.
- The outer start tag and outer end tag should each be preceded by exactly six spaces, but the child elements should each be preceded by exactly eight spaces. There should be no extra spaces at the end of a given line.
Implement the method called actedIn() whose header we have provided. It takes a string representing the id number of a person, and it should return a string containing the XML for a single complex element named actedIn that includes nested child elements of type movie for all of the movies in the database in which the person has acted, ordered by the year of the movie and (in the case of ties) the name of the movie. If there is no person with the specified id or if the person has not acted in any of the movies in the database, the method should simply return the empty string.

For example, if you run the following test code:
```
XMLforPeople xml = new XMLforPeople("movie.sqlite");
System.out.println(xml.actedIn(xml.idFor("Geena Davis")));
System.out.println(xml.actedIn("1234567"));
System.out.println(xml.actedIn(xml.idFor("Denzel Washington")));
```
you should see:
```
    <actedIn>
      <movie>
        <year>1988</year>
        <name>Accidental Tourist, The</name>
      </movie>
      <movie>
        <year>1999</year>
        <name>Stuart Little</name>
      </movie>
    </actedIn>
 
 
    <actedIn>
      <movie>
        <year>1989</year>
        <name>Glory</name>
      </movie>
      <movie>
        <year>1993</year>
        <name>Philadelphia</name>
      </movie>
      <movie>
        <year>2001</year>
        <name>Training Day</name>
      </movie>
      <movie>
        <year>2016</year>
        <name>Fences</name>
      </movie>
    </actedIn>
 
```
Here again, you should see an extra blank line when you print the results of any call that produces one or more elements (including the blank lines shown above), because the string returned by the method ends with a newline, and the println method adds its own newline.

Note that the second println statement prints only a blank line because there is no person whose id is 1234567, and thus the call xml.actedIn("1234567") returns an empty string.

Important guidelines:
- You should begin by performing the appropriate SQL query to get the necessary ResultSet. Make sure to use an ORDER BY clause in your query to get the necessary ordering.
- After performing the query, almost all of the remaining work will be done by making the appropriate call to your movieElemsFrom() method. You must use that method to produce the XML for the movie elements (if any) for the results in the ResultSet produced by your query.
- Given the string returned by movieElemsFrom(), you should take the necessary steps to return either:
  - a string containing the XML for the appropriate actedIn element
    
    OR
  - an empty string, if there is no person with the specified id or if the person did not act in any of the movies in the database.
- The returned string should be formatted as shown above. The outer start tag, outer end tag, and each child element should be on its own line, with a newline character ("\n") at the end of each line.
- You should not use the simpleElem helper method in this method.
- The outer start tag and outer end tag should each be preceded by exactly four spaces. The child elements should have the leading spaces that were given to them by movieElemsFrom. There should be no extra spaces at the end of a given line.
Implement the method called directed() whose header we have provided. It takes a string representing the id number of a person, and it should return a string containing the XML for a single complex element named directed that includes nested child elements of type movie for all of the movies in the database that the person has directed, ordered by the year of the movie and (in the case of ties) the name of the movie. If there is no person with the specified id or if the person has not directed any of the movies in the database, the method should simply return the empty string.

For example, if you run the following test code:
```
XMLforPeople xml = new XMLforPeople("movie.sqlite");
System.out.println(xml.directed(xml.idFor("Geena Davis")));
System.out.println(xml.directed("1234567"));
System.out.println(xml.directed(xml.idFor("Denzel Washington")));
```
you should see:
```
 
 
    <directed>
      <movie>
        <year>2016</year>
        <name>Fences</name>
      </movie>
    </directed>
 
```
Here again, you should see an extra blank line when you print the results of any call that produces one or more elements (including the blank lines shown above), because the string returned by the method ends with a newline, and the println method adds its own newline.

Note that the first and second println statements print only a blank line because Geena Davis has not directed any of the movies in the database and there is no person whose id is 1234567, and thus the corresponding calls to directed return an empty string.

The guidelines for the previous method also apply here, but for directed elements instead of actedIn elements.
Implement the method called elementFor() whose header we have provided. It takes a string representing the id number of a person, and it should return a string containing the XML for a single complex element named person that includes nested child elements for:
- the non-null fields in the person’s tuple (use fieldsFor() to get these)
- the movies (if any) in which the person has acted (use actedIn() to get these)
- the movies (if any) that the person has directed (use directed() to get these).
In addition, each person element must have an attribute named id for the person’s id number.

For example, if you run the following test code:
```
XMLforPeople xml = new XMLforPeople("movie.sqlite");
System.out.println(xml.elementFor(xml.idFor("Elisabeth Moss")));
System.out.println(xml.elementFor("1234567"));
System.out.println(xml.elementFor(xml.idFor("Denzel Washington")));
```
you should see:
```
  <person id="0005253">
    <name>Elisabeth Moss</name>
    <dob>1983-01-01</dob>
    <actedIn>
      <movie>
        <year>1999</year>
        <name>Girl, Interrupted</name>
      </movie>
    </actedIn>
  </person>
 
 
  <person id="0000243">
    <name>Denzel Washington</name>
    <dob>1954-12-28</dob>
    <pob>Mount Vernon, New York, USA</pob>
    <actedIn>
      <movie>
        <year>1989</year>
        <name>Glory</name>
      </movie>
      <movie>
        <year>1993</year>
        <name>Philadelphia</name>
      </movie>
      <movie>
        <year>2001</year>
        <name>Training Day</name>
      </movie>
      <movie>
        <year>2016</year>
        <name>Fences</name>
      </movie>
    </actedIn>
    <directed>
      <movie>
        <year>2016</year>
        <name>Fences</name>
      </movie>
    </directed>
  </person>
 
```
Here again, you should see an extra blank line when you print the results of any call that produces one or more elements (including the blank lines shown above), because the string returned by the method ends with a newline, and the println method adds its own newline.

Note that the second println statement prints only a blank line because there is no person whose id is 1234567, and thus the call xml.elementFor("1234567") returns an empty string.

Important guidelines:
- This method should not perform any queries of its own. Rather, you must use your previous methods to obtain the necessary child elements. Don’t forget that you should only use the simpleElem method for elements that do not have any children.
- Don’t forget to include the person’s id as an attribute within the start tag for person, as shown above. In order to include the quotes around the id, you will need to use the escape sequence "\"" for each double-quote character.
- The outer start tag and outer end tag should each be on their own line preceded by exactly two spaces and followed by a newline character. The child elements should have the same spacing and formatting described in the earlier method specifications, so you won’t need to add any new spaces to them. Once again, there should be no extra spaces at the end of a given line.

Once you have completed and tested all of your methods, running the XMLforPeople program should create a file named people.xml that represents the entire Person table in XML!

Problem 7: Querying an XML database

25-35 points total

This problems asks you to construct XPath and XQuery queries for an XML version of our entire movie database. The schema of this XML database is described here.

Getting started

To allow you to check your work, we’ll make use of a freely available XML DBMS called BaseX.

You should begin by downloading the following files into your folder for this assignment:

Make sure to put the files in your ps3 folder. If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.

Performing queries in BaseX

Once you have completed the steps outlined above, you can perform queries by taking the following steps:

Start up the BaseX GUI by double-clicking on the JAR file that you downloaded above (BaseX851.jar).
Select the Database->New menu option, click the Browse button, and use the resulting dialog box to find the imdb.xml file that you downloaded above.
Click Open to select the file, and click OK to create the database.
To execute a query, enter it in the Editor pane in BaseX, and click the green play button to execute it. (You can also use Ctrl+Enter or Ctrl+Return for this purpose.)
The results (if any) will be displayed in the Result pane.

If you have trouble using BaseX

If you have trouble getting BaseX to work on your machine, here are some possible options:

On newer Macbooks, try downloading a newer version of the BaseX JAR file: BaseX103.jar
On MacOS, try running the program from the Terminal.

First, use cd to navigate to the appropriate folder – the one that contains the downloaded JAR file.

Then, entering the following from the command line:
```
java -jar BaseX103.jar
```
Or, if you have the original JAR file:
```
java -jar BaseX851.jar
```
If you are able to launch BaseX but you can’t open the database file, you may need to give Java permission to access the disk. One set of instructions for doing so on Macs can be found here.

Important guidelines

If you’re using a Mac, you should disable smart quotes, because they may lead to errors in BaseX and in our testing. There are instructions for doing so here.
ps3_queries.py is a Python file, so you could use a Python IDE to edit it, but a regular text editor like TextEdit or Notepad++ would also be fine. However, if you use a text editor, you must ensure that you save it as a plain-text file.
Construct the XQuery commands needed to solve the problems given below. Test each command in BaseX to make sure that it works.
Once you have finalized the XQuery command for a given problem, copy the command into your ps3_queries.py file, putting it between the triple quotes provided for that query’s variable. We have included a sample query to show you what the format of your answers should look like.
Each of the problems must be solved by means of a single query. Unless the problem specifies otherwise, you may use either a standalone XPath expression or an XQuery FLWOR expression. Use nested subqueries as needed.
Your queries should only use information provided in the problem itself. In addition, they should work for any XML database that follows the schema that we have specified.

The query problems

Make sure to read and follow the guidelines given above.

It is worth noting that our movie database was last updated in September of last year, so it doesn’t include information about the most recent movies or Academy Award winners.

Write a standalone XPath expression (not a FLWOR expression) to find the names of all people in the database who were born in the state of Maine – i.e., who have the string “Maine, USA” in their place of birth. The results of the query should be a collection of name elements with the names of these people.

Hint: You should use the contains function to perform substring matching, as discussed in lecture.
In Problem Set 1, you wrote a SQL query to find information about Meryl Streep’s Oscars. Write a FLWOR expression to solve the same problem. The results of the query should be new elements of type streep_oscar that include three child elements: the year of the Oscar, the type of the Oscar, and the name of the movie that she won it for.

Hint: You will need to use curly braces and commas as part of your return clause. See the examples in the lecture notes for models of how to transform the elements selected by a query into new types of elements.

Note: You don’t need to worry about indenting and line breaks in the results of the queries for this or any other problem.
Find all movies with an R rating that have won a Best Picture Oscar. The result of your query should be a collection of new elements of type winner whose values consist of the name of the movie, followed by a space, followed by the year of the Oscar, which you should surround with parentheses. For example:
```
    <winner>Gladiator (2001)</winner>
```
Order the results based on the year of the Oscar.

Hints:
- You will need to use the text() function to obtain the values of the relevant elements, without their begin and end tags.
- You will again need to use curly braces and commas as part of your return clause.
- The return clause can also include string literals (e.g., " (").
Do the Oscars reward top-grossing films? Write a query that determines the Oscars won by the top 200 grossing films.

For any of the top 200 grossing movies that have won one or more of the Oscars in the database, your query should construct a new element of type oscar_grosser that includes the following:
- the original child elements for the movie’s name and earnings rank
- a new child element of type num_oscars that has as its value the number of Oscars in the database that were won by that movie
- 1 or more child elements of type award, each of which has as its value the type of one of the Oscars won by the movie.
For example:
```
<oscar_grosser>
  <name>Titanic</name>
  <earnings_rank>8</earnings_rank>
  <num_oscars>2</num_oscars>
  <award>BEST-PICTURE</award>
  <award>BEST-DIRECTOR</award>
</oscar_grosser>
```
Movies that did not win any of the Oscars in the database should not appear in the results.

Hints:
- You will need to use the built-in count() and text() functions. See the lecture notes for examples of queries that use them.
- You will need to use a nested FLWOR expression.
- Some of the hints for the previous problem also apply here.
For each of the movie ratings in the database, construct a new element of type rating-info that includes the following:
- a child element called rating that contains the rating name; see below for more details about this element
- a new child element of type num-movies that has as its value the number of movies in the database with that rating
- a new child element of type avg-runtime that has as its value the average runtime of movies in the database with that rating
- 0 or more child elements of type top-twenty, each of which contains the name of a movie with that rating that is among the top 20 grossing movies of all time, according to its earnings_rank.
For example:
```
<rating-info>
  <rating>PG</rating>
  <num-movies>145</num-movies>
  <avg-runtime>112.0551724137931</avg-runtime>
  <top-twenty>Incredibles 2</top-twenty>
  <top-twenty>The Lion King</top-twenty>
  <top-twenty>Beauty and the Beast</top-twenty>
  <top-twenty>Finding Dory</top-twenty>
  <top-twenty>Frozen II</top-twenty>
  <top-twenty>Star Wars: Episode I - The Phantom Menace</top-twenty>
</rating-info>
```
Hints:
- To ensure that you only consider a given rating once, you should use the distinct-values function. For example, to iterate over distinct Oscar types, you could do the following:
```
for $t in distinct-values(//oscar/type)
```
- distinct-values gives you the text values of the corresponding elements, not the elements themselves. As a result, your return clause will need to construct new rating elements by adding back in the begin and end tags that were removed by distinct-values.
- Some of the hints for the previous problems also apply here.
(required of grad-credit students; “partial” extra credit for others)

Some of the people in the database have served as both actors and directors. For each such person, construct a new element of type actor_director that includes the following:
- a name child element that has as its value the name of the person
- a new child element of type num_acted that has as its value the number of movies in the database in which the person has acted
- a new child element of type num_directed that has as its value the number of movies in the database that the person has directed
- 0 or more child elements of type acted_and_directed, each of which contains the name of a movie for which the person served as both an actor and a director. (Most of the people in the results will not have a child element of this type, because although they have both acted and directed, they have not done both jobs in the same movie for one of the movies in the database.)
For example:
```
<actor_director>
  <name>Laurence Olivier</name>
  <num_acted>3</num_acted>
  <num_directed>1</num_directed>
  <acted_and_directed>Hamlet</acted_and_directed>
</actor_director>
```
Hints:
- As mentioned in our discussion of XPath predicates in lecture, you can test for the presence of an element or attribute by simply using its name (or, in the case of an attribute, its name preceded by an @ symbol). A similar technique works in the WHERE clause of a FLWOR expression.
  
  For example, to get all movies with an earnings rank, we could do either of the following:
```
for $m in //movie[earnings_rank]
...
```
  or
```
for $m in //movie
where $m/earnings_rank
```
- You may find it helpful to use the tokenize() function, which breaks a string into a collection of component strings based on a specified separator character. For example, tokenize('a,b,c', ',') would produce ('a', 'b', 'c').
  
  More information about this and other XQuery functions can be found here.
- You can use the contains function when checking if an ID is in a collection of IDREFS.
- Some of the hints for the previous problems also apply here.
(required of grad-credit students; “partial” extra credit for others)

Winning multiple Oscars is one sign of greatness, as is appearing in a large number of movies in our database. Doing both is truly impressive!

Find all people who have both:
- won two or more Best Actor or Best Actress awards (note that we are not counting any other type of Oscar)
- appeared in six or more of the movies in the database.
For each such person, construct a new element of type acting_legend that includes the following:
- a name child element that has as its value the name of the person
- a new child element of type num_movies that has as its value the number of movies in the database in which the person has acted
- new child elements of type won_for, each of which contains the name of a movie for which the person won Best Actor or Best Actress, followed by a space, followed by the year in which they won the award, surrounded by parentheses.
For example:
```
<acting_legend>
  <name>Tom Hanks</name>
  <num_movies>12</num_movies>
  <won_for>Forrest Gump (1995)</won_for>
  <won_for>Philadelphia (1994)</won_for>
</acting_legend>
```

Submitting your work for Part II

Submit your XMLforPeople.java and ps3_queries.py files using these steps:

Click on PS 3: Part II in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your files to the box labeled DRAG & DROP. You can either drag and drop the files from their folder into the box, or you can click on the box itself and browse for the files.
Click the Upload button.
You should see a box saying that your submission was successful. Click the (x) button to close that box.
The Autograder will perform some tests on your files. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.

Note: You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of each file to view its contents. Check to make sure that the files contain the code that you want us to grade.

Important

It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough time to do so, wait an hour or two and then try again. If you are unable to submit and it is close to the deadline, email your homework before the deadline to esaracin@bu.edu

Last updated on October 21, 2024.