Due before the start of lecture on November 5, 2024. See below for a summary of the policies regarding late submissions.
Homework is due prior to the start of lecture. If it is submitted more than 10 minutes after the start of lecture, it will be considered late. There will be a 10% deduction for submissions made any time in the following seven days, up to the start of the next lecture. We will not accept any homework that is more than 7 days late. Plan your time carefully, and don’t wait until the last minute to begin an assignment. Starting early will give you ample time to ask questions and obtain assistance.
In your work on this assignment, make sure to abide by our policies on academic conduct.
If you have questions while working on this assignment, please attend office hours or post them on Ed Discussion.
50 points total
Create a subfolder called ps3
within your e66
folder, and put all of the files for this assignment
in that folder.
This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps3_partI
.
Add your work for all of the problems from Part I to this file.
Once you have completed Part I, choose File->Download->PDF document,
and save the PDF file on your machine.
The resulting PDF file (ps3_partI.pdf
) is the one that you will
submit. See the submission guidelines at the end of Part I.
10 points total
Below are two schedules in which actions by three transactions (T1, T2, and T3) are interleaved.
schedule 1:
w1(A); r2(A); w2(B); r3(B); w3(C); r1(C)
schedule 2:
r1(C); w1(A); r3(A); w2(B); r3(B); w3(C)
(Note: When determining conflict serializability, the order of the commits does not matter, so we haven’t included them above.)
For each of the above schedules, you should take the following steps:
In your copy of the ps3_partI
template (see above), edit the
diagram that we have provided for the schedule, making whatever
changes are needed to construct the schedule’s precedence graph.
State whether the schedule is conflict serializable.
If the schedule is conflict serializable, state one possible equivalent serial schedule. If the schedule is not conflict serializable, explain briefly why the schedule is not equivalent to the serial schedule T1; T2; T3. Your explanation should include a discussion of why particular actions from the original schedule are not consistent with that serial schedule.
10 points total
Assume that transactions are operating concurrently in a DBMS in which:
The system is using two-phase locking to ensure serializability.
The system allows shared locks to be upgraded to exclusive locks.
It is not using update locks.
Consider these two transactions:
T1: r(B); r(C); r(A); w(C); commit T2: r(B); r(A); r(C); w(A); commit
The following is the beginning of one possible schedule of these transactions, with appropriate lock instructions added:
T1 |
T2 |
---|---|
sl(B) |
|
Could T2 have unlocked B earlier than it does in this partial schedule? Explain why or why not.
Assume that after the operations in the partial schedule above have occurred, the DBMS lets transaction T1 go next. Would T1 be able to perform its next operation – i.e., would it be able to read A? Explain briefly why or why not. Make sure to take into account any additional lock request that it may need to make before attempting the read.
Regardless of your answer for part 2, let’s assume that after the operations in the partial schedule above have occurred, the DBMS lets transaction T2 go next instead of T1. In addition, assume that both transactions are able to complete the rest of their actions, and that the complete schedule of the transactions looks like this:
T1 |
T2 |
---|---|
sl(B) |
|
Does the above schedule observe strict locking? Explain briefly why or why not.
Is the above schedule recoverable? Explain briefly why or why not.
Is the above schedule cascadeless? Explain briefly why or why not.
9 points total; 1.5 points each part
Consider the following partial schedule of the transactions T1, T2, and T3:
xl2(A); w2(A); sl1(B); r1(B); ul3(C); ...
Given this partial schedule, which of the following lock requests would be granted if it were the next operation in the schedule, and which would be denied?
ul1(A)
sl3(B)
ul2(B)
sl1(C)
xl3(C)
ul2(C)
Fill in the provided table with the system’s response to each request,
along with a brief explanation of your answer. (Remember: uli(X)
indicates Ti’s request of the update lock for X.)
9 points total
For each sequence of operations below, determine whether deadlock will occur under rigorous two-phase locking. You should assume that:
If deadlock occurs, show the partial schedule in table form (including lock operations and any commits) up to the point of deadlock, along with the waits-for graph at the point of deadlock.
If deadlock does not occur, show the full schedule, including lock operations and commits. In that case, you do not need to include the waits-for graph.
In either case, your schedule should include any changes to the sequence of operations that occur because a transaction is forced to wait for a lock. If a transaction waits for a lock that is subsequently granted, you should include two lock requests – one for the initial request, and one at the point at which the lock is granted. If two transactions wait for a lock held by the same other transaction, and that other transaction subsequently commits, you should assume that the waiting transaction with the smaller transaction number is granted its request first.
Important
Don’t forget that if a transaction is forced to wait for a lock, it cannot make any forward progress until that lock is granted.
sequence 1:
r2(C); r1(D); w1(C); r3(C); w2(C); w3(D); w1(D)
sequence 2:
w3(A); r2(A); w2(A); r1(B); r3(B); w3(B); w1(B)
12 points total; 4 points each part
Consider the following sequence of operations:
s1; s2; s3; s4; s5; w3(A); w1(A); w4(A); r5(A); r2(A); c1; c2; c3; c4; c5
where si
indicates the start of transaction Ti
, which is when its
timestamp is assigned, and ci
indicates the commit of
transaction Ti
.
Assume that the transactions are assigned the following timestamps, based on the order in which they start:
and that RTS(A) and WTS(A) are both initially 0.
Given these assumptions:
In the first table for this problem in ps3_partI
, we’ve filled
in the row for the first operation. Complete the table to
show how the system would respond to the remaining read and write
requests when it is using regular timestamp-based concurrency
control without commit bits.
In the first column of the table, put the requested operation.
In the second column of the table, indicate the response of the system by selecting the correct option from the following list:
In the third column of the table, you should do one of the following:
If the action is allowed, summarize any changes to the state maintained for item A, as we’ve done in the first row.
If the action is ignored or denied, include a brief explanation. For example, if a transaction T7 tried to read item B and its read were too late, you would include something like this:
TS(T7) < WTS(B)
You do not need to restart a transaction that is rolled back, which means that you can skip any requests by a transaction that come after the point at which it is rolled back.
Because commits don’t have an effect when we’re not using commit
bits, you do not need to include the commit actions (c1
,
c2
, etc.) in this table.
Complete the second table that we’ve provided to show the system’s response to this sequence of operations if the DBMS is using regular timestamp-based concurrency control with commit bits.
In addition to the reads and writes, you should include the commits at the end of the sequence since they may cause a change in state.
In the second column of the table, there are now four options to choose from:
If a transaction is made to wait, your table should include an additional row for the operation in question when the wait comes to an end – something like:
request | response of the system |
---|---|
... | |
r7(X) | denied; make wait |
... | |
r7(X) | allowed |
Explain what will happen in response to this sequence of operations if the DBMS uses multiversion timestamp-based concurrency control without commit bits.
In the third column, make sure to specify which version’s state is being updated (e.g., “create A(t) with RTS = 0” or “RTS(A(t)) = ...”, where t is the timestamp of the version).
Similarly, if a transaction is allowed to read A, make sure to indicate in the second column which version it is allowed to read (e.g., if there was a version with a timestamp of 100 that it was allowed to read, you would say “allowed to read A(100)”)
Because commits don’t have an effect when we’re not using commit
bits, you do not need to include the commit actions (c1
,
c2
, etc.) in this table.
Login to Gradescope by clicking the link in the left-hand navigation bar. Once you are in logged in, click on the box for CSCI E-66.
Submit your ps3_partI.pdf
file using these steps:
If you still need to create the PDF file, open your file on Google Drive, choose File->Download->PDF document, and save the PDF file on your machine.
Click on PS 3: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps3_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
esaracin@fas.harvard.edu
50-60 points total
25 points total
In this problem, you will write a series of methods that can be used
to create an XML version of the Person
table from Problem Set
1. Your methods will be part of a larger program that uses
the JDBC framework to connect to the SQLite database that
you used in PS 1 and to execute the SQL queries needed to extract the
necessary data.
You should begin by downloading the following zip file: problem6.zip
Unzip/extract the contents of the file.
Depending on your system, after extracting the contents you will either have:
a folder named problem6
that contains all of the files that
you need for this problem
an outer folder called problem6
that contains an inner folder named
problem6
that contains all of the Java files that you need for this
problem.
Take the problem6
folder that actually contains the necessary
files and drag it into your ps3
folder so that you can easily find
and open it from within VS Code.
Read through our overview of the JDBC framework.
Launch VS Code on your laptop.
In VS Code, select the File->Open Folder or File->Open menu
option, and use the resulting dialog box to find and open the
problem6
folder that you created above – the one that contains the
provided files. (Note: You must open the folder; it is not
sufficient to simply open one of the Java files in the folder.)
The name of the folder should appear in the Explorer pane on the left-hand side of the VS Code window, along with a list of all of its contents.
Click on the name XMLforPeople.java
in the Explorer pane, which
will open the file that you need to modify.
Review all of the code that we’ve provided before you start writing any new code. See below for some additional information on what we’ve given you.
The class that you will be completing is called XMLforPeople
.
In this class we’ve given you:
the constructor for the class, which takes the name of a SQLite
file that should contain a relational database with the schema
outlined in Problem Set 1; the constructor
establishes a connection to the SQLite database, and it stores the
resulting Connection
object in a field called db
.
a helper method called simpleElem()
, which takes as inputs the
name and value of a simple element and returns a string of the
form "<name>value</name>"
. This method should only be used
to form simple elements – ones that do not have any child
elements.
a helper method called resultsFor()
, which takes as input a string
presenting a SQL query for the movie database and returns a
ResultSet
object that can be used to process the results of that
query. You are welcome to use this method in the code that you
write, although doing so is not required. It will also be useful
in testing one of the methods that you write.
a method called idFor()
, which takes as input the name of a
person and performs a query to find and return the person’s id;
this method will be useful when testing the methods that you
write.
a method called createFile()
, which performs a query to obtain the id
numbers of all of the people in the Person
table, and which
processes them one at a time
the main
method, which runs the full program.
Note that many of these methods – and all of the methods that you
will implement – include a throws
clause in their method
header. This clause is needed because the code included in these
methods may throw the exception(s) mentioned in the throws
clause,
and instead of catching them, we are simply declaring that they may be
thrown.
You will implement a number of non-static methods of the
XMLforPeople
class.
Important
We’ve provided the headers of the methods that you will implement. You must not change these headers in any way.
In the code that you write, you must limit yourself to the packages that we’ve imported at the top of the starter file. You must not use classes from any other Java package. In addition, you must not use any Java features that were not present in Java 8.
We have given you a separate Java class TestDriver
that you
can use when testing each method. Simply add the appropriate
test code to the main
method of this class, and run TestDriver
to see if you obtain the correct output.
Note: If you are having trouble running TestDriver
using the
Run link or Run button, you should be able to compile and run it
from the command line of the Terminal as follows:
to compile:
javac -cp 'lib/*' *.java
to run on Windows:
java -cp 'lib/*;.' TestDriver
to run on macOS:
java -cp 'lib/*:.' TestDriver
The two commands for running the program are almost identical,
but in the Windows version there is a semi-colon (;
) before
the period in the string after -cp
, whereas the macOS version
uses a colon (:
).
Here are your tasks:
Implement the method called fieldsFor()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing a sequence of XML
elements – one for each non-null field (i.e., column value) in
that person’s tuple. You may assume that the person’s name will
never be null. If the person’s dob or pob has a null value, that
field should not be included in the returned string. If there
is no person with the specified id, the method should simply
return the empty string.
For example, if you run the following test code (adding it to the
main
method in TestDriver.java
):
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.fieldsFor(xml.idFor("Julianne Moore"))); System.out.println(xml.fieldsFor("1234567")); // no person with that id System.out.println(xml.fieldsFor(xml.idFor("Elisabeth Moss")));
you should see:
<name>Julianne Moore</name> <dob>1960-12-03</dob> <pob>Fayetteville, North Carolina, USA</pob> <name>Elisabeth Moss</name> <dob>1983-01-01</dob>
Notes:
In our database, Elisabeth Moss has a null value for her pob, and thus that element is not included in the string returned for her.
You should see an extra blank line when you print the results
of any call that produces one or more fields (including the
blank line after the results for Elisabeth Moss shown above),
because the string returned by the method ends with a newline,
and the println
method adds its own newline.
The second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the call
xml.fieldsFor("1234567")
returns an empty string. As a result,
there are two blank lines after the results for Julianne Moore:
one after her fields, and one from the printing of the empty string.
Important guidelines:
You must begin by performing the appropriate SQL query.
Use the idFor()
method as a model for what you should do.
When processing the results, make sure to follow the approach given in our JDBC overview.
The elements for the non-null fields should appear in the same
order as the corresponding columns in the Person
table:
You should use the provided simpleElem()
method to form the
XML element for a given field.
Each element should be on its own line, which you can accomplish
by including a newline character ("\n"
) after each element.
Each element should be preceded by exactly four spaces, and there should be no extra spaces at the end of a given line.
Implement the method called movieElemsFrom()
whose header we have
provided. It takes a ResultSet
object for a SQL query that produces
tuples of the form (movie year, movie name), and it creates and returns
a string containing the XML for a sequence of 0 or more complex elements
of type movie
that contain nested child elements of
type year
and name
. If the ResultSet
object has no results,
the method should simply return the empty string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); String query1 = "SELECT year, name FROM Movie WHERE year = 2020;"; ResultSet results1 = xml.resultsFor(query1); System.out.println(xml.movieElemsFrom(results1)); String query2 = "SELECT year, name FROM Movie WHERE earnings_rank = 1;"; ResultSet results2 = xml.resultsFor(query2); System.out.println(xml.movieElemsFrom(results2));
you should see:
<movie> <year>2020</year> <name>Nomadland</name> </movie> <movie> <year>2020</year> <name>The Father</name> </movie> <movie> <year>2020</year> <name>Minari</name> </movie> <movie> <year>2015</year> <name>Star Wars: The Force Awakens</name> </movie>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Important guidelines:
You shouldn’t make any assumptions about the number of movies
in the ResultSet
. Rather, your code should be able to handle an
arbitrary number of results. See the createFile()
method
for a model of how to do this. If there are no results in
the ResultSet
, the method should return an empty string.
You should use the provided simpleElem()
method to form the
XML for each simple element. However, you should not use
simpleElem
to form the outer movie
elements, which are not
simple because they have children.
The returned string should be formatted as shown above. The
outer start tag, outer end tag, and each child element
should be on its own line, with a newline character ("\n"
)
at the end of each line.
The outer start tag and outer end tag should each be preceded by exactly six spaces, but the child elements should each be preceded by exactly eight spaces. There should be no extra spaces at the end of a given line.
Implement the method called actedIn()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing the XML for a
single complex element named actedIn
that includes nested child
elements of type movie
for all of the movies in the database in
which the person has acted, ordered by the year of the movie and
(in the case of ties) the name of the movie. If there is no person
with the specified id or if the person has not acted in any of the
movies in the database, the method should simply return the empty
string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.actedIn(xml.idFor("Geena Davis"))); System.out.println(xml.actedIn("1234567")); System.out.println(xml.actedIn(xml.idFor("Denzel Washington")));
you should see:
<actedIn> <movie> <year>1988</year> <name>Accidental Tourist, The</name> </movie> <movie> <year>1999</year> <name>Stuart Little</name> </movie> </actedIn> <actedIn> <movie> <year>1989</year> <name>Glory</name> </movie> <movie> <year>1993</year> <name>Philadelphia</name> </movie> <movie> <year>2001</year> <name>Training Day</name> </movie> <movie> <year>2016</year> <name>Fences</name> </movie> </actedIn>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the call
xml.actedIn("1234567")
returns an empty string.
Important guidelines:
You should begin by performing the appropriate SQL query to
get the necessary ResultSet
. Make sure to use an ORDER BY
clause in your query to get the necessary ordering.
After performing the query, almost all of the remaining work
will be done by making the appropriate call to your
movieElemsFrom()
method. You must use that method to
produce the XML for the movie
elements (if any) for the results
in the ResultSet
produced by your query.
Given the string returned by movieElemsFrom()
, you should
take the necessary steps to return either:
a string containing the XML for the appropriate actedIn
element
OR
an empty string, if there is no person with the specified id or if the person did not act in any of the movies in the database.
The returned string should be formatted as shown above. The
outer start tag, outer end tag, and each child element
should be on its own line, with a newline character ("\n"
)
at the end of each line.
You should not use the simpleElem
helper method in this method.
The outer start tag and outer end tag should each be preceded by
exactly four spaces. The child elements should have the
leading spaces that were given to them by movieElemsFrom
.
There should be no extra spaces at the end of a given line.
Implement the method called directed()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing the XML for a
single complex element named directed
that includes nested child
elements of type movie
for all of the movies in the database
that the person has directed, ordered by the year of the movie and
(in the case of ties) the name of the movie. If there is no person
with the specified id or if the person has not directed any of the
movies in the database, the method should simply return the empty
string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.directed(xml.idFor("Geena Davis"))); System.out.println(xml.directed("1234567")); System.out.println(xml.directed(xml.idFor("Denzel Washington")));
you should see:
<directed> <movie> <year>2016</year> <name>Fences</name> </movie> </directed>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the first and second println
statements print only a
blank line because Geena Davis has not directed any of the movies
in the database and there is no person whose id is 1234567, and
thus the corresponding calls to directed
return an empty string.
The guidelines for the previous method also apply here, but for
directed
elements instead of actedIn
elements.
Implement the method called elementFor()
whose header we have
provided. It takes a string representing the id number of a person,
and it should return a string containing the XML for a single
complex element named person
that includes nested child elements
for:
the non-null fields in the person’s tuple (use fieldsFor()
to get these)
the movies (if any) in which the person has acted (use
actedIn()
to get these)
the movies (if any) that the person has directed (use
directed()
to get these).
In addition, each person
element must have an attribute
named id
for the person’s id number.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.elementFor(xml.idFor("Elisabeth Moss"))); System.out.println(xml.elementFor("1234567")); System.out.println(xml.elementFor(xml.idFor("Denzel Washington")));
you should see:
<person id="0005253"> <name>Elisabeth Moss</name> <dob>1983-01-01</dob> <actedIn> <movie> <year>1999</year> <name>Girl, Interrupted</name> </movie> </actedIn> </person> <person id="0000243"> <name>Denzel Washington</name> <dob>1954-12-28</dob> <pob>Mount Vernon, New York, USA</pob> <actedIn> <movie> <year>1989</year> <name>Glory</name> </movie> <movie> <year>1993</year> <name>Philadelphia</name> </movie> <movie> <year>2001</year> <name>Training Day</name> </movie> <movie> <year>2016</year> <name>Fences</name> </movie> </actedIn> <directed> <movie> <year>2016</year> <name>Fences</name> </movie> </directed> </person>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the call
xml.elementFor("1234567")
returns an empty string.
Important guidelines:
This method should not perform any queries of its own.
Rather, you must use your previous methods to obtain
the necessary child elements. Don’t forget that you should
only use the simpleElem
method for elements that do not have any
children.
Don’t forget to include the person’s id as an attribute within
the start tag for person
, as shown above. In order to
include the quotes around the id, you will need to use the
escape sequence "\""
for each double-quote character.
The outer start tag and outer end tag should each be on their own line preceded by exactly two spaces and followed by a newline character. The child elements should have the same spacing and formatting described in the earlier method specifications, so you won’t need to add any new spaces to them. Once again, there should be no extra spaces at the end of a given line.
Once you have completed and tested all of your methods, running the
XMLforPeople
program should create a file named people.xml
that
represents the entire Person
table in XML!
25-35 points total
This problems asks you to construct XPath and XQuery queries for an XML version of our entire movie database. The schema of this XML database is described here.
To allow you to check your work, we’ll make use of a freely available XML DBMS called BaseX.
You should begin by downloading the following files into your folder for this assignment:
Make sure to put the files in your ps3
folder. If your browser
doesn’t allow you to specify where the file should be saved, try
right-clicking on the link above and choosing Save as... or Save
link as..., which should produce a dialog box that allows you to
choose the correct folder for the file.
Once you have completed the steps outlined above, you can perform queries by taking the following steps:
Start up the BaseX GUI by double-clicking on the JAR file that
you downloaded above (BaseX851.jar
).
Select the Database->New menu option, click the Browse button,
and use the resulting dialog box to find the imdb.xml
file that you downloaded above.
Click Open to select the file, and click OK to create the database.
To execute a query, enter it in the Editor pane in BaseX, and click the green play button to execute it. (You can also use Ctrl+Enter or Ctrl+Return for this purpose.)
The results (if any) will be displayed in the Result pane.
If you have trouble getting BaseX to work on your machine, here are some possible options:
On newer Macbooks, try downloading a newer version of the BaseX
JAR file: BaseX103.jar
On MacOS, try running the program from the Terminal.
First, use cd
to navigate to the appropriate folder – the one that
contains the downloaded JAR file.
Then, entering the following from the command line:
java -jar BaseX103.jar
Or, if you have the original JAR file:
java -jar BaseX851.jar
If you are able to launch BaseX but you can’t open the database file, you may need to give Java permission to access the disk. One set of instructions for doing so on Macs can be found here.
If you’re using a Mac, you should disable smart quotes, because they may lead to errors in BaseX and in our testing. There are instructions for doing so here.
ps3_queries.py
is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the XQuery commands needed to solve the problems given below. Test each command in BaseX to make sure that it works.
Once you have finalized the XQuery command for a given problem, copy
the command into your ps3_queries.py
file, putting it between
the triple quotes provided for that query’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Each of the problems must be solved by means of a single query. Unless the problem specifies otherwise, you may use either a standalone XPath expression or an XQuery FLWOR expression. Use nested subqueries as needed.
Your queries should only use information provided in the problem itself. In addition, they should work for any XML database that follows the schema that we have specified.
Make sure to read and follow the guidelines given above.
It is worth noting that our movie database was last updated in September of last year, so it doesn’t include information about the most recent movies or Academy Award winners.
Write a standalone XPath expression (not a FLWOR
expression) to find the names of all people in the database who
were born in the state of Maine – i.e., who have the string
“Maine, USA” in their place of birth. The results of the query
should be a collection of name
elements with the names of these
people.
Hint: You should use the contains
function to perform
substring matching, as discussed in lecture.
In Problem Set 1, you wrote
a SQL query to find information about Meryl Streep’s Oscars.
Write a FLWOR expression to solve the same problem. The results
of the query should be new elements of type streep_oscar
that
include three child elements: the year of the Oscar, the type of
the Oscar, and the name of the movie that she won it for.
Hint: You will need to use curly braces and commas as part of your
return
clause. See the examples in the lecture notes for
models of how to transform the elements selected by a query
into new types of elements.
Note: You don’t need to worry about indenting and line breaks in the results of the queries for this or any other problem.
Find all movies with an R rating that have won a Best Picture
Oscar. The result of your query should be a collection of new
elements of type winner
whose values consist of the name of
the movie, followed by a space, followed by the year of the
Oscar, which you should surround with parentheses. For example:
<winner>Gladiator (2001)</winner>
Order the results based on the year of the Oscar.
Hints:
text()
function to obtain the
values of the relevant elements, without their begin and end
tags.return
clause. return
clause can also include string literals (e.g.,
" ("
).Do the Oscars reward top-grossing films? Write a query that determines the Oscars won by the top 200 grossing films.
For any of the top 200 grossing movies that have won one or more
of the Oscars in the database, your query should construct a new
element of type oscar_grosser
that includes the following:
the original child elements for the movie’s name and earnings rank
a new child element of type num_oscars
that has as its
value the number of Oscars in the database that were won by
that movie
1 or more child elements of type award
, each of
which has as its value the type of one of the Oscars won by
the movie.
For example:
<oscar_grosser> <name>Titanic</name> <earnings_rank>8</earnings_rank> <num_oscars>2</num_oscars> <award>BEST-PICTURE</award> <award>BEST-DIRECTOR</award> </oscar_grosser>
Movies that did not win any of the Oscars in the database should not appear in the results.
Hints:
You will need to use the built-in count()
and text()
functions. See the lecture notes for examples of queries that
use them.
You will need to use a nested FLWOR expression.
Some of the hints for the previous problem also apply here.
For each of the movie ratings in the database, construct a new
element of type rating-info
that includes the following:
a child element called rating
that contains the rating
name; see below for more details about this element
a new child element of type num-movies
that has as its
value the number of movies in the database with that rating
a new child element of type avg-runtime
that has as its
value the average runtime of movies in the database with that
rating
0 or more child elements of type top-twenty
, each of
which contains the name of a movie with that rating that is
among the top 20 grossing movies of all time, according
to its earnings_rank
.
For example:
<rating-info> <rating>PG</rating> <num-movies>145</num-movies> <avg-runtime>112.0551724137931</avg-runtime> <top-twenty>Incredibles 2</top-twenty> <top-twenty>The Lion King</top-twenty> <top-twenty>Beauty and the Beast</top-twenty> <top-twenty>Finding Dory</top-twenty> <top-twenty>Frozen II</top-twenty> <top-twenty>Star Wars: Episode I - The Phantom Menace</top-twenty> </rating-info>
Hints:
To ensure that you only consider a given rating once, you should
use the distinct-values
function. For example, to iterate
over distinct Oscar types, you could do the following:
for $t in distinct-values(//oscar/type)
distinct-values
gives you the text values of the
corresponding elements, not the elements themselves. As a
result, your return
clause will need to construct new rating
elements by adding back in the begin and end tags that were
removed by distinct-values
.
Some of the hints for the previous problems also apply here.
(required of grad-credit students; “partial” extra credit for others)
Some of the people in the database have served as both actors and
directors. For each such person, construct a new element of type
actor_director
that includes the following:
a name
child element that has as its value the name of the
person
a new child element of type num_acted
that has as its
value the number of movies in the database in which the person
has acted
a new child element of type num_directed
that has as its
value the number of movies in the database that the person
has directed
0 or more child elements of type acted_and_directed
, each of
which contains the name of a movie for which the person served
as both an actor and a director. (Most of the people in the
results will not have a child element of this type, because
although they have both acted and directed, they have not done
both jobs in the same movie for one of the movies in the
database.)
For example:
<actor_director> <name>Laurence Olivier</name> <num_acted>3</num_acted> <num_directed>1</num_directed> <acted_and_directed>Hamlet</acted_and_directed> </actor_director>
Hints:
As mentioned in our discussion of XPath predicates in lecture,
you can test for the presence of an element or attribute by
simply using its name (or, in the case of an attribute, its
name preceded by an @
symbol). A similar technique works in
the WHERE
clause of a FLWOR expression.
For example, to get all movies with an earnings rank, we could do either of the following:
for $m in //movie[earnings_rank] ...
or
for $m in //movie where $m/earnings_rank
You may find it helpful to use the tokenize()
function, which breaks a string into a collection of component
strings based on a specified separator character. For
example, tokenize('a,b,c', ',')
would produce ('a', 'b', 'c')
.
More information about this and other XQuery functions can be found here.
You can use the contains
function when checking if an ID is
in a collection of IDREFS.
Some of the hints for the previous problems also apply here.
(required of grad-credit students; “partial” extra credit for others)
Winning multiple Oscars is one sign of greatness, as is appearing in a large number of movies in our database. Doing both is truly impressive!
Find all people who have both:
won two or more Best Actor or Best Actress awards (note that we are not counting any other type of Oscar)
appeared in six or more of the movies in the database.
For each such person, construct a new element of type
acting_legend
that includes the following:
a name
child element that has as its value the name of the
person
a new child element of type num_movies
that has as its
value the number of movies in the database in which the person
has acted
new child elements of type won_for
, each of which contains
the name of a movie for which the person won Best Actor or
Best Actress, followed by a space, followed by the year in
which they won the award, surrounded by parentheses.
For example:
<acting_legend> <name>Tom Hanks</name> <num_movies>12</num_movies> <won_for>Forrest Gump (1995)</won_for> <won_for>Philadelphia (1994)</won_for> </acting_legend>
Login to Gradescope by clicking the link in the left-hand navigation bar, and click on the box for CSCI E-66.
Submit your XMLforPeople.java
and ps3_queries.py
files using
these steps:
Click on PS 3: Part II in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your files to the box labeled DRAG & DROP. You can either drag and drop the files from their folder into the box, or you can click on the box itself and browse for the files.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your files. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Note: You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of each file to view its contents. Check to make sure that the files contain the code that you want us to grade.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
esaracin@bu.edu
Last updated on October 21, 2024.