Due before the start of lecture on September 24, 2024. See below for a summary of the policies regarding late submissions.
Homework is due prior to the start of lecture. If it is submitted more
than 10 minutes after the start of lecture, it will be considered
late. There will be a 10% deduction for late submissions that
are made by 11:59 p.m. on the Sunday after the deadline, and a 20%
deduction for submissions that are made after that Sunday and before
the start of the next lecture. New 9/27: There will be a 10%
deduction for submissions made any time in the following seven days,
up to the start of the next lecture. We will not accept any homework that
is more than 7 days late. Plan your time carefully, and don’t wait
until the last minute to begin an assignment. Starting early will give
you ample time to ask questions and obtain assistance.
In your work on this assignment, make sure to abide by our policies on academic conduct.
If you haven’t already done so, you should complete Problem Set 0 before beginning this assignment.
If you have questions while working on this assignment, please attend office hours or post them on Ed Discussion.
40 points total
We encourage you to create a folder named e66
for your
work in this course. You can follow these
instructions to do so.
Then create a subfolder called ps1
within your e66
folder,
and put all of the files for this assignment in that folder.
The problems from Part I will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps1_partI
.
Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose
File->Download->PDF document, and save the PDF file in your
ps1
folder. The resulting PDF file (ps1_partI.pdf
) is the one that
you will submit. See the submission guidelines at the end of Part I.
8 points total; 2 pts. each part
Before going any further, make sure to follow the instructions above to create your own copy of the Google Drive document that you should use for Part I of the assignment.
The ER diagram shown below is part of the design of a database that includes information about professors and the courses they teach.
In the above version of the ER diagram, there are no constraints on the relationships between professors and courses.
In the ps1_partI
template that we’ve provided on Google Drive (see
above), we’ve included the beginnings of three separate versions of
this ER diagram.
Edit the first version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every professor teaches at least one (i.e., 1 or more) course. To do so, you should:
Click on the diagam and then click the Edit link that appears below the diagram.
From the collection of eight connectors that we have provided below the diagram, select the appropriate connectors and use them to connect the two entity sets to the relationship set. Lengthen the connectors as needed to make the connections.
Click the Save & Close button.
Edit the second version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every professor must teach exactly one course.
Edit the third version of the diagram, adding the connections needed to create an ER diagram that specifies that every course must be taught by exactly one professor, and every professor must teach at most one (i.e., 0 or 1) course.
Consider your answer to part 1—the ER diagram for the situation in which every professor teaches at least one course. If we converted that ER model to a relational schema, would the following be an acceptable schema for a relation used to capture the Teaches relationship set?
Teaches(professor, course, semester)
where professor is a foreign key referring to Professor(id), course is a foreign key referring to Course(name), and the primary-key attributes of Teaches are underlined. Explain your answer briefly.
12 points total
The ER diagram shown below in Figure 2-1 represents a music database – one that stores information about recording artists, recording labels (e.g., EMI or Sony), songs, and albums.
Figure 2-1:
Artists can sing songs and belong to recording labels, labels can produce albums, and songs can appear on albums.
(3 points) One of the relationship sets captures a many-to-one relationship. Which one? In your answer, you should specify the direction of the relationship (e.g., ________ is a many-to-one relationship from _________ to ___________).
(3 points) Describe all constraints on relationships that are specified by the diagram. Use words that describe the problem domain (e.g., Each course meets in at most one room...) rather than technical terminology.
(6 points) Transform this diagram into a relational schema by following the procedure discussed in lecture. Here are some additional guidelines:
Give the schema of each relation in the form relation_name(attr_name1, attr_name2, ...).
When appropriate, you should combine relations as discussed in lecture. If you do combine two relations, you should briefly explain why it makes sense to do so.
Indicate the primary-key attribute(s) of each relation by
putting an underscore character (_
) on either side of the
attribute name(s). For example, if you had a key called id
,
you would write it as _id_
.
Specify each relation’s foreign-key attribute(s) (if any) and state the associated referential-integrity constraints. For example, if you were working with the MajorsIn relation at the end of the first set of lecture notes, one of its referential-integrity constraints could be specified as follows: Each value of the student_id attribute in MajorsIn must match a value of the id attribute from the Student relation.
10 points total; 2 pts. each part
Relation R has attributes a, b, and c. Relation S has attributes a, b and d. You are given the following instances of these relations:
Relation R
a |
b |
c |
---|---|---|
1 |
2 |
3 |
3 |
4 |
3 |
7 |
6 |
5 |
Relation S
a |
b |
d |
---|---|---|
2 |
3 |
7 |
3 |
4 |
8 |
7 |
6 |
9 |
For each of the following questions, use the Insert->Table menu option in Google Drive to insert an appropriately sized table for the answer, and then fill in the cells of the table with the correct values.
10 points total
Write relational-algebra queries to answer the following problems from the movie domain described below. In Part II, you will write SQL queries for these problems. In this problem, you should solve them using relational-algebra queries instead.
See below for some suggestions on formatting your queries.
(3 points) problem 7 (Oscars won by Meryl Streep)
(3 points) problem 12 (actors who haven’t appeared in any movies
from 2010 or after), but instead of producing a count,
your query should produce the ids of the actors.
Hint: Take advantage of the set difference operator (-
).
(4 points) problem 13 (Australians and Oscars), but
instead of producing counts, your query should produce tuples
of the form (name, award type, year). If a person has won
multiple awards, he or she should have multiple tuples in the
results, one for each award. If a person has won no awards, he or
she should have a single tuple with NULL
values for the award type
and year.
To simplify your queries, you may:
Replace the relational-algebra symbols with appropriate words
(e.g., JOIN
for ⋈, SELECT
for σ, etc.) and
surround with curly braces any subscripts that
accompany the operators (e.g., JOIN{id = student}
). For
example, here is how you could write a query to get the name
and year of all movies in the database with an R rating:
PROJECT{name, year}(SELECT{rating = 'R'}(Movie))
Perform a sequence of two or more queries in which the results
of a given query are assigned to a variable using the
assignment operator (<--
), and then that variable is used in
a subsequent query. For example, here is an alternate way of
writing the above query:
Rmovies <-- SELECT{rating = 'R'}(Movie) PROJECT{name, year}(Rmovies)
Note that using a sequence of queries is not allowed in your SQL answers, but it is allowed here.
Once you have completed Part I in Google Drive, choose
File->Download->PDF document, and save the resulting file
(ps1_partI.pdf
) on your machine.
Login to Gradescope by clicking the link in the left-hand navigation bar.
Once you are in logged in, click on the box for CSCI E-66.
If you don’t see that box, email Cody (cdoucette@fas.harvard.edu
) ASAP
and ask to be added to the course on Gradescope.
Click on the name PS 1: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps1_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
esaracin@bu.edu
60-70 points total
The problem domain for this assignment is movie trivia.
Before you get started, you should carefully read over the full description of the database. Please do so now!
Go to sqlitebrowser.org/dl.
Find the appropriate download button for your machine/operating system, and download and install that version of DB Browser for SQLite.
If you have a Mac, you should download the Mac .dmg file. Double-click on the file, which will open up a window. Then drag the application’s icon to your Applications folder.
If you have a Windows machine, you should download the Standard installer for 64-bit Windows (which is an .msi file). Double-click on the file to run the installer. You should be able to use all of the default options.
Download the following files:
Launch the program.
Click the Open Database button, and find and open the movie.sqlite
database file that you downloaded above.
To explore the schema of the database, click on the Database Structure tab, and then click on the arrows to the left of the table names.
To explore the contents of the tables, click on the Browse Data tab, and then choose the appropriate table from the drop-down menu.
Use the Execute SQL tab to perform queries on the database. Enter your SQL command in the space provided, and press F5 or Ctrl-R to run it. (There is also a small button that you can click; it has a triangular shape that looks like the Play button of a music player.)
ps1_queries.py
is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the SQL commands needed to solve the problems given below. Test each SQL command in DB Browser for SQLite to make sure that it works.
Once you have finalized the SQL command for a given problem, copy
the command into your ps1_queries.py
file, putting it between
the triple quotes provided for that problem’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Make sure that every query ends with a semi-colon.
Unless otherwise stated, each of the problems must be solved by means of a single query (i.e., each query should have a single semi-colon). Use nested subqueries as needed.
Your queries should not use a LIMIT
clause.
Make sure that the results produced by your queries contain exact answers to the problems. We should not have to infer the answer from the results. For example, if we ask you how many films meet a given criterion, we want a number, not a list of the films. In addition, your results should not include any extraneous information.
5 points
Make sure to read the important guidelines above before you get started!
Will Smith won an Oscar in 2022 for his work in King Richard – although he ended up making more news for his behavior at the awards ceremony than he did for winning the award. Write a query to determine the names and dates of birth of all people in the database whose first name is Will. Use pattern-matching as needed, and make sure that you only request people whose first name is exactly ‘Will’. Hint: Think about how to construct a pattern that obtains people whose first name is ‘Will’ without getting people whose first name begins with ‘Will’ (e.g., William) or people who have ‘Will’ as all or part of their last name.
5 points
There are several movies in the database that were set in the Boston area. Two of them are Good Will Hunting and Mystic River. Write a single query that finds the years of these movies. The result of the query should be two tuples of the form (name of movie, year).
Hint: If your initial query does not produce any results, you may want to reconsider the logical operator (AND, OR, NOT) that you are using in your WHERE clause.
5 points
Meryl Streep has been nominated for a record number of Oscars – 21 in all! – and she has won an award three of those times. Write a query to find the awards that she has won. The tuples in the result should contain the year in which she won the award, the type of Oscar, and the name of the corresponding movie.
5 points
In 2023, the Irish actor Paul Mescal was nominated for an Oscar (although he is not yet in our database). Find the number of actors in our database who were born in Ireland. You may assume that all such people have a place of birth that ends with “Ireland”. The result of your query should be a single number.
5 points
Write a query to find the oldest movie(s) in the database that has an R rating. The result of the query should be one or more tuples of the form (movie name, year).
Important
For full credit, your answer must use a subquery. Although SQLite provides a way to solve this problem without a subquery, that technique is not standard SQL, and thus it should not be used as part of your answer.
5 points
Are some movie ratings much more profitable than others? For each movie rating that occurs in the 200 top-grossing movies, find the number of top-grossing movies with that rating, and the earnings rank of the most profitable movie with that rating. The result of the query should be tuples in which a rating is followed by the two numeric results for that rating. Sort the results in descending order by the number of top-grossing movies.
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
5 points
Find all places of birth shared by at least 50 of the people in the database. Your query should exclude the value of NULL (which indicates an unknown place of birth) from the results. The result of the query should be tuples of the form (pob, number of people).
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
5 points
Write a query to determine the number of actors in the database who have not appeared in any of the movies in the database from 2010 or later. The result of the query should be a single number.
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
5 points
Write a query that summarizes the number of Oscars won by people in the database who were born in Australia. The result of the query should be tuples of the form (person name, number of Oscars). If a person has won no awards, he/she should still appear in the table with a value of 0 for the number of Oscars. You may assume that all people from Australia in the database have a unique name, and that they have a place of birth that ends with the string “Australia”.
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
5 points
Write a query that determines, for each type of Oscar included in the database, the average runtime of movies associated with that Oscar. The result of the query should be six tuples of the form (oscar type, average runtime).
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
5 points
Write a single query to find both the shortest and longest movies in the database. The result of the query should be tuples of the form (name, runtime).
5 points
Until recently, Ratatouille was one of the 200
top-grossing movies of all time (which it is why it is in our
database), but it is now in position 212. Because it’s no longer
among the top 200, it has a NULL
value for its earnings rank in our
database. Write a SQL command that
changes its earnings rank in the database to 212. Hint: This is the
only question that does not call for a SELECT command. You will need
to figure out the correct type of SQL command to use.
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
required of grad-credit students; may be completed by other students for “partial” extra credit
Sometimes, Best Picture nominees are also nominated for Best Director. Write a query that finds all years in which the same movie won both Best Picture and Best Director. The result of the query should be tuples of the form (year awarded, movie name).
required of grad-credit students; may be completed by other students for “partial” extra credit
The Covid-19 pandemic has had a significant impact on the movie industry. Theatres were closed for many months, and some moviegoers are still reluctant to see a movie in a theatre.
To quantify the pandemic’s impact, construct a query that determines how many top-grossing movies were released in each of the seven years in the range 2016-2022. Recall that all of the top-grossing movies in our database have an earnings rank that is between 1 and 200. The result of your query should be 7 tuples of the form (year, number of top-grossing movies).
If there are any years in that range that don’t have any top-grossing movies, they should still appear in the results with a count of 0.
Hint: One thing that may prove helpful is the use of a subquery in the FROM clause.
Note: we’ll talk about the technique needed for this query in lecture on 9/17.
Login to Gradescope by clicking the link in the left-hand navigation bar.
Once you are in logged in, click on the box for CSCI E-66.
Submit your ps1_queries.py
file using these steps:
Click on the name of the assignment (PS 1: Part II
) in the list of
assignments. You should see a pop-up window with a box labeled
DRAG & DROP. (If you don’t see it, click the Submit
or Resubmit button at the bottom of the page.)
Add your file to the box labeled DRAG & DROP. You can either drag and drop the file from its folder into the box, or you can click on the box itself and browse for the file.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your file. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Note: You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of the file to view its contents. Check to make sure that the file contains the work that you want us to grade.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
esaracin@bu.edu
.
Last updated on September 27, 2024.