In our answers for 1.1 and 1.3, are we supposed to use the specific values found in the relational tuples (e.g., ‘Ina Garten’), or something more general?
You should use the specific values.
However, as indicated in the second guideline, you need to take into account the entire sets of relationships that the database will need to capture.
For example, consider the following document from our movies
collection for Part II (which we also discussed in lecture):
{ _id: "0499549", name: "Avatar", year: 2009, rating: "PG-13", runtime: 162, genre: "AVYS", earnings_rank: 4, actors: [ { id: "0000244", name: "Sigourney Weaver" }, { id: "0002332", name: "Stephen Lang" }, { id: "0735442", name: "Michelle Rodriguez" }, { id: "0757855", name: "Zoe Saldana" }, { id: "0941777", name: "Sam Worthington" } ], directors: [ { id: "0000116", name: "James Cameron" } ] }
If we only needed to capture information about Avatar, we could
have used a field called director
whose value was a single
embedded subdocument. However, because some movies may have more
than one director, we needed to use a field called directors
whose value is an array of one or more embedded subdocuments.
In the relational database, there are four tuples for the information that we are trying to capture. Does that mean that we also need to include four documents in our answers for 1.1 and 1.3?
No, you may not need four documents. To see why, consider our movie database. The relational version required 5 tables: Movie, Person, Oscar, Actor and Director. The MongoDB version only requires 3 collections: movies, people, and oscars. The difference has to do with how the two logical models capture many-to-many relationships.
In the relational model, we need to use separate tables like Actor
and Director to capture many-to-many relationships, because the
relational model doesn’t allow for multi-valued attributes. But
MongoDB allows for multi-valued attributes, so we can capture
those relationships inside the documents that store information
about the entities. For example, in the above movie document, the actors
field captures the relationships between movies and the people who
acted in them, so we don’t need separate “actor” documents.
When creating the documents for 1.1, should we take an approach like the one used in the above movie document, in which a person’s name is grouped with their id?
No. Our movie database takes a hybrid approach that is mostly reference-based, but that also uses some embedding because of the inclusion of the name of a people or movie whenever we use a reference.
In 1.1, you should use a purely reference-based approach with no embedding. For example, here is what a purely reference-based approach would look like for the movie document above:
{ _id: "0499549", name: "Avatar", year: 2009, rating: "PG-13", runtime: 162, genre: "AVYS", earnings_rank: 4, actors: [ "0000244", "0002332", "0735442", "0757855", "0941777" ], directors: [ "0000116" ] }
When capturing relationships, should we include information about a relationship in the documents for both of the entities involved, or should we only include in the document for one of the two entities? And if we only include it with one of the entities, how do we decide which one?
It depends. For example, in our MongoDB movie database, we only included information about the relationships between a movie and its actors in the document for the movie. We decided not to include it in the people documents of the actors, because the number of movies in which a person has acted could grow significantly over time and cause the document to become large enough that it would need to be moved on disk.
It’s worth noting that the possible growth of the document over time is more of a concern when using an embedded or hybrid approach, since an array of embedded subdocuments can take up significantly more room than just an array of references.
I understand that the _id
field is supposed to function as the
key of the document. This seems easy to implement when the primary
key of the corresponding tuple is a single value. What should we do
when the primary key is a combination of values?
You can let MongoDB assign the _id
value, as we did in the
documents from the oscars
collection in the movie database.
When you show an example of a document for which MongoDB is
assigning the _id
value, you can use notation like the
following:
_id: ObjectID1,
and specify that ObjectID1
is an ObjectID value generated by MongoDB.
Will the number of documents needed for 1.3 be the same as the number of documents that we used in 1.1?
It depends. There are different possible approaches here depending on how much embedding you decide to do.
For example, in our movie database, we could have decided to only
have two collections: one for person
documents and one for
movie
documents. In this approach, we could have embedded
information about acting and directing Oscars in the corresponding
person
documents, and information about Best-Picture Oscars in
the corresponding movie documents.
In yet another approach, we could have just used a single collection for movie documents – and embedded people and Oscar information within those documents.
I’m unsure about how to approach problems 3.1, 3.2 and 3.3. Do you have any suggestions?
These problems are similar to ones from pages 272-273 in the coursepack. Consult your coursepack or the lecture video for a reminder of how we solved these problems.
In addition, you can find extra practice problems on pages 279-281, and the solutions are available on the Lectures page.
In the results of our queries, do the order of the documents or of the fields within a document matter?
No. The Autograder should give you full credit as long as you have all of the necessary documents, field names, and field values.
The values in my actual query results look the same as the ones in the expected results, but the Autograder is saying that the results are incorrect. Any suggestions?
Make sure that your field names are correct. For example, for Query 9,
make sure that you use a field name of director
(with no s
at the end).
For Query 4, I’m unclear about how to get the year from a person’s dob. Any suggestions?
Rather than trying to extract the year, you can use a condition that involves pattern matching to find the appropriate dob values.
For Query 6, I’m missing the results for a rating of null
. Do you
have any idea why that would be?
One possibility is that you may unnecessarily be using an
$unwind
stage to “unwind” the rating values in the movie
documents. Using $unwind
is only necessary when the value of a
field is an array of values, and you want to create subgroups
based on the individual values in the array. In this case, the
rating values are not arrays, so using an $unwind
stage isn’t
necessary.
For Query 7, I’m not sure how to compute someone’s age so as to find the youngest actor in the database. Any ideas?
You don’t need to compute the ages. Strings in MongoDB can be
compared using the same operators as integers, and because the
dob
values in the documents are strings of the form
yyyy-mm-dd
, the larger a dob
string is, the later the person
was born and the younger the person is.
In lecture, we discussed a similar example in which we found the name and runtime of the movie with the longest runtime. It may be useful as a model.
For Query 8, I’m trying to apply the conditions needed to focus on DOB values from 2000-2009, but it doesn’t appear to be working. Any suggestions?
Don’t forget that when forming a selection document that uses an implicit logical AND, you can’t have two separate subconditions that both involve the same field. For example, if we wanted to find all movies with runtimes between 120 and 180 minutes, the following selection document would not work:
// does NOT work! { runtime: { $gte: 120 }, runtime: { $lte: 180 } }
This doesn’t work because a JSON document can’t have two fields with the same name.
As discussed in lecture, one way to get around this is to use an
explicit $and
operator:
{ $and: [ { runtime: { $gte: 120 } }, { runtime: { $lte: 180 } } ] }
Since the runtime
fields now belong to two separate
subdocuments, they don’t violate the rule that you can’t have two
fields with the same name.
Another option is to group the two inequality operators together using an implicit logical AND as follows:
{ runtime: { $gte: 120, $lte: 180 } }
Last updated on December 2, 2024.