Problem Set 5 FAQ

Problem 1
Problem 3
Part II

Problem 1

In our answers for 1.1 and 1.3, are we supposed to use the specific values found in the relational tuples (e.g., ‘Ina Garten’), or something more general?

You should use the specific values.

However, as indicated in the second guideline, you need to take into account the entire sets of relationships that the database will need to capture.

For example, consider the following document from our movies collection for Part II (which we also discussed in lecture):
```
{
    _id:            "0499549",
    name:           "Avatar",
    year:           2009,
    rating:         "PG-13",
    runtime:        162,
    genre:          "AVYS",
    earnings_rank:  4,
    actors: [
        { id: "0000244", name: "Sigourney Weaver" },
        { id: "0002332", name: "Stephen Lang" },
        { id: "0735442", name: "Michelle Rodriguez" },
        { id: "0757855", name: "Zoe Saldana" },
        { id: "0941777", name: "Sam Worthington" }
    ],
    directors: [ { id: "0000116", name: "James Cameron" } ]
}
```
If we only needed to capture information about Avatar, we could have used a field called director whose value was a single embedded subdocument. However, because some movies may have more than one director, we needed to use a field called directors whose value is an array of one or more embedded subdocuments.
In the relational database, there are four tuples for the information that we are trying to capture. Does that mean that we also need to include four documents in our answers for 1.1 and 1.3?

No, you may not need four documents. To see why, consider our movie database. The relational version required 5 tables: Movie, Person, Oscar, Actor and Director. The MongoDB version only requires 3 collections: movies, people, and oscars. The difference has to do with how the two logical models capture many-to-many relationships.

In the relational model, we need to use separate tables like Actor and Director to capture many-to-many relationships, because the relational model doesn’t allow for multi-valued attributes. But MongoDB allows for multi-valued attributes, so we can capture those relationships inside the documents that store information about the entities. For example, in the above movie document, the actors field captures the relationships between movies and the people who acted in them, so we don’t need separate “actor” documents.
When creating the documents for 1.1, should we take an approach like the one used in the above movie document, in which a person’s name is grouped with their id?

No. Our movie database takes a hybrid approach that is mostly reference-based, but that also uses some embedding because of the inclusion of the name of a people or movie whenever we use a reference.

In 1.1, you should use a purely reference-based approach with no embedding. For example, here is what a purely reference-based approach would look like for the movie document above:
```
{
    _id:            "0499549",
    name:           "Avatar",
    year:           2009,
    rating:         "PG-13",
    runtime:        162,
    genre:          "AVYS",
    earnings_rank:  4,
    actors:         [ "0000244", "0002332", "0735442", "0757855", "0941777" ],
    directors:      [ "0000116" ]
}
```
When capturing relationships, should we include information about a relationship in the documents for both of the entities involved, or should we only include in the document for one of the two entities? And if we only include it with one of the entities, how do we decide which one?

It depends. For example, in our MongoDB movie database, we only included information about the relationships between a movie and its actors in the document for the movie. We decided not to include it in the people documents of the actors, because the number of movies in which a person has acted could grow significantly over time and cause the document to become large enough that it would need to be moved on disk.

It’s worth noting that the possible growth of the document over time is more of a concern when using an embedded or hybrid approach, since an array of embedded subdocuments can take up significantly more room than just an array of references.
I understand that the _id field is supposed to function as the key of the document. This seems easy to implement when the primary key of the corresponding tuple is a single value. What should we do when the primary key is a combination of values?

You can let MongoDB assign the _id value, as we did in the documents from the oscars collection in the movie database. When you show an example of a document for which MongoDB is assigning the _id value, you can use notation like the following:
```
_id:    ObjectID1,
```
and specify that ObjectID1 is an ObjectID value generated by MongoDB.
Will the number of documents needed for 1.3 be the same as the number of documents that we used in 1.1?

It depends. There are different possible approaches here depending on how much embedding you decide to do.

For example, in our movie database, we could have decided to only have two collections: one for person documents and one for movie documents. In this approach, we could have embedded information about acting and directing Oscars in the corresponding person documents, and information about Best-Picture Oscars in the corresponding movie documents.

In yet another approach, we could have just used a single collection for movie documents – and embedded people and Oscar information within those documents.

Problem 3

I’m unsure about how to approach problems 3.1, 3.2 and 3.3. Do you have any suggestions?

These problems are similar to ones from pages 272-273 in the coursepack. Consult your coursepack or the lecture video for a reminder of how we solved these problems.

In addition, you can find extra practice problems on pages 279-281, and the solutions are available on the Lectures page.

Part II

In the results of our queries, do the order of the documents or of the fields within a document matter?

No. The Autograder should give you full credit as long as you have all of the necessary documents, field names, and field values.
The values in my actual query results look the same as the ones in the expected results, but the Autograder is saying that the results are incorrect. Any suggestions?

Make sure that your field names are correct. For example, for Query 9, make sure that you use a field name of director (with no s at the end).
For Query 4, I’m unclear about how to get the year from a person’s dob. Any suggestions?

Rather than trying to extract the year, you can use a condition that involves pattern matching to find the appropriate dob values.
For Query 6, I’m missing the results for a rating of null. Do you have any idea why that would be?

One possibility is that you may unnecessarily be using an $unwind stage to “unwind” the rating values in the movie documents. Using $unwind is only necessary when the value of a field is an array of values, and you want to create subgroups based on the individual values in the array. In this case, the rating values are not arrays, so using an $unwind stage isn’t necessary.
For Query 7, I’m not sure how to compute someone’s age so as to find the youngest actor in the database. Any ideas?

You don’t need to compute the ages. Strings in MongoDB can be compared using the same operators as integers, and because the dob values in the documents are strings of the form yyyy-mm-dd, the larger a dob string is, the later the person was born and the younger the person is.

In lecture, we discussed a similar example in which we found the name and runtime of the movie with the longest runtime. It may be useful as a model.
For Query 8, I’m trying to apply the conditions needed to focus on DOB values from 2000-2009, but it doesn’t appear to be working. Any suggestions?

Don’t forget that when forming a selection document that uses an implicit logical AND, you can’t have two separate subconditions that both involve the same field. For example, if we wanted to find all movies with runtimes between 120 and 180 minutes, the following selection document would not work:
```
// does NOT work!
{ runtime: { $gte: 120 },
  runtime: { $lte: 180 } }
```
This doesn’t work because a JSON document can’t have two fields with the same name.

As discussed in lecture, one way to get around this is to use an explicit $and operator:
```
{ $and: [ { runtime: { $gte: 120 } },
          { runtime: { $lte: 180 } } ] }
```
Since the runtime fields now belong to two separate subdocuments, they don’t violate the rule that you can’t have two fields with the same name.

Another option is to group the two inequality operators together using an implicit logical AND as follows:
```
{ runtime: { $gte: 120, $lte: 180 } }
```

Last updated on December 2, 2024.