E-66
Database Systems
  • Home
  • Lectures
  • Problem Sets
  • Sections
  • Syllabus
  • Schedule
  • Staff
  • Policies
  • Canvas
  • Gradescope
  • Ed Discussion

Section 10

MongoDB Queries

From the following specifications, form MongoDB queries that work on three collections:

  1. a collection containing movie documents (named movies)
  2. a collection containing people documents (named people)
  3. a collection containing Oscar documents (named oscars)

Note that whenever we refer to a person or movie, we also embed the associated entity’s name for convenience. Note also that we include two Boolean fields to person documents, hasActed and hasDirected, that are only included when their values are true. Sample documents are included below.

{
    _id:            "0499549",
    name:           "Avatar",
    year:           2009,
    rating:         "PG-13",
    runtime:        162,
    genre:          "AVYS",
    earnings_rank:  1,
    actors: [
        { id: "0000244", name: "Sigourney Weaver" },
        { id: "0002332", name: "Stephen Lang" },
        { id: "0735442", name: "Michelle Rodriguez" },
        { id: "0757855", name: "Zoe Saldana" },
        { id: "0941777", name: "Sam Worthington" }
    ],
    directors: [ { id: "0000116", name: "James Cameron" } ]
}

{
    _id:         "0000059",
    name:        "Laurence Olivier",
    dob:         "1907-5-22",
    pob:         "Dorking, Surrey, England, UK",
    hasActed:    true,
    hasDirected: true
}

{
    _id:    ObjectID("528bf38ce6d3df97b49a0569"),
    year:   2013,
    type:   "BEST-ACTOR",
    person: { id: "0000358", name: "Daniel Day-Lewis" },
    movie:  { id: "0443272", name: "Lincoln" }
}
  1. What are some of the methods available for use in MongoDB? Which of them can be used interchangeably?

    The key method is db.collection.find(), which takes a selection and projection (in relational algebra terms) JavaScript object to describe a query.

    The db.collection.distinct() method takes a field name and selection object and produces distinct results. db.collection.count() takes a selection object and returns the number of documents that match the selection.

    Finally, the db.collection.aggregate() method takes a pipeline of objects, each keyed by a special aggregate operator. The full list of operators are available in the lecture notes and on the MongoDB documentation, but here are a few we’ve covered in detail:

    • $match, which works like the selection objects you can pass to db.collection.find() and others, and limits the documents that pass to the next step in the pipeline
    • $group, which works like a SQL GROUP BY clause, and can pass counts/groupings to the next step in the pipeline
    • $project, which works like the projection objects you can pass to db.collection.find() and others, and can include/exclude fields from a result, or create/rename fields

  2. Find the top ten highest grossing movies in the database. For each film, output the movie’s name and the name of the director(s).

    db.movies.find({ earnings_rank: { $lte: 10 }}, { name: 1, "directors.name": 1, _id: 0 })
    

    The following is also acceptable:

    db.movies.aggregate(
        { $match:   { earnings_rank: { $lte: 10 } } },
        { $project: { name: 1, directors: "$directors.name", _id: 0 } }
    )
    

  3. How many actors in the database were born in California?

    db.people.count({ hasActed: true, pob: /, California/ })
    

    The following is also acceptable:

    db.people.aggregate(
        { $match:   { hasActed: true, pob: /, California/ } },
        { $group:   { _id: 0, numActors: { $sum: 1 } } },
        { $project: { numActors: 1, _id: 0 } }
    )
    

  4. Which people in the database have directed a movie in which Tom Hanks has acted?

    db.movies.distinct("directors.name", { "actors.name": "Tom Hanks" })
    

  5. Find all years before 1970 that have 6 or more movies in the database.

    db.movies.aggregate(
        { $match:   { year: { $lt: 1970 } } },
        { $group:   { _id: "$year", count: { $sum: 1 } } },
        { $match:   { count: { $gte: 6 } } },
        { $project: { year: "$_id", count: 1, _id: 0 } }
    )
    

  6. Find the person in the database who has acted in the most movies (the $unwind operator may be useful for this query).

    db.movies.aggregate( 
        { $unwind:  "$actors" },
        { $group:   { _id: "$actors.name", numPerformances: { $sum: 1 } } },
        { $sort:    { numPerformances : -1 } },
        { $limit: 1  },
        { $project: { _id: 0, actor: "$_id", numPerformances: 1 } }
    )
    

  7. Who directs the longest movies? Of all directors who have directed 5 or more movies, find the 10 directors whose movies have the longest average runtime.

    db.movies.aggregate( 
        { $unwind:  "$directors" },
        { $group:   { _id: "$directors.name", avgLength: { $avg: "$runtime" }, numMovies: { $sum: 1 } } },
        { $match:   { numMovies: { $gte: 5 } } },
        { $sort:    { avgLength : -1 } },
        { $limit: 10  },
        { $project: { _id: 0, director: "$_id", avgLength: 1, numMovies: 1 } }
    )
    

Last updated on September 26, 2024.