E-66
Database Systems
  • Home
  • Lectures
  • Problem Sets
  • Sections
  • Syllabus
  • Schedule
  • Staff
  • Policies
  • Canvas
  • Gradescope
  • Ed Discussion

Section 9

Data modeling in MongoDB

Recall from last week’s lecture our discussion of capturing relationships between entities in MongoDB, and consider the following example documents that model the banking domain.

{
    _id: <AccountId1>,
    account-num: 5656,
    balance: 450,
    branch: <?>
}

{
    _id: <AccountId2>,
    account-num: 1010,
    balance: 3000,
    branch: <?>
}

{
    _id: <BranchId1>,
    name: "11 Smith Square",
    phone: "1-555-102-3231"
}

{
    _id: <CustomerId1>,
    name: "Susan Johnson",
    phone: "1-555-323-1000",
    accounts: <?>
}

In lecture, we considered two ways of capturing relationships: using references and embedding documents inside other documents.

  1. Describe how the branch field of the first account document would look if

    1. we decided to store a reference to the Smith Square branch, and

      Using references, the branch field would look like the following:

      {
          _id: <AccountId1>,
          account-num: 5656,
          balance: 450,
          branch: <BranchId1>
      }
      

    2. we decided to embed the Smith Square branch document in the account.

      If we decided to embed the branch document, it would appear inline, as follows:

      {
          _id: <AccountId1>,
          account-num: 5656,
          balance: 450,
          branch: { id: <BranchId1>,
                    name: "11 Smith Square",
                    phone: "1-555-102-3231"
          }
      }
      

  2. If we knew that one customer may have more than one account with the bank, describe how Susan Johnson’s accounts field would look if she owned accounts 5656 and 1010 when

    1. we use references to store the relationship between the account owner and the accounts, and

      Using references, we would store the accounts belonging to an owner as an array, like the following:

      {
          _id: <CustomerId1>,
          name: "Susan Johnson",
          phone: "1-555-323-1000",
          accounts: [ <AccountId1>, <AccountId2> ]
      }
      

    2. we embed the account documents in the owner’s document.

      Using embedding, we would store the accounts inline, like the following:

      {
          _id: <CustomerId1>,
          name: "Susan Johnson",
          phone: "1-555-323-1000",
          accounts: [
              { id: <AccountId1>,
                account-num: 5656,
                balance: 450,
                branch: <BranchId1> },
              { id: <AccountId2>,
                account-num: 1010,
                balance: 3000,
                branch: <BranchId1> },
          ]
      }
      

  3. If we were designing a system for a bank whose customers had many accounts, which style of capturing the relationship between an account owner and an account might we use? Why?

    Either approach is correct.

    If we often needed access to the accounts themselves without caring about their owners, we would store the accounts separately (e.g., in another collection) and keep references in customer documents. Furthermore, if we knew that customers were likely to acquire many accounts over time, we might choose references to keep the customer documents from growing in size too quickly. But in this scheme, if we wanted to get a list of accounts while having only information about the customer, we would need to perform two queries — one to acquire the IDs of the accounts, and another to retrieve the accounts by ID.

    If we knew most customers would only have a small number of accounts, and we only ever accessed an account by searching for its customer, embedding would be a better option — we would be able to write only one query to get one or all of the accounts owned by a given customer, and we would not need to worry about a customer document growing too large and triggering relocation.

  4. If we were designing a system to store data representing a large social network, which style of capturing a relationship between two friends might we use? Why?

    To avoid undesired redundancy, we would use references. We might keep all person documents in the same collection and rely on their _id fields to represent a relationship.

    If we used embedding, a given person may have many copies of their document in the documents of their friends. This would quickly waste space in a very connected friendship graph. It would also make updating a given person’s information costly, because we would need to update all copies of the document.

  5. Under what conditions is atomicity provided for MongoDB collections? How would these conditions affect the correctness of our database (i.e., which document types would we keep together in a collection)?

    Atomicity is only guaranteed for operations on a single document. This means that, for an update operation that involves multiple documents, some documents may be not be altered (e.g., if a database failure occurs). The implication is that we should use embedding if we ever desire atomicity for an update.

    For our bank example above, this would mean that we should embed account documents in a given customer document, since balance transfers or calculating minimum balance fees would need to be atomic.

Last updated on November 7, 2024.