Recall from last week’s lecture our discussion of capturing relationships between entities in MongoDB, and consider the following example documents that model the banking domain.
{ _id: <AccountId1>, account-num: 5656, balance: 450, branch: <?> } { _id: <AccountId2>, account-num: 1010, balance: 3000, branch: <?> } { _id: <BranchId1>, name: "11 Smith Square", phone: "1-555-102-3231" } { _id: <CustomerId1>, name: "Susan Johnson", phone: "1-555-323-1000", accounts: <?> }
In lecture, we considered two ways of capturing relationships: using references and embedding documents inside other documents.
Describe how the branch field of the first account document would look if
we decided to store a reference to the Smith Square branch, and
{ _id: <AccountId1>, account-num: 5656, balance: 450, branch: <BranchId1> }
we decided to embed the Smith Square branch document in the account.
{ _id: <AccountId1>, account-num: 5656, balance: 450, branch: { id: <BranchId1>, name: "11 Smith Square", phone: "1-555-102-3231" } }
If we knew that one customer may have more than one account with the bank, describe how Susan Johnson’s accounts field would look if she owned accounts 5656 and 1010 when
we use references to store the relationship between the account owner and the accounts, and
{ _id: <CustomerId1>, name: "Susan Johnson", phone: "1-555-323-1000", accounts: [ <AccountId1>, <AccountId2> ] }
we embed the account documents in the owner’s document.
{ _id: <CustomerId1>, name: "Susan Johnson", phone: "1-555-323-1000", accounts: [ { id: <AccountId1>, account-num: 5656, balance: 450, branch: <BranchId1> }, { id: <AccountId2>, account-num: 1010, balance: 3000, branch: <BranchId1> }, ] }
If we were designing a system for a bank whose customers had many accounts, which style of capturing the relationship between an account owner and an account might we use? Why?
If we often needed access to the accounts themselves without caring about their owners, we would store the accounts separately (e.g., in another collection) and keep references in customer documents. Furthermore, if we knew that customers were likely to acquire many accounts over time, we might choose references to keep the customer documents from growing in size too quickly. But in this scheme, if we wanted to get a list of accounts while having only information about the customer, we would need to perform two queries — one to acquire the IDs of the accounts, and another to retrieve the accounts by ID.
If we knew most customers would only have a small number of accounts, and we only ever accessed an account by searching for its customer, embedding would be a better option — we would be able to write only one query to get one or all of the accounts owned by a given customer, and we would not need to worry about a customer document growing too large and triggering relocation.
If we were designing a system to store data representing a large social network, which style of capturing a relationship between two friends might we use? Why?
_id
fields to represent a relationship.
If we used embedding, a given person may have many copies of their document in the documents of their friends. This would quickly waste space in a very connected friendship graph. It would also make updating a given person’s information costly, because we would need to update all copies of the document.
Under what conditions is atomicity provided for MongoDB collections? How would these conditions affect the correctness of our database (i.e., which document types would we keep together in a collection)?
For our bank example above, this would mean that we should embed account documents in a given customer document, since balance transfers or calculating minimum balance fees would need to be atomic.
Last updated on November 7, 2024.