E-66
Database Systems
  • Home
  • Lectures
  • Problem Sets
  • Sections
  • Syllabus
  • Schedule
  • Staff
  • Policies
  • Canvas
  • Gradescope
  • Ed Discussion

Section 3

Review of SQL and the Relational Model

Bonus problem from Section 2:

  1. For each departure city in last section’s the Flights relation, find the flight numbers of the flights from that city to New York (if any). If there is no such flight, use a value of NULL to indicate that fact. UNION should not be used.

    Say our Flights table looks like this (ignoring the columns that are irrelevant to this problem):

    flno

    origin

    dest

    10

    Chicago

    New York

    20

    Chicago

    New York

    30

    Los Angeles

    New York

    40

    Los Angeles

    Seattle

    50

    Seattle

    Anchorage

    60

    New York

    Boston

    The result we eventually want for this problem is:

    origin

    flno

    Chicago

    10

    Chicago

    20

    Los Angeles

    30

    Seattle

    NULL

    New York

    NULL

    Note that the result set has every possible departure city from the table matched up with the flight numbers that get you from that city to New York. Note also that if you can’t get to New York by starting from a given departure city, it has one row in the result set and the flno value is NULL.

    This sounds like how an outer join works, so let’s use an outer join from Flights to itself:

    SELECT F1.origin, F2.flno
    FROM Flights AS F1 LEFT OUTER JOIN Flights AS F2
         ON F1.flno = F2.flno AND F2.dest = 'New York';
    

    When we join Flights with itself using the join condition F1.flno = F2.flno, we end up with just as many rows as Flights has, duplicating the columns of Flights so that it looks like this:

    F1.flno

    F1.origin

    F1.dest

    F2.flno

    F2.origin

    F2.dest

    10

    Chicago

    New York

    10

    Chicago

    New York

    20

    Chicago

    New York

    20

    Chicago

    New York

    30

    Los Angeles

    New York

    30

    Los Angeles

    New York

    40

    Los Angeles

    Seattle

    40

    Los Angeles

    Seattle

    50

    Seattle

    Anchorage

    50

    Seattle

    Anchorage

    60

    New York

    Boston

    60

    New York

    Boston

    Then, by specifying AND F2.dest = 'New York', we can get NULL to appear wherever a given flight doesn’t arrive in New York:

    F1.flno

    F1.origin

    F1.dest

    F2.flno

    F2.origin

    F2.dest

    10

    Chicago

    New York

    10

    Chicago

    New York

    20

    Chicago

    New York

    20

    Chicago

    New York

    30

    Los Angeles

    New York

    30

    Los Angeles

    New York

    40

    Los Angeles

    Seattle

    NULL

    NULL

    NULL

    50

    Seattle

    Anchorage

    NULL

    NULL

    NULL

    60

    New York

    Boston

    NULL

    NULL

    NULL

    Finally, we project the columns F1.origin and F2.flno to get this:

    origin

    flno

    Chicago

    10

    Chicago

    20

    Los Angeles

    30

    Los Angeles

    NULL

    Seattle

    NULL

    New York

    NULL

    But this isn’t what we wanted! Los Angeles should not appear twice: it should appear once, since flight number 30 gets us to New York. (We don’t care about flight 40, the flight that departs in Los Angeles for Seattle.)

    We can modify our query to exclude the rows where F2.flno is NULL when there’s a flight to New York from that origin already:

    SELECT DISTINCT F1.origin, F2.flno
    FROM Flights AS F1 LEFT OUTER JOIN Flights AS F2
         ON F1.flno = F2.flno AND F2.dest = 'New York'
    WHERE NOT (F2.flno IS NULL AND
               F1.origin IN (SELECT origin
                             FROM Flights
                             WHERE dest = 'New York'));
    

    With this query, we no longer have the row for flight 40.

Consider the following relational schemas, which relate to a database containing information for a hospital.

  • Patient(name, DOB, sex)
  • Appointment(patient_name, patient_DOB, staff_id, room_number, building, appt_date, appt_time, reason)
  • Room(building, number, type)
  • Staff(id, name, status, salary, specialty_type)
  • Specialty(type, phone_number, chair_id)

Given the above schema write the following SQL queries:

  1. Find the schedule and number of appointments for each doctor on April 1, 2013.
    Specifically, display the name of the doctor followed by the earliest appointment of the day for that doctor, the latest appointment time of the day, and the total number of appointments that day for each doctor who has at least one appointment on April 1, 2013.

    SELECT S.name, MIN(A.appt_time), MAX(A.appt_time), COUNT(*)
    FROM Staff S, Appointment A
    WHERE S.id = A.staff_id AND A.appt_date = '2013-04-01'
    GROUP BY S.id, S.name;
    

  2. Find the names of all staff members who do not have an appointment on the afternoon of March 6, 2013.

    SELECT name
    FROM Staff
    WHERE id NOT IN (SELECT staff_id
                     FROM Appointment
                     WHERE appt_date = '2013-03-06'
                       AND appt_time >= '12:00:00');
    

  3. Find schedules of all staff members for the afternoon of March 6, 2013. The result set should include the names of each staff member, and the times of all appointments they have on March 6, 2013. A given name may appear more than once if that staff member has more than one appointment. Make sure to include those staff members that happen to have NO appointments on that date.

    SELECT s.name, a.appt_time
    FROM Staff as s LEFT OUTER JOIN Appointment as a
    ON s.id = a.staff_id
    WHERE a.appt_date = '2013-03-06' and a.appt_time >= '12:00:00';
    

Record Representations: fixed-length and variable-length records

  1. What are the advantages of fixed-length records? What are the disadvantages?

    Fixed-length records can simplify the computation of field offsets; furthermore, field offsets need not be stored with every record. Disadvantages include inefficient use of space if fields in records are not always utilized (e.g., many records have NULL entries, or entries that require fewer bytes than the specified field size).

  2. Why are variable-length records advantageous?

    Space can be used more efficiently: space does not need to be allocated for fields with NULL entries, and variable-length record fields only require as much space as is necessary for the particular field value in that record.

  3. A record contains the following fields from our Movie relation from Problem Set 1:

    • id, a fixed-length string of 7 characters
    • name, a variable-length string of up to 30 characters
    • earnings_rank, an integer
    • runtime, an integer

    Assume that we are using the record format discussed in lecture in which each record begins with a header of field offsets. What will the record look like for the following tuple? Assume that you have one-byte characters and four-byte integers.

    ('5678463', 'Philadephia', NULL, 125)
    

    One possible representation is the following:

    0
    4
    8
    12
    16
    20
    27
    39
    20
    27
    -1
    39
    43
    '5678463'
    'Philadelphia'
    125

    Note that we use a special offset -1 for the NULL value, and that we also include the offset of the end of the record, so that we can compute the length of the last field. In this case, because the NULL value is for an integer, we could theoretically use an offset of 39 for it instead. Then, when we subtracted and discovered that the length of the integer is 0, we would know that the value is NULL. However, that approach does not work for string values, since an empty string (i.e., a string of length 0) is not equivalent to a NULL value for a string field.

Last updated on September 26, 2024.