Section 3

Review of SQL and the Relational Model

Bonus problem from Section 2:

For each departure city in last section’s the Flights relation, find the flight numbers of the flights from that city to New York (if any). If there is no such flight, use a value of NULL to indicate that fact. UNION should not be used.

Say our Flights table looks like this (ignoring the columns that are irrelevant to this problem):

flno	origin	dest
10	Chicago	New York
20	Chicago	New York
30	Los Angeles	New York
40	Los Angeles	Seattle
50	Seattle	Anchorage
60	New York	Boston

The result we eventually want for this problem is:

origin	flno
Chicago	10
Chicago	20
Los Angeles	30
Seattle	`NULL`
New York	`NULL`

Note that the result set has every possible departure city from the table matched up with the flight numbers that get you from that city to New York. Note also that if you can’t get to New York by starting from a given departure city, it has one row in the result set and the flno value is NULL.

This sounds like how an outer join works, so let’s use an outer join from Flights to itself:

SELECT F1.origin, F2.flno
FROM Flights AS F1 LEFT OUTER JOIN Flights AS F2
     ON F1.flno = F2.flno AND F2.dest = 'New York';

When we join Flights with itself using the join condition F1.flno = F2.flno, we end up with just as many rows as Flights has, duplicating the columns of Flights so that it looks like this:

F1.flno	F1.origin	F1.dest	F2.flno	F2.origin	F2.dest
10	Chicago	New York	10	Chicago	New York
20	Chicago	New York	20	Chicago	New York
30	Los Angeles	New York	30	Los Angeles	New York
40	Los Angeles	Seattle	40	Los Angeles	Seattle
50	Seattle	Anchorage	50	Seattle	Anchorage
60	New York	Boston	60	New York	Boston

Then, by specifying AND F2.dest = 'New York', we can get NULL to appear wherever a given flight doesn’t arrive in New York:

F1.flno	F1.origin	F1.dest	F2.flno	F2.origin	F2.dest
10	Chicago	New York	10	Chicago	New York
20	Chicago	New York	20	Chicago	New York
30	Los Angeles	New York	30	Los Angeles	New York
40	Los Angeles	Seattle	`NULL`	`NULL`	`NULL`
50	Seattle	Anchorage	`NULL`	`NULL`	`NULL`
60	New York	Boston	`NULL`	`NULL`	`NULL`

Finally, we project the columns F1.origin and F2.flno to get this:

origin	flno
Chicago	10
Chicago	20
Los Angeles	30
Los Angeles	`NULL`
Seattle	`NULL`
New York	`NULL`

But this isn’t what we wanted! Los Angeles should not appear twice: it should appear once, since flight number 30 gets us to New York. (We don’t care about flight 40, the flight that departs in Los Angeles for Seattle.)

We can modify our query to exclude the rows where F2.flno is NULL when there’s a flight to New York from that origin already:

SELECT DISTINCT F1.origin, F2.flno
FROM Flights AS F1 LEFT OUTER JOIN Flights AS F2
     ON F1.flno = F2.flno AND F2.dest = 'New York'
WHERE NOT (F2.flno IS NULL AND
           F1.origin IN (SELECT origin
                         FROM Flights
                         WHERE dest = 'New York'));

With this query, we no longer have the row for flight 40.

Consider the following relational schemas, which relate to a database containing information for a hospital.

Patient(name, DOB, sex)
Appointment(patient_name, patient_DOB, staff_id, room_number, building, appt_date, appt_time, reason)
Room(building, number, type)
Staff(id, name, status, salary, specialty_type)
Specialty(type, phone_number, chair_id)

Given the above schema write the following SQL queries:

Find the schedule and number of appointments for each doctor on April 1, 2013.
Specifically, display the name of the doctor followed by the earliest appointment of the day for that doctor, the latest appointment time of the day, and the total number of appointments that day for each doctor who has at least one appointment on April 1, 2013.
```
SELECT S.name, MIN(A.appt_time), MAX(A.appt_time), COUNT(*)
FROM Staff S, Appointment A
WHERE S.id = A.staff_id AND A.appt_date = '2013-04-01'
GROUP BY S.id, S.name;
```

Find the names of all staff members who do not have an appointment on the afternoon of March 6, 2013.

SELECT name
FROM Staff
WHERE id NOT IN (SELECT staff_id
                 FROM Appointment
                 WHERE appt_date = '2013-03-06'
                   AND appt_time >= '12:00:00');

Find schedules of all staff members for the afternoon of March 6, 2013. The result set should include the names of each staff member, and the times of all appointments they have on March 6, 2013. A given name may appear more than once if that staff member has more than one appointment. Make sure to include those staff members that happen to have NO appointments on that date.
```
SELECT s.name, a.appt_time
FROM Staff as s LEFT OUTER JOIN Appointment as a
ON s.id = a.staff_id
WHERE a.appt_date = '2013-03-06' and a.appt_time >= '12:00:00';
```

Record Representations: fixed-length and variable-length records

What are the advantages of fixed-length records? What are the disadvantages?

Fixed-length records can simplify the computation of field offsets; furthermore, field offsets need not be stored with every record. Disadvantages include inefficient use of space if fields in records are not always utilized (e.g., many records have NULL entries, or entries that require fewer bytes than the specified field size).
Why are variable-length records advantageous?

Space can be used more efficiently: space does not need to be allocated for fields with NULL entries, and variable-length record fields only require as much space as is necessary for the particular field value in that record.
A record contains the following fields from our Movie relation from Problem Set 1:
- id, a fixed-length string of 7 characters
- name, a variable-length string of up to 30 characters
- earnings_rank, an integer
- runtime, an integer
Assume that we are using the record format discussed in lecture in which each record begins with a header of field offsets. What will the record look like for the following tuple? Assume that you have one-byte characters and four-byte integers.
```
('5678463', 'Philadephia', NULL, 125)
```
One possible representation is the following:

0

4

8

12

16

20

27

39

20

27

-1

39

43

'5678463'

'Philadelphia'

125

Note that we use a special offset -1 for the NULL value, and that we also include the offset of the end of the record, so that we can compute the length of the last field. In this case, because the NULL value is for an integer, we could theoretically use an offset of 39 for it instead. Then, when we subtracted and discovered that the length of the integer is 0, we would know that the value is NULL. However, that approach does not work for string values, since an empty string (i.e., a string of length 0) is not equivalent to a NULL value for a string field.

Last updated on September 26, 2024.