Campus Example

A schema for a database for a hypothetical campus enrolment and grade tracking system that can be used to compare paradigms, languages, and schema design concepts. It makes a fairly good example because most people in IT are familiar with campus activities, having gone to college, and thus time is not spent describing how the domain works.


Draft Schema:

Table:
Class

------------ classID // autonumber courseRef semesterRef // a code like YYYYss may be sufficient instructorRef // FK to People table Table: Courses -------------- courseID courseTitle units // Unit weight of course (use "1" for campuses not using this convention) Table: coursesTaken ------------- takenID // autonumber personRef // FK to People table classRef // FK to Class table grade Table: People ------------- personID formality // Mr. Mrs. Dr. Etc. FirstMI lastName canTeach // If can be a teacher contactRef // FK to Contact Table: Contact -------------- contactID alias // in case goes by alt name addrLine1 addrLine2 addrLine3 // See note below about internationalization state // some disagree with this and zip. See below. zip country phone otherContactInfo // pager, fax, phone-2, etc. Table: courseCategories // links courses to categories ----------------------- courseRef categRef cC_note Table: Categories ----------------------- categID // autonumber categTitle categDescrip Table: Prerequisites -------------------- courseRef categRef

See Constant Table for tips on handling grade codes and other constants.


Note: Prerequisites may need something more sophisticated than mere set tables, such as Boolean expressions and minimum grade. This is an issue that needs to be further developed per campus.


Is this database concerned with the difference between a student (currently) taking a course, and a student who has (already) taken a course? If so, I can't see anything here to help decide that.

I assumed nothing goes into the record until course is finished. However, perhaps this can be rolled into an Enrolled In table of some sorts. Perhaps just use a "CE" grade (above) for "currently enrolled".


Note that the Contact table displays a common American Cultural Assumption about address formats. Do you really need "city", "state_province" and "zipCode" to be separate fields and not just combined as "addrLine3"? If you are allowing "country" as a field, you are implying that this database is not restricted to the USA, so I suggest you should allow more flexible addressing

The "world-friendly" address issue can get rather involved. But if say 90%+ of all addresses are in the US, then going all-out to address that issue is overkill IMO. I agree with a 3rd address line for such though. But if this campus was in the US, I would have State and Zipcode in there also.

Google "i18n" for more info on internalization.

(Somebody deleted this section for some reason. I put it back. If you move it somewhere, please include a link here. Please be more careful about deletions.)

Having worked with contact information for student and fundraising software, not having a separate address table is a mistake. Situations where people have campus address (office numbers for the TA's, who are instructors AND students and some times from overseas of course)), off campus addresses, summer addresses, parent addresses and post graduation addresses to name a few. In addition with the proliferation of electronic communications, there is a need for landline phone numbers, cell phone numbers, fax, email etc. My solution would look more like (my comments in parens):

ADDRESSES

---

address_id (int, pk) entity_id (int, fk to entity link table) Addresstypecode (int fk to addresstype table) Address Field (large text/char field, I have seen some horrendous addresses in Wales!) Postal Code (could be US zip code or other code) Country code address begin date address end date Special Handling Notes (memo field)

CONTACTS (for a lack of a better term)

---

comm_id (pk) entity_id (int, fk to entity link table) Contacttypecode (fk link to contact types table) landline (could be phone or fax) internet (usually email)

Relationships for the entities to the addresses (student or faculty) would be many-to-many (e.g. students and faculty can be married to each other or share lodging. This may seem like over kill but I have seen horribly normalized databases which have forced bizarre workarounds in code. Or worse people starting shadow database (often in spreadsheets) to track what the database system cannot. -- Pedro Lopez

I don't see the need for a separate record for each contact device (cell, email, etc.). Could you please clarify that? In addition to the common phone and email columns, I generally have a big "otherContact" text column with for open-ended descriptions of all the other doodads that people communicate with these days. We don't need to create a dedicated column for each one, and a row for each one is overkill IMO. An example dump from a given contact record may resemble:

Day Phone:
"123-555-1234"

Evening Phone:
"123-555-2837"

Email:
"susie33@aol.com"

Other Contact:
"Cell: 543-555-9292, beeper: 777-555-3221, Fax: ... IM: blah..."

Keep in mind there may be a separate contact record for Susie's parent(s) and/or other addresses. But they don't necessarily have to populate phone numbers in them.

Then, what happens if the boss asks for a list of all of the contacts with a Fax number, to send out a broadcast fax (just as an example)? You'd have to query the database for "Other Contact" LIKE "%Fax:%", hoping that everyone spelled it right, then parse it out. And then later, after the boss gets a bunch of complaints, he or she tells you to add another field onto each record so that students can opt out of receiving broadcast faxes but still keep their number in for solicited faxes. It'd be far easier if they started as separate records, or at least, if your database supports an array of records field type, to use that.

A lot of different "what-if's" can pop up. The decision about which path to take is similar to "investment trees" which use probability trees to make the best decision based on given probabilities of future events. These issues are common in "Yag Ni fights". (Related: Decision Math And Yagni) My experience would say not to make a dedicated "slot" for it unless there's a reasonable chance of it being needed, such as automated faxing or paging etc. having happened in the past. -t

Related: Party Pattern (Perhaps move this there, or to the renamed Contact And Address Models)


Re: semesterRef column.

probably needs better code, like 2003fall or 2004spring [verbose is not bad]

But it might not sort properly. Perhaps we should have another table to define the full names, or just use a library function to expand it to English.

On the other hand, you could simply use a sorting function suited to the data.

The existing crop of RDBMS generally don't permit that and it is wasteful to do it in the app IMO. But, it is not outside of relational theory to have defined relational functions. Perhaps if we split it into Year and Semester we could write a trigger to automatically fill a sorting code. But anyhow, I think the data is easier to work with if we keep it one code, at least for the key. Perhaps we can do this:

Table:
Semesters

----------------- semesterID // auto-gen integer or shortcut code like YYYYS semYear // Ex: "2002" semSequence // 1=winter, 2=spring, etc. semDescript // Ex: "Spring, 2002"

This way we don't have to try to make a code that is both super-readable yet computer-friendly. Note that we may later add semester-related attributes such as studentCount, revenue, etc.


On comp.object, somebody doing OO modeling against a similar example suggested having a Transcript object and have GPA be a method of Transcript.

I find that kind of icky. For one, you have to think of possible associations to figure out what object/class the GPA is attached to. It could also be attached to a Student object, no? Second, what if we only wanted to calculate GPA for a year or so? Or GPA for a group of students for research? In "pure" OO we would have to code up some manual iteration over the students or transcripts. I would rather do a query than instantiate each student just to gather their GPA. (Note that we need to do a summary of a summary in order to avoid weighting some students more than others. Although SQL often does not handle double-level summaries very well, a "good" relational language could. Sql Flaws. We may also need a Grade Value table to translate grades into numeric values.)

Student would presumably have a getTranscript() method, so we could say student.getTranscript().getGPA() to get a student's GPA; if we wanted, we could define a Law Of Demeter-style getGPA() convenience method on Student which got the transcript and asked it for the GPA. For time-restricted GPAs, we could have a method on Transcript to get a sub-transcript for a particular year (or date range, or set of courses, or whatever), and then just ask that for its GPA. For a group of students, i'd put them in a list, then, since i'm using the Python Language, say:

totalGPA = reduce( lambda a, b:
a + b, map( Student.getGPA, students ) )

If this code runs on a different machine to the one on which the objects live, we have a Remote Iteration Issue here. Therefore, it doesn't. Machines other than the object-home which which to know about GPA totals have to remotely call GPA-totalling procedure on the object-home.

You then end up reinventing a query system. We are back to the classic Reinventing The Database In Application Holy War.


2 things: people have roles, such as teacher, student, etc. A canTeach attribute is a partial solution. Also, college students often have 2 addresses, so this could be captured too. Fun.

Do you have a specific requirement scenario in mind where "canTeach" is insufficient? Also, if people need a second address, then just add a second reference to the People table ("contactRef2"). That is a relatively minor addition. A more complicated change requirement would be a request for open-ended quantities of addresses (contacts). But I don't see a realistic need for that so far.

Yes, this seems to be a really lightweight example here. Perhaps another 1st or 2nd year CS student trying to get some help on an assignment?

You are welcome to add requirements, as addendums perhaps, to it. The reasons I suggest addendums so that we can perhaps do some Code Change Impact Analysis on it. Note that if we make examples too involved, then perhaps fewer readers can get involved. It is the classic tradeoff between realism and grokability time. Further, I think 2 addresses is plenty sufficient for say 99 percent of the students. I suggest perhaps having a large "notes" column in the Student or People table to explain any odd circumstances and/or extra addresses. It is not practical to fully automate such oddities IMO. Tail Wags Dog.


I am curious. How many would rather split the People table into say Students and Faculty, and why?

I would not. I would keep People but add some kind of Roles table. An entry in People can have multiple Roles. Instead of simple canTeach flag, there'd be a relation. That is, if there is a requirement that leads to that complexity. --Steven Newton


If somebody wishes to turn this into a requirements document, then one can generally take each of the entities involved and apply this patter to them:

For entity X, we want to be able to

add a new X

list X's using Query By Example

change an X as selected by list in #2

delete an X as selected by list in #2, after preview

In other words, typical Crud Screens. For the many-to-many tables, it would be something like, "Be able to assign or unassign an X to a Y where X and Y are the two entities being cross-linked. Display descriptions in addition to ID's.


Suggested Use Cases

Enter student grades (at end of quarter/semester)

Print/display report cards per student

Print/display transcripts per student and optionally all students for a quarter/semester

Print/display grades per class

Remove student grades with audit trail

Revise student grades with audit trail

Reassign student to new class (e.g. student drops 8am class for 2pm class)

Change student name (e.g. in case of marriage or adoption) with audit trail

Change faculty name with audit trail

Class load report per instructor (important to make sure instructors don't end up too many students per week)

Class load per student (in case overloaded students need special permission, or student with a low work load lose financial aid or scholarships)

Notice that there is a lot of repetition here. Generally with each entity (class, student, grade, etc.) one will want to add, change, delete, list, search, and print/report on just about every one to some degree. Thus, it's best to have a framework that provides most of that for us with relatively little coding. Related: Database Verbs. One could perhaps describe the requirements as such:

entities = list{students, teachers, courses, classes, grades, degrees, admin, etc.} for each entity in entities perform: { Add Change Delete List Sort Search Print Summarize Audit changes (log) Secure/verify usage }

"Summarize" could get combinationally tricky because one may want to see each of the entities from each of the other entities' perspective, such as grades per student, grades per teacher, grades per course, classes per student etc. Security may also requiring dealing with a lot of interweaving. This is typical with CRUD applications: future reports, views, etc. often cannot be anticipated up front. One often starts off with the basics so that each entity is at least searchable and editable to a minimum degree, and then tune or add to it for work-flow convenience per department, etc. One department or user may end up saying, "I have to visit 4 different screens (entities) for these kinds of transactions, which I do a lot of. If you can combine such and such information onto one screen, then it would save me time." Some of these requests we may know up front, some not. But at least having the minimum set for each entity allows work to get done, just not necessarily in an optimized way. Everything is editable. -t



See original on c2.com