Anatomy of a Database

Top  Previous  Next

This section briefly describes the fundamentals of database design. It is meant only to provide an overview of the subject for those who are not already thoroughly familiar with standard database design concepts and issues. Experienced developers may want to move right on to the next chapter and skip this section.

 

Definitions

A database is a collection of information (data) in a system of tables, rows, and columns. The database is maintained by one or more computer programs or applications.

The basic unit of data storage is a column. A column is a storage place for information of a similar type. For example, one column might store a name and another column might hold a telephone number.

A group of different columns that are logically related make up a row. A row contains all the information related to one subject. For example, all the columns containing information concerning one student (name, address, telephone number, student number, etc.) makes up one student’s row. This would be similar to a file folder a school might keep for each student.

Collections of logically related rows make up a table. Using the same example, a collection of all students’ rows makes up the student body table. This would be similar to the file cabinets where students’ folders are kept.

Another way of looking at this is as a table or spreadsheet:

In this format, the entire table is a file cabinet, with rows (folders) and columns (data about one row).

A database is a collection of related tables. This is similar to a bank of file cabinets where the entire school records are kept. One file cabinet might hold the files with students’ personal data, another with class enrollment information, and another with faculty information.

A relational database is a collection of tables with defined relationships between them. Effective database design breaks the data into related tables that are joined together through linking columns. This will be covered in detail later in this section.

 

Summary:

 One or more columns combine to form a row.

 One or more rows combine to form a table.

 A collection of related tables is a database.

 

Table Systems and Table Drivers

There are several table formats used on PCs. These are the actual physical storage formats written to disk by programs that maintain the tables. Using TopSpeed’s table driver technology, Clarion supports many of them. Table drivers enable Clarion programs to read these different table formats.

The Professional and Enterprise Editions include ADO, TopSpeed, Clarion, ASCII, BASIC, ODBC, Clipper, dBase III & IV, DOS, FoxPro, Pervasive, MSSQL and other SQL table drivers. When you need to read data from another table system, you can add new table drivers to the Professional and Enterprise Editions. Call SoftVelocity’s Sales department at (954) 785-4555 to inquire about the availability of any specific table driver you need.

Each table system has its own idiosyncrasies and limitations. See Database Drivers in the Language Reference core help for more specific information.

 

Data Types

Columns can store many different types of data, but each individual column may hold only one type. When a column is defined, its data type is specified. This enables it to efficiently store that type of data. For example, to store a number from 0 to 100, using a column defined as a single BYTE takes less space than one defined as a decimal number column (a byte can hold an unsigned whole number between 0 and 255).

Clarion supports a wide variety of data types. All are fully documented in Chapter 3 of the Language Reference core help.

 

Sorting Data: Keys and Indexes

One of the most powerful aspects of a computerized database is the ability to sort data in many different ways. To do this manually requires multiple copies of record forms, many file folders, and many file cabinets. It would also require a lot of time spent filing each copy in different places for each sort order.

Sorting computer rows in a database merely requires the definition of keys or indices. Keys and indices declare sort orders other than the physical order of the rows within the table. In some table systems the keys are kept in separate disk tables, in others they are contained within the same table as the data. TopSpeed’s table driver technology handles these differences transparently.

Keys and indices are functionally equivalent. The only difference is the way they are maintained by an application.

·

A key is dynamically maintained by the application. Every time a row is added, modified, or deleted, the sort sequence is updated, if necessary. Keys are useful for frequently used sort-orders.

·

An index is not dynamically maintained, it is only built when needed. Indexes are useful to create sort sequences that are infrequently used, such as month-end or year-end processes.

Using the student table example discussed earlier, suppose you wanted to sort the students’ rows two ways: by name alphabetically, and by name within each major. This produces two alternate sorts.

This example uses a one-component key on the student’s name.

The next example has two components in the key: major and student name. A key can contain one or more columns, allowing sorts within sorts.

 

Ascending and Descending Sort Orders

Some table systems support both ascending and descending order for keys or components of keys (for example, the TopSpeed system). Other table systems only support ascending order, which means the data can only be sorted from lowest to highest (for example, the Clarion system).

The next example has two components in the key: Graduation Year (descending), and student name (ascending).

 

Using Keys as Range Limits

Suppose you want to create a class enrollment report from a database for the past fifteen years. All you are interested in (for purposes of this report) is the last three years. You can dramatically reduce processing time if you use a subset of the table that only contains the rows from the last three years. It takes two steps to accomplish this:

·

First, define a key to sort the data by the date of each course.

·

Next, define the range limits you are interested in. A range limit specifies a subset of the entire table to process. Only those rows that fall within the range limits are considered.

In this example, only one-fifth of the rows are processed (assuming that each year’s course offerings are the same). Reducing the number of rows to include by 80% reduces processing time by the same amount.

 

Relationships Between Tables

One goal of relational database design is reducing data redundancy. The basic rule is that data should be located in only one place. This is beneficial in two ways. First, it reduces storage space requirements. Second, it makes the database easier to maintain. To reach this goal, tables are broken up into separate, related tables through a process called data normalization.

The first step is to move any repeating groups into separate tables. For example, if a student could take a maximum of six classes, you could design the student table to contain six class columns (class1, class2, class3, etc.). But not all of these columns would be used in each row. If one student is taking six courses, all would be used, but if another student only took one class, there would be five empty columns in his row. For that reason, the class columns are moved into a separate table, eliminating the need to reserve space for empty columns. This creates a One to Many relationship (One student takes Many classes) between the student table and the classes table.

The next step is to move redundant data into separate tables. Every column in the student table must be dependent on the primary key (Student Number). The student’s name, address, and phone number remain in the student table. But the student’s major description could be moved to a separate table. This eliminates the need to repeat the Major’s Description for each student with that Major. To do this, add a Major ID number column to the student table and the Majors table. This creates a Many to One relationship (Many students have One major) between the student table and the majors table.

Once data storage is "normalized," related information is linked by ensuring a column in one table is identical to a column in the other. These common columns create the links between related tables. A linking column could be a student number, a course code, or a classroom number. Any column (or group of columns) that uniquely identifies a row in the primary table can be used as a link.

Examples of these relations can be found in a school database:

·

One teacher teaches many classes (One-to-Many).

·

Many students would have one major (Many-to-One).

 

Database Summary

A database is a collection of information (data) in a system of columns, rows, and tables.

Columns can store many different types of data, but each individual column is specified to hold only one type.

Each data item should be stored in only one place.

One or more columns makes up a row. One or more rows make up a table. A collection of related tables make up a database.

Clarion programs can access many different table systems through the use of table drivers.

Keys and indexes declare sort orders other than the physical order of the rows within the table, and can contain more than one column, allowing sorts within sorts.

Range Limits enable you to process a subset of rows, which reduces processing time.

Tables are related through common columns containing identical data, which allows you to eliminate redundant data.

 

Click here to jump to the next lesson