Ampex/SDC
Key words: Database internals, Tape storage, Hash tables, Rapid learning,
Assembly language.
Ampex and System Development Corporation. Two employers, one job. It
came about this way: SDC was an Ampex customer. In order to assure
delivery of their most important subsystem, SDC bought the Ampex TeraBit Memory
division. People, hardware and the lease on the building. When this
was announced to the troops we had a choice. We could accept that our
checks would be coming from Santa Monica instead of Redwood City or we could
leave.
After a day on the job I got what must have been the standard explanation to our
database structure. I followed the explanation and commented that I was
being shown multi dimensional linked lists on disk. Bob could tell I
understood. The only thing left was a few details about which tables were
linked to which tables and where all that was defined.
I participated in the software side of Ampex’s efforts to extend its expertise
in video recording to large data stores. I participated in many little bug
fixes but two projects stick in my mind.
Hash Table
The first memorable project was new development disguised as a bug fix.
The bug concerned a problem that would give progressively worse performance
until things almost ground to a halt. This problem was the hash table
design. Design not implementation. As Larry originally designed it
hash tables were circular. That meant that hash table entries [details
below] were placed in the first available slot following the location given by
the hash algorithm. The end of the table wrapped around to the start if no
empty slot was found before the end of the table.
That sounds like a reasonable approach until one realizes that a search for an
entry that does not [yet] exist may well take a very long time. Just such
a search is required before creating a new hash-keyed record. That kind of
search can not stop when it finds an empty entry. It can only stop when it
finds an exact match or a virgin hash table slot. If the search stops
before it finds a never used entry it could miss an entry created before a
record was deleted creating the empty spot. In the pathological case every
hash table slot has been in use at one time or another. Searching such a
table involves a search or EVERY hash table entry.
My redesign involved initial hashing to a sector. Hash table slots were
small enough that 80 would fit in a single sector. Yes, searching for
something that was not present required searching the entire sector but once the
sector was read into RAM a search of only 80 hash slots took very little time.
The primary hash table sector could be linked to another sector should the
primary sector fill. Given that our hashing was reasonably decent and that
we allocated 2 to 3 times as many hash slots that would ever be used the chance
of any overflows was small.
I promised to share what our hash table slots looked like. They
used 6 bytes. The first 4 bytes in the slot was the address of the data
record containing the actual key. The other 2 bytes were a second hash
function of the key. In particular the key was not present in the hash
table. The second hash function eliminated the need to check a
non-matching key (on average) in all but in 1 in 65536 times.
The original author of that code went on to form his own well-known database
company.
Journaling
Our database required multiple writes to disjoint sectors to update anything.
If something went wrong the database was corrupted and data would be lost.
By keeping journals it was possible to repair a database that had become
corrupted if that corruption was the result of failing to write one of more
sectors. In particular, if the database machine halted and had to be
rebooted we could recover everything nicely.
I worked with another software engineer (who later helped found that well-known
database company) to produce this feature. In essence it was simple.
Before allowing any actual writes back to the database we collected the new
content (multiple instances of sector number, offset and length) all in one
place. We then made sure all sections were safely written to disk before
actual writes were allowed back to the database. When the system came up
after a crash we played back the journals and updated everything. When the
journal file got large we paused the system briefly making sure all transactions
were complete. We deleted the journal and started a new one.
Easy to understand but a bit of a problem to pull off. We had to look at
EVERY database action in our entire system and make them work with our scheme.
And given the era, everything was written in PDP-11 assembly language.
Miscellaneous
This was the first time I had ever done anything on a stack machine. I had
no problem adapting. I wrote a simple recursive routine, a right-justified
binary to ASCII function. As I write this I can not remember if I wrote it
because something like that was needed or I did it for its tutorial value.
I did rewrite that function for a useful purpose for a subsequent employer.
Our database was used to keep the location of files that had migrated to tape.
Users requested fines in the normal way and had their files copied to disk
before they used them. Changed files were copied back as needed.
By today’s standards the Ampex TBM was a joke. A reel of 2-inch wide
videotape a mile long could store a little under 2Gbytes. Today one can
buy twice that capacity on a CF card for way less than the cost of one TBM tape
transport. And 5Gig hard drives with a USB interface now sell for $99.
Yes, times have changed.
Back