Ampex/SDC

Key words: Database internals, Tape storage, Hash tables, Rapid learning, Assembly language.

Ampex and System Development Corporation.  Two employers, one job.  It came about this way: SDC was an Ampex customer.  In order to assure delivery of their most important subsystem, SDC bought the Ampex TeraBit Memory division.  People, hardware and the lease on the building.  When this was announced to the troops we had a choice.  We could accept that our checks would be coming from Santa Monica instead of Redwood City or we could leave.

After a day on the job I got what must have been the standard explanation to our database structure.  I followed the explanation and commented that I was being shown multi dimensional linked lists on disk.  Bob could tell I understood.  The only thing left was a few details about which tables were linked to which tables and where all that was defined.

I participated in the software side of Ampex’s efforts to extend its expertise in video recording to large data stores.  I participated in many little bug fixes but two projects stick in my mind.

Hash Table

The first memorable project was new development disguised as a bug fix.  The bug concerned a problem that would give progressively worse performance until things almost ground to a halt.  This problem was the hash table design.  Design not implementation.  As Larry originally designed it hash tables were circular.  That meant that hash table entries [details below] were placed in the first available slot following the location given by the hash algorithm.  The end of the table wrapped around to the start if no empty slot was found before the end of the table.

That sounds like a reasonable approach until one realizes that a search for an entry that does not [yet] exist may well take a very long time.  Just such a search is required before creating a new hash-keyed record.  That kind of search can not stop when it finds an empty entry.  It can only stop when it finds an exact match or a virgin hash table slot.  If the search stops before it finds a never used entry it could miss an entry created before a record was deleted creating the empty spot.  In the pathological case every hash table slot has been in use at one time or another.  Searching such a table involves a search or EVERY hash table entry.

My redesign involved initial hashing to a sector.  Hash table slots were small enough that 80 would fit in a single sector.  Yes, searching for something that was not present required searching the entire sector but once the sector was read into RAM a search of only 80 hash slots took very little time.  The primary hash table sector could be linked to another sector should the primary sector fill.  Given that our hashing was reasonably decent and that we allocated 2 to 3 times as many hash slots that would ever be used the chance of any overflows was small.

I promised to share what our hash table slots looked like.  They used 6 bytes. The first 4 bytes in the slot was the address of the data record containing the actual key.  The other 2 bytes were a second hash function of the key.  In particular the key was not present in the hash table.  The second hash function eliminated the need to check a non-matching key (on average) in all but in 1 in 65536 times.

The original author of that code went on to form his own well-known database company.

Journaling

Our database required multiple writes to disjoint sectors to update anything.  If something went wrong the database was corrupted and data would be lost.  By keeping journals it was possible to repair a database that had become corrupted if that corruption was the result of failing to write one of more sectors.  In particular, if the database machine halted and had to be rebooted we could recover everything nicely.

I worked with another software engineer (who later helped found that well-known database company) to produce this feature.  In essence it was simple.  Before allowing any actual writes back to the database we collected the new content (multiple instances of sector number, offset and length) all in one place.  We then made sure all sections were safely written to disk before actual writes were allowed back to the database.  When the system came up after a crash we played back the journals and updated everything.  When the journal file got large we paused the system briefly making sure all transactions were complete.  We deleted the journal and started a new one.

Easy to understand but a bit of a problem to pull off.  We had to look at EVERY database action in our entire system and make them work with our scheme.  And given the era, everything was written in PDP-11 assembly language.

Miscellaneous

This was the first time I had ever done anything on a stack machine.  I had no problem adapting.  I wrote a simple recursive routine, a right-justified binary to ASCII function.  As I write this I can not remember if I wrote it because something like that was needed or I did it for its tutorial value.  I did rewrite that function for a useful purpose for a subsequent employer.

Our database was used to keep the location of files that had migrated to tape.  Users requested fines in the normal way and had their files copied to disk before they used them.  Changed files were copied back as needed.

By today’s standards the Ampex TBM was a joke.  A reel of 2-inch wide videotape a mile long could store a little under 2Gbytes.  Today one can buy twice that capacity on a CF card for way less than the cost of one TBM tape transport.  And 5Gig hard drives with a USB interface now sell for $99.  Yes, times have changed.
Back