What is Backup Software?

I spend quite a lot of time thinking about backup software. AltexaBackup is something I’ve been working on, on and off, for about 10 years. The original idea is even older.

The whole project stems from a simple realization – none of the backup software I’ve tried actually seems to work. It might seem a silly thing to say, but that’s the conclusion I’ve come to. From the Windows “DVD Backup”, which didn’t actually manage to reinstall the PC, to various pieces of shareware which just seem to fail doing the most basic of tasks. I haven’t tried a great deal of commercial software, simply for financial reasons.

Being a developer, I reacted to this failure in the most logical way I could – by trying to do better, myself.

And so AltexaBackup was born. Originally, in V1, Altexa Backup (note the space) was designed to be a cloud backup (back before there was really a cloud – AWS even did a whitepaper on the system; that’s now novel it was). There was the Windows software, which talked to a Linux-based backend. There was a big database, with replication at the backend, and so on and so forth. I was actually quite proud of the program, despite its limitations. I had a handful of subscribers (well, more than a handful actually), but quickly came to the conclusion that whenever a problem appeared, fixing it was rather difficult – the software wasn’t particularly well written.

After fighting with it for a while, I decided that it would be better to rethink the system, and completely rewrite it. Everything in there is new – even down to the directory walk system (i.e. listing files), which is no longer recursive. (There’s no real reason for this change as such, I just wanted a little more flexibility).

One of the early design decisions was to use DLL files (“plugins”) for the system. I decided to create two ‘families’ – one for storing the backup (there are currently ‘local’, ‘ftp’ and ‘s3’ for that), and another for deciding how to backup (there are ‘archive’ and ‘copy’ for that).

(This post is getting a bit long, sorry – I’m getting to the point)

Over the last couple of weeks/months (I don’t work on this full time), I’ve realized that a backup system is just…an ETL.

· You “extract” data from a file

· You “transform” the data – i.e. you compress it (or you don’t – but “nothing” is an operation too)

· You “load” the data into its final backup resting place

This actually means that backup programs could be a lot more generic.

Take AltexaBackup for example. Right now we have

  • A gui which creates jobs (entries in a config file)
  • A service which runs
    • A worker which loops over available jobs, generates a list of files to backup and calls
      • A backup dll for each file, which calls
        • A storage dll for the result

This seems fairly optimized, but if I were to add a new level of abstraction at the start:

  • A service which runs
    • A worker which loops over each job and calls
      • The read dll which gets the data and calls
        • A handler dll which does something (or nothing) with the data and calls
          • A storage dll to put the result somewhere

This now seems a lot more flexible. We could backup files (as now), partitions, maybe databases.

The program could be used to download websites, to reencode film files, to do just about anything.

Imagine a setup like this one

  • A service which runs
    • A worker which loops over each job and calls
      • The read dll which gets the data and calls
        • A handler dll which does something and calls
        • A handler dll which does something else and calls
        • A handler dll which does something else and calls
          • The storage dll to put the result somewhere

Now we are talking flexibility!

The great thing is that each of these dlls can be very very simple.

Today’s AltexaBackup storage DLLs only need to export two (file-handling) functions:

    • GetFile
    • PutFile

This is really great, because they don’t know what they are storing – and the rest of the system doesn’t know anything about where the data has been stored. This is the perfect example of Douglas Adam’s perfectly-coined phrase – the SEP. Somebody Else’s Problem.

Computing reduced to the simplest expression. Or the Unix way of doing things. Personally, I love it.