Archive of November 2017

What is Backup Software? (part two)

Last time I talked about what I think Backup Software is, and how it really can be reduced to just a specialized version of ETL software.

Let’s take things a little further. I said we could have a read dll which gets some data (anonymous data), and gives it to a handler dll. So how can these two actually decide what’s the best way of handling the data?

We could treat the information as just a series of bytes, which could be considered the lowest-common-denominator. It will work, but might be a little too generic.

Imagine a read dll which connects to a database and starts a dump of the data, spewing it out as SQL statements. It might be interesting for the handler dll to know that these are SQL statements, and not just a byte stream. Maybe the handler DLL’s job is to transform the data into an XLS file (or something else). Maybe the task is to get data from a HANA database, and transform it so it can be loaded into an SQL Server database, or Oracle, or whatever. The statements could be parsed and just the specific SQL dialects changed.

So how can these DLL files decide what’s the best way of transferring data – especially without user intervention.

(I’m not completely done on the thinking part of this yet, so this is just an idea)

How about the dlls have a way of saying how they can accept data, and how they can produce data.

The read dll could say “I can produce sql statements, or text, or a bytestream”. The options are listed in (what the read dll thinks are) most useful order. I.e. SQL statements are a specialized version of a text stream, which is a specialized version of a bytestream.

Handler dlls could say “I know how to understand jpeg files, png files, sql statements, and bytestreams”. Again, in order of usefulness (OK my example is a bit contrived).

So in this example, the read dll can produce sql statements, and the handler dll knows how to understand them. Perfect.

We could have another handler dll which doesn’t care what the data is (say its job is to encrypt data) – it could announce “I can accept bytestreams”. Since bytestreams are the lowest common denominator, this would work too.

I was thinking of using simple mimetypes to announce this. These are data metadata types which exist and can be easily understood.

That’s where I’m at thinking about this so far. I have a couple of ideas about the GUI part too, but I have to let that settle :)

What is Backup Software?

I spend quite a lot of time thinking about backup software. AltexaBackup is something I’ve been working on, on and off, for about 10 years. The original idea is even older.

The whole project stems from a simple realization – none of the backup software I’ve tried actually seems to work. It might seem a silly thing to say, but that’s the conclusion I’ve come to. From the Windows “DVD Backup”, which didn’t actually manage to reinstall the PC, to various pieces of shareware which just seem to fail doing the most basic of tasks. I haven’t tried a great deal of commercial software, simply for financial reasons.

Being a developer, I reacted to this failure in the most logical way I could – by trying to do better, myself.

And so AltexaBackup was born. Originally, in V1, Altexa Backup (note the space) was designed to be a cloud backup (back before there was really a cloud – AWS even did a whitepaper on the system; that’s now novel it was). There was the Windows software, which talked to a Linux-based backend. There was a big database, with replication at the backend, and so on and so forth. I was actually quite proud of the program, despite its limitations. I had a handful of subscribers (well, more than a handful actually), but quickly came to the conclusion that whenever a problem appeared, fixing it was rather difficult – the software wasn’t particularly well written.

After fighting with it for a while, I decided that it would be better to rethink the system, and completely rewrite it. Everything in there is new – even down to the directory walk system (i.e. listing files), which is no longer recursive. (There’s no real reason for this change as such, I just wanted a little more flexibility).

One of the early design decisions was to use DLL files (“plugins”) for the system. I decided to create two ‘families’ – one for storing the backup (there are currently ‘local’, ‘ftp’ and ‘s3’ for that), and another for deciding how to backup (there are ‘archive’ and ‘copy’ for that).

(This post is getting a bit long, sorry – I’m getting to the point)

Over the last couple of weeks/months (I don’t work on this full time), I’ve realized that a backup system is just…an ETL.

· You “extract” data from a file

· You “transform” the data – i.e. you compress it (or you don’t – but “nothing” is an operation too)

· You “load” the data into its final backup resting place

This actually means that backup programs could be a lot more generic.

Take AltexaBackup for example. Right now we have

  • A gui which creates jobs (entries in a config file)
  • A service which runs
    • A worker which loops over available jobs, generates a list of files to backup and calls
      • A backup dll for each file, which calls
        • A storage dll for the result

This seems fairly optimized, but if I were to add a new level of abstraction at the start:

  • A service which runs
    • A worker which loops over each job and calls
      • The read dll which gets the data and calls
        • A handler dll which does something (or nothing) with the data and calls
          • A storage dll to put the result somewhere

This now seems a lot more flexible. We could backup files (as now), partitions, maybe databases.

The program could be used to download websites, to reencode film files, to do just about anything.

Imagine a setup like this one

  • A service which runs
    • A worker which loops over each job and calls
      • The read dll which gets the data and calls
        • A handler dll which does something and calls
        • A handler dll which does something else and calls
        • A handler dll which does something else and calls
          • The storage dll to put the result somewhere

Now we are talking flexibility!

The great thing is that each of these dlls can be very very simple.

Today’s AltexaBackup storage DLLs only need to export two (file-handling) functions:

    • GetFile
    • PutFile

This is really great, because they don’t know what they are storing – and the rest of the system doesn’t know anything about where the data has been stored. This is the perfect example of Douglas Adam’s perfectly-coined phrase – the SEP. Somebody Else’s Problem.

Computing reduced to the simplest expression. Or the Unix way of doing things. Personally, I love it.