What is Backup Software? (part two)

Last time I talked about what I think Backup Software is, and how it really can be reduced to just a specialized version of ETL software.

Let’s take things a little further. I said we could have a read dll which gets some data (anonymous data), and gives it to a handler dll. So how can these two actually decide what’s the best way of handling the data?

We could treat the information as just a series of bytes, which could be considered the lowest-common-denominator. It will work, but might be a little too generic.

Imagine a read dll which connects to a database and starts a dump of the data, spewing it out as SQL statements. It might be interesting for the handler dll to know that these are SQL statements, and not just a byte stream. Maybe the handler DLL’s job is to transform the data into an XLS file (or something else). Maybe the task is to get data from a HANA database, and transform it so it can be loaded into an SQL Server database, or Oracle, or whatever. The statements could be parsed and just the specific SQL dialects changed.

So how can these DLL files decide what’s the best way of transferring data – especially without user intervention.

(I’m not completely done on the thinking part of this yet, so this is just an idea)

How about the dlls have a way of saying how they can accept data, and how they can produce data.

The read dll could say “I can produce sql statements, or text, or a bytestream”. The options are listed in (what the read dll thinks are) most useful order. I.e. SQL statements are a specialized version of a text stream, which is a specialized version of a bytestream.

Handler dlls could say “I know how to understand jpeg files, png files, sql statements, and bytestreams”. Again, in order of usefulness (OK my example is a bit contrived).

So in this example, the read dll can produce sql statements, and the handler dll knows how to understand them. Perfect.

We could have another handler dll which doesn’t care what the data is (say its job is to encrypt data) – it could announce “I can accept bytestreams”. Since bytestreams are the lowest common denominator, this would work too.

I was thinking of using simple mimetypes to announce this. These are data metadata types which exist and can be easily understood.

That’s where I’m at thinking about this so far. I have a couple of ideas about the GUI part too, but I have to let that settle :)