I'm playing round with a little 'pet' project at the moment.
I have a file of 10M lines of text. Basically each line is the line number followed by the date, except for lines whose number is divisable by 10000, which contain only the word 'test'.
The file is around 250Meg in size.
I have in Delphi (XE7) a data type of
TadbCarray = array of cardinal;
TadbStrCard = TDictionary
I load the file in, one line at a time, and if the text is not found, create a new value in the dictionary with the line number in the array of cardinals, and if it is found (i.e. for the 'test' lines), just add the line number to the array of cardinals.
One might think that the dictionary in itself would take up a little more than the length of the file, let's say 300Meg
The file is in ASCII, whcih gets converted to unicode, so let's say 600Meg
Each entry in the dictionary includes a HashCode which is an integer, so around 20Meg overhead
The arrays of cardinal values would hold, say 4 bytes per value, so around 40Meg.
One might therefore estimate (roughly) the total size of the dictionary at 670Meg - let's round that to 700Meg.
I used a routine to check the program's usage, once before loading my dictionary, and once after, to see the memory occupied by the dictionary...and it agreed with what the Task Manager was telling me. The total memory used is over 1.5GB
Quite how that can be I don't understand. A 250Meg file taking up 1.5GB in memory....ridiculous
I have to find some way of reducing the memory footprint, while ideally keeping the 0-time lookup the dictionary gives me.
The research continues, and any help is more than appreciated!