Suitable Java data structure for parsing large data file
I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value 2011-11-30 09:00 10 2011-11-30 09:15 5 2011-12-01 12:42 14 2011-12-01 19:58 19 2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>> to map the date to the rest of the line but is a TreeMap of Lists a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List as a value that I'm worried might be unsuitable.
I'm using a TreeMap because I want to iterate the keys in date order.
is a TreeMap of Lists a ridiculous thing to do?
Conceptually not, but it is going to be very memory-inefficient (both because of the Map and because of the List). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.
For a more memory-efficient solution, create a class that has fields for every column (including a Date), put all those in a List and sort it (ideally using quicksort) when you're done reading.
There's nothing wrong with using a List as the value for a Map. All of those <> look ugly, but it's perfectly fine to put a generics class inside of a generics class.
Instead of using a String as the key, it would probably be better to use java.util.Date because the keys are dates. This will allow the TreeMap to more accurately sort the dates. If you store the dates as Strings, then the TreeMap may not properly sort the dates (they will be sorted as strings, not as "real" dates).
Map<Date, List<String>> map = new TreeMap<Date, List<String>>();
There is no objection against using Lists. Though in your case maybe a List<Integer> as values of the Map would be appropriate.