Is there an algorithm to merge this kind of list?
I have a list like this:
a a . a b . . a a . a . c . a .
or in list format
[['a', 'a', '.', 'a'], ['b', '.', '.', 'a'], ['a', '.', 'a', '.'], ['c', '.', 'a', '.']]
and I want to merge it into [['a','b','c'],['a'],['a'],['a']], or
a,b,c a a a
so that when two consecutive rows share the same letter at any of the four columns, their elements will be merged non-redundantly. I can do it by pairwise comparisons, but wonder if there are formal algorithms to do this?
You didn't specify the language but you can create a HashMap / HashTable for each column and populate it with the column values. (Your key and value will be the same thing.) Populating a HashMap means you cannot have duplicate keys so you will end uo with a list of unique values in each collection. Then pull out the values from each hashMap into an array, or arrays. If the periods in your sample data are actually periods you will have to ignore them as you loop through the array otherwise you will get them as output.
Take a look at Python dictionaries.
the pseudo code for this solution (Python will look similar ;-)
- Create a list of dictionaries (list length = # of columns)
- Loop over columns
- Loop over rows
- Insert data into appropriate dictionary *
- Loop over list of dictionaries
- Loop over dictionary values
- Create new set of arrays with unique values