Trying to merge files after removing duplicate content
Here is my problem.
I have n files and they all have overlapping and common text in them. I want to create a file using these n files such that the new file only contains unique lines in it that exist across all of the n files.
I am looking for a bash command, python api that can do it for me. If there is an algorithm I can also attempt to code it myself.
If the order of the lines is not important, you could do this:
sort -u file1 file2 ...
This will (a) sort all the lines in all the files, and then (b) remove duplicates. This will give you the lines that are unique among all the files.