How do I efficiently crossmatch two ASCII catalogs?

I have two ASCII text files with columnated data. The first column of both files is a 'name' that is consistent across both files. One file has some 6000 rows, the other only has 800. Without doing a for line in file.readlines(): approach - e.g.,

with open('big_file.txt') as catalogue:
with open('small_file.txt') as targets:
    for tline in targets.readlines()[2:]:
        name = tline.split()[0]

        for cline in catalogue.readlines()[8:]:
            if name == cline.split()[0]
                print cline
                catalogue.seek(0)
                break

is there an efficient way to return only the rows (or lines) from the larger file that also appear in the smaller file (using the 'name' as the check)?

It's okay if it is one row at a time for say a file.write(matching_line) the idea would be to create a third file with all the info from the large file for only the objects that are in the small file.

Answers


for line in file.readlines() is not inherently bad. What's bad is the nested loops you have there. You can use a set to keep track of and check all the names in the smaller file:

s = set()
for line in targets:
    s.add(line.split()[0])

Then, just loop through the bigger file and check if the name is in s:

for line in catalogue:
    if line.split()[0] in s:
        print line

Need Your Help

Creating Embeddable JS that requires jQuery, how to not overwrite existing jQuery

javascript jquery ruby-on-rails backbone.js

I am in the process of creating a JS snippet that users can include on their site. This snippet depends on jQuery (and Backbone). It works fine if I embed it in a site that doesn't already have jQu...

How do I access my viewController from my appDelegate? iOS

ios uiviewcontroller uikit storyboard

I have an iOS app I created as a "view-based app" in xCode. I have only one viewController, but it is displayed automatically, and I see no code that ties it to my appDelegate. I need to pass dat...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.