need to search for social security number in thousands of documents (.doc,.docx,.pdf) in C#

Which is the best way to access the documents (opening and reading only text) so that searching is faster. I have already tried using Microsoft office word object to open and get the text by creating a word application and opening the files. I cant even go with threading because either i need to create only one word application which wont help me in threading and if i create word application in each thread the system cant handle it. How do you suggest me to go.

Thanks in advance

Answers


Ah... go back to reading the documentation of your operating system. FOr quite some time (i.e. many many years) there is an indexing and search system there that actually a lot of things can hook in (if you install the proper filters, downloadable from microsoft, adobe etc.).

This creates a full text index that then has an API to search. A LOT more efficient for repeatedly searching a large number of documents.


Need Your Help

How can I force a Perl script to use ActiveState's wperl?

windows perl activeperl

I have a Perl script that requires two command line arguments that takes a while to run. I am using ActiveState Perl to run it. I can call it with wperl script.pl arg1 arg2 from a command prompt an...

How to store images into database using php in android?

java android android-image

Hi i want to store images into database.I created table with fields like imgID tinyint, image blob. But i have a little bit confusion on it.Can any body tel me how can i do that? Thanks in advance.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.