How can I select a FAQ entry from a user's natural-language inquiry?
I am working on an app where the user submits a series of questions. These questions are freeform text, but are based on a specific product, so I have a general understanding of the context. I have a FAQ listing, and I need to try to match the user's question to a question in the FAQ.
My language is Delphi. My general thought approach is to throw out small "garbage words", a, an, the, is, of, by, etc... Run a stemming program over these words to get the root words, and then try to match as many of the remaining words as possible.
Is there a better approach? I have thought about some type of natural language processing, but I am afraid that I would be looking at years of development, rather than a week or two.
Not sure if this solution is precisely what you're looking for, but if you're looking to parse natural language, you could use the Link-Grammar Parser.
Thankfully, I've translated this for use with Delphi (complete with a demo), which you can download (free and 100% open source) from this page on my blog.
You don't need to invent a new way of doing this. It's all been done before. What you need is called a FAQ finder, introduced by Hammond, et al in 1995 (FAQ finder: a case-based approach to knowledge navigation, 11th Conference on Artificial Intelligence for Applications).
AI Magazine included a paper by some of the same authors as the first paper that evaluated their implementation. Burke, et al, Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System, 1997. It describes two stages for how it works:
First, they use Smart, an information-retrieval system, to generate an initial set of candidate questions based on the user's input. It looks like it works similarly to what you described, stemming all the words and omitting anything on the stop list of short words.
Next, the candidates are scored against the user's query according to statistical similarity, semantic similarity, and coverage. (Read the paper for details.) Scoring semantic similarity relies on WordNet, which groups English words into sets of distinct concepts. The FAQ finder reviewed here was designed to cover all Usenet FAQs; since your covered domain is smaller, it might be feasible for you to apply more domain knowledge than the basics that WordNet provides.