Inventors:
Chidanand Apte - Chappaqua NY
Frederick J. Damerau - North Salem NY
Sholom M. Weiss - Highland Park NJ
Brian F. White - Yorktown Heights NY
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 1730
Abstract:
A lightweight document matcher employs minimal processing and storage. The lightweight document matcher matches new documents to those stored in a database. The matcher lists, in order, those stored documents that are most similar to the new document. The new documents are typically problem statements or queries, and the stored documents are potential solutions such as FAQs (Frequently Asked Questions). Given a set of documents, titles, and possibly keywords, an automatic back-end process constructs a global dictionary of unique keywords and local dictionaries of relevant words for each document. The application front-end uses this information to score the relevance of stored documents to new documents. The scoring algorithm uses the count of matched words as a base score, and then assigns bonuses to words that have high predictive value. It optionally assigns an extra bonus for a match of words in special sections, e. g. , titles.