Thesaurus Source Files - Full-Text Retrieval (FTR) - Help

Full-Text Retrieval (FTR) Help

Language
English
Product
Full-Text Retrieval (FTR)
Search by Category
Help

A thesaurus file lists other forms of the words used in a query. It can contain synonyms, plurals, and possessives of the words in the query. When the search process expands a query, the words in the thesaurus file are included so all relevant information is found even though the expanded terms were not included in the original query. The use of a thesaurus file improves the recall of a search and eases the creation of a query.

The thesaurus source file can contain two types of rules: synonym rules and suffix rules.

  • A synonym rule tells the search engine to look for related search terms or search terms with a similar meaning.

  • A suffix rule tells the search engine to look for words with the same stem, but a different ending.

FTR delivers six thesaurus files in the \config directory. The thesaurus files are adjadv.fth, general.fth, nouns.fth, legal.fth, plurals.fth, and verbs.fth.

Each rule has two logical parts, a Left-Hand-Side (LHS) and a Right-Hand-Side (RHS), separated by a colon and terminated by a semicolon. The LHS contains words or suffixes to be matched when a search term is looked up in the thesaurus. The RHS contains a list of alternative words and phrases (synonyms) or suffixes (plurals and possessives). When a match is made with one of the entries in the LHS, the original term is replaced by the alternatives directly from the RHS or formed by combining the word stem with each of the alternative suffixes from the RHS.

Synonym rules contain a list of words in the LHS and a list of words or phrases in the RHS, if present. A phrase in the RHS is denoted by hyphens (or any other stopped punctuation) joining its constituent words. At search time, the synonym rules have precedence over the suffix rules. A match between a search term and a word in the LHS of a synonym rule prevents any suffix processing for that term whether or not any alternatives were generated.

The RHS of synonym rules should include any plurals, possessives, and any other alternatives that should be derived from the terms in the LHS. When the same word appears in the LHS of more than one rule, a synonym lookup for that word generates a combined list of alternatives from the RHS of all the matching rules.

Suffix rules are distinguished by a + (plus) as the first nonblank character. The LHS and optional RHS are lists of suffixes separated by blank space. The special symbol % (percent) can be used to represent a null suffix. Suffix lookup proceeds such that the longest possible suffix in the LHS of all suffix rules is matched. The % symbol represents the suffix of last resort, and should be used in the LHS of only one rule.