Syntax and Semantics of the Rules File - Full-Text Retrieval (FTR) - Help

Full-Text Retrieval (FTR) Help

Language
English
Product
Full-Text Retrieval (FTR)
Search by Category
Help

Each rule in the character variant rules file must be on its own line until the end of the file is reached. A rule has four fields, each with a specific starting column and maximum length as follows:

Field Name

What It Contains

Starting Column

Length

substitution code

a colon (:) to indicate substitution anywhere within the word, or a percent sign (%) to indicate a suffix is to be replaced

1

1

target string

the part of the word to be replaced

2

<=4

replacement string

the string that replaces the target string

6

<=4

end of rule

line feed (x0A) or End of File (EOF)

6-10

1

The target and replacement strings must be padded with space characters when they occupy less than four characters.

A suffix matching rule can have an empty target string. In this case every original term generates a character variant that has the replacement string appended as a suffix. Suffix rules are applied only to an ordinary word by itself or as the last component of an implied phrase. For example, given the query terms friend% and micro-computer, the suffix rules are applied to computer only.

Suffix rules are not applied to single-character words. The same rule applies to the last component of an implied phrase, where the last component must contain at least two characters to be eligible for suffix substitution.

The total number of character variants generated from a single query term can become very large when several substitution rules apply. FTR must look up each generated variant form in the index, a large number of variants (more than a few hundred) can cause an unacceptably slow response, even if only a few variants actually occur in the table.

Character variant generation is applied to stop words. To avoid searches on stop words, all spelling variants of each word in the stop file must be explicitly included in the stop file. For example, in a stop file for the German language, include both für and fuer.