X-Hacker.org- FAST TEXT SEARCH for Clipper v.2.0 - <b>optimization

Click above to get retro games delivered to your door ever month! X-Hacker.org- FAST TEXT SEARCH for Clipper v.2.0 - <b>optimization - making it better</b>
[<<Previous Entry] [^^Up^^] [Next Entry>>] [Menu] [About The Guide]
   Optimization  - Making it Better

   The length and uniqueness of the strings that are passed to CftsSet()
   are extremely important to the performance of CFTS; the more uncommon 
   the string, the better the performance. The .IA index tracks text 
   signatures. The more unique signatures a particular string has, the 
   easier it is for CFTS to identify it. For example, a string like 
   'tested' is made up of very common and frequently occurring characters 
   and character groups. In most cases, it is more difficult (slower) for 
   CFTS to search for 'tested' than it is for it to find a string like 
   'zxcvbnm' because 'tested' will produce more aliases than will 
   'zxcvbnm'. Users of CFTS systems should always be encouraged to supply 
   as much as they can for the search string. Even additional partial 
   words can sometimes make a big difference in search speeds.

   An exception to the more is better rule of CFTS search strings is the 
   case of repeated strings. Once a string has been read and the text 
   signatures calculated, the addition of that same string again has no 
   effect on an .IA index search. This is true for both adding records to 
   the index and passing search strings to be searched for. For example, 
   if a text record contains the same word more than once, the signatures 
   for that word would only be registered in the index key one time. 
   Likewise, if a search string contains a repeated string, record 
   numbers of all index records containing signatures for that string 
   will be returned by CftsNext(). In other words, no performance 
   improvement is realized by specifying a search string more than once.

   Another factor influencing the overall performance of an CFTS    
   system is the nature of the text contained in the text records. A 
   data set made up of a very small vocabulary will offer CFTS a small 
   set of signatures by which to distinguish one record form another. 
   This will result in more time spent verifying aliases. Conversely, 
   a data set offering a large variety of signatures will allow CFTS 
   to perform closer to its theoretical best.

   This version of CFTS was designed to provide rapid text searches of 
   dynamic data under a wide range of situations. It was specifically 
   built to allow rapid additions to and updates of the index file and 
   require small index files. These facts make it less well suited to 
   searching very large amounts of static data. An example of this type 
   of data would be in a CD-ROM application. Here the data will never be 
   updated, disk space (index file size) is not of great concern and slow 
   disk access speed makes the verification processing time even more 
   critical. Developers who have used CFTS in CD-ROM applications have 
   had to make adjustments; such as adding dedicated memory to hold the 
   entire .IA index file. We think CFTS is a wonderful system but it does 
   have its limitations. A version is planned that will be tuned toward 
   large fixed data sets. The index creation will take longer and the 
   index files will be larger but the alias rate will go way down. If you 
   have these requirements, please contact us. We will be happy to give 
   help in optimizing performance in such systems.

   As you can see from the above discussion, CFTS is a versatile and 
   flexible system. Tuning an CFTS application can be as much art as 
   science. You are encouraged to test and experiment in the early stages 
   of application development.
Online resources provided by: http://www.X-Hacker.org --- NG 2 HTML conversion by Dave Pearson