Difference between revisions of "Retain Indexing"
(→Indexing) |
(→Level 2) |
||
Line 24: | Line 24: | ||
==Level 2== | ==Level 2== | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Go to [http://training.gwava.com/index.php5/Retain_Server_Configuration Server Configuration]== | ==Go to [http://training.gwava.com/index.php5/Retain_Server_Configuration Server Configuration]== |
Revision as of 19:14, 6 March 2014
The indexing engine can be changed between the two engines, but requires the index to be re-created. Recreating the indexes is a time consuming process and should not be done unless required. Searches of the Retain Archive during the index re-creation or migration process may not contain all results.
First reading that should be done should be from the RetainServer UI itself under the Indexing Engine section.
Contents |
Level 1
Indexing Engine
This is where you set the indexing engine Retain will use to index message metadata, message content, and message attachments. Lucene is the most widely used by far, so the screen defaults to Lucene.
Lucene
Lucene is hosted locally on the same machine as the Retain Server, and requires no further configuration, but does not have the same options or the extent of the capabilities as the Exalead engine does, but is sufficient for most customers' needs and we recommend using this indexer. It is quickly catching up to Exalead in capabilities.
Exalead
Because Exalead is a much more robust indexing engine it requires its own server and resources. As such, when Exalead is selected as the indexing engine, a connection address and starting base port are required. The default BASEPORT is 10000. To ensure that the connection to the Exalead server is working, the ‘Test Connection’ button may be selected, which triggers Retain to contact the Exalead server. The results should shortly appear as a small notification window in your browser.
Indexing
You can control what Retain indexes here. You may add as many items as you wish to the list of attachment types to index. Note the explanation at the top of the table. The items are listed (in order) by type, extension, archived form (extractor used), and maximum stream size and file size.
You choose whether to index the attachment based on its filename extension or its MIME type (the content itself). You also choose which extractor to use to index the attachment. Retain supports HTML, RTF, TEXT, XML, OpenXML – (MS Office 2007 .docx), OpenOffice2, Word Perfect documents, Excel files, .DOC, and .PDF under the Lucene indexing engine, while Retain supports many more under Exalead.
Because of high CPU, memory, and performance requirements, MS Word and Adobe PDF are not indexed by default and must be enabled to be indexed. If you need to index these items, the allotted memory should be increased. Indexing these items will slow down the indexing process. Select as many as you need. If an attachment type common in the system which needs to be indexed, but which does not already exist in the system, it may be added by using the ‘add’ row. We recommend that these document types be marked if the customer wishes to do keyword searches and have those attachment types included in the search results.
Also by default, Retain only indexes the first 2MB of each attachment. This saves disk space and minimizes impact on performance; however, again, if the intent is for users to find documents through searches, we recommend setting both the “stream size” and “file size” field for every document type to “-1”, which instructs the Indexer to index the entire content.
“Force Indexing” tells the server to index items that are not currently indexed. This queries the system for the top 500,000 items that are not currently indexed, and starts the indexer working if it is not currently working.