Retain Indexing

From GWAVA Technologies Training
Revision as of 19:52, 6 March 2014 by Admin (Talk | contribs)

Jump to: navigation, search

Contents

Level 1

Whereas the database is used when browsing messages in the Retain mailbox, indexing is used for performing searches on data stored in Retain. If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.

Indexing Engine

This is where you set the indexing engine Retain will use to index message metadata, message content, and message attachments. Lucene is the most widely used by far, so the screen defaults to Lucene.

Lucene

Lucene is hosted locally on the same machine as the Retain Server, and requires no further configuration, but does not have the same options or the extent of the capabilities as the Exalead engine does, but is sufficient for most customers' needs and we recommend using this indexer. It is quickly catching up to Exalead in capabilities.

Exalead

Because Exalead is a much more robust indexing engine it requires its own server and resources. As such, when Exalead is selected as the indexing engine, a connection address and starting base port are required. The default BASEPORT is 10000. To ensure that the connection to the Exalead server is working, the ‘Test Connection’ button may be selected, which triggers Retain to contact the Exalead server. The results should shortly appear as a small notification window in your browser.

Indexing

You can control what Retain indexes here. You may add as many items as you wish to the list of attachment types to index. Note the explanation at the top of the table. The items are listed (in order) by type, extension, archived form (extractor used), and maximum stream size and file size.

You choose whether to index the attachment based on its filename extension or its MIME type (the content itself). You also choose which extractor to use to index the attachment. Retain supports HTML, RTF, TEXT, XML, OpenXML – (MS Office 2007 .docx), OpenOffice2, Word Perfect documents, Excel files, .DOC, and .PDF under the Lucene indexing engine, while Retain supports many more under Exalead.

Because of high CPU, memory, and performance requirements, MS Word and Adobe PDF are not indexed by default and must be enabled to be indexed. If you need to index these items, the allotted memory should be increased. Indexing these items will slow down the indexing process. Select as many as you need. If an attachment type common in the system which needs to be indexed, but which does not already exist in the system, it may be added by using the ‘add’ row. We recommend that these document types be marked if the customer wishes to do keyword searches and have those attachment types included in the search results.

Also by default, Retain only indexes the first 2MB of each attachment. This saves disk space and minimizes impact on performance; however, again, if the intent is for users to find documents through searches, we recommend setting both the “stream size” and “file size” field for every document type to “-1”, which instructs the Indexer to index the entire content.

As of 3/6/14, waiting on answers from development on what constitutes the "stream" and what the File Size limit means. See bug 4593.

The Stream size is the upper limit on how much text should be stored. The File Size limit indicates the size of an attachment to be stored in the index. When adjusting the Stream Size, or the file Size, the sizes are in bytes. Adjusting lower sizes, may help increasing the speed of searching, limiting the sizes so it doesn’t bog down the system. Increasing the sizes will allow larger attachments and files to be indexed, but on very large files could slow down the searching. You may enter in a -1 to have no limit on the sizes if you choose.

1. Index Config.PNG


Force Indexing” tells the server to index items that are not currently indexed. This queries the system for the top 500,000 items that are not currently indexed, and starts the indexer working if it is not currently working.

Level 2

Go to Server Configuration

Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox