Revision as of 17:49, 31 August 2015

Whereas the database is used when browsing messages in the Retain mailbox, the indexes are used for performing searches on data stored in Retain. If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.

Advanced Information

What Happens a When Search Is Performed

When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds. Those message IDs are then looked up in the retain database for specific information about the items themselves.

Determining Indexer Status

There are several ways to tell if the indexer is running:

1. The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine. Not returning results, however, does not necessarily mean that the Indexer is not running or operational. If the search results are empty, go to step 2.

2. There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.

The status of the Indexer is shown where it reads, "Indexer is alive:"

True = The indexer is running.
False = The indexer is not running.

It will also display any items that are in the queue waiting to be indexed or that have not been ::indexed. It will only display 10,000 items and will have a plus ("+") sign next to it if there is more. You can turn the Indexer on or off.

3. Indexer logs. In the /opt/beginfinite/retain/tomcat7/logs directory (Apache Software Foundation/tomcat7/logs in Windows) the indexer has its own logs. They will be titled: Indexer.date.log. Looking in the log, can help to determine if there are errors, or if the indexer is turned off. Below is the initialization and startup of the undexer.

10:25:33,611 LuceneIndexingManager - Indexing manager initialization...
10:25:33,952 LuceneIndexingStats - Stats updater launched
10:25:34,690 LuceneIndexingAddition - Create IndexWriter for Lucene: version=LUCENE_35,path=/retaindata/index/, createMode=false,supportPrefixWildcards=true
10:25:35,728 ServerIndexingBroker - Determining backgroundIndexer for engine:lucene
10:25:38,900 LuceneIndexingManager - Created background indexer...
10:25:38,904 IndexingThread - Start index master
10:25:38,908 LuceneIndexingManager - Indexing manager successfully initialized
10:25:39,516 IndexAdminMessageConsumerImpl - Trying to process the operation: INDEX_LAUNCH_STARTUP
10:25:39,517 IndexAdminMessageConsumerImpl - Will initialize lucene
10:25:39,517 IndexAdminMessageConsumerImpl - INIT: Indexing Manager being launched
10:25:39,517 LuceneIndexingManager - Indexing manager initialization...
10:25:39,561 LuceneIndexingManager - Created background indexer...
10:25:39,561 LuceneIndexingManager - Indexing manager successfully initialized

Here is a snippet from the log on what it looks like when messages are indexed:

10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS
10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList...
10:45:32,809 LuceneDocumentUtil - NEW LuceneDocumentUtil enabled

What happens if the Indexer is not working or is turned off?

What if the search produces no results after you have adjusted the date view and reset the search?

The first thing to do is check to see if the indexer is turned off. If it says "...alive: false", try turning the indexer on by clicking on the "Try turning indexer on" button. Take note of the number of items unindexed items. You can also view the unindexed items in the Server status page in Retain.

Another method is to restart tomcat. This will shut down any indexer threads and restart them, bringing the indexer back alive.

Looking in the indexer logs can also help to identify if the indexer is not working if you see this in the log:

10:46:34,125 IndexingThread - End index master
10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager
10:46:34,630 NRTSingleton - NRTSingleton: Closing NRTManagerReopenThread
10:46:34,633 LuceneIndexingStats - Stats updater going away
10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources

Identifying Items Not Indexed

Refer to KB article, How to View the Number of Items Not Indexed, for instructions on identifying items that have not been indexed.

Fixing Missing or Corrupt Index Files

Refer to KB article, Fixing Missing Index Files / Indexer Fails to Load, for instructions on checking for missing index files and/or index file corruption.

Rebuilding Indexes

Refer to KB article, How to Rebuild Indexes, on symptoms of when indexes may need to be rebuilt and how to rebuild them.

Explanation of f_indexed Field Values in the t_message table

Retain 3.x:

1: Fully indexed
-1: Indexing error. No part of this message was indexed.
-32: Larger than indexing size limit.
-64: Retain doesn't recognize this file type. Therefore, it doesn't know how to extract text from it to index. (I.E. attachment/item was not indexed)
-128: INDEXING_HIBERNATE_EXCEPTION It's unlikely that you'll ever see this code. From Development: I've looked through the code and can't find any place where a hibernate exception is thrown, however, tracking down where exceptions can come from after the fact can be difficult so it is possible that I just missed it. However, if a hibernate exception is thrown this error code means that during the process of indexing this message that an hibernate exception occurred that prevented the indexing of this message.

Retain 4.x:

2: Fully indexed
10: Indexing Error (same as -1 in Retain 3.x)
16: General indexing error (This should never be written to the database.)
32: Larger than indexing size limit. (Same as -32 on 3.x)
64: INDEXING_NOT_ON_WHITELIST (Same as -64 on 3.x)
128: INDEXING_HIBERNATE_EXCEPTION (Same as -128 on 3.x, this should never be written to the database.)
256: INDEXING_EXTRACTOR_EXCEPTION. As per Dev: Any text extraction we perform on any file type could potentially throw an exception and this is the f_indexed value that we set to indicate this. (This should never be written to the database.)
512: INDEXER_EXCEPTION. (As of 2015-02-10 not actually being used in the beta code.)

Tweaking Indexing speed and other tools

In the file /opt/beginfinite/retain/RetainServer/WEB-INF/classes/config/lucene.indexing.properties There are a number of items that can be tweaked.

To access the HPI web interface, go to http://[dns hostname / IP]/hpi

Pre-Release Information

REMEMBER: This information is still in development. Any or all of this information is subject to change.

In other words, this information stays internal to GWAVA.

Known Changes for 4.0 release

Lucene and Exalead will be dropped in favor of SOLR, named High Performance Indexer (HPI) in the Retain UI.

How SOLR Determines the Relevancy Score

From product management:

I did some research too and this subject is not an easy one. Everyone has a preference. Dania Bilal did a research article on "Ranking, relevance judgment, and precision of information retrieval on children's queries: Evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids" which was published and available for hire or purchase here: http://onlinelibrary.wiley.com/doi/10.1002/asi.22675/abstract

The winner was Google by the way... 11% overlap vs 3% for Yahoo kids...

I think we release what we have and get feedback from our customers.

Probable Changes to the Search UI

Sneak peak at the new search UI:

http://retainbeta.gwava.com/RetainServer

admin / myretain4beta

@@ Line 130: / Line 130: @@
 *  [https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html ''Very'' Technical Data on SOLR Relevancy]
-From our product manager:
+From product management:
 :: I did some research too and this subject is not an easy one. Everyone has a preference. Dania Bilal did a research article on "Ranking, relevance judgment, and precision of information retrieval on children's queries: Evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids" which was published and available for hire or purchase here: http://onlinelibrary.wiley.com/doi/10.1002/asi.22675/abstract

Difference between revisions of "Retain Indexing"

Revision as of 17:49, 31 August 2015

Contents

Advanced Information

What Happens a When Search Is Performed

Determining Indexer Status

What happens if the Indexer is not working or is turned off?

Identifying Items Not Indexed

Fixing Missing or Corrupt Index Files

Rebuilding Indexes

Explanation of f_indexed Field Values in the t_message table

Retain 3.x:

Retain 4.x:

Tweaking Indexing speed and other tools

Pre-Release Information

REMEMBER: This information is still in development. Any or all of this information is subject to change.

Known Changes for 4.0 release

How SOLR Determines the Relevancy Score

Probable Changes to the Search UI

Go to Server Configuration

Personal tools

Namespaces

Variants

Views

Actions

Search

Home

Exchange

GroupWise

JAVA

Linux

MTK

Retain

GW Monitoring and Reporting (Redline)

GW Disaster Recovery (Reload)

GW Forensics (Reveal)

GWAVA

Secure Messaging Gateway

GW Mailbox Management (Vertigo)

Windows

Other

User Experience

Search

Toolbox

Languages

Toolbox