Difference between revisions of "Retain Indexing"
(→Determining Indexer Status) |
(→Determining Indexer Status) |
||
Line 42: | Line 42: | ||
::[[File:2._indexerStatus.PNG||||border]] | ::[[File:2._indexerStatus.PNG||||border]] | ||
− | |||
− | |||
<br> | <br> |
Revision as of 14:37, 3 June 2014
Contents |
Level 1
Whereas the database is used when browsing messages in the Retain mailbox, indexing is used for performing searches on data stored in Retain. If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.
Indexing Engine
This is where you set the indexing engine Retain will use to index message metadata, message content, and message attachments. Lucene is the most widely used by far, so the screen defaults to Lucene.
Lucene
Lucene is hosted locally on the same machine as the Retain Server, and requires no further configuration, but does not have the same options or the extent of the capabilities as the Exalead engine does, but is sufficient for most customers' needs and we recommend using this indexer. It is quickly catching up to Exalead in capabilities.
Exalead
Because Exalead is a much more robust indexing engine it requires its own server and resources. As such, when Exalead is selected as the indexing engine, a connection address and starting base port are required. The default BASEPORT is 10000. To ensure that the connection to the Exalead server is working, the ‘Test Connection’ button may be selected, which triggers Retain to contact the Exalead server. The results should shortly appear as a small notification window in your browser.
Indexing
You can control what Retain indexes here. You may add as many items as you wish to the list of attachment types to index. Note the explanation at the top of the table. The items are listed (in order) by type, extension, archived form (extractor used), and maximum stream size and file size.
You choose whether to index the attachment based on its filename extension or its MIME type (the content itself). You also choose which extractor to use to index the attachment. Retain supports HTML, RTF, TEXT, XML, OpenXML – (MS Office 2007 .docx), OpenOffice2, Word Perfect documents, Excel files, .DOC, and .PDF under the Lucene indexing engine, while Retain supports many more under Exalead.
Because of high CPU, memory, and performance requirements, MS Word and Adobe PDF are not indexed by default and must be enabled to be indexed. If you need to index these items, the allotted memory should be increased. Indexing these items will slow down the indexing process. Select as many as you need. If an attachment type common in the system which needs to be indexed, but which does not already exist in the system, it may be added by using the ‘add’ row. We recommend that these document types be marked if the customer wishes to do keyword searches and have those attachment types included in the search results.
Also by default, Retain only indexes the first 2MB of each attachment. This saves disk space and minimizes impact on performance; however, again, if the intent is for users to find documents through searches, we recommend setting both the “stream size” and “file size” field for every document type to “-1”, which instructs the Indexer to index the entire content.
As of 3/6/14, waiting on answers from development on what constitutes the "stream" and what the File Size limit means. See bug 4593.
The Stream size is the upper limit on how much text should be stored. The File Size limit indicates the size of an attachment to be stored in the index. When adjusting the Stream Size, or the file Size, the sizes are in bytes. Adjusting lower sizes, may help increasing the speed of searching, limiting the sizes so it doesn’t bog down the system. Increasing the sizes will allow larger attachments and files to be indexed, but on very large files could slow down the searching. You may enter in a -1 to have no limit on the sizes if you choose.
“Force Indexing” tells the server to index items that are not currently indexed. This queries the system for the top 500,000 items that are not currently indexed, and starts the indexer working if it is not currently working.
Level 2
What Happens a When Search Is Performed
When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds. Those message IDs are then looked up in the retain database for specific information about the items themselves.
Determining Indexer Status
There are several ways to tell if the indexer is running:
- 1. The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine. Not returning results, however, does not necessarily mean that the Indexer is not running or operational. If the search results are empty, go to step 2.
- 2. There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.
- 2. There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.
- The status of the Indexer is shown where it reads, "Indexer is alive:"
- True = The indexer is running.
- False = The indexer is not running.
- The status of the Indexer is shown where it reads, "Indexer is alive:"
- It will also display any items that are in the queue waiting to be indexed or that have not been ::indexed. It will only display 10,000 items and will have a plus ("+") sign next to it if there is more. You can turn the Indexer on or off.
- 3. Indexer logs. In the /opt/beginfinite/retain/tomcat7/logs directory (Apache Software Foundation/tomcat7/logs in Windows) the indexer has its own logs. They will be titled: Indexer.date.log. Looking in the log, can help to determine if there are errors, or if the indexer is turned off. Below is the initialization and startup of the undexer.
10:25:33,611 LuceneIndexingManager - Indexing manager initialization... 10:25:33,952 LuceneIndexingStats - Stats updater launched 10:25:34,690 LuceneIndexingAddition - Create IndexWriter for Lucene: version=LUCENE_35,path=/retaindata/index/, createMode=false,supportPrefixWildcards=true 10:25:35,728 ServerIndexingBroker - Determining backgroundIndexer for engine:lucene 10:25:38,900 LuceneIndexingManager - Created background indexer... 10:25:38,904 IndexingThread - Start index master 10:25:38,908 LuceneIndexingManager - Indexing manager successfully initialized 10:25:39,516 IndexAdminMessageConsumerImpl - Trying to process the operation: INDEX_LAUNCH_STARTUP 10:25:39,517 IndexAdminMessageConsumerImpl - Will initialize lucene 10:25:39,517 IndexAdminMessageConsumerImpl - INIT: Indexing Manager being launched 10:25:39,517 LuceneIndexingManager - Indexing manager initialization... 10:25:39,561 LuceneIndexingManager - Created background indexer... 10:25:39,561 LuceneIndexingManager - Indexing manager successfully initialized
- Here is a snippet from the log on what it looks like when messages are indexed:
10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS 10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList... 10:45:32,809 LuceneDocumentUtil - NEW LuceneDocumentUtil enabled
What happens if the Indexer is not working or is turned off?
What if the search produces no results after you have adjusted the date view and reset the search?
The first thing to do is check to see if the indexer is turned off. If it says "...alive: false", try turning the indexer on by clicking on the "Try turning indexer on" button. Take note of the number of items unindexed items. You can also view the unindexed items in the Server status page in Retain.
Another method is to restart tomcat. This will shut down any indexer threads and restart them, bringing the indexer back alive.
Looking in the indexer logs can also help to identify if the indexer is not working if you see this in the log:
10:46:34,125 IndexingThread - End index master 10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager 10:46:34,630 NRTSingleton - NRTSingleton: Closing NRTManagerReopenThread 10:46:34,633 LuceneIndexingStats - Stats updater going away 10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources
Identifying Items Not Indexed
Refer to KB article, How to View the Number of Items Not Indexed, for instructions on identifying items that have not been indexed.
Fixing Missing or Corrupt Index Files
Refer to KB article, Fixing Missing Index Files / Indexer Fails to Load, for instructions on checking for missing index files and/or index file corruption.
Rebuilding Indexes
Refer to KB article, How to Rebuild Indexes, on symptoms of when indexes may need to be rebuilt and how to rebuild them.