Difference between revisions of "Retain Indexing"
(→Meaning of Values in T_MESSAGE.F_INDEXED Field) |
|||
Line 93: | Line 93: | ||
===Meaning of Values in T_MESSAGE.F_INDEXED Field=== | ===Meaning of Values in T_MESSAGE.F_INDEXED Field=== | ||
− | ::''1'' (0x01): Fully indexed | + | ::'''1''' (0x01): Fully indexed |
− | ::''-1'' (-0x01): Indexing error (would this be no part was indexed or partially indexed?) | + | ::'''-1''' (-0x01): Indexing error (would this be no part was indexed or partially indexed?) |
− | ::''-32'' (-0x20): Larger than indexing size limit. | + | ::'''-32''' (-0x20): Larger than indexing size limit. |
− | :''-64'' (-0x40): Not sure what "INDEXING_NOT_ON_WHITELIST" means. | + | :'''-64''' (-0x40): Not sure what "INDEXING_NOT_ON_WHITELIST" means. |
− | :''-128'' (-0x80): I don't think anyone in support has seen this one, what is it for future reference? INDEXING_HIBERNATE_EXCEPTION | + | :'''-128''' (-0x80): I don't think anyone in support has seen this one, what is it for future reference? INDEXING_HIBERNATE_EXCEPTION |
For Solr (4.0+) | For Solr (4.0+) | ||
− | ::''2''(0x02): Fully indexed | + | ::'''2'''(0x02): Fully indexed |
− | ::''10'' (0x10): Indexing error (same as -1 on Lucene/Exaleed I assume. If so I have the same question as above. Also, you can kill 2 questions with one answer.) | + | ::'''10''' (0x10): Indexing error (same as -1 on Lucene/Exaleed I assume. If so I have the same question as above. Also, you can kill 2 questions with one answer.) |
− | ::''32'' (0x20): Larger than indexing size limit. (similar to -32 on old indexer?) | + | ::'''32''' (0x20): Larger than indexing size limit. (similar to -32 on old indexer?) |
− | ::''64'' (0x40): INDEXING_NOT_ON_WHITELIST (similar to -64 on old indexer?) | + | ::'''64''' (0x40): INDEXING_NOT_ON_WHITELIST (similar to -64 on old indexer?) |
− | ::''128'' (0x80): INDEXING_HIBERNATE_EXCEPTION (same question as above for -128/-0x80) | + | ::'''128''' (0x80): INDEXING_HIBERNATE_EXCEPTION (same question as above for -128/-0x80) |
− | ::''256'' (0x100): INDEXING_EXTRACTOR_EXCEPTION. I assume this means that it had that specific error, generally I've seen that be a .pdf extraction problem. Just couldn't pull the data out of the file. Item was not indexed. | + | ::'''256''' (0x100): INDEXING_EXTRACTOR_EXCEPTION. I assume this means that it had that specific error, generally I've seen that be a .pdf extraction problem. Just couldn't pull the data out of the file. Item was not indexed. |
− | ::''512'' (0x200): INDEXER_EXCEPTION. Other misc errors I assume? Also was not indexed at all? How does this differ from 10 (0x10)? | + | ::'''512''' (0x200): INDEXER_EXCEPTION. Other misc errors I assume? Also was not indexed at all? How does this differ from 10 (0x10)? |
===Tweaking Indexing speed and other tools=== | ===Tweaking Indexing speed and other tools=== |
Revision as of 17:03, 9 February 2015
Whereas the database is used when browsing messages in the Retain mailbox, the indexes are used for performing searches on data stored in Retain. If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.
Contents
|
How It Works
What Happens a When Search Is Performed
When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds. Those message IDs are then looked up in the retain database for specific information about the items themselves.
Determining Indexer Status
There are several ways to tell if the indexer is running:
- 1. The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine. Not returning results, however, does not necessarily mean that the Indexer is not running or operational. If the search results are empty, go to step 2.
- 2. There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.
- 2. There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.
- The status of the Indexer is shown where it reads, "Indexer is alive:"
- True = The indexer is running.
- False = The indexer is not running.
- The status of the Indexer is shown where it reads, "Indexer is alive:"
- It will also display any items that are in the queue waiting to be indexed or that have not been ::indexed. It will only display 10,000 items and will have a plus ("+") sign next to it if there is more. You can turn the Indexer on or off.
- 3. Indexer logs. In the /opt/beginfinite/retain/tomcat7/logs directory (Apache Software Foundation/tomcat7/logs in Windows) the indexer has its own logs. They will be titled: Indexer.date.log. Looking in the log, can help to determine if there are errors, or if the indexer is turned off. Below is the initialization and startup of the undexer.
10:25:33,611 LuceneIndexingManager - Indexing manager initialization... 10:25:33,952 LuceneIndexingStats - Stats updater launched 10:25:34,690 LuceneIndexingAddition - Create IndexWriter for Lucene: version=LUCENE_35,path=/retaindata/index/, createMode=false,supportPrefixWildcards=true 10:25:35,728 ServerIndexingBroker - Determining backgroundIndexer for engine:lucene 10:25:38,900 LuceneIndexingManager - Created background indexer... 10:25:38,904 IndexingThread - Start index master 10:25:38,908 LuceneIndexingManager - Indexing manager successfully initialized 10:25:39,516 IndexAdminMessageConsumerImpl - Trying to process the operation: INDEX_LAUNCH_STARTUP 10:25:39,517 IndexAdminMessageConsumerImpl - Will initialize lucene 10:25:39,517 IndexAdminMessageConsumerImpl - INIT: Indexing Manager being launched 10:25:39,517 LuceneIndexingManager - Indexing manager initialization... 10:25:39,561 LuceneIndexingManager - Created background indexer... 10:25:39,561 LuceneIndexingManager - Indexing manager successfully initialized
- Here is a snippet from the log on what it looks like when messages are indexed:
10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS 10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList... 10:45:32,809 LuceneDocumentUtil - NEW LuceneDocumentUtil enabled
What happens if the Indexer is not working or is turned off?
What if the search produces no results after you have adjusted the date view and reset the search?
The first thing to do is check to see if the indexer is turned off. If it says "...alive: false", try turning the indexer on by clicking on the "Try turning indexer on" button. Take note of the number of items unindexed items. You can also view the unindexed items in the Server status page in Retain.
Another method is to restart tomcat. This will shut down any indexer threads and restart them, bringing the indexer back alive.
Looking in the indexer logs can also help to identify if the indexer is not working if you see this in the log:
10:46:34,125 IndexingThread - End index master 10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager 10:46:34,630 NRTSingleton - NRTSingleton: Closing NRTManagerReopenThread 10:46:34,633 LuceneIndexingStats - Stats updater going away 10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources
Identifying Items Not Indexed
Refer to KB article, How to View the Number of Items Not Indexed, for instructions on identifying items that have not been indexed.
Fixing Missing or Corrupt Index Files
Refer to KB article, Fixing Missing Index Files / Indexer Fails to Load, for instructions on checking for missing index files and/or index file corruption.
Rebuilding Indexes
Refer to KB article, How to Rebuild Indexes, on symptoms of when indexes may need to be rebuilt and how to rebuild them.
Meaning of Values in T_MESSAGE.F_INDEXED Field
- 1 (0x01): Fully indexed
- -1 (-0x01): Indexing error (would this be no part was indexed or partially indexed?)
- -32 (-0x20): Larger than indexing size limit.
- -64 (-0x40): Not sure what "INDEXING_NOT_ON_WHITELIST" means.
- -128 (-0x80): I don't think anyone in support has seen this one, what is it for future reference? INDEXING_HIBERNATE_EXCEPTION
For Solr (4.0+)
- 2(0x02): Fully indexed
- 10 (0x10): Indexing error (same as -1 on Lucene/Exaleed I assume. If so I have the same question as above. Also, you can kill 2 questions with one answer.)
- 32 (0x20): Larger than indexing size limit. (similar to -32 on old indexer?)
- 64 (0x40): INDEXING_NOT_ON_WHITELIST (similar to -64 on old indexer?)
- 128 (0x80): INDEXING_HIBERNATE_EXCEPTION (same question as above for -128/-0x80)
- 256 (0x100): INDEXING_EXTRACTOR_EXCEPTION. I assume this means that it had that specific error, generally I've seen that be a .pdf extraction problem. Just couldn't pull the data out of the file. Item was not indexed.
- 512 (0x200): INDEXER_EXCEPTION. Other misc errors I assume? Also was not indexed at all? How does this differ from 10 (0x10)?
Tweaking Indexing speed and other tools
In the file /opt/beginfinite/retain/RetainServer/WEB-INF/classes/config/lucene.indexing.properties There are a number of items that can be tweaked.
Pre-Release Information
REMEMBER: This information is still in development.
Any or all of this information is subject to change.
In other words, this information stays internal to GWAVA.
Known Changes for 4.0 release
Lucene and Exalead will be dropped in favor of SOLR Cloud (Really an updated version of Lucene)
Probable Changes to the Search UI
Sneak peak at the new search UI:
There will be a blank search field, after you enter at least 3 characters the system will offer suggestions. Or you can press enter or start your search and it will come back with results. You can then narrow down your results with sections that will appear on the left.
As of 11/20/2014: It was decided that entries from the same section would be explicitly OR’d together while entries from different sections would be explicitly AND’d together.
I.E. "Search In" with Subject and Sender will be OR statements. This will include every result that the search term applies to weather it's in the subject or the sender address. Alternately, if you search Subject from "Search IN" and Mail from "Scope" the search term must show up in both the subject AND the Mail From address.