Difference between revisions of "Retain Indexing"

From GWAVA Technologies Training
Jump to: navigation, search
(Level 2)
(Solr memory issue)
 
(81 intermediate revisions by 5 users not shown)
Line 1: Line 1:
  
==Level 1==
+
Whereas the database is used when browsing messages in the Retain mailbox, the indexes are used for performing searches on data stored in Retain.  If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.
Whereas the database is used when browsing messages in the Retain mailbox, indexing is used for performing searches on data stored in Retain.  If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.  
+
  
===Indexing Engine===
+
=Advanced Information=
This is where you set the indexing engine Retain will use to index message metadata, message content, and message attachments.  Lucene is the most widely used by far, so the screen defaults to Lucene.
+
 
+
====Lucene====
+
Lucene is hosted locally on the same machine as the Retain Server, and requires no further configuration, but does not have the same options or the extent of the capabilities as the Exalead engine does, but is sufficient for most customers' needs and we recommend using this indexer.  It is quickly catching up to Exalead in capabilities.
+
 
+
====Exalead====
+
Because Exalead is a much more robust indexing engine it requires its own server and resources. As such, when Exalead is selected as the indexing engine, a connection address and starting base port are required. The default BASEPORT is 10000. To ensure that the connection to the Exalead server is working, the ‘Test Connection’ button may be selected, which triggers Retain to contact the Exalead server. The results should shortly appear as a small notification window in your browser.
+
 
+
===Indexing===
+
You can control what Retain indexes here. You may add as many items as you wish to the list of attachment types to index. Note the explanation at the top of the table. The items are listed (in order) by type, extension, archived form (extractor used), and maximum stream size and file size.<br>
+
 
+
You choose whether to index the attachment based on its filename extension or its MIME type (the content itself). You also choose which extractor to use to index the attachment. Retain supports HTML, RTF, TEXT, XML, OpenXML – (MS Office 2007 .docx), OpenOffice2, Word Perfect documents, Excel files, .DOC, and .PDF under the Lucene indexing engine, while Retain supports many more under Exalead. <br>
+
 
+
Because of high CPU, memory, and performance requirements, MS Word and Adobe PDF are not indexed by default and must be enabled to be indexed. If you need to index these items, the allotted memory should be increased. Indexing these items will slow down the indexing process. Select as many as you need. If an attachment type common in the system which needs to be indexed, but which does not already exist in the system, it may be added by using the ‘add’ row. We recommend that these document types be marked if the customer wishes to do keyword searches and have those attachment types included in the search results.
+
 
+
Also by default, Retain only indexes the first 2MB of each attachment.  This saves disk space and minimizes impact on performance; however, again, if the intent is for users to find documents through searches, we recommend setting both the “stream size” and “file  size” field for <i>every</i> document type to “-1”, which instructs the Indexer to index the entire content.
+
 
+
As of 3/6/14, waiting on answers from development on what constitutes the "stream" and what the File Size limit means.  See bug [http://bugzilla.gwava.com/show_bug.cgi?id=4593 4593].
+
 
+
The Stream size is the upper limit on how much text should be stored. The File Size limit indicates the size of an attachment to be stored in the index. When adjusting the Stream Size, or the file Size, the sizes are in bytes. Adjusting lower sizes, may help increasing the speed of searching, limiting the sizes so it doesn’t bog down the system.  Increasing the sizes  will allow larger attachments and files to be indexed, but on very large files could slow down the searching. You may enter in a -1 to have no limit on the sizes if you choose.
+
 
+
[[File:1._Index_Config.PNG||||border]]
+
 
+
<br>
+
“<b>Force Indexing</b>” tells the server to index items that are not currently indexed. This queries the system for the top 500,000 items that are not currently indexed, and starts the indexer working if it is not currently working.
+
 
+
==Level 2==
+
 
===What Happens a When Search Is Performed===
 
===What Happens a When Search Is Performed===
 
When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds.  Those message IDs are then looked up in the retain database for specific information about the items themselves.
 
When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds.  Those message IDs are then looked up in the retain database for specific information about the items themselves.
Line 37: Line 9:
 
There are several ways to tell if the indexer is running:  
 
There are several ways to tell if the indexer is running:  
  
::<b>1.</b> The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine.  Not returning results, however, does not necessarily mean that the Indexer is not running or operational.  If the search results are empty, go to step 2.
+
:: The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine.  Not returning results, however, does not necessarily mean that the Indexer is not running or operational.  If the search results are empty, go to step 2.
  
::<b>2.</b> There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer. <br><br>
+
:: There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer. <br><br>
  
 
::[[File:2._indexerStatus.PNG||||border]]
 
::[[File:2._indexerStatus.PNG||||border]]
Line 58: Line 30:
 
word-wrap: break-word;
 
word-wrap: break-word;
 
margin-left: 5em;
 
margin-left: 5em;
width: 100%">
+
width: 50%">
 
10:25:33,611 LuceneIndexingManager - Indexing manager initialization...
 
10:25:33,611 LuceneIndexingManager - Indexing manager initialization...
 
10:25:33,952 LuceneIndexingStats - Stats updater launched
 
10:25:33,952 LuceneIndexingStats - Stats updater launched
Line 82: Line 54:
 
word-wrap: break-word;
 
word-wrap: break-word;
 
margin-left: 5em;
 
margin-left: 5em;
width: 100%">
+
width: 50%">
 
10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS
 
10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS
 
10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList...
 
10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList...
Line 102: Line 74:
 
white-space: -o-pre-wrap;  
 
white-space: -o-pre-wrap;  
 
word-wrap: break-word;
 
word-wrap: break-word;
margin-left: 2em;
+
margin-left: 5em;
width: 100%">
+
width: 40%">
 
10:46:34,125 IndexingThread - End index master
 
10:46:34,125 IndexingThread - End index master
 
10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager
 
10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager
Line 110: Line 82:
 
10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources
 
10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources
 
</pre>
 
</pre>
 
  
 
===Identifying Items Not Indexed===
 
===Identifying Items Not Indexed===
Line 120: Line 91:
 
===Rebuilding Indexes===
 
===Rebuilding Indexes===
 
Refer to KB article, [http://support2.gwava.com/kb/?View=entry&EntryID=1142 How to Rebuild Indexes], on symptoms of when indexes may need to be rebuilt and how to rebuild them.
 
Refer to KB article, [http://support2.gwava.com/kb/?View=entry&EntryID=1142 How to Rebuild Indexes], on symptoms of when indexes may need to be rebuilt and how to rebuild them.
 +
 +
===Explanation of f_indexed Field Values in the t_message table===
 +
When wanting to see a summary count of messages by each indexing state, run the query described in this kb:  [http://support.gwava.com/kb/?View=entry&EntryID=2765 SQL query for showing the message count of various index states]
 +
 +
 +
====Retain 3.x:====
 +
 +
*1: Fully indexed
 +
*-1:  Indexing error. No part of this message was indexed.
 +
*-32: Larger than indexing size limit.
 +
*-64: Retain doesn't recognize this file type. Therefore, it doesn't know how to extract text from it to index.  (I.E. attachment/item was not indexed)
 +
*-128: INDEXING_HIBERNATE_EXCEPTION  It's unlikely that you'll ever see this code.  From Development: ''I've looked through the code and can't find any place where a hibernate exception is thrown, however, tracking down where exceptions can come from after the fact can be difficult so it is possible that I just missed it.  However, if a hibernate exception is thrown this error code means that during the process of indexing this message that an hibernate exception occurred that prevented the indexing of this message.''
 +
 +
 +
====Retain 4.x:====
 +
 +
*2: Fully indexed
 +
*10: Indexing Error (same as -1 in Retain 3.x)
 +
*16: General indexing error (This should never be written to the database.)
 +
*32: Larger than indexing size limit. (Same as -32 on 3.x)
 +
*64: INDEXING_NOT_ON_WHITELIST (Same as -64 on 3.x)
 +
*128: INDEXING_HIBERNATE_EXCEPTION (Same as -128 on 3.x, this should never be written to the database.)
 +
*256: INDEXING_EXTRACTOR_EXCEPTION.  As per Dev: ''Any text extraction we perform on any file type could potentially throw an exception and this is the f_indexed value that we set to indicate this.'' (This should never be written to the database.)
 +
*512: INDEXER_EXCEPTION.  (As of 2015-02-10 not actually being used in the beta code.)
 +
 +
'''''Note:''''' Just because it ''shouldn't'' be written to the database doesn't mean you ''won't'' ever see it.  Typically you'll want to assume that anything greater than a 2 had an error and is either partially indexed or not indexed at all.
 +
 +
===Tweaking Indexing speed and other tools===
 +
In the file /opt/beginfinite/retain/RetainServer/WEB-INF/classes/config/lucene.indexing.properties There are a number of items that can be tweaked.
 +
<br><br>
 +
==All Things SOLR==
 +
=====Solr MemoryMap Files and Memory Usage=====
 +
Here is a blog post from someone that works for Solr about its use of MemoryMap files and memory usage in general:
 +
[http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Understanding Solr Memory Usage]
 +
 +
=====Accessing the Solr Web Interface=====
 +
To access the HPI web interface, go to '''http://[dns hostname / IP]:8081/hpi/#/'''
 +
<br>
 +
<br>
 +
::NOTE: The username and password is stored in the ASConfig.cfg within the dynamicAttributes tag:
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 5em;
 +
width: 25%">   
 +
    <dynamicAttributes>
 +
        <entry>
 +
          <string>hpiPassword</string>
 +
          <string>retain</string>
 +
        </entry>
 +
        <entry>
 +
          <string>hpiUsername</string>
 +
          <string>admin</string>
 +
        </entry>
 +
      </dynamicAttributes>
 +
</pre>
 +
 +
This interface can be used to test queries.  Simply to go "'''retaincore'''" and then to '''Query'''.  In the second field from the top under a section header "common", you specify the field you are searching then the search criteria.  For example, if I'm looking for a particular message ID of "187234", I would write the query like this: '''id:187234''' and then click on the blue '''Execute Query''' button.  In the window to the right of the query dialog, it will display some data:
 +
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 5em;
 +
width: 25%">
 +
 +
  "responseHeader": {
 +
    "status": 0,
 +
    "QTime": 371,
 +
    "params": {
 +
      "q": "id:2",
 +
      "indent": "true",
 +
      "wt": "json",
 +
      "_": "1442356979581"
 +
    }
 +
  },
 +
  "response": {
 +
    "numFound": 1,
 +
    "start": 0,
 +
    "docs": []
 +
  },
 +
  "facet_counts": {
 +
    "facet_queries": {},
 +
    "facet_fields": {},
 +
    "facet_dates": {},
 +
    "facet_ranges": {},
 +
    "facet_intervals": {},
 +
    "facet_heatmaps": {}
 +
  }
 +
}
 +
 +
</pre>
 +
 +
Look under "'''response'''" at the "'''numFound'''"''':'''.  The number displayed is the number of hits it received.  In this example, it found that a message with the ID of 187234 had a hit, which means that message was indexed.
 +
 +
====Max Docs vs. Num Docs====
 +
numDocs represents the number of searchable documents in the index. maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. (From Solr official Tutorial). You could remove logically deleted files by optimizing your index.
 +
 +
=====How SOLR Determines the Relevancy Score=====
 +
*  [https://wiki.apache.org/solr/SolrRelevancyFAQ SOLR Relevancy FAQ]
 +
*  [https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html ''Very'' Technical Data on SOLR Relevancy]
 +
 +
From product management:
 +
 +
:: I did some research too and this subject is not an easy one. Everyone has a preference. Dania Bilal did a research article on "Ranking, relevance judgment, and precision of information retrieval on children's queries: Evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids" which was published and available for hire or purchase here: http://onlinelibrary.wiley.com/doi/10.1002/asi.22675/abstract
 +
 +
:: The winner was Google by the way... 11% overlap vs 3% for Yahoo kids...
 +
 +
:: I think we release what we have and get feedback from our customers.
 +
 +
==Retain 4.0 Demo==
 +
Demo server credentials that can be used to demonstrate the latest version:
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 5em;
 +
width: 25%">
 +
 +
http://retainbeta.gwava.com/RetainServer
 +
admin / myretain4beta
 +
 +
</pre>
 +
 +
==4.x Indexer Tweaks==
 +
You can override the default thread allocation mechanism in .../RetainServer/WEB-INF/classes/config/solrcloud.indexing.properties file.
 +
 +
==Retain 4.0 Training Video: New Search UI / External HPI Installation and Configuration==
 +
::*[http://training.gwava.com/videos/Retain_4.0_training.mp4 Retain 4.0 Training, delivered on August 28, 2015 by Matt Southwick (MP4)]
 +
 +
::*[http://training.gwava.com/videos/retain_4.0_training.tvs Retain 4.0 Training, delivered on August 28, 2015 by Matt Southwick (Requires TeamViewer)]
 +
 +
==HPI - External Indexer (soon to be it's own page)==
 +
 +
Default file locations
 +
 +
===Windows===
 +
Index Manager:
 +
 +
C:\Program Files (x86)\GWAVA\Retain
 +
 +
Tomcat:
 +
 +
C:\Program Files\Retain\Tomcat7
 +
 +
Shard:
 +
 +
C:\Program Files (x86)\GWAVA\Retain\HPI
 +
 +
===Linux===
 +
Index Manager:
 +
 +
/opt/beginfinite/retain/idxmanager
 +
 +
Tomcat:
 +
 +
/opt/beginfinite/retain/hpi-tomcat
 +
 +
Shard:
 +
 +
/media/retain-hpidata
 +
 +
==Index Schema-Version==
 +
Q.  Has someone a short description what the Index schema-version tells us in retain/ is good for? I've seen servers on 4.1.0.1 with schema version 403 and some on 4.1.0.1 with 340, is this okay, or should i be concerned that there might be something wrong?!?
 +
 +
A.  There was a thought to change the indexing schema but the cost to the customers was too high so it was taking out for now.  All that number tells you is what the indexing schema is, when the change is made and after the migration that number will change.  For now it does nothing.
 +
 +
For the index schema version:  The only difference between 403 and 340 is additional Chinese (either that or some of the other Asian languages) character support.  Other than that, there's nothing different.
 +
 +
NOTE: if they're on 340 then they installed 4.x on either 4.0, 4.0.1, or 4.0.2.  The "403" schema denotes 4.0.3.x and later.
 +
 +
==Solr memory issue==
 +
There’s a tiny little issue with solr and tomcat in Retain 4.5 and below.
 +
 +
Tomcat is not giving solr enough time to shutdown properly when we tell tomcat to stop, so it knocks the feet out from under solr since solr doesn’t have enough time to shut itself down properly. This causes orphaned files and memory leaks.
 +
 +
We need to stop tomcat and then make sure the index .lock file is gone before starting tomcat again. If tomcat is stopped and the .lock file still exists it can be safely deleted.
 +
 +
Windows
 +
…\index\solrhome\retaincore\data\index\write.lock
 +
Linux
 +
.../index/solrhome/retaincore/data/index/write.lock
 +
For example:
 +
/var/opt/beginfinite/retain/index/solrhome/retaincore/data/index/write.lock
 +
 +
If memory is not clearing a reboot of the server will be needed. Stop tomcat, remove the write.lock file, then restart the server.
 +
 +
Fix is set to be going into Retain 4.6
  
 
==Go to [http://training.gwava.com/index.php5/Retain_Server_Configuration Server Configuration]==
 
==Go to [http://training.gwava.com/index.php5/Retain_Server_Configuration Server Configuration]==

Latest revision as of 20:24, 22 June 2018

Whereas the database is used when browsing messages in the Retain mailbox, the indexes are used for performing searches on data stored in Retain. If message metadata, message content, and message attachments have been properly indexed, Retain will be able to find it when the Search feature is used. If that content does not get indexed, then it will not be included in the search results. Making sure that all messages are indexed is very important.

Contents

[edit] Advanced Information

[edit] What Happens a When Search Is Performed

When performing a search, Retain simply takes the search conditions and criteria you are searching for and converts them into a query. This query is sent to the Indexer, which returns message IDs for the items it finds. Those message IDs are then looked up in the retain database for specific information about the items themselves.

[edit] Determining Indexer Status

There are several ways to tell if the indexer is running:

The first and easiest way is to just go perform a search. If it returns any results, then the indexer is just fine. Not returning results, however, does not necessarily mean that the Indexer is not running or operational. If the search results are empty, go to step 2.
There is a Retain Java utility, indexerStatus.jsp, that will check to see if the indexer is running. After logging into the RetainServer as admin, add to the end of RetainServer "/Util/indexerStatus.jsp" (e.g., http://10.1.9.26/RetainServer/Util/indexerStatus.jsp). This will take you to a page and tell you what the status is of the indexer.

2. indexerStatus.PNG


The status of the Indexer is shown where it reads, "Indexer is alive:"
  • True = The indexer is running.
  • False = The indexer is not running.
It will also display any items that are in the queue waiting to be indexed or that have not been ::indexed. It will only display 10,000 items and will have a plus ("+") sign next to it if there is more. You can turn the Indexer on or off.
3. Indexer logs. In the /opt/beginfinite/retain/tomcat7/logs directory (Apache Software Foundation/tomcat7/logs in Windows) the indexer has its own logs. They will be titled: Indexer.date.log. Looking in the log, can help to determine if there are errors, or if the indexer is turned off. Below is the initialization and startup of the undexer.
10:25:33,611 LuceneIndexingManager - Indexing manager initialization...
10:25:33,952 LuceneIndexingStats - Stats updater launched
10:25:34,690 LuceneIndexingAddition - Create IndexWriter for Lucene: version=LUCENE_35,path=/retaindata/index/, createMode=false,supportPrefixWildcards=true
10:25:35,728 ServerIndexingBroker - Determining backgroundIndexer for engine:lucene
10:25:38,900 LuceneIndexingManager - Created background indexer...
10:25:38,904 IndexingThread - Start index master
10:25:38,908 LuceneIndexingManager - Indexing manager successfully initialized
10:25:39,516 IndexAdminMessageConsumerImpl - Trying to process the operation: INDEX_LAUNCH_STARTUP
10:25:39,517 IndexAdminMessageConsumerImpl - Will initialize lucene
10:25:39,517 IndexAdminMessageConsumerImpl - INIT: Indexing Manager being launched
10:25:39,517 LuceneIndexingManager - Indexing manager initialization...
10:25:39,561 LuceneIndexingManager - Created background indexer...
10:25:39,561 LuceneIndexingManager - Indexing manager successfully initialized
Here is a snippet from the log on what it looks like when messages are indexed:
10:45:30,966 IndexAdminConfigMessageConsumerImpl - Trying to process the operation: INDEX_INCREMENT_STATS
10:45:32,803 AbstractBackgroundIndexer - processIndexingOfList...
10:45:32,809 LuceneDocumentUtil - NEW LuceneDocumentUtil enabled

[edit] What happens if the Indexer is not working or is turned off?

What if the search produces no results after you have adjusted the date view and reset the search?

The first thing to do is check to see if the indexer is turned off. If it says "...alive: false", try turning the indexer on by clicking on the "Try turning indexer on" button. Take note of the number of items unindexed items. You can also view the unindexed items in the Server status page in Retain.

Another method is to restart tomcat. This will shut down any indexer threads and restart them, bringing the indexer back alive.

Looking in the indexer logs can also help to identify if the indexer is not working if you see this in the log:

10:46:34,125 IndexingThread - End index master
10:46:34,626 NRTSingleton - NRTSingleton: Closing NRTManager
10:46:34,630 NRTSingleton - NRTSingleton: Closing NRTManagerReopenThread
10:46:34,633 LuceneIndexingStats - Stats updater going away
10:46:34,633 LuceneIndexingManager - IndexingManager has shut down all resources

[edit] Identifying Items Not Indexed

Refer to KB article, How to View the Number of Items Not Indexed, for instructions on identifying items that have not been indexed.

[edit] Fixing Missing or Corrupt Index Files

Refer to KB article, Fixing Missing Index Files / Indexer Fails to Load, for instructions on checking for missing index files and/or index file corruption.

[edit] Rebuilding Indexes

Refer to KB article, How to Rebuild Indexes, on symptoms of when indexes may need to be rebuilt and how to rebuild them.

[edit] Explanation of f_indexed Field Values in the t_message table

When wanting to see a summary count of messages by each indexing state, run the query described in this kb: SQL query for showing the message count of various index states


[edit] Retain 3.x:

  • 1: Fully indexed
  • -1: Indexing error. No part of this message was indexed.
  • -32: Larger than indexing size limit.
  • -64: Retain doesn't recognize this file type. Therefore, it doesn't know how to extract text from it to index. (I.E. attachment/item was not indexed)
  • -128: INDEXING_HIBERNATE_EXCEPTION It's unlikely that you'll ever see this code. From Development: I've looked through the code and can't find any place where a hibernate exception is thrown, however, tracking down where exceptions can come from after the fact can be difficult so it is possible that I just missed it. However, if a hibernate exception is thrown this error code means that during the process of indexing this message that an hibernate exception occurred that prevented the indexing of this message.


[edit] Retain 4.x:

  • 2: Fully indexed
  • 10: Indexing Error (same as -1 in Retain 3.x)
  • 16: General indexing error (This should never be written to the database.)
  • 32: Larger than indexing size limit. (Same as -32 on 3.x)
  • 64: INDEXING_NOT_ON_WHITELIST (Same as -64 on 3.x)
  • 128: INDEXING_HIBERNATE_EXCEPTION (Same as -128 on 3.x, this should never be written to the database.)
  • 256: INDEXING_EXTRACTOR_EXCEPTION. As per Dev: Any text extraction we perform on any file type could potentially throw an exception and this is the f_indexed value that we set to indicate this. (This should never be written to the database.)
  • 512: INDEXER_EXCEPTION. (As of 2015-02-10 not actually being used in the beta code.)

Note: Just because it shouldn't be written to the database doesn't mean you won't ever see it. Typically you'll want to assume that anything greater than a 2 had an error and is either partially indexed or not indexed at all.

[edit] Tweaking Indexing speed and other tools

In the file /opt/beginfinite/retain/RetainServer/WEB-INF/classes/config/lucene.indexing.properties There are a number of items that can be tweaked.

[edit] All Things SOLR

[edit] Solr MemoryMap Files and Memory Usage

Here is a blog post from someone that works for Solr about its use of MemoryMap files and memory usage in general: Understanding Solr Memory Usage

[edit] Accessing the Solr Web Interface

To access the HPI web interface, go to http://[dns hostname / IP]:8081/hpi/#/

NOTE: The username and password is stored in the ASConfig.cfg within the dynamicAttributes tag:
    
     <dynamicAttributes>
        <entry>
          <string>hpiPassword</string>
          <string>retain</string>
        </entry>
        <entry>
          <string>hpiUsername</string>
          <string>admin</string>
        </entry>
      </dynamicAttributes>

This interface can be used to test queries. Simply to go "retaincore" and then to Query. In the second field from the top under a section header "common", you specify the field you are searching then the search criteria. For example, if I'm looking for a particular message ID of "187234", I would write the query like this: id:187234 and then click on the blue Execute Query button. In the window to the right of the query dialog, it will display some data:


  "responseHeader": {
    "status": 0,
    "QTime": 371,
    "params": {
      "q": "id:2",
      "indent": "true",
      "wt": "json",
      "_": "1442356979581"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": []
  },
  "facet_counts": {
    "facet_queries": {},
    "facet_fields": {},
    "facet_dates": {},
    "facet_ranges": {},
    "facet_intervals": {},
    "facet_heatmaps": {}
  }
}

Look under "response" at the "numFound":. The number displayed is the number of hits it received. In this example, it found that a message with the ID of 187234 had a hit, which means that message was indexed.

[edit] Max Docs vs. Num Docs

numDocs represents the number of searchable documents in the index. maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. (From Solr official Tutorial). You could remove logically deleted files by optimizing your index.

[edit] How SOLR Determines the Relevancy Score

From product management:

I did some research too and this subject is not an easy one. Everyone has a preference. Dania Bilal did a research article on "Ranking, relevance judgment, and precision of information retrieval on children's queries: Evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids" which was published and available for hire or purchase here: http://onlinelibrary.wiley.com/doi/10.1002/asi.22675/abstract
The winner was Google by the way... 11% overlap vs 3% for Yahoo kids...
I think we release what we have and get feedback from our customers.

[edit] Retain 4.0 Demo

Demo server credentials that can be used to demonstrate the latest version:


http://retainbeta.gwava.com/RetainServer 
admin / myretain4beta

[edit] 4.x Indexer Tweaks

You can override the default thread allocation mechanism in .../RetainServer/WEB-INF/classes/config/solrcloud.indexing.properties file.

[edit] Retain 4.0 Training Video: New Search UI / External HPI Installation and Configuration

[edit] HPI - External Indexer (soon to be it's own page)

Default file locations

[edit] Windows

Index Manager:

C:\Program Files (x86)\GWAVA\Retain

Tomcat:

C:\Program Files\Retain\Tomcat7

Shard:

C:\Program Files (x86)\GWAVA\Retain\HPI

[edit] Linux

Index Manager:

/opt/beginfinite/retain/idxmanager

Tomcat:

/opt/beginfinite/retain/hpi-tomcat

Shard:

/media/retain-hpidata

[edit] Index Schema-Version

Q. Has someone a short description what the Index schema-version tells us in retain/ is good for? I've seen servers on 4.1.0.1 with schema version 403 and some on 4.1.0.1 with 340, is this okay, or should i be concerned that there might be something wrong?!?

A. There was a thought to change the indexing schema but the cost to the customers was too high so it was taking out for now. All that number tells you is what the indexing schema is, when the change is made and after the migration that number will change. For now it does nothing.

For the index schema version: The only difference between 403 and 340 is additional Chinese (either that or some of the other Asian languages) character support. Other than that, there's nothing different.

NOTE: if they're on 340 then they installed 4.x on either 4.0, 4.0.1, or 4.0.2. The "403" schema denotes 4.0.3.x and later.

[edit] Solr memory issue

There’s a tiny little issue with solr and tomcat in Retain 4.5 and below.

Tomcat is not giving solr enough time to shutdown properly when we tell tomcat to stop, so it knocks the feet out from under solr since solr doesn’t have enough time to shut itself down properly. This causes orphaned files and memory leaks.

We need to stop tomcat and then make sure the index .lock file is gone before starting tomcat again. If tomcat is stopped and the .lock file still exists it can be safely deleted.

Windows

…\index\solrhome\retaincore\data\index\write.lock

Linux

.../index/solrhome/retaincore/data/index/write.lock

For example:

/var/opt/beginfinite/retain/index/solrhome/retaincore/data/index/write.lock

If memory is not clearing a reboot of the server will be needed. Stop tomcat, remove the write.lock file, then restart the server.

Fix is set to be going into Retain 4.6

[edit] Go to Server Configuration

Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox