Difference between revisions of "Retain Publishing Indexer"

From GWAVA Technologies Training
Jump to: navigation, search
(Database)
(Database)
 
(2 intermediate revisions by one user not shown)
Line 131: Line 131:
  
 
===Database===
 
===Database===
After messages get indexed, the database will reflect those changes.  
+
After messages get indexed, the database will reflect those changes.
  
 
[[File:2._Index_db._png.png||||border]]
 
[[File:2._Index_db._png.png||||border]]
  
In the attachment table, the Indexed column at the far right will show whether the attachment has been indexed. If it shows as 0 it has not been indexed. Alternatively, if it shows 1 then the index was successful.  
+
In the Attachment table, the Indexed column at the far right will show whether the attachment has been indexed. If it shows as 0 it has not been indexed. Alternatively, if it shows 1 then the index was successful.  
  
Opening the database is something a customer may not do. However, opening the database and viewing the details of each message can be helpful to identify if there are specific messages that have not been indexed.  
+
Opening the database is something a customer may not do. However, opening the database and viewing the details of each message can be helpful to identify if there are specific messages that have not been indexed. It is an SQLite database and there are several free graphical editors available.
  
 
To show a list of the messages that have been indexed type in this query:           
 
To show a list of the messages that have been indexed type in this query:           
::select * from Attachment where Indexed = 1;
+
::'''<span style="color:#0000FF">select * from Attachment where Indexed = 1;''' </span>
  
 
Change the 1 to a 0 to see which messages have not been indexed.
 
Change the 1 to a 0 to see which messages have not been indexed.

Latest revision as of 01:21, 8 March 2014

Contents

[edit] Level 1

In the Publisher Training module it was mentioned that, after publishing, the Indexer will kick off automatically if the configuration is set to do so. The Indexer can also be run manually at any time.

In this module you will learn:

  • Index Filtering
  • Indexing Modes: Automatic or Manual
  • Indexing

[edit] Index Filtering

In both the automatic and manual processes of the Indexer, there are Indexer Filter Settings. The last screen prior to retrieving messages will allow you to select, or deselect, any types of attachment types you wish to not index. This may be beneficial if you are only searching for a specific type of attachment and wish to leave the others out of your search.

3. Indexer Filter Settings.png


The Manual version of the Indexer Filter settings is a button that will allow you to change these settings. Simply click the button, select or de-select your types, and click Close.

[edit] Indexing Setup

As messages get published, they are stored in a SQLite database. When the Viewer is launched it queries the database and displays the messages that have been exported. Within the Retain Viewer, it has the ability to search for messages. This is where the Indexer comes in. Without Indexing the messages, the Viewer will not all you to search for any messages. The option to search will be greyed out. In this case, running the Indexer is the step to take to search.

During the Installation of the Publisher, the Indexer is automatically installed. There is not a separate installation for the Indexer. The Publisher.exe is what needs to be run to index the messages.

There are 2 modes for the Indexer:

  • Automatic
  • Manual

[edit] Automatic

Setting up the automatic indexing happened during the process of selecting the location of where you want your exported messages located.

1.Automatic Indexer.png

After choosing the location of the Published files, click the box: "Start Indexer after messages have been retrieved". At the end of the publishing, the Indexer process will automatically start.

[edit] Manual

If you do not check the automatic option to index after message retrieval, you can still run the Indexer on a database manually. After publishing messages, you can launch the Indexer (Start Menu) and point it to the database to index. It will run the indexing exactly the same as during the automatic run.

2. Indexer Start Menu.png


After clicking on the Indexer to launch it will pull up a screen that may contain previously indexed locations, or blank.


4. Manual Index.png


To add a new location, (a database that needs to be indexed) click on the Add location and select the location of where the db is located. You only have to select the parent directory above the db to access the data needed to be indexed. It is important though, that there is a database in the directory structure you are trying to index. Look for the db directory (base.db) to verify that it is the location.

[edit] Indexing

The Indexing is all automatic, and will run on its own. After selecting location, and filters, click on Start Indexing to being the process. Depending on how many messages it needs to index can take a long time, or go through very quick. As it indexes it will show the current Indexing document it is working on. The bigger the document the longer it will take to process.

5. Indexing.png


Once the indexer is finished, there will be a green checkmark next to the location that signifies it is complete. Click Close to close out of the Indexer.

[edit] Level 2

The indexer allows the Publishing Indexer allows messages exported by the Publisher utility so that the Viewer can search messages. In this level of the Publishing Indexer training, we will take a look at:

      • Indexer Log
      • Database
      • Troubleshooting

[edit] Indexer Log

As the indexer runs, it uses what is called Extractors that collect the messages and index them. Each of these extractors are designed for the specific type of attachment that is being indexed. If you look at the list of Indexing types, each one contains its own extractor (see graphic under Level 1 | Index Filtering).

Not all files can be extracted and indexed. Only the file types listed in the Filters window can be indexed. This line in the log file details how it cannot index all attachments - the .img file type is one of them:

2013-06-05 15:39:56,648 INFO  Publisher - Indexing attachment. Message Id: 1510, Attachment Id: 4226.
2013-06-05 15:39:56,651 INFO  Publisher - IFilter. ExtractorsCollector. filter for 'img' was not found

When messages get indexed, the messages and information are all written to a log file: C:\Users\[loggedinuser]\AppData\Roaming\GWAVA\Retain Publisher\Indexer.log.

When starting the Indexer either automatically or manually, this is what shows up in the log:

 
2013-06-07 09:35:28,058 INFO  Publisher - Indexer started as x64 application. Assembly version: Indexer, Version=3.1.4850.34906, Culture=neutral, PublicKeyToken=null.
2013-06-07 09:35:28,084 INFO  Publisher - Indexer environment: Microsoft Windows NT 6.1.7601 Service Pack 1 (x64)
2013-06-07 09:35:28,085 INFO  Publisher - Indexer references: Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: WindowsFormsIntegration, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: Controls, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: IFilter, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: Lucene.Net, Version=2.9.2.2, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: PresentationCore, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: Microsoft.Practices.Unity, Version=2.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Configuration, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Core, Version=3.5.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Data.Entity, Version=3.5.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO  Publisher - Indexer references: System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089

This goes and checks all of the versions, the .net, and all of the files necessary to run the program. If the Indexer does not start up for any reason, simply check the version, uninstall the Publisher, and re-install with the latest version.

When a message gets indexed, the Indexer will grab each attachment, extract it, and index it accordingly:

2013-06-05 15:39:56,646 INFO  Publisher - Indexing attachment. Message Id: 1509, Attachment Id: 4222.
2013-06-05 15:39:56,647 INFO  Publisher - IFilter. ExtractorsCollector. App will use default extractor for 'htm'.
2013-06-05 15:39:56,648 INFO  Publisher - Indexing message. Message Id: 1510.

When seeing this in the log file this means that the message as successfully indexed. Each message will contain its own ID, which is used in the database for identification and linking up of the documents together (messages and attachments).

When the Indexer comes across an attachment for which the filter type has been disabled in the settings, the log file will write this:

IFilter. ExtractorsCollector. filter for 'txt' not enabled

The Indexer is designed to look at every message its attachments. As it comes accross each, it will log it. Any errors associated with indexing messages/attachments will be found in the log.

[edit] Database

After messages get indexed, the database will reflect those changes.

2. Index db. png.png

In the Attachment table, the Indexed column at the far right will show whether the attachment has been indexed. If it shows as 0 it has not been indexed. Alternatively, if it shows 1 then the index was successful.

Opening the database is something a customer may not do. However, opening the database and viewing the details of each message can be helpful to identify if there are specific messages that have not been indexed. It is an SQLite database and there are several free graphical editors available.

To show a list of the messages that have been indexed type in this query:

select * from Attachment where Indexed = 1;

Change the 1 to a 0 to see which messages have not been indexed.

[edit] Troubleshooting

Troubleshooting the indexer is basically looking in the log files and identifying any errors. Depending on the error, the message can then be tracked down in the Retain System, or you can open up the Publisher database to see if there is anything wrong with the database.

Most errors will crop up as the attachment type just cannot be indexed. As mentioned previously, not all attachments can be indexed.

Any other errors may indicate a problem with the message or attachment. This is where it becomes difficult to troubleshoot, because once the message gets archived into Retain, it becomes stagnant. At that state, there is no way to repair the message. The most you can do is identify the message that is corrupt and remove the type from the filter. This essentially skips any messages with that attachment type.

The Indexer is mostly automatic and hard coded, so there is not much to the configuration except for the filters. At times messages might not be able to get indexed. This is usually caused by issues with the attachment itself.

If you do encounter a problem where the Indexer stop, check the log file (C:\Users\”WindowsUser”\AppData\roaming\GWAVA\Retain Publisher). The log is called Indexer.log and will display all information.

Since the data we are exporting is already static data on the Retain Server, we don’t have any utilities that will help to fix the message. Looking at the log can identify the problem. Unchecking the filter type for the message that cannot get indexed is the only workaround. Retrieving the attachment from the customer and testing it could help us duplicate the issue in-house. Doing this could provide insight into how the Indexer is handling the message and would need to be submitted to development.

Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox