Retain Publishing Indexer
Contents |
Level 1
In the Publisher Training module it was mentioned that, after publishing, the Indexer will kick off automatically if the configuration is set to do so. The Indexer can also be run manually at any time.
In this module you will learn:
- Index Filtering
- Indexing Modes: Automatic or Manual
- Indexing
Index Filtering
In both the automatic and manual processes of the Indexer, there are Indexer Filter Settings. The last screen prior to retrieving messages will allow you to select, or deselect, any types of attachment types you wish to not index. This may be beneficial if you are only searching for a specific type of attachment and wish to leave the others out of your search.
The Manual version of the Indexer Filter settings is a button that will allow you to change these settings. Simply click the button, select or de-select your types, and click Close.
Indexing Setup
As messages get published, they are stored in a SQLite database. When the Viewer is launched it queries the database and displays the messages that have been exported. Within the Retain Viewer, it has the ability to search for messages. This is where the Indexer comes in. Without Indexing the messages, the Viewer will not all you to search for any messages. The option to search will be greyed out. In this case, running the Indexer is the step to take to search.
During the Installation of the Publisher, the Indexer is automatically installed. There is not a separate installation for the Indexer. The Publisher.exe is what needs to be run to index the messages.
There are 2 modes for the Indexer:
- Automatic
- Manual
Automatic
Setting up the automatic indexing happened during the process of selecting the location of where you want your exported messages located.
After choosing the location of the Published files, click the box: "Start Indexer after messages have been retrieved". At the end of the publishing, the Indexer process will automatically start.
Manual
If you do not check the automatic option to index after message retrieval, you can still run the Indexer on a database manually. After publishing messages, you can launch the Indexer (Start Menu) and point it to the database to index. It will run the indexing exactly the same as during the automatic run.
After clicking on the Indexer to launch it will pull up a screen that may contain previously indexed locations, or blank.
To add a new location, (a database that needs to be indexed) click on the Add location and select the location of where the db is located. You only have to select the parent directory above the db to access the data needed to be indexed. It is important though, that there is a database in the directory structure you are trying to index. Look for the db directory (base.db) to verify that it is the location.
Indexing
The Indexing is all automatic, and will run on its own. After selecting location, and filters, click on Start Indexing to being the process. Depending on how many messages it needs to index can take a long time, or go through very quick. As it indexes it will show the current Indexing document it is working on. The bigger the document the longer it will take to process.
Once the indexer is finished, there will be a green checkmark next to the location that signifies it is complete. Click Close to close out of the Indexer.
Level 2
The indexer allows the Publishing Indexer allows messages exported by the Publisher utility so that the Viewer can search messages. In this level of the Publishing Indexer training, we will take a look at:
- Indexer Log
- Database
- Troubleshooting
Indexer Log
As the indexer runs, it uses what is called Extractors that collect the messages and index them. Each of these extractors are designed for the specific type of attachment that is being indexed. If you look at the list of Indexing types, each one contains its own extractor (see graphic under Level 1 | Index Filtering).
Not all files can be extracted and indexed. Only the file types listed in the Filters window can be indexed. This line in the log file details how it cannot index all attachments - the .img file type is one of them:
2013-06-05 15:39:56,648 INFO Publisher - Indexing attachment. Message Id: 1510, Attachment Id: 4226. 2013-06-05 15:39:56,651 INFO Publisher - IFilter. ExtractorsCollector. filter for 'img' was not found
When messages gets Indexed the messages and information are all written to a log file found in: C:\Users\”loggedinuser”\AppData\Roaming\GWAVA\Retain Publisher
The log file is called: Indexer.log
When starting the Indexer, either automatically or manually, this is what shows up in the log.
2013-06-07 09:35:28,058 INFO Publisher - Indexer started as x64 application. Assembly version: Indexer, Version=3.1.4850.34906, Culture=neutral, PublicKeyToken=null.
2013-06-07 09:35:28,084 INFO Publisher - Indexer environment: Microsoft Windows NT 6.1.7601 Service Pack 1 (x64)
2013-06-07 09:35:28,085 INFO Publisher - Indexer references: Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: WindowsFormsIntegration, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: Controls, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: IFilter, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: Lucene.Net, Version=2.9.2.2, Culture=neutral, PublicKeyToken=null
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: PresentationCore, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: Microsoft.Practices.Unity, Version=2.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Configuration, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Core, Version=3.5.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Data.Entity, Version=3.5.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
2013-06-07 09:35:28,086 INFO Publisher - Indexer references: System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
This goes and checks all of the versions, the .net and all of the files necessary to run the program. If the Indexer does not start up for any reason, simply check the version, uninstall the Publisher and reinstall with the latest version.
When message get indexed it will grab each attachment and extract the attachment, and index it accordingly. The log shows thus:
2013-06-05 15:39:56,646 INFO Publisher - Indexing attachment. Message Id: 1509, Attachment Id: 4222. 2013-06-05 15:39:56,647 INFO Publisher - IFilter. ExtractorsCollector. App will use default extractor for 'htm'. 2013-06-05 15:39:56,648 INFO Publisher - Indexing message. Message Id: 1510.
When seeing this in the log file this means that the message as successfully indexed. Each message will contain its own id, which is used in the database for identification and linking up of the documents together (messages and attachments).
If a type in the Filters settings is disabled and it comes across a message that has that type that is turned off the log file will write this:
IFilter. ExtractorsCollector. filter for 'txt' not enabled
The Indexer log is designed to look at every message and index them. It will go through each attachment and message and write them in the log. Any errors associated with Indexing messages will be found in this log.
Database
After messages get indexed, the database will reflect those changes.
2. Index db.png
In the Attachment table, the Indexed column at the far right will show whether the attachment has been indexed. If it shows as 0 it has not been indexed. Alternatively, if it shows 1 then the index was successful. Opening the database is something a customer may not do. However, opening the database and viewing the details of each message can be helpful to identify if there are specific messages that have not been indexed. To show a list of the messages that have been indexed type in this query: select * from Attachment where Indexed = 1; Change the 1 to a 0 to see which messages have not been indexed.
Troubleshooting
Troubleshooting the indexer is basically looking in the log files and identifying any errors. Depending on the error, the message can then be tracked down in the Retain System, or you can open up the Publisher database to see if there is anything wrong with the database.
Most errors will crop up as the attachment type just cannot be indexed. As mentioned previously, not all attachments can be indexed.
Any other errors may indicate a problem with the message or attachment. This is where it becomes difficult to troubleshoot, because once the message gets archived into Retain, it becomes stagnant. At that state, there is no way to repair the message. The most you can do is identify the message that is corrupt and remove the type from the filter. This essentially skips any messages with that attachment type.
The Indexer is mostly automatic and hard coded, so there is not much to the configuration except for the filters. At times messages might not be able to get indexed. This is usually caused by issues with the attachment itself.
If you do encounter a problem where the Indexer stop, check the log file (C:\Users\”WindowsUser”\AppData\roaming\GWAVA\Retain Publisher). The log is called Indexer.log and will display all information.
Since the data we are exporting is already static data on the Retain Server, we don’t have any utilities that will help to fix the message. Looking at the log can identify the problem. Unchecking the filter type for the message that cannot get indexed is the only workaround. Retrieving the attachment from the customer and testing it could help us duplicate the issue in-house. Doing this could provide insight into how the Indexer is handling the message and would need to be submitted to development.