Difference between revisions of "Retain Deletion Management"

From GWAVA Technologies Training
Jump to: navigation, search
(Advanced Information)
(Bug Watch)
Line 43: Line 43:
 
These are the important bugs/enhancements I'm watching:
 
These are the important bugs/enhancements I'm watching:
  
* [http://bugzilla.gwava.com/show_bug.cgi?id=7441 Bug 7441 (logging enhancement)]
+
* [http://bugzilla.gwava.com/show_bug.cgi?id=7441 Bug 7441 logging enhancement)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=5914 Bug 5914 (Report shows messages which should not be eligible for deletion)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=5914 Bug 5914 (Report shows messages which should not be eligible for deletion)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=7281 Bug 7281 ("PREVENTLOOP - ABORT" errors in larger deletion jobs)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=7281 Bug 7281 ("PREVENTLOOP - ABORT" errors in larger deletion jobs)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=6120 Bug 6120 (the deletion report stops at 10,001 items when there are more)]
 
* [http://bugzilla.gwava.com/show_bug.cgi?id=6120 Bug 6120 (the deletion report stops at 10,001 items when there are more)]
 
+
* [http://bugzilla.gwava.com/show_bug.cgi?id=7449 Bug 7449 - Show the Retain message ID and all of its associated hashes in the deletion report rather than the natural message ID from the mail system]
  
 
==Advanced Information==
 
==Advanced Information==

Revision as of 17:34, 27 May 2015

After reading Retain Administration and Users Guide on Deletion Management, try this hands-on exercise or refer to the section below it, "How It Works"

Contents

Hands-On Exercise

Prerequisite:  Archive mail from your test mail system a '''few days in advance'''.  The more mail the better, but at least have a few items.  This exercise will require that the items be in Retain for over two days.

1.  Login to your Retain server (the actual VM server, not the RetainServer web UI).
2.  Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows.
3.  Rename the "index" directory to "index.old".  The index directory is located under your Retain storage directory (see Server Configuration | Storage).
4.  Create a new index directory.  If on Linux, make tomcat the owner of that directory:  '''chown tomcat:tomcat index'''.
5.  Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log.  Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs].  For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log.
6.  Start tomcat.
7.  Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin.
8.  Under the Management section in the left-hand pane, select '''Deletion Management'''.
9.  Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”.  Note the various options.  For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived.
11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
13. Go ahead and check the boxes to have mail notifications sent to your mailbox.
14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”.
15. Click on the “Save Changes” icon at the upper-right of the screen.
16. Click on the "Run Job Now" button.
17. Switch over to the actual Retain server and look at the tail of your RetainServer log.  What do you see happening?  Did anything get deleted?
18. Login to a Retain mailbox and check to see if anything older than 1 day.  Was anything deleted?  Why not?
19. Logout of RetainServer web UI.
20. Go back to the actual Retain server and stop tomcat.
21. Delete the new index directory you created.
22. Rename index.old to index.
23. Start tomcat.
24. Login to the RetainServer web UI as admin.
25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button.
26. Switch over to the actual Retain server and look at the tail of your RetainServer log.  Now what do you see?  Any deletion activity?
27. What is the role of indexes in a deletion job?  See Level 2 section to check your answer.

Bug Watch

These are the important bugs/enhancements I'm watching:

Advanced Information

This section will describe the deletion process. With each stage, the log entries you should expect to see in the RetainServer log are shown. If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO. Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log. For example, the stage 1 shows the first log line as follows: If INFO: "Running deletion job". Thus, If INFO is the condition (the logging level set on the Server) and "Starting the deletion job" is the text displayed in the log.

STAGE 1 – The Deletion Job

1. Deletion job is initiated.

If INFO: “Running deletion job.”
If DEBUG and if report only mode: “Will only report, not delete”
If DEBUG and not report only mode: “Normal deletion mode”
If DEBUG: “autoApprove” [Boolean value]
If DEBUG: “Deletion report stored at: [filePath]"
If DEBUG and PO chosen: “Deleting PO [PO]"


2. Searches the indexes.

It starts by sending a query to the Indexer with the given scope and gets a total number of hits as well as the number of hits within the first page. A page consists of 500 records.


If DEBUG: "On Page # [pageNumber]"
If INFO: "Aborting early, deliberately since this is a trial" (if the last page number is less 20 and if in REPORTING ONLY mode)
          By default, we only report the first 10,000 items to be deleted 
          (500 items per page and 20 pages by default: 500 x 20 = 10,000)
If INFO: "Result page TotalHits: [total number of hits from index] [hits on current page] in result set"
[loglevel unknown]: "inSet: [list of all message IDs that are identified]"

It puts all those results in a list of items to be deleted from the index. The list is kept in memory and is lost if Tomcat is stopped at this point.

As it queries the Indexer, you'll see this entry in the Indexer log. Note that the following example is from a deletion job where the date range was set to "1 day or older" and triggered off of delivered date:


IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:[domain] AND postoffice:[post office])

From the text box above, you can see that it shows that it is being triggered off of the delivered date because it states "DeletionQuery:delivered". Since the date range was "1 day or older", the range of the query is shown as follows: * TO [numeric string]. The begin date of the range is represented by an asterisk because it goes back infinitely. The end date of the range is represented by a numerical string. The first 8 characters of the numeric string represent the day before the date this was run. I ran the deletion job on 2015-02-27. I do not know what the last 6 digits represent at this time.

It then displays the total hits found:

 
IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 8144, localHits:1000, sort:<score>

It appears that the database driver picks this up and displays all of the Retain message IDs from retain.t_message.message_id(?) after "inSet:"


[DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: [message IDs]

... and then proceeds to list the first 500:


[DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: [message IDs]

At this point, it is assumed that it sends that list to the Indexer with instructions to delete them from the indexes as we see these entries in the Indexer log:


IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE
LuceneIndexingManager - Delete [message ID] from index
LuceneIndexingManager - Delete [message ID] from index'''
etc...

It does this in batches of 500, as we see these log entries in the Indexer log every 500 items:


IndexSearchMessageConsumerImpl - Trying to process the operation: INDEX_DELETION_TASK_SEARCH
IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents)
LuceneIndexingSearcher - searching with offset: 0, count: 500
LuceneSearchController - query: delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents) field: uuid
LuceneSearchController - query.toString: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871005, start query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871014, end query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 7644, localHits:1000, sort:<score>
IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE

3. Removes litigation hold messages

Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.


If DEBUG: "[emailID] is under litigation hold and cannot be deleted"
If DEBUG: "[emailID] doesn’t belong to user"

4. Updates each message record, putting it in a “deleted” state

This is known as putting the message record "in the dumpster".

There are no log messages for this step. This step is where it puts the messages in the "dumpster". The dumpster is a folder record in the t_folder table and is literally named "Dumpster".

You can find the messages waiting to be deleting in the t_message table by using the following query: SELECT * FROM retain35.t_message where folder_id='273';

NOTE: "273" is the folder_id from the t_folder table which you can get by executing this SQL query: SELECT * FROM retain35.t_folder where f_name='Dumpster';



THE FOLLOWING INFORMATION WAS GIVEN TO ME BY MIKE BELL AND IS VERY SKETCHY - WAITING FOR MORE ACCURATE AND PIN-POINT EXPLANATIONS


5. Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks.


At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.


Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.


The deletion job keeps looping through each page of the index until it is done. This happens while the INDEXER continues working on deleting items from the index. Once done with the list, that's the end of deletion job.

STAGE 2 – The Dumpster Thread

The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the t_dsref.f_state field. It would have one of two values:

  • 0 = normal
  • 1 = queued for deletion


As it finds items with a state value of "1", it pages through them and deletes those records in the database. Events that can trigger the dumpster thread are:

  • Deletion job completion.
  • Server boot up.
  • Maintenance.


If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file. See “Storage Engine Process” for more information (below).


When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database. It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.

Storage Engine Process

A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:

Retain 2.x and Before: Standard Storage Engine
StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.
This was changed in Retain 3 because a straightforward implementation is too slow. It would synchronously delete index pieces and DB pieces etc. All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.
Retain 3.x: DataStore_Process Storage Engine
DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.
At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.
Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox