Difference between revisions of "Retain Deletion Management"

Revision as of 21:32, 29 December 2014

After reading Retain Administration and Users Guide on Deletion Management, try this hands-on exercise or refer to the section below it, "How It Works"

Hands-On Exercise

Prerequisite: Archive mail from your test mail system a '''few days in advance'''. The more mail the better, but at least have a few items. This exercise will require that the items be in Retain for over two days.

1. Login to your Retain server (the actual VM server, not the RetainServer web UI).
2. Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows.
3. Rename the "index" directory to "index.old". The index directory is located under your Retain storage directory (see Server Configuration | Storage).
4. Create a new index directory. If on Linux, make tomcat the owner of that directory: '''chown tomcat:tomcat index'''.
5. Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log. Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs]. For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log.
6. Start tomcat.
7. Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin.
8. Under the Management section in the left-hand pane, select '''Deletion Management'''.
9. Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”. Note the various options. For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived.
11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
13. Go ahead and check the boxes to have mail notifications sent to your mailbox.
14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”.
15. Click on the “Save Changes” icon at the upper-right of the screen.
16. Click on the "Run Job Now" button.
17. Switch over to the actual Retain server and look at the tail of your RetainServer log. What do you see happening? Did anything get deleted?
18. Login to a Retain mailbox and check to see if anything older than 1 day. Was anything deleted? Why not?
19. Logout of RetainServer web UI.
20. Go back to the actual Retain server and stop tomcat.
21. Delete the new index directory you created.
22. Rename index.old to index.
23. Start tomcat.
24. Login to the RetainServer web UI as admin.
25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button.
26. Switch over to the actual Retain server and look at the tail of your RetainServer log. Now what do you see? Any deletion activity?
27. What is the role of indexes in a deletion job? See Level 2 section to check your answer.

How It Works

This section will describe the deletion process. With each stage, the log entries you should expect to see in the RetainServer log are shown. If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO. Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log. For example, the stage 1 shows the first log line as follows: If INFO: "Running deletion job". Thus, If INFO is the condition (the logging level set on the Server) and "Starting the deletion job" is the text displayed in the log.

STAGE 1 – The Deletion Job

1. Deletion job is initiated.


If INFO: “Running deletion job.”
If DEBUG and if report only mode: “Will only report, not delete”
If DEBUG and not report only mode: “Normal deletion mode”
If DEBUG: “autoApprove” [Boolean value]
If DEBUG: “Deletion report stored at: [filePath]"
If DEBUG and PO chosen: “Deleting PO [PO]"

2. Searches the Indexes.

It searches the indexes in pages, looping through them a "page" at a time. A "page" consists of 500 records.


If DEBUG: “On Page # [pageNumber]"
If INFO and if the last page number is less 20 and if in reporting only mode: “Aborting early, deliberately since this is a trial”  
        By default, we only report the first 10,000 items to be deleted 
        (500 items per page and 20 pages by default: 500 x 20 = 10,000)
If INFO: Result page TotalHits: [hits on page] +listOfIDs+” in result set”

It puts all those items in a list of items to be deleted from the index. The list is kept in memory and is lost if Tomcat is stopped at this point.

3. Removes Litigation Hold Messages

Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.


If DEBUG: "[emailID] is under litigation hold and cannot be deleted"
If DEBUG: "[emailID] doesn’t belong to user"

4. Updates each message record, putting it in a “deleted” state (a.k.a., "Dumpster").

There are no log messages for this step. This step is where it puts the messages in the "dumpster". The dumpster is a folder record in the t_folder table and is literally named "Dumpster".

You can find the messages waiting to be deleting in the t_message table by using the following query: SELECT * FROM retain35.t_message where folder_id='273';

NOTE: "273" is the folder_id from the t_folder table which you can get by executing this SQL query: SELECT * FROM retain35.t_folder where f_name='Dumpster';

5. Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks.

At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.

Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.

The deletion job keeps looping through each page of the index until it is done. This happens while the INDEXER continues working on deleting items from the index. Once done with the list, that's the end of deletion job.

STAGE 2 – The Dumpster Thread

The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the t_dsref.f_state field. It would have one of two values:

0 = normal
1 = queued for deletion

As it finds items with a state value of "1", it pages through them and deletes those records in the database. Events that can trigger the dumpster thread are:

Deletion job completion.
Server boot up.
Maintenance.

If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file. See “Storage Engine Process” for more information (below).

When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database. It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.

Storage Engine Process

A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:

Retain 2.x and Before: Standard Storage Engine

StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.

This was changed in Retain 3 because a straightforward implementation is too slow. It would synchronously delete index pieces and DB pieces etc. All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.

Retain 3.x: DataStore_Process Storage Engine

DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.

At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.

@@ Line 88: / Line 88: @@
 :''It puts all those items in a list of items to be deleted from the index.  The list is kept in memory and is lost if Tomcat is stopped at this point.''<br>
-=====3. Removes all messages that are on litigation hold from the list of messages to be deleted.=====
+=====3. Removes Litigation Hold Messages=====
+Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.
 <pre style="white-space: pre-wrap;

Difference between revisions of "Retain Deletion Management"

Revision as of 21:32, 29 December 2014

Contents

Hands-On Exercise

How It Works

STAGE 1 – The Deletion Job

1. Deletion job is initiated.

2. Searches the Indexes.

3. Removes Litigation Hold Messages

4. Updates each message record, putting it in a “deleted” state (a.k.a., "Dumpster").

STAGE 2 – The Dumpster Thread

Storage Engine Process

Personal tools

Namespaces

Variants

Views

Actions

Search

Home

Exchange

GroupWise

JAVA

Linux

MTK

Retain

GW Monitoring and Reporting (Redline)

GW Disaster Recovery (Reload)

GW Forensics (Reveal)

GWAVA

Secure Messaging Gateway

GW Mailbox Management (Vertigo)

Windows

Other

User Experience

Search

Toolbox

Languages

Toolbox