Retain Deletion Management

From GWAVA Technologies Training
Revision as of 20:33, 2 March 2017 by Steved (Talk | contribs)

Jump to: navigation, search

Contents

PAGE OWNER: Steve Dorrough

After reading Retain Administration and Users Guide on Deletion Management, try this hands-on exercise or refer to the section below it, "How It Works"

Hands-On Exercise

Prerequisite:  Archive mail from your test mail system a '''few days in advance'''.  The more mail the better, but at least have a few items.  This exercise will require that the items be in Retain for over two days.

1.  Login to your Retain server (the actual VM server, not the RetainServer web UI).
2.  Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows.
3.  Rename the "index" directory to "index.old".  The index directory is located under your Retain storage directory (see Server Configuration | Storage).
4.  Create a new index directory.  If on Linux, make tomcat the owner of that directory:  '''chown tomcat:tomcat index'''.
5.  Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log.  Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs].  For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log.
6.  Start tomcat.
7.  Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin.
8.  Under the Management section in the left-hand pane, select '''Deletion Management'''.
9.  Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”.  Note the various options.  For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived.
11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
13. Go ahead and check the boxes to have mail notifications sent to your mailbox.
14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”.
15. Click on the “Save Changes” icon at the upper-right of the screen.
16. Click on the "Run Job Now" button.
17. Switch over to the actual Retain server and look at the tail of your RetainServer log.  What do you see happening?  Did anything get deleted?
18. Login to a Retain mailbox and check to see if anything older than 1 day.  Was anything deleted?  Why not?
19. Logout of RetainServer web UI.
20. Go back to the actual Retain server and stop tomcat.
21. Delete the new index directory you created.
22. Rename index.old to index.
23. Start tomcat.
24. Login to the RetainServer web UI as admin.
25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button.
26. Switch over to the actual Retain server and look at the tail of your RetainServer log.  Now what do you see?  Any deletion activity?
27. What is the role of indexes in a deletion job?  See Level 2 section to check your answer.

Troubleshooting Information

There are four main stages of a deletion job:

STAGE 1
  • Query the indexes for all message IDs matching the date criteria.
  • Delete the message records in the indexes for those message IDs that get returned.
STAGE 2
  • Mark all of the message records in the database that qualify for deletion. It sets the folder field to the "dumpster" folder.
  • Delete all message records that are in the "dumpster" folder. In the log, you'll see "Deleting node [message id]"
STAGE 3
  • Delete the corresponding attachment records for those messages from the t_document table. This is shown as "DeleteDao: Deleting com.gwava.dao.social.Document..." in the Server log.
STAGE 4
  • Delete the files off disk that correspond with the records in t_document that were removed. This is shown as " DeleteDocuments: deleteDocumentList:..." in the Server log.

It actually rotates between stage 3 and stage 4. Perhaps we should just call them both "stage 3", but one works in the DB and the other with the file system, so this explanation separates them logically. Stage 3 collects a list of files (hashes) to delete as it removes their references from the database. It probably limits the size of that list for memory and other purposes, so it does it one list at a time.

The following section will describe the deletion stages in more detail. With each stage, the log entries you should expect to see in the RetainServer log are shown. If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO. Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log. For example, the stage 1 shows the first log line as follows: If INFO: "Running deletion job". Thus, If INFO is the condition (the logging level set on the Server) and "Starting the deletion job" is the text displayed in the log.

STAGE 1 – The Deletion Job

1. Deletion job is initiated.

If INFO: “Running deletion job.”
If DEBUG and if report only mode: “Will only report, not delete”
If DEBUG and not report only mode: “Normal deletion mode”
If DEBUG: “autoApprove” [Boolean value]
If DEBUG: “Deletion report stored at: [filePath]"
If DEBUG and PO chosen: “Deleting PO [PO]"


2. Searches the indexes.

It starts by sending a query to the Indexer with the given scope and gets a total number of hits as well as the number of hits within the first page. A page consists of 500 records.


If DEBUG: "On Page # [pageNumber]"
If INFO: "Aborting early, deliberately since this is a trial" (if the last page number is less 20 and if in REPORTING ONLY mode)
          By default, we only report the first 10,000 items to be deleted 
          (500 items per page and 20 pages by default: 500 x 20 = 10,000)
If INFO: "Result page TotalHits: [total number of hits from index] [hits on current page] in result set"
[loglevel unknown]: "inSet: [list of all message IDs that are identified]"

It puts all those results in a list of items to be deleted from the index. The list is kept in memory and is lost if Tomcat is stopped at this point.

As it queries the Indexer, you'll see this entry in the Indexer log. Note that the following example is from a deletion job where the date range was set to "1 day or older" and triggered off of delivered date:


IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:[domain] AND postoffice:[post office])

From the text box above, you can see that it shows that it is being triggered off of the delivered date because it states "DeletionQuery:delivered". Since the date range was "1 day or older", the range of the query is shown as follows: * TO [numeric string]. The begin date of the range is represented by an asterisk because it goes back infinitely. The end date of the range is represented by a numerical string. The first 8 characters of the numeric string represent the day before the date this was run. I ran the deletion job on 2015-02-27. I do not know what the last 6 digits represent at this time.

It then displays the total hits found:

 
IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 8144, localHits:1000, sort:<score>

It appears that the database driver picks this up and displays all of the Retain message IDs from retain.t_message.message_id(?) after "inSet:"


[DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: [message IDs]

... and then proceeds to list the first 500:


[DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: [message IDs]

At this point, it is assumed that it sends that list to the Indexer with instructions to delete them from the indexes as we see these entries in the Indexer log:


IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE
LuceneIndexingManager - Delete [message ID] from index
LuceneIndexingManager - Delete [message ID] from index'''
etc...

It does this in batches of 500, as we see these log entries in the Indexer log every 500 items:


IndexSearchMessageConsumerImpl - Trying to process the operation: INDEX_DELETION_TASK_SEARCH
IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents)
LuceneIndexingSearcher - searching with offset: 0, count: 500
LuceneSearchController - query: delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents) field: uuid
LuceneSearchController - query.toString: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871005, start query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871014, end query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 7644, localHits:1000, sort:<score>
IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE

3. Removes litigation hold messages

Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.


If DEBUG: "[emailID] is under litigation hold and cannot be deleted"
If DEBUG: "[emailID] doesn’t belong to user"

4. Updates each message record, putting it in a “deleted” state

This is known as putting the message record "in the dumpster".

There are no log messages for this step. This step is where it puts the messages in the "dumpster". The dumpster is a folder record in the t_folder table and is literally named "Dumpster".

You can find the messages waiting to be deleting in the t_message table by using the following query: SELECT * FROM retain.t_message where folder_id = 273;

NOTE: "273" is the folder_id from the t_folder table which you can get by executing this SQL query: SELECT * FROM retain.t_folder where f_name = 'Dumpster';



THE FOLLOWING INFORMATION WAS GIVEN TO ME BY MIKE BELL AND IS VERY SKETCHY - WAITING FOR MORE ACCURATE AND PIN-POINT EXPLANATIONS


5. Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks.


At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.


Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.


The deletion job keeps looping through each page of the index until it is done. This happens while the INDEXER continues working on deleting items from the index. Once done with the list, that's the end of deletion job.

STAGE 2 – The Dumpster Thread

The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the t_dsref.f_state field. It would have one of two values:

  • 0 = normal
  • 1 = queued for deletion


As it finds items with a state value of "1", it pages through them and deletes those message records in the database, the ones where the folder_id is set to the dumpster folder's ID. In the sample text from the server log, the number following "Delete" and "Deleting node" is the message ID:

15:54:00, 985[DumpsterThread] [TRACE] DumpsterDiver: Delete 3608803
15:54:00, 986[DumpsterThread] [INFO ] DumpsterDiver: Deleting node 3608803

Events that can trigger the dumpster thread are:

  • Deletion job completion.
  • Server boot up.
  • Maintenance.


If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file. See “Storage Engine Process” for more information (below).


When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database. It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.

Storage Engine Process

A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:

Retain 2.x and Before: Standard Storage Engine
StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.
This was changed in Retain 3 because a straightforward implementation is too slow. It would synchronously delete index pieces and DB pieces etc. All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.
Retain 3.x: DataStore_Process Storage Engine
DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.
At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.

Reading Deletion Logs

As you go through the Server log you can trace the progress of the job:

If you schedule a deletion job it will run during the maintainance cycle at 1am (by default):

2015-08-12 01:03:22,584 [Maintain] INFO com.gwava.jobs.MaintenanceJob - Execute deletion jobs

But if jobs are set to manual then you will see this message:

2015-08-12 01:03:22,593 [DeletionJobTask-1-thread-3] DEBUG com.gwava.deletion.DeletionTaskHelper - Skipping deletion job. Only manual trigger selected.

You can trigger a job manually by clicking on the "Run Job Now" button under the Schedule tab and you will see this in the logs:

2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DeletionTaskHelper - Manual trigger of deletion job requested. Not checking schedule

And the job begins:

2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - Running deletion job. 2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - Normal deletion mode 2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - autoApprove:true 2015-08-12 09:06:03,074 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.DJDeletionOperation - Deletion report stored at /var/opt/beginfinite/retain/archive/Deletion8464290409683004469EYSZTWSUTFUWYFMRMHWHU.html

If you are running the job against individual users you'll see them listed:

2015-08-12 09:06:03,094 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DJDeletionOperation - Include user scope: [4632DB80-0EA9-0000-9A73-667864633738]

Then it begins to build batches of 500 eligible items from the indexes:

2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: 823104,823105,...,824095,824096 2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: 823104,823105,...,824095,824096 2015-08-12 09:06:05,373 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=0,toIndex=100 2015-08-12 09:06:07,100 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=100,toIndex=200 2015-08-12 09:06:07,583 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=200,toIndex=300 2015-08-12 09:06:08,393 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=300,toIndex=400 2015-08-12 09:06:09,427 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=400,toIndex=500 2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs 2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs, tl.size: 500

Once it has build batches of items to be deleted it sends the email (yes, we know. This is going to be changed to when the deletion job actually ends) and begins to remove items from the database:

2015-08-12 09:06:58,459 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DeletionTaskHelper - Sending mail after deletion job... 2015-08-12 09:06:58,501 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - DeletionTaskExecutor ends 2015-08-12 09:06:58,508 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread beginning 2015-08-12 09:06:58,512 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Dumpster dive begins 2015-08-12 09:06:58,573 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Got these many children of dumpster500 2015-08-12 09:06:58,573 [DumpsterThread] TRACE com.gwava.deletion.DumpsterDiver - Delete 823039 2015-08-12 09:06:58,582 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Deleting node 823039

and continues until that is finished:

2015-08-12 09:09:26,605 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Delete orphaned documents 2015-08-12 09:09:26,605 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page

Then the job moves on to delete items from disk, which are also done in batches:

2015-08-12 09:09:26,692 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=841321, hash=560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D 2015-08-12 09:09:40,482 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Executed query in 13877 2015-08-12 09:09:40,489 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - deleteDocumentList:560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D 0 841321

until it is done and the deletion job ends:

2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page 2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Executed query in 0 2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - deleteOrphanedDocuments: Nothing left 2015-08-12 09:12:30,778 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Dumpster dive completed 2015-08-12 09:12:30,778 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread ending

Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox