Difference between revisions of "Retain Deletion Management"

From GWAVA Technologies Training
Jump to: navigation, search
(STAGE 1 – The Deletion Job)
(Deletion Management Logs)
 
(177 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 +
===PAGE OWNER:  Steve Dorrough===
  
==Level 1==
+
After reading Retain Administration and Users Guide on Deletion Management, try this hands-on exercise or refer to the section below it, [[Retain Deletion Management#How_It_Works|"How It Works"]]
The Deletion Manager provides for the removal of items from the archive according to the specified criteria. The Deletion Manager runs as a scheduled job in the archive, looking for, and processing or deleting items that match the search terms. Customers often mistakenly feel that a Deletion job is part of the archive jobs. This is not so.  Deletion jobs occur immediately following scheduled maintenance jobs and are not linked to archive jobs whatsoever (see the “Schedule” section of this document).  Mail removed from the archive is permanently deleted. Use this option with care.
+
  
The Deletion Manager will not show up in your system menu if you have not granted the logged-in user the Deletion Management right, or have the litigation hold right. See User Rights. The Litigation Hold right allows users to go to the deletion management section and add or remove the Litigation Hold right for other users. They cannot modify other settings. Users with the deletion management right can view the litigation hold tab, but they cannot grant rights; it is read only.  
+
==Hands-On Exercise==
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
width:75%">
 +
Prerequisite:  Archive mail from your test mail system a '''few days in advance'''.  The more mail the better, but at least have a few items. This exercise will require that the items be in Retain for over two days.
  
===Core Settings===
+
1.  Login to your Retain server (the actual VM server, not the RetainServer web UI).
Here you enable and disable deletion jobs, and detail which actions they will take. When setting up a deletion job, you have the option to tell the job to delete and report on the messages deleted, or to simply generate a report on the mail that will be removed from the database. The report function can be very handy. Rather than deleting any items, it will simply report on the items that would qualify for deletion should you run an actual deletion job.
+
2.  Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows.
 +
3.  Rename the "index" directory to "index.old".  The index directory is located under your Retain storage directory (see Server Configuration | Storage).
 +
4.  Create a new index directory.  If on Linux, make tomcat the owner of that directory:  '''chown tomcat:tomcat index'''.
 +
5.  Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log.  Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs].  For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log.
 +
6.  Start tomcat.
 +
7.  Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin.
 +
8.  Under the Management section in the left-hand pane, select '''Deletion Management'''.
 +
9.  Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
 +
10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”.  Note the various options.  For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived.
 +
11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
 +
12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
 +
13. Go ahead and check the boxes to have mail notifications sent to your mailbox.
 +
14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”.
 +
15. Click on the “Save Changes” icon at the upper-right of the screen.
 +
16. Click on the "Run Job Now" button.
 +
17. Switch over to the actual Retain server and look at the tail of your RetainServer log.  What do you see happening?  Did anything get deleted?
 +
18. Login to a Retain mailbox and check to see if anything older than 1 day.  Was anything deleted?  Why not?
 +
19. Logout of RetainServer web UI.
 +
20. Go back to the actual Retain server and stop tomcat.
 +
21. Delete the new index directory you created.
 +
22. Rename index.old to index.
 +
23. Start tomcat.
 +
24. Login to the RetainServer web UI as admin.
 +
25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button.
 +
26. Switch over to the actual Retain server and look at the tail of your RetainServer log.  Now what do you see?  Any deletion activity?
 +
27. What is the role of indexes in a deletion job?  See Level 2 section to check your answer.
  
===Basic Options===
+
</pre>
This tab provides the criteria that the deletion job will use to identify messages to be deleted. This should look nearly identical to the profile of an archive job. The functions are the same. The item type, source, and status determine which messages are flagged for deletion. If none of the boxes are checked, then all items meeting the date criteria are subject to deletion.
+
  
===Data Scope===
+
==Troubleshooting Information==
There are many dates that are contained in a mail system, and the deletion manager allows you to select different date ranges to identify the scope of the deletion manager. The setup is simple; the date range between the “Begin” and “End” dates will be targeted by the delete job.  
+
There are four main stages of a deletion job:
 +
=====STAGE 1=====
 +
* Query the indexes for all message IDs matching the date criteria.
 +
* Delete the message records in the indexes for those message IDs that get returned.
 +
=====STAGE 2=====
 +
* Mark all of the message records in the database that qualify for deletion. It sets the folder field to the "dumpster" folder.
 +
* Delete all message records that are in the "dumpster" folder. In the log, you'll see "Deleting node [message id]"
 +
=====STAGE 3=====
 +
* Delete the corresponding attachment records for those messages from the t_document table. This is shown as "DeleteDao: Deleting com.gwava.dao.social.Document..." in the Server log.
 +
=====STAGE 4=====
 +
* Delete the files off disk that correspond with the records in t_document that were removed. This is shown as " DeleteDocuments: deleteDocumentList:..." in the Server log.
  
The dates can be identified by the date filter. The Date filter allows you to  
+
It actually rotates between stage 3 and stage 4. Perhaps we should just call them both "stage 3", but one works in the DB and the other with the file system, so this explanation separates them logically. Stage 3 collects a list of files (hashes) to delete as it removes their references from the database. It probably limits the size of that list for memory and other purposes, so it does it one list at a time.<br>
specify the mail system or Retain message dates. The creation and delivered
+
<br>
date are mail system dates. The date archived and expiration dates are set in  
+
The following section will describe the deletion stages in more detail. With each stage, the log entries you should expect to see in the RetainServer log are shown.  If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO.  Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log.  For example, the stage 1 shows the first log line as follows: <span style=color:red>If INFO: "Running deletion job"</span>.  Thus, <span style=color:red>If INFO</span> is the condition (the logging level set on the Server) and <span style=color:red>"Starting the deletion job"</span> is the text displayed in the log.
Retain. The expiration date is tied to the job, and is set under the job section.  
+
  
The Job Expiration option allows you to set an ‘expiration date’ after which the mail no longer needs to be in the archive (States have different laws and requirements but is usually between 5 and 10 years.) The expiration date is set under the worker's Core Settings tab.  The Deletion Management interface can utilize this expiration date to identify messages that are due for removal.
+
===STAGE 1 – The Deletion Job===
  
===Job Members===
+
=====1. Deletion job is initiated.=====
A deletion job will only be active for selected users or a selected mail server. The Job Members tab allows you to include an entire mail server or group of users, while excluding specific users.
+
  
Use this in conjunction with the Generate Report option under Core Settings to pinpoint the mail that will be included in the deletion job.
+
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
  
===Notification===
+
If INFO: “Running deletion job.”
The reports, errors, and summaries of delete jobs can be sent to the listed address in the notification tab. Select the options as desired.  If no notifications are selected, the report can be accessed from ..\[retain storage directory]\archive.  The filename will be Deletion.....html (e.g., Deletion7895881300442619770UTCMSSHMZYBWMRBJJYZAVED.html)
+
If DEBUG and if report only mode: “Will only report, not delete”
 +
If DEBUG and not report only mode: “Normal deletion mode”
 +
If DEBUG: “autoApprove” [Boolean value]
 +
If DEBUG: “Deletion report stored at: [filePath]"
 +
If DEBUG and PO chosen: “Deleting PO [PO]"
  
===Schedule===
+
</pre>
The last tab is the schedule tab. This allows you to automate and run a deletion job automatically on mail that has passed its required archive duration. The options are to run this weekly or on a specific day of the month. The Deletion job will run immediately following the scheduled maintenance. Scheduled
+
<br>
Maintenance is found under Configuration | Server Configuration | Maintenance. Manually starting a job is not currently supported; however, a workaround to this would be to set the daily maintenance time to a desired time of day (like within minutes of setting up the deletion job).  It should be noted, though, that once maintenance begins, no one can log in to the Retain server.
+
  
===Litigation Hold===
+
=====2. Searches the indexes.=====
The Litigation Hold tab provides the ability to exclude any specified user’s data from any deletion job, preventing any of their data from being deleted when the job runs.  
+
It starts by sending a query to the Indexer with the given scope and gets a total number of hits as well as the number of hits within the first page.  A page consists of 500 records.
  
Any official auditors, legal representatives, system administrators, or users may be added to this list. These accounts will be able to set and lift any legislative hold in the system, and therefore this is not a generally granted right and should be restricted to only specified users. Because of the power of this right it is granted separately from the usual rights for users.
+
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 75%">
  
To add a user to the legislative hold list, select the ‘Add User’ button to open the ‘Select Mailbox’ window. Select the source system for the user and enter search criteria. After searching, select the desired user or users and select the ‘Ok’ button to add them to the list.
+
If DEBUG: "On Page # [pageNumber]"
 +
If INFO: "Aborting early, deliberately since this is a trial" (if the last page number is less 20 and if in REPORTING ONLY mode)
 +
          By default, we only report the first 10,000 items to be deleted
 +
          (500 items per page and 20 pages by default: 500 x 20 = 10,000)
 +
If INFO: "Result page TotalHits: [total number of hits from index] [hits on current page] in result set"
 +
[loglevel unknown]: "inSet: [list of all message IDs that are identified]"
  
Save all changes.  
+
</pre>
 +
 
 +
It puts all those results in a list of items to be deleted from the index. ''The list is kept in memory and is lost if Tomcat is stopped at this point''.<br>
 +
 
 +
===== =====
 +
As it queries the Indexer, you'll see this entry in the Indexer log.  Note that the following example is from a deletion job where the date range was set to "1 day or older" and triggered off of delivered date: 
  
===Hands On===
 
 
<pre style="white-space: pre-wrap;  
 
<pre style="white-space: pre-wrap;  
 
white-space: -moz-pre-wrap;  
 
white-space: -moz-pre-wrap;  
 
white-space: -pre-wrap;  
 
white-space: -pre-wrap;  
 
white-space: -o-pre-wrap;  
 
white-space: -o-pre-wrap;  
word-wrap: break-word;">
+
word-wrap: break-word;
1. Log in to the Retain server ([ip address]/RetainServer).
+
margin-left: 2em;
2. Under the Management section in the left-hand pane, select Deletion Management.
+
width: 75%">
3. Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
+
 
4. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”.  Note the various options.  You'll probably want to leave it at the default “Date Stored in Retain”, which is the date the items were archived.
+
IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:[domain] AND postoffice:[post office])
5. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
+
 
6. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
+
7. If you have a SMTP mail server set up, go ahead and check the boxes to have mail notifications sent to your mailbox.
+
8. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “day of month”; then, click on the day drop-down box and set it to today's date.
+
9. Click on the “Save Changes” icon at the upper-left of the screen.
+
10. Click on “Server Configuration” under the “Configuration” section.
+
11. Click on the “Maintenance” tab.
+
12. Make sure that “Enable Index Optimization” is set to “(every day)”.
+
13. Set the time for “Run maintain procedure at” to 2 minutes from the current time.
+
14. Go to the Logging tab and ensure that the logging level is set to “diagnostic”.
+
15. Save changes.
+
16. Open an SSH session using PuTTY (or some other tool) to your Retain server.
+
17. Change directories to /opt/beginfinite/retain-tomcat7.
+
18. Type “tail -f RetainServer-[mm]-[dd]-[yyyy].log”, where “mm”, “dd”, and “yyyy” represent the current date.  You should see a maintenance job kick off followed by the deletion job within a minute if not seconds.
+
19. When the deletion job completes, exit the PuTTY session and launch WinSCP (if you do not have it, download it for free).
+
20. Connect WinSCP to the Retain server.
+
21. Change directories to ..\[retain storage directory]\archive.
+
22. Open the deletion report (Deletion[series of alpha/numeric chars].html).
+
 
</pre>
 
</pre>
  
==Level 2==
+
From the text box above, you can see that it shows that it is being triggered off of the delivered date because it states "DeletionQuery:delivered".  Since the date range was "1 day or older", the range of the query is shown as follows:  '''<span style=color:blue> *  TO <span style=color:red> [numeric string]'''</span>.  The begin date of the range is represented by an asterisk because it goes back infinitely.  The end date of the range is represented by a numerical string.  The first 8 characters of the numeric string represent the day before the date this was run.  I ran the deletion job on 2015-02-27.  I do not know what the ''last'' 6 digits represent at this time.
 +
<br>
  
 +
===== =====
 +
It then displays the total hits found:
  
===STAGE 1 – The Deletion Job===
+
<pre style="white-space: pre-wrap;
<b>1.</b> Deletion job is initiated.<p>
+
white-space: -moz-pre-wrap;
<b>2.</b> It does a search of the index (Lucene/Exalead) in pages, looping through it a "page" at a time.
+
white-space: -pre-wrap;
:*As it does so, it checks a few things like litigation holds, etc.
+
white-space: -o-pre-wrap;
:*It also finds children of those messages as well.<p>
+
word-wrap: break-word;
<b>3.</b> It puts all those items in a list of items to be deleted from the index (stated for simplicity's sake - it is more complicated than this).  The list is kept in memory and is lost if Tomcat is stopped.
+
margin-left: 2em;
<br><b>4.</b> Meanwhile it also updates each message record, putting it in a “deleted” state in t_Message.state, also referred to as “the dumpster”.  Here are the message states:
+
width: 50%">
 +
 +
IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 8144, localHits:1000, sort:<score>
  
:{| class="wikitable"
+
</pre>
|-
+
| 1L
+
| private static final long serialVersionUID
+
|-
+
| 1
+
| private static final short STATE_DELETED
+
|-
+
| 2
+
| private static final short STATE_LITIGATIONHOLD
+
|-
+
| 4
+
| private static final short STATE_READ
+
|-
+
| 8
+
| private static final short STATE_UPDATE
+
|-
+
| 16
+
| private static final short STATE_SUBJECTHTML
+
|-
+
| 32
+
| private static final short STATE_ATTACHMENTS
+
|-
+
| 64
+
| private static final short STATE_HASTAG
+
|}
+
  
:DELETED has a weight of 1, which means ANY ODD NUMBERED value of state implies DELETED. Binary weight means multiple states can exist. For example, “13” implies:
+
===== =====
 +
It appears that the database driver picks this up and displays all of the Retain message IDs from '''retain.t_message.message_id'''(?) after '''"inSet:"'''
  
::Total state values: 1 + 4 + 8 = 13:
+
<pre style="white-space: pre-wrap;
::*DELETED (1)
+
white-space: -moz-pre-wrap;
::*READ (4)
+
white-space: -pre-wrap;
::*UPDATE (8)
+
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
 +
 
 +
[DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: [message IDs]
 +
 
 +
</pre>
 +
 
 +
===== =====
 +
... and then proceeds to list the first 500: 
 +
 
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
 +
 
 +
[DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: [message IDs]
 +
 
 +
</pre>
 +
===== =====
 +
At this point, it is assumed that it sends that list to the Indexer with instructions to delete them from the indexes as we see these entries in the Indexer log:
 +
 
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
 +
 
 +
IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE
 +
LuceneIndexingManager - Delete [message ID] from index
 +
LuceneIndexingManager - Delete [message ID] from index'''
 +
etc...
 +
 
 +
</pre>
 +
===== =====
 +
It does this in batches of 500, as we see these log entries in the Indexer log every 500 items:
 +
 
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 85%">
 +
 
 +
IndexSearchMessageConsumerImpl - Trying to process the operation: INDEX_DELETION_TASK_SEARCH
 +
IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents)
 +
LuceneIndexingSearcher - searching with offset: 0, count: 500
 +
LuceneSearchController - query: delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents) field: uuid
 +
LuceneSearchController - query.toString: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
 +
LuceneSearchController - 1425073871005, start query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
 +
LuceneSearchController - 1425073871014, end query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
 +
LuceneSearchController - IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 7644, localHits:1000, sort:<score>
 +
IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE
 +
 
 +
</pre>
 +
 
 +
=====3. Removes litigation hold messages=====
 +
Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.
 +
 
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
 +
 
 +
If DEBUG: "[emailID] is under litigation hold and cannot be deleted"
 +
If DEBUG: "[emailID] doesn’t belong to user"
 +
 
 +
</pre>
 +
 
 +
=====4. Updates each message record, putting it in a “deleted” state=====
 +
This is known as putting the message record "in the dumpster".
 +
 
 +
There are no log messages for this step.  This step is where it puts the messages in the "dumpster".  The dumpster is a folder record in the <span style=color:blue>'''t_folder table'''</span> and is literally named "Dumpster".
 +
<br><br>
 +
You can find the messages waiting to be deleting in the t_message table by using the following query: <span style=color:blue>'''SELECT * FROM retain.t_message where folder_id = 273;'''</span> 
 
<br>
 
<br>
<b>5.</b> Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index.  
+
::'''NOTE:''' "<span style=color:orange>'''273'''</span>" is the <span style=color:blue>'''folder_id'''</span> from the <span style=color:blue>'''t_folder'''</span> table which you can get by executing this SQL query: <span style=color:blue>'''SELECT * FROM retain.t_folder where f_name = 'Dumpster';'''</span>
<p>
+
<br>
The time for this to occur varies from 0 seconds to 100 years, depending on the indexer load.   At this point, a browse of the archive mailbox already fails to show it because the item is now marked for deletion in the database. <p>
+
<br>
 +
 
 +
== THE FOLLOWING INFORMATION WAS GIVEN TO ME BY MIKE BELL AND IS VERY SKETCHY - WAITING FOR MORE ACCURATE AND PIN-POINT EXPLANATIONS ==
 +
<br>
 +
<b>5.</b> Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks. <br>
 +
 
 +
<br>At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.<br>
 
   
 
   
Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the maiblox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.<p>
+
<br>Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.<br>
  
The deletion job keeps looping through each page of the index until it is done.  This happens while the INDEXER continues working on deleting items from the index.  Once done with the list, that's the end of deletion job.
+
<br>The deletion job keeps looping through each page of the index until it is done.  This happens while the INDEXER continues working on deleting items from the index.  Once done with the list, that's the end of deletion job.
  
 
===STAGE 2 – The Dumpster Thread===
 
===STAGE 2 – The Dumpster Thread===
  
The Dumpster Thread is awakened by the end of the deletion job and looks for new items in the dumpster (see Stage 1, Step 4). If it finds them, it pages through them and deletes those records in the database. Events that can trigger the Dumpster Thread are:
+
The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the <b><span style=color:blue>t_dsref.f_state</span></b> field. It would have one of two values:
Deletion job completion.
+
::*0 = normal
Server boot up.
+
::*1 = queued for deletion
Maintenance.
+
If the final document entry is orphaned (reference Count==0), it calls the StorageEngine to delete the physical file.  See “Storage Engine Process” for more information.
+
When "Dumpster Thread ending" appears in the server log, It  it just means its last pass found nothing left to process.  See Appendix B, “Mutliple Jobs Running Simultaneously” for scenarios where there might still be items to be deleted.
+
  
Storage Engine Process
+
<br>As it finds items with a state value of "1", it pages through them and deletes those message records in the database, the ones where the folder_id is set to the dumpster folder's ID. In the sample text from the server log, the number following "Delete" and "Deleting node" is the message ID:
A StorageEngine is just a black box with a delete method attached. How it works and how fast depends on the implementation:
+
 
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;
 +
margin-left: 2em;
 +
width: 50%">
 +
15:54:00, 985[DumpsterThread] [TRACE] DumpsterDiver: Delete 3608803
 +
15:54:00, 986[DumpsterThread] [INFO ] DumpsterDiver: Deleting node 3608803
 +
</pre>
 +
 
 +
Events that can trigger the dumpster thread are:
 +
::*Deletion job completion.
 +
::*Server boot up.
 +
::*Maintenance.
 +
 
 +
<br>If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file.  See “<b>Storage Engine Process</b>” for more information (below).<br>
 +
 
 +
<br>When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database.  It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.
 +
 
 +
===Storage Engine Process===
 +
A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:
 +
 
 +
:<b>Retain 2.x and Before:  Standard Storage Engine</b><br>
 +
:StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it.  Just a simple straightforward method.<br>
 +
 
 +
:This was changed in Retain 3 because a straightforward implementation is too slow.  It would synchronously delete index pieces and DB pieces etc.  All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.<br>
 +
 
 +
:<b>Retain 3.x:  DataStore_Process Storage Engine</b>
 +
:DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.<br>
 +
 
 +
:At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.
 +
 
 +
===Reading Deletion Logs===
 +
 
 +
As you go through the Server log you can trace the progress of the job:
 +
 
 +
If you schedule a deletion job it will run during the maintainance cycle at 1am (by default):
 +
 
 +
<code>
 +
2015-08-12 01:03:22,584 [Maintain] INFO  com.gwava.jobs.MaintenanceJob - Execute deletion jobs
 +
</code>
 +
 
 +
But if jobs are set to manual then you will see this message:
 +
 
 +
<code>
 +
2015-08-12 01:03:22,593 [DeletionJobTask-1-thread-3] DEBUG com.gwava.deletion.DeletionTaskHelper - Skipping deletion job. Only manual trigger selected.
 +
</code>
 +
 
 +
You can trigger a job manually by clicking on the "Run Job Now" button under the Schedule tab and you will see this in the logs:
 +
 
 +
<code>
 +
2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.DeletionTaskHelper - Manual trigger of deletion job requested. Not checking schedule
 +
</code>
 +
 
 +
And the job begins:
 +
 
 +
<code>
 +
2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - Running deletion job.
 +
2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - Normal deletion mode
 +
2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - autoApprove:true
 +
2015-08-12 09:06:03,074 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.DJDeletionOperation - Deletion report stored at /var/opt/beginfinite/retain/archive/Deletion8464290409683004469EYSZTWSUTFUWYFMRMHWHU.html
 +
</code>
 +
 
 +
If you are running the job against individual users you'll see them listed:
 +
 
 +
<code>
 +
2015-08-12 09:06:03,094 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.DJDeletionOperation - Include user scope: [4632DB80-0EA9-0000-9A73-667864633738]
 +
</code>
 +
 
 +
Then it begins to build batches of 500 eligible items from the indexes:
 +
 
 +
<code>
 +
2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: 823104,823105,...,824095,824096
 +
2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: 823104,823105,...,824095,824096
 +
2015-08-12 09:06:05,373 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=0,toIndex=100
 +
2015-08-12 09:06:07,100 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=100,toIndex=200
 +
2015-08-12 09:06:07,583 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=200,toIndex=300
 +
2015-08-12 09:06:08,393 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=300,toIndex=400
 +
2015-08-12 09:06:09,427 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=400,toIndex=500
 +
2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs
 +
2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs, tl.size: 500
 +
</code>
 +
 
 +
Once it has build batches of items to be deleted it sends the email (yes, we know. This is going to be changed to when the deletion job actually ends) and begins to remove items from the database:
 +
 
 +
<code>
 +
2015-08-12 09:06:58,459 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.DeletionTaskHelper - Sending mail after deletion job...
 +
2015-08-12 09:06:58,501 [DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - DeletionTaskExecutor ends
 +
2015-08-12 09:06:58,508 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread beginning
 +
2015-08-12 09:06:58,512 [DumpsterThread] INFO  com.gwava.deletion.DumpsterDiver - Dumpster dive begins
 +
2015-08-12 09:06:58,573 [DumpsterThread] INFO  com.gwava.deletion.DumpsterDiver - Got these many children of dumpster500
 +
2015-08-12 09:06:58,573 [DumpsterThread] TRACE com.gwava.deletion.DumpsterDiver - Delete 823039
 +
2015-08-12 09:06:58,582 [DumpsterThread] INFO  com.gwava.deletion.DumpsterDiver - Deleting node 823039
 +
</code>
 +
 
 +
and continues until that is finished:
 +
 
 +
<code>
 +
2015-08-12 09:09:26,605 [DumpsterThread] INFO  com.gwava.deletion.DumpsterDiver - Delete orphaned documents
 +
2015-08-12 09:09:26,605 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page
 +
</code>
 +
 
 +
Then the job moves on to delete items from disk, which are also done in batches:
 +
 
 +
<code>
 +
2015-08-12 09:09:26,692 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=841321, hash=560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D
 +
2015-08-12 09:09:40,482 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - Executed query in 13877
 +
2015-08-12 09:09:40,489 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - deleteDocumentList:560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D 0 841321
 +
</code>
 +
 
 +
until it is done and the deletion job ends:
 +
 
 +
<code>2015-08-12 09:12:30,777 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page
 +
2015-08-12 09:12:30,777 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - Executed query in 0
 +
2015-08-12 09:12:30,777 [DumpsterThread] INFO  com.gwava.deletion.ng.DeleteDocuments - deleteOrphanedDocuments: Nothing left
 +
2015-08-12 09:12:30,778 [DumpsterThread] INFO  com.gwava.deletion.DumpsterDiver - Dumpster dive completed
 +
2015-08-12 09:12:30,778 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread ending
 +
 
 +
</code>
 +
 
 +
==Deletion Management - Retain 4.2==
 +
<br>
 +
<b>OVERVIEW</b><br>
 +
Retain 4.2 introduces a new Deletion Management system that has improved the effectiveness, reliability, and functionality from the old Deletion Management in previous versions.
 +
Customers in the past have had issues with Deletion Management with the following problems:<br>
 +
• Understanding how to configure, and launch Deletion Management<br>
 +
• Having the Deletion Management not delete all of the data, and not reclaiming the storage space that the customer needs.<br>
 +
• Creating orphaned documents, either due to indexing errors, errors within the code of Deletion Management, a restart of the tomcat service, or the server itself. These orphaned documents cannot be deleted, and causes customers to lose confidence in the reliability of the product.<br>
 +
• Reports of the Deletion Management only show 10k items to be deleted, which doesn’t accurately depict what the customer would like to see removed from their system.
 +
• When the Deletion Management runs it doesn’t reliably delete messages, and customers are stuck with messages that are still searchable where they should be deleted. This includes messages that go years back, <br><br>
 +
 
 +
<b>IMPROVEMENTS AND CHANGES</b><br>
 +
The new Deletion Management resolves all of the above issues. More importantly its design is to allow customers to delete their data easily using a friendlier interface that matches the archiving process, standardizing the look and feel of the software. At the same time customers ,and support, can be assured that the data will be reliably removed, and at any time if the Deletion Job is interrupted, it will pick up from where it left off. This prevents orphaned documents, and gives customers the confidence that their storage space will be reclaimed.<br>
 +
 
 +
Listed below are the details the specifications and the improvements of the new Deletion Management. <br>
 +
 
 +
<b>Interface: Setup and Launching a Deletion Job</b><br>
 +
*The Deletion Management has a new interface mirroring the archive jobs menu. This was meant to standardize the look and feel for Retain, as well as help customers know how to set up when, and what to delete in a convenient format.<br>
 +
*The Deletion Management no longer runs during Maintenance, but instead runs with its own deletion schedule, and process which runs in its own background.<br>
 +
*The interface for Deletion Management is now contained under a new Data Removal section, which is only used for administration. Rights can be assigned to specified users and groups as necessary.<br>
 +
 +
<b>Deleting Messages</b><br>
 +
There are three ways to delete messages in Retain:<br><br>
 +
<b>1. Mailbox Deletion:</b><br>
 +
This option allows administrators to delete ALL messages from users. It is intended to only run without a schedule, with a run now button as the only means to delete.  A report can be generated prior to deleting messages processed.<br>
 +
The user(s) that are deleted from the Mailbox Deletion do not get removed from the address book in Retain. This will be added into future versions of Retain. In Retain 4.3 an option will be put into the Deletion Management to remove the user from the address book.<br>
 +
Customers can use the Mailbox Deletion to their advantage to remove users they no longer want to archive or have since left their email system. Once messages have been completely removed from the Retain system using mailbox deletion, the ts_store in the t_abook table is set to 0 for the mailbox and the user is no longer counted as an active license. It is highly recommended that the administrator removes the user from their e-mail system, or exclude them from archiving it altogether. <br>
 +
 +
<b>2.Item Deletion</b><br>
 +
The biggest improvement of the Deletion Management is being able to schedule a job, or multiple jobs and have them run without using the maintenance process in Retain.
 +
It is done by creating a schedule for each job (when you want to kick off the job), set your profile (what you want to remove), and creating a job , assigning the e-mail servers, or mailboxes. This gives customers flexibility, and can kick the job off very easily by adjusting the schedule. Future versions of Retain will have a run now button for the item deletion.<br>
 +
 
 +
<b>3. User Item Deletion</b><br>
 +
Users, if given the access rights by the administrator, can select messages from browse, or from their search in their archive and click to delete. Retain will delete the messages that are selected. This is best used for several messages. Large amounts of messages, or accounts, are best used with Item Deletion, or Mailbox Deletion.
 +
 
 +
===Scheduling a Deletion Job===
 +
To schedule a job to run, create a schedule, profile and job, just like creating an archive job.
 +
 
 +
Choose the mailboxes in the job, select in the profile what to delete, and the date ranges. Then adjust the schedule to run at the desired time.
 +
 
 +
The Retain Server log will show when the job runs : DeletionTaskHelper
 +
 
 +
It will then state it is deleting indexes, then Deleting documents from the database, and then finally off of the physical storage.
 +
 
 +
===Deletion Reports===
 +
The Deletion Management can create a report before deleting the messages as they are processed. These reports are stored in the /archive directory, and are named the date and time that the job kicked off. These reports will show what has been deleted. There is also a list of users and messages that are put in to a CSV file that will show the tracking of what has been deleted.<br>
 +
In previous versions of Retain the report would only show 10,001 messages maximum. This was a hassle for customers who wanted to know what would be deleted first, before they proceed with the actual deletion. Retain 4.2 fixes this, and provides the customer with a full report which the administrators can view and see every message that will be deleted for each account. <br>
 +
Notifications emails are also sent out, but only show basic information if the job has completed or not. Future revisions of Retain will show more detail in the reports, including the Mailbox Deletion report.. Also, future revisions will have a Deletion Report in the reporting and monitoring that will provide more information as well.<br>
 +
***Overall with the changes to the interface, and assigning the Deletion Management to run whenever the customer needs are major improvements to the product. Not only that, the system will reliably delete the messages without question, and the way it does it on the back end ensures that no messages become orphaned, or get missed during the process.
 +
 
 +
===Deletion Management Deletion Process – Backend===
 +
When the Deletion Management runs, it will first run a query based on the criteria specified within the job. This query is dependent on the indexes.<br>
 +
Based on the profile, and the mailboxes selected in the job, the Deletion Management will flag the messages in the t_message table, altering the f_state record. If the last binary digit in the f_state record is a 1 or an odd number then that tells Retain that the message is flagged for deletion.  The next step will then proceed with deleting the indexes, which as it does will create a log file in the /archive directory to track which users and messages it has deleted.<br>
 +
The deletion process will delete from the indexes first, then the database, and finally from the physical disk. This method also helps to prevent orphaned documents. As the items are flagged in the database they are tracked through the f_state record. Even if the indexes are deleted, but the metadata in the database is not, the Deletion Management will recognize this, and remove the messages. Even if the Retain Service is interrupted in any way, it will recognize the flags and will delete based on where it left off. This prevents orphaned items.<br>
 +
If the Retain service is ever interrupted restart the schedule and it will pick up from where it left off and delete the messages successfully.<br><br>
 +
The following are more details with the backend processes on each section of how the Deletion Management removes messages:<br>
 +
<b>Index Process – Backend</b><br>
 +
When a mailbox deletion or a  schedule, profile, and job is set to delete messages, the Deletion Management will run a query to see which users, and items need to be deleted. It is important that the indexes are functioning properly, and if not then a rebuild of the indexes is necessary.<br>
 +
Once the indexes are queried  the Deletion Management creates a CSV file of the users and items that are to be deleted.  This file is stored in the /archive directory with the date and time that the job ran.
 +
The query will proceed to flag the messages that need to be deleted. As mentioned above, it will modify the f_state record in the t_message table. Once it is flagged, it will remove the message.
 +
*Note: Once a message is flagged to be deleted, there is no way to reverse it. Customers need to be sure that they know exactly what they are deleting before removing any data from Retain. A backup of the data, and the database, is required if they are needing to restore the deleted message.<br>
 +
Once the query is finished, the Indexes will be removed.<br><br>
 +
 
 +
Item deletion psuedocode
 +
*When job starts:
 +
#query index and get the list of eligible items for deletion
 +
#go through them page by page (each page = 500)  --> EnhancedDeletionTaskExecutor.deletePage();
 +
##find records in DB 100 by 100
 +
###mark them for deletion
 +
###assign each message to dumpster node
 +
####if this message has a message parent, also assign it dumpster node
 +
####if this message has children assign them to dumpster node. also append them to the list of index IDs for deletion
 +
###commit transaction every 10 message
 +
##delete items from index
 +
 
 +
*When Dumpster starts:  DumpsterTaskExecutor.removeFromDumpster()
 +
#delete messages from DB : processAndDeleteMessagesFromDB(), deleteMessage()
 +
##MessageProperty
 +
##MessageRecipient
 +
##MessageAttachment  --> this will decrement refCount --> orphaning the document for the message (if no other message is referring to it)
 +
##LegacyID
 +
##MessageTag
 +
##Message
 +
#delete documents :  deletionOperations.getDeleteDocuments().deleteOrphanedDocuments()
 +
##fetch top 1000 documents
 +
##delete them from Disk
 +
##delete them from DB
 +
##continue until nothing is left
 +
 
 +
Database Process- Backend
 +
The next process in the data removal is deleting the entries in the database. Once the indexes are deleted, Retain will proceed to delete the metadata, still using the flags as its measure to delete the data.<br>
 +
The f_state is looked at then the data is removed.<br>
 +
 +
The following is a more detailed sketch of what the f_state looks like and how the Deletion Management would delete messages:<br>
 +
 
 +
Here are the flags for f_state in t_message<br>
 +
STATE_DELETED=1;  // Item is in Deleted State<br>
 +
STATE_LITIGATIONHOLD = 2; // Item has been Litigation Held (not in a deleted state) <br>
 +
STATE_READ = 4; // Item has been Read (not in a deleted state)<br>
 +
STATE_UPDATE=8; // Index should UPDATE instead of ADD - eg atomically remove old documents associated and then add (not in a deleted state)<br>
 +
STATE_SUBJECTHTML=16; // SUBJECT (or other pieces) may have encoded HTML (not in a deleted state)<br>
 +
STATE_SUBJECTHTML=17; // SUBJECT (or other pieces) may have encoded HTML  (message set to be deleted)<br>
 +
STATE_ATTACHMENTS = 32; // Has Attachments (not in a deleted state)<br>
 +
STATE_HASTAG=64; // Has Message Tags (not in a deleted state)<br>
 +
STATE_CONFIDENTIAL=128; // Is in Confidential State (not in a deleted state)<br>
 +
STATE_CONFIDENTIAL=129; // Is in Confidential State (not in a deleted state) (Message set to be deleted)<br>
 +
 
 +
 
 +
*Besides the STATE-DELETED=1 meaning the message is to be deleted, the binary could be a different number depending on the type of message. If the message contains an attachment, for example, then the number is not going to be a 1. When it is to be deleted, the first bit on the  binary number will change to a one, making the binary number an odd number, and be flagged for deletion.<br>
 +
 
 +
*Example database change with message flagged:
 +
44 = 00101100 --- This message has READ, UPDATE and ATTACHMENTS flags on <br>
 +
Set for deletion would change the number 44 to 45 = 00101101<br>
 +
 
 +
*Another thing to keep in mind, that as the deletion management runs it will also check the f_referenceCount and set it to 0 in the t_document table. This lets the deletion management know to delete from the database as well as the phsycial disk. When the deletion management runs it will check the f_referenceCount, and if any of them equal 0, it will delete them from the disk as well as the t_document table. This also helps to prevent orphaned documents.<br>
 +
 
 +
<b>Physical Disk Removal – BackEnd</b>
 +
As mentioned, when the f_referenceCount = 0, then it will delete the remainder of the data from the t_document, as well as from the physical disk.<br>
 +
The physical disk removal takes the longest. As the data is deleted, it will remove the .dat files from the hash, and will reclaim the disk space for the user.<br><br>
 +
 
 +
===Deletion Management Logs===
 +
The Deletion Management logs are still contained within the Retain Server logs. In future revisions of Retain they will be separated to their own logs.<br>
 +
To see that the Deletion Job is running look for this line:<br>
 +
DeletionTaskHelper-executor<br>
 +
 
 +
With the three phases that the Deletion Management goes through to delete data from the Retain system, the logs will show these phases distinctly and clearly. The logging has been modified from previous version to be more clear on exactly what Retain is deleting. The following are examples with a brief expository on what to see in the logs during a Deletion Job: <br><br>
 +
 
 +
'''Index Deletion:'''<br>
 +
When Retain gets a list of the data to delete via the indexes it will show the following in the log:<br>
 +
 
 +
12:29:32, 052[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: When processing job: Generate a report but don't delete messages. <br>
 +
12:29:32, 052[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Querying indexes, recording in memory the message IDs of records that match the deletion criteria.<br>
 +
12:29:32, 095[RTSQuartzScheduler_Deletion_Worker-1] [DEBUG] DJDeletionOperation: No include user scope<br>
 +
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Deleting PO [GodsDom.greek]<br>
 +
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [TRACE] DJDeletionOperation: initializeSearch: offset=0<br>
 +
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: On Page #0<br>
 +
12:29:32, 174[ActiveMQ Session Task] [TRACE] IndexService: Deletion query: (domain:godsdom AND postoffice:greek)<br>
 +
12:29:32, 175[ActiveMQ Session Task] [TRACE] IndexService: Deletion filter query (fq): storeDate:[1980-01-01T19:29:32.000Z TO 2017-10-26T18:29:32.000Z]<br>
 +
12:29:32, 466[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Result page TotalHits: 2061 500 in result set<br>
 +
12:29:32, 468[RTSQuartzScheduler_Deletion_Worker-1] [TRACE] HibernateStringUtil: inSet: 266,269,272,275,463,466,469,472,1793,1796,1799,1802,1805,1808,1811,1814,1817,1820,1552,1555,1558,1561,1564,1567,1570,1573,1576,1579,1582,1585,1588,1591,1594,1597,1600,1603,1606,1609,1677,1680,1683,1686,1689,1692,1695,1698,1701<br><br>
 +
 
 +
'''Mailbox Deletion:'''<br>
 +
This is what you will see when performing a Mailbox Deletion: <br>
 +
 
 +
12:27:15, 528[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: DeletionTaskExecutor begins<br>
 +
12:27:15, 528[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Running deletion job.<br>
 +
12:27:15, 529[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Normal deletion mode<br>
 +
12:27:15, 529[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: autoApprove:true<br>
 +
12:27:15, 571[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: start full mailbox deletion of: 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:15, 626[DeletionJobTask-1-thread-2] [DEBUG] AbstractDeletionTaskExecutor: deleteIndexDocs<br>
 +
12:27:15, 627[DeletionJobTask-1-thread-2] [DEBUG] AbstractDeletionTaskExecutor: deleteIndexDocs, SearchResults.size: 618<br>
 +
12:27:19, 325[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Deletion of messages has been done for mailbox: 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:19, 325[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Now deleting folders...<br>
 +
12:27:19, 358[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Deletion of folders has been done for mailbox: 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:19, 436[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Mailbox deletion has been completed for: 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:19, 455[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Looking for any mailbox that does not have archived messages and is mistakenly defined as active...<br>
 +
12:27:19, 455[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Successful session retrieval<br>
 +
12:27:19, 470[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Returning results: [247BB500-09CE-0000-B067-6F6464663663, 4885C901-09C3-0000-B067-6F6464663663, 5472E700-181B-0000-B067-6F6464663663, 7F196B80-182C-0000-B067-6F6464663663, 81219600-0922-0000-B067-6F6464663663, 83CFCA00-09BA-0000-B067-6F6464663663, 850F4C80-182C-0000-B067-6F6464663663, 850F4C81-182C-0000-B067-6F6464663663, 87E70F00-0AC2-0000-B067-6F6464663663, 88F88F00-02BF-0000-B067-6F6464663663, 8D445980-0AC2-0000-B067-6F6464663663, 903F4A00-0AC2-0000-B067-6F6464663663, 933A3A80-0AC2-0000-B067-6F6464663663, 94823080-0925-0000-B067-6F6464663663, 96352B00-0AC2-0000-B067-6F6464663663, 9B927580-0AC2-0000-B067-6F6464663663, 9E8D6600-0AC2-0000-B067-6F6464663663, A0EFC000-0AC2-0000-B067-6F6464663663, A4834700-0AC2-0000-B067-6F6464663663, AD467D80-1823-0000-B067-6F6464663663, AD467D81-1823-0000-B067-6F6464663663, B9CE6900-093C-0000-B067-6F6464663663, CE2F3A80-0844-0000-B067-6F6464663663, D36BC100-09CD-0000-B067-6F6464663663, E1193C00-0924-0000-B067-6F6464663663, FEEEAB81-09CD-0000-B067-6F6464663663]<br>
 +
12:27:19, 510[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: deleting AddressBook entry with uuid = 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:19, 725[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: delete UUID Mapping: 128 - 60beee8d-1f91-42a6-aec7-e8df19719921<br>
 +
12:27:19, 812[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: DeletionTaskExecutor ends<br><br>
 +
 
 +
'''Database Deletion:''' <br>
 +
When seeing messages being delete from the database, the lines will show something like this: <br>
 +
12:28:03, 412[DumpsterThread] [INFO ] DeleteDocuments: deleteDocumentList:1B55E8253EB9B323B01A9773A452A30550F7665F4E4991BF133C263553A2B607 0 3604<br>
 +
12:28:03, 418[DumpsterThread] [INFO ] DeleteDocuments: deleteDocumentList:7F1C99CFAA51FB93E646F915979756493993148F431B02ACEB711975932FB209 0 3605<br>
 +
12:28:03, 421[DumpsterThread] [TRACE] DeleteDao: Deleting document entry from DB com.gwava.dao.social.Document id=4794, hash=E7FEF0A842443BDA08A6F1B561EC79DCCBB118796189D8D8EE2414FB255AE66F<br>
 +
12:30:09, 920[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: Delete DB Reference (large blob) for 4495 DEAB395DE11294E4206F6851DE565C464CC655AD4B0A1DB5D4F42389BDD9E47E<br><br>
 +
 
 +
'''Disk Deletion:''' <br>
 +
12:30:09, 784[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of 1960766F46190356631CF7C2A2C772DFABFFA14F8E17E3F50FE6655BDBDCD852<br>
 +
12:30:09, 785[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: DataStoreBlobID [id=FDE7E382A645559F7274581DA4DFD1C50BF6E6F308E3C56CE6A4D26AFEAD38AB] has 1 entries...<br>
 +
12:30:09, 785[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of FDE7E382A645559F7274581DA4DFD1C50BF6E6F308E3C56CE6A4D26AFEAD38AB<br>
 +
12:30:09, 786[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: DataStoreBlobID [id=A91682FEF9CB63F6DCA3DCC29EE237F2A430E2FC7BAEEE5E01A186C82AF87D50] has 1 entries...<br>
 +
12:30:09, 786[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of A91682FEF9CB63F6DCA3DCC29EE237F2A430E2FC7BAEEE5E01A186C82AF87D50<br><br>
 +
 
 +
 +
For more information on reading the Retain Server and Deletion logs see this KB article:<br>
 +
http://support.gwava.com/kb/?View=entry&EntryID=2850<br><br>
  
Retain 2.x and Before: Standard Storage Engine
+
To see if the Deletion Job has finished look for this line:<br>
StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.
+
  DumpsterDive - End<br>
  
This was changed in Retain 3 because a straightforward implementation is too slowIt would synchronously delete index pieces and DB pieces etc.   All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.
+
Upcoming for Retain 4.4 Deletion Management
 +
These are the list of items that will be included in Retain 4.3.<br>
 +
• Mailbox Deletion - create a report for all messages deleted for the accounts and put into a CSV file.<br>
 +
• Deletion Management having its own log, separate from the Retain Server log.<br>
 +
• Having a report in the Reporting and Monitoring for Deleted items<br>
 +
• Email CSV files, and detailed reports to administrator or other. <br> <br>
 +
• Be able to remove old orphaned documents and meta-data (separate utility)
 +
• Status tab that will show the details of the job and to abort if necessary.<br>
 +
Future revisions of Retain will eventually contain a dashboard that will show the process of the deletion, when it is completed. As of right now the log files are still the only way to determine if a deletion job is running, and when it finishes.<br>
  
Retain 3.x:  DataStore_Process Storage Engine
+
===Deletion Management Queries===
DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the DELETED state. It does nothing more.
+
The following are a list of queries that can be run to help identify the flags and to help troubleshoot database related issues with the Deletion Management.<br> 
At some point, another background thread specific to the DataStore looks for items in the DELETED state. After performing various sanity checks, it removes the T_DSREF entry and the corresponding file from disk.
+
https://traininggwava.microfocus.net/index.php/Retain_Database#Deletion_Management_specific_Queries

Latest revision as of 19:36, 27 October 2017

Contents

[edit] PAGE OWNER: Steve Dorrough

After reading Retain Administration and Users Guide on Deletion Management, try this hands-on exercise or refer to the section below it, "How It Works"

[edit] Hands-On Exercise

Prerequisite:  Archive mail from your test mail system a '''few days in advance'''.  The more mail the better, but at least have a few items.  This exercise will require that the items be in Retain for over two days.

1.  Login to your Retain server (the actual VM server, not the RetainServer web UI).
2.  Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows.
3.  Rename the "index" directory to "index.old".  The index directory is located under your Retain storage directory (see Server Configuration | Storage).
4.  Create a new index directory.  If on Linux, make tomcat the owner of that directory:  '''chown tomcat:tomcat index'''.
5.  Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log.  Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs].  For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log.
6.  Start tomcat.
7.  Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin.
8.  Under the Management section in the left-hand pane, select '''Deletion Management'''.
9.  Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.”
10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”.  Note the various options.  For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived.
11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”.
12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button.
13. Go ahead and check the boxes to have mail notifications sent to your mailbox.
14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”.
15. Click on the “Save Changes” icon at the upper-right of the screen.
16. Click on the "Run Job Now" button.
17. Switch over to the actual Retain server and look at the tail of your RetainServer log.  What do you see happening?  Did anything get deleted?
18. Login to a Retain mailbox and check to see if anything older than 1 day.  Was anything deleted?  Why not?
19. Logout of RetainServer web UI.
20. Go back to the actual Retain server and stop tomcat.
21. Delete the new index directory you created.
22. Rename index.old to index.
23. Start tomcat.
24. Login to the RetainServer web UI as admin.
25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button.
26. Switch over to the actual Retain server and look at the tail of your RetainServer log.  Now what do you see?  Any deletion activity?
27. What is the role of indexes in a deletion job?  See Level 2 section to check your answer.

[edit] Troubleshooting Information

There are four main stages of a deletion job:

[edit] STAGE 1
  • Query the indexes for all message IDs matching the date criteria.
  • Delete the message records in the indexes for those message IDs that get returned.
[edit] STAGE 2
  • Mark all of the message records in the database that qualify for deletion. It sets the folder field to the "dumpster" folder.
  • Delete all message records that are in the "dumpster" folder. In the log, you'll see "Deleting node [message id]"
[edit] STAGE 3
  • Delete the corresponding attachment records for those messages from the t_document table. This is shown as "DeleteDao: Deleting com.gwava.dao.social.Document..." in the Server log.
[edit] STAGE 4
  • Delete the files off disk that correspond with the records in t_document that were removed. This is shown as " DeleteDocuments: deleteDocumentList:..." in the Server log.

It actually rotates between stage 3 and stage 4. Perhaps we should just call them both "stage 3", but one works in the DB and the other with the file system, so this explanation separates them logically. Stage 3 collects a list of files (hashes) to delete as it removes their references from the database. It probably limits the size of that list for memory and other purposes, so it does it one list at a time.

The following section will describe the deletion stages in more detail. With each stage, the log entries you should expect to see in the RetainServer log are shown. If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO. Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log. For example, the stage 1 shows the first log line as follows: If INFO: "Running deletion job". Thus, If INFO is the condition (the logging level set on the Server) and "Starting the deletion job" is the text displayed in the log.

[edit] STAGE 1 – The Deletion Job

[edit] 1. Deletion job is initiated.

If INFO: “Running deletion job.”
If DEBUG and if report only mode: “Will only report, not delete”
If DEBUG and not report only mode: “Normal deletion mode”
If DEBUG: “autoApprove” [Boolean value]
If DEBUG: “Deletion report stored at: [filePath]"
If DEBUG and PO chosen: “Deleting PO [PO]"


[edit] 2. Searches the indexes.

It starts by sending a query to the Indexer with the given scope and gets a total number of hits as well as the number of hits within the first page. A page consists of 500 records.


If DEBUG: "On Page # [pageNumber]"
If INFO: "Aborting early, deliberately since this is a trial" (if the last page number is less 20 and if in REPORTING ONLY mode)
          By default, we only report the first 10,000 items to be deleted 
          (500 items per page and 20 pages by default: 500 x 20 = 10,000)
If INFO: "Result page TotalHits: [total number of hits from index] [hits on current page] in result set"
[loglevel unknown]: "inSet: [list of all message IDs that are identified]"

It puts all those results in a list of items to be deleted from the index. The list is kept in memory and is lost if Tomcat is stopped at this point.

[edit]

As it queries the Indexer, you'll see this entry in the Indexer log. Note that the following example is from a deletion job where the date range was set to "1 day or older" and triggered off of delivered date:


IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:[domain] AND postoffice:[post office])

From the text box above, you can see that it shows that it is being triggered off of the delivered date because it states "DeletionQuery:delivered". Since the date range was "1 day or older", the range of the query is shown as follows: * TO [numeric string]. The begin date of the range is represented by an asterisk because it goes back infinitely. The end date of the range is represented by a numerical string. The first 8 characters of the numeric string represent the day before the date this was run. I ran the deletion job on 2015-02-27. I do not know what the last 6 digits represent at this time.

[edit]

It then displays the total hits found:

 
IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 8144, localHits:1000, sort:<score>

[edit]

It appears that the database driver picks this up and displays all of the Retain message IDs from retain.t_message.message_id(?) after "inSet:"


[DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: [message IDs]

[edit]

... and then proceeds to list the first 500:


[DeletionJobTask-1-thread-1] INFO  com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: [message IDs]

[edit]

At this point, it is assumed that it sends that list to the Indexer with instructions to delete them from the indexes as we see these entries in the Indexer log:


IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE
LuceneIndexingManager - Delete [message ID] from index
LuceneIndexingManager - Delete [message ID] from index'''
etc...

[edit]

It does this in batches of 500, as we see these log entries in the Indexer log every 500 items:


IndexSearchMessageConsumerImpl - Trying to process the operation: INDEX_DELETION_TASK_SEARCH
IndexSearchMessageConsumerImpl - DeletionQuery:delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents)
LuceneIndexingSearcher - searching with offset: 0, count: 500
LuceneSearchController - query: delivered:[* TO 20150226215106] AND (domain:steve AND postoffice:parents) field: uuid
LuceneSearchController - query.toString: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871005, start query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - 1425073871014, end query: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents)
LuceneSearchController - IndexSearcher.search: +delivered:[* TO 20150226215106] +(+domain:steve +postoffice:parents), totalHits: 7644, localHits:1000, sort:<score>
IndexDeletionMessageConsumerImpl - Trying to process the operation: INDEX_DELETE

[edit] 3. Removes litigation hold messages

Removes all messages pertaining to users that are on litigation hold from the list of messages to be deleted.


If DEBUG: "[emailID] is under litigation hold and cannot be deleted"
If DEBUG: "[emailID] doesn’t belong to user"

[edit] 4. Updates each message record, putting it in a “deleted” state

This is known as putting the message record "in the dumpster".

There are no log messages for this step. This step is where it puts the messages in the "dumpster". The dumpster is a folder record in the t_folder table and is literally named "Dumpster".

You can find the messages waiting to be deleting in the t_message table by using the following query: SELECT * FROM retain.t_message where folder_id = 273;

NOTE: "273" is the folder_id from the t_folder table which you can get by executing this SQL query: SELECT * FROM retain.t_folder where f_name = 'Dumpster';



[edit] THE FOLLOWING INFORMATION WAS GIVEN TO ME BY MIKE BELL AND IS VERY SKETCHY - WAITING FOR MORE ACCURATE AND PIN-POINT EXPLANATIONS


5. Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks.


At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.


Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.


The deletion job keeps looping through each page of the index until it is done. This happens while the INDEXER continues working on deleting items from the index. Once done with the list, that's the end of deletion job.

[edit] STAGE 2 – The Dumpster Thread

The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the t_dsref.f_state field. It would have one of two values:

  • 0 = normal
  • 1 = queued for deletion


As it finds items with a state value of "1", it pages through them and deletes those message records in the database, the ones where the folder_id is set to the dumpster folder's ID. In the sample text from the server log, the number following "Delete" and "Deleting node" is the message ID:

15:54:00, 985[DumpsterThread] [TRACE] DumpsterDiver: Delete 3608803
15:54:00, 986[DumpsterThread] [INFO ] DumpsterDiver: Deleting node 3608803

Events that can trigger the dumpster thread are:

  • Deletion job completion.
  • Server boot up.
  • Maintenance.


If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file. See “Storage Engine Process” for more information (below).


When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database. It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.

[edit] Storage Engine Process

A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:

Retain 2.x and Before: Standard Storage Engine
StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.
This was changed in Retain 3 because a straightforward implementation is too slow. It would synchronously delete index pieces and DB pieces etc. All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.
Retain 3.x: DataStore_Process Storage Engine
DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.
At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.

[edit] Reading Deletion Logs

As you go through the Server log you can trace the progress of the job:

If you schedule a deletion job it will run during the maintainance cycle at 1am (by default):

2015-08-12 01:03:22,584 [Maintain] INFO com.gwava.jobs.MaintenanceJob - Execute deletion jobs

But if jobs are set to manual then you will see this message:

2015-08-12 01:03:22,593 [DeletionJobTask-1-thread-3] DEBUG com.gwava.deletion.DeletionTaskHelper - Skipping deletion job. Only manual trigger selected.

You can trigger a job manually by clicking on the "Run Job Now" button under the Schedule tab and you will see this in the logs:

2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DeletionTaskHelper - Manual trigger of deletion job requested. Not checking schedule

And the job begins:

2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - Running deletion job. 2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - Normal deletion mode 2015-08-12 09:06:03,072 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - autoApprove:true 2015-08-12 09:06:03,074 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.DJDeletionOperation - Deletion report stored at /var/opt/beginfinite/retain/archive/Deletion8464290409683004469EYSZTWSUTFUWYFMRMHWHU.html

If you are running the job against individual users you'll see them listed:

2015-08-12 09:06:03,094 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DJDeletionOperation - Include user scope: [4632DB80-0EA9-0000-9A73-667864633738]

Then it begins to build batches of 500 eligible items from the indexes:

2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] TRACE com.gwava.hibernate.HibernateStringUtil - inSet: 823104,823105,...,824095,824096 2015-08-12 09:06:05,301 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - List<EMail>: 823104,823105,...,824095,824096 2015-08-12 09:06:05,373 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=0,toIndex=100 2015-08-12 09:06:07,100 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=100,toIndex=200 2015-08-12 09:06:07,583 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=200,toIndex=300 2015-08-12 09:06:08,393 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=300,toIndex=400 2015-08-12 09:06:09,427 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - fromIndex=400,toIndex=500 2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs 2015-08-12 09:06:10,913 [DeletionJobTask-1-thread-1] DEBUG com.gwava.deletion.AbstractDeletionTaskExecutor - deleteIndexDocs, tl.size: 500

Once it has build batches of items to be deleted it sends the email (yes, we know. This is going to be changed to when the deletion job actually ends) and begins to remove items from the database:

2015-08-12 09:06:58,459 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.DeletionTaskHelper - Sending mail after deletion job... 2015-08-12 09:06:58,501 [DeletionJobTask-1-thread-1] INFO com.gwava.deletion.AbstractDeletionTaskExecutor - DeletionTaskExecutor ends 2015-08-12 09:06:58,508 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread beginning 2015-08-12 09:06:58,512 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Dumpster dive begins 2015-08-12 09:06:58,573 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Got these many children of dumpster500 2015-08-12 09:06:58,573 [DumpsterThread] TRACE com.gwava.deletion.DumpsterDiver - Delete 823039 2015-08-12 09:06:58,582 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Deleting node 823039

and continues until that is finished:

2015-08-12 09:09:26,605 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Delete orphaned documents 2015-08-12 09:09:26,605 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page

Then the job moves on to delete items from disk, which are also done in batches:

2015-08-12 09:09:26,692 [DumpsterThread] TRACE com.gwava.message.dao.DeleteDao - Deleting com.gwava.dao.social.Document id=841321, hash=560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D 2015-08-12 09:09:40,482 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Executed query in 13877 2015-08-12 09:09:40,489 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - deleteDocumentList:560170B00B4B4C4FABA777D76D93BA98527C12E77AE77A6C2F7DB4580AF0FE8D 0 841321

until it is done and the deletion job ends:

2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Removing orphaned documents page 2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - Executed query in 0 2015-08-12 09:12:30,777 [DumpsterThread] INFO com.gwava.deletion.ng.DeleteDocuments - deleteOrphanedDocuments: Nothing left 2015-08-12 09:12:30,778 [DumpsterThread] INFO com.gwava.deletion.DumpsterDiver - Dumpster dive completed 2015-08-12 09:12:30,778 [DumpsterThread] DEBUG com.gwava.deletion.DeletionJobFactory - Dumpster Thread ending

[edit] Deletion Management - Retain 4.2


OVERVIEW
Retain 4.2 introduces a new Deletion Management system that has improved the effectiveness, reliability, and functionality from the old Deletion Management in previous versions. Customers in the past have had issues with Deletion Management with the following problems:
• Understanding how to configure, and launch Deletion Management
• Having the Deletion Management not delete all of the data, and not reclaiming the storage space that the customer needs.
• Creating orphaned documents, either due to indexing errors, errors within the code of Deletion Management, a restart of the tomcat service, or the server itself. These orphaned documents cannot be deleted, and causes customers to lose confidence in the reliability of the product.
• Reports of the Deletion Management only show 10k items to be deleted, which doesn’t accurately depict what the customer would like to see removed from their system. • When the Deletion Management runs it doesn’t reliably delete messages, and customers are stuck with messages that are still searchable where they should be deleted. This includes messages that go years back,

IMPROVEMENTS AND CHANGES
The new Deletion Management resolves all of the above issues. More importantly its design is to allow customers to delete their data easily using a friendlier interface that matches the archiving process, standardizing the look and feel of the software. At the same time customers ,and support, can be assured that the data will be reliably removed, and at any time if the Deletion Job is interrupted, it will pick up from where it left off. This prevents orphaned documents, and gives customers the confidence that their storage space will be reclaimed.

Listed below are the details the specifications and the improvements of the new Deletion Management.

Interface: Setup and Launching a Deletion Job

  • The Deletion Management has a new interface mirroring the archive jobs menu. This was meant to standardize the look and feel for Retain, as well as help customers know how to set up when, and what to delete in a convenient format.
  • The Deletion Management no longer runs during Maintenance, but instead runs with its own deletion schedule, and process which runs in its own background.
  • The interface for Deletion Management is now contained under a new Data Removal section, which is only used for administration. Rights can be assigned to specified users and groups as necessary.

Deleting Messages
There are three ways to delete messages in Retain:

1. Mailbox Deletion:
This option allows administrators to delete ALL messages from users. It is intended to only run without a schedule, with a run now button as the only means to delete. A report can be generated prior to deleting messages processed.
The user(s) that are deleted from the Mailbox Deletion do not get removed from the address book in Retain. This will be added into future versions of Retain. In Retain 4.3 an option will be put into the Deletion Management to remove the user from the address book.
Customers can use the Mailbox Deletion to their advantage to remove users they no longer want to archive or have since left their email system. Once messages have been completely removed from the Retain system using mailbox deletion, the ts_store in the t_abook table is set to 0 for the mailbox and the user is no longer counted as an active license. It is highly recommended that the administrator removes the user from their e-mail system, or exclude them from archiving it altogether.

2.Item Deletion
The biggest improvement of the Deletion Management is being able to schedule a job, or multiple jobs and have them run without using the maintenance process in Retain. It is done by creating a schedule for each job (when you want to kick off the job), set your profile (what you want to remove), and creating a job , assigning the e-mail servers, or mailboxes. This gives customers flexibility, and can kick the job off very easily by adjusting the schedule. Future versions of Retain will have a run now button for the item deletion.

3. User Item Deletion
Users, if given the access rights by the administrator, can select messages from browse, or from their search in their archive and click to delete. Retain will delete the messages that are selected. This is best used for several messages. Large amounts of messages, or accounts, are best used with Item Deletion, or Mailbox Deletion.

[edit] Scheduling a Deletion Job

To schedule a job to run, create a schedule, profile and job, just like creating an archive job.

Choose the mailboxes in the job, select in the profile what to delete, and the date ranges. Then adjust the schedule to run at the desired time.

The Retain Server log will show when the job runs : DeletionTaskHelper

It will then state it is deleting indexes, then Deleting documents from the database, and then finally off of the physical storage.

[edit] Deletion Reports

The Deletion Management can create a report before deleting the messages as they are processed. These reports are stored in the /archive directory, and are named the date and time that the job kicked off. These reports will show what has been deleted. There is also a list of users and messages that are put in to a CSV file that will show the tracking of what has been deleted.
In previous versions of Retain the report would only show 10,001 messages maximum. This was a hassle for customers who wanted to know what would be deleted first, before they proceed with the actual deletion. Retain 4.2 fixes this, and provides the customer with a full report which the administrators can view and see every message that will be deleted for each account.
Notifications emails are also sent out, but only show basic information if the job has completed or not. Future revisions of Retain will show more detail in the reports, including the Mailbox Deletion report.. Also, future revisions will have a Deletion Report in the reporting and monitoring that will provide more information as well.

***Overall with the changes to the interface, and assigning the Deletion Management to run whenever the customer needs are major improvements to the product. Not only that, the system will reliably delete the messages without question, and the way it does it on the back end ensures that no messages become orphaned, or get missed during the process.

[edit] Deletion Management Deletion Process – Backend

When the Deletion Management runs, it will first run a query based on the criteria specified within the job. This query is dependent on the indexes.
Based on the profile, and the mailboxes selected in the job, the Deletion Management will flag the messages in the t_message table, altering the f_state record. If the last binary digit in the f_state record is a 1 or an odd number then that tells Retain that the message is flagged for deletion. The next step will then proceed with deleting the indexes, which as it does will create a log file in the /archive directory to track which users and messages it has deleted.
The deletion process will delete from the indexes first, then the database, and finally from the physical disk. This method also helps to prevent orphaned documents. As the items are flagged in the database they are tracked through the f_state record. Even if the indexes are deleted, but the metadata in the database is not, the Deletion Management will recognize this, and remove the messages. Even if the Retain Service is interrupted in any way, it will recognize the flags and will delete based on where it left off. This prevents orphaned items.
If the Retain service is ever interrupted restart the schedule and it will pick up from where it left off and delete the messages successfully.

The following are more details with the backend processes on each section of how the Deletion Management removes messages:
Index Process – Backend
When a mailbox deletion or a schedule, profile, and job is set to delete messages, the Deletion Management will run a query to see which users, and items need to be deleted. It is important that the indexes are functioning properly, and if not then a rebuild of the indexes is necessary.
Once the indexes are queried the Deletion Management creates a CSV file of the users and items that are to be deleted. This file is stored in the /archive directory with the date and time that the job ran. The query will proceed to flag the messages that need to be deleted. As mentioned above, it will modify the f_state record in the t_message table. Once it is flagged, it will remove the message.

*Note: Once a message is flagged to be deleted, there is no way to reverse it. Customers need to be sure that they know exactly what they are deleting before removing any data from Retain. A backup of the data, and the database, is required if they are needing to restore the deleted message.

Once the query is finished, the Indexes will be removed.

Item deletion psuedocode

  • When job starts:
  1. query index and get the list of eligible items for deletion
  2. go through them page by page (each page = 500) --> EnhancedDeletionTaskExecutor.deletePage();
    1. find records in DB 100 by 100
      1. mark them for deletion
      2. assign each message to dumpster node
        1. if this message has a message parent, also assign it dumpster node
        2. if this message has children assign them to dumpster node. also append them to the list of index IDs for deletion
      3. commit transaction every 10 message
    2. delete items from index
  • When Dumpster starts: DumpsterTaskExecutor.removeFromDumpster()
  1. delete messages from DB : processAndDeleteMessagesFromDB(), deleteMessage()
    1. MessageProperty
    2. MessageRecipient
    3. MessageAttachment --> this will decrement refCount --> orphaning the document for the message (if no other message is referring to it)
    4. LegacyID
    5. MessageTag
    6. Message
  2. delete documents : deletionOperations.getDeleteDocuments().deleteOrphanedDocuments()
    1. fetch top 1000 documents
    2. delete them from Disk
    3. delete them from DB
    4. continue until nothing is left

Database Process- Backend The next process in the data removal is deleting the entries in the database. Once the indexes are deleted, Retain will proceed to delete the metadata, still using the flags as its measure to delete the data.
The f_state is looked at then the data is removed.

The following is a more detailed sketch of what the f_state looks like and how the Deletion Management would delete messages:

Here are the flags for f_state in t_message
STATE_DELETED=1; // Item is in Deleted State
STATE_LITIGATIONHOLD = 2; // Item has been Litigation Held (not in a deleted state)
STATE_READ = 4; // Item has been Read (not in a deleted state)
STATE_UPDATE=8; // Index should UPDATE instead of ADD - eg atomically remove old documents associated and then add (not in a deleted state)
STATE_SUBJECTHTML=16; // SUBJECT (or other pieces) may have encoded HTML (not in a deleted state)
STATE_SUBJECTHTML=17; // SUBJECT (or other pieces) may have encoded HTML (message set to be deleted)
STATE_ATTACHMENTS = 32; // Has Attachments (not in a deleted state)
STATE_HASTAG=64; // Has Message Tags (not in a deleted state)
STATE_CONFIDENTIAL=128; // Is in Confidential State (not in a deleted state)
STATE_CONFIDENTIAL=129; // Is in Confidential State (not in a deleted state) (Message set to be deleted)


*Besides the STATE-DELETED=1 meaning the message is to be deleted, the binary could be a different number depending on the type of message. If the message contains an attachment, for example, then the number is not going to be a 1. When it is to be deleted, the first bit on the  binary number will change to a one, making the binary number an odd number, and be flagged for deletion.
  • Example database change with message flagged:

44 = 00101100 --- This message has READ, UPDATE and ATTACHMENTS flags on
Set for deletion would change the number 44 to 45 = 00101101

*Another thing to keep in mind, that as the deletion management runs it will also check the f_referenceCount and set it to 0 in the t_document table. This lets the deletion management know to delete from the database as well as the phsycial disk. When the deletion management runs it will check the f_referenceCount, and if any of them equal 0, it will delete them from the disk as well as the t_document table. This also helps to prevent orphaned documents.

Physical Disk Removal – BackEnd As mentioned, when the f_referenceCount = 0, then it will delete the remainder of the data from the t_document, as well as from the physical disk.
The physical disk removal takes the longest. As the data is deleted, it will remove the .dat files from the hash, and will reclaim the disk space for the user.

[edit] Deletion Management Logs

The Deletion Management logs are still contained within the Retain Server logs. In future revisions of Retain they will be separated to their own logs.
To see that the Deletion Job is running look for this line:

DeletionTaskHelper-executor

With the three phases that the Deletion Management goes through to delete data from the Retain system, the logs will show these phases distinctly and clearly. The logging has been modified from previous version to be more clear on exactly what Retain is deleting. The following are examples with a brief expository on what to see in the logs during a Deletion Job:

Index Deletion:
When Retain gets a list of the data to delete via the indexes it will show the following in the log:

12:29:32, 052[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: When processing job: Generate a report but don't delete messages.
12:29:32, 052[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Querying indexes, recording in memory the message IDs of records that match the deletion criteria.
12:29:32, 095[RTSQuartzScheduler_Deletion_Worker-1] [DEBUG] DJDeletionOperation: No include user scope
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Deleting PO [GodsDom.greek]
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [TRACE] DJDeletionOperation: initializeSearch: offset=0
12:29:32, 096[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: On Page #0
12:29:32, 174[ActiveMQ Session Task] [TRACE] IndexService: Deletion query: (domain:godsdom AND postoffice:greek)
12:29:32, 175[ActiveMQ Session Task] [TRACE] IndexService: Deletion filter query (fq): storeDate:[1980-01-01T19:29:32.000Z TO 2017-10-26T18:29:32.000Z]
12:29:32, 466[RTSQuartzScheduler_Deletion_Worker-1] [INFO ] DJDeletionOperation: Result page TotalHits: 2061 500 in result set
12:29:32, 468[RTSQuartzScheduler_Deletion_Worker-1] [TRACE] HibernateStringUtil: inSet: 266,269,272,275,463,466,469,472,1793,1796,1799,1802,1805,1808,1811,1814,1817,1820,1552,1555,1558,1561,1564,1567,1570,1573,1576,1579,1582,1585,1588,1591,1594,1597,1600,1603,1606,1609,1677,1680,1683,1686,1689,1692,1695,1698,1701

Mailbox Deletion:
This is what you will see when performing a Mailbox Deletion:

12:27:15, 528[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: DeletionTaskExecutor begins
12:27:15, 528[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Running deletion job.
12:27:15, 529[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Normal deletion mode
12:27:15, 529[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: autoApprove:true
12:27:15, 571[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: start full mailbox deletion of: 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:15, 626[DeletionJobTask-1-thread-2] [DEBUG] AbstractDeletionTaskExecutor: deleteIndexDocs
12:27:15, 627[DeletionJobTask-1-thread-2] [DEBUG] AbstractDeletionTaskExecutor: deleteIndexDocs, SearchResults.size: 618
12:27:19, 325[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Deletion of messages has been done for mailbox: 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:19, 325[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Now deleting folders...
12:27:19, 358[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Deletion of folders has been done for mailbox: 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:19, 436[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Mailbox deletion has been completed for: 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:19, 455[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Looking for any mailbox that does not have archived messages and is mistakenly defined as active...
12:27:19, 455[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Successful session retrieval
12:27:19, 470[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: Returning results: [247BB500-09CE-0000-B067-6F6464663663, 4885C901-09C3-0000-B067-6F6464663663, 5472E700-181B-0000-B067-6F6464663663, 7F196B80-182C-0000-B067-6F6464663663, 81219600-0922-0000-B067-6F6464663663, 83CFCA00-09BA-0000-B067-6F6464663663, 850F4C80-182C-0000-B067-6F6464663663, 850F4C81-182C-0000-B067-6F6464663663, 87E70F00-0AC2-0000-B067-6F6464663663, 88F88F00-02BF-0000-B067-6F6464663663, 8D445980-0AC2-0000-B067-6F6464663663, 903F4A00-0AC2-0000-B067-6F6464663663, 933A3A80-0AC2-0000-B067-6F6464663663, 94823080-0925-0000-B067-6F6464663663, 96352B00-0AC2-0000-B067-6F6464663663, 9B927580-0AC2-0000-B067-6F6464663663, 9E8D6600-0AC2-0000-B067-6F6464663663, A0EFC000-0AC2-0000-B067-6F6464663663, A4834700-0AC2-0000-B067-6F6464663663, AD467D80-1823-0000-B067-6F6464663663, AD467D81-1823-0000-B067-6F6464663663, B9CE6900-093C-0000-B067-6F6464663663, CE2F3A80-0844-0000-B067-6F6464663663, D36BC100-09CD-0000-B067-6F6464663663, E1193C00-0924-0000-B067-6F6464663663, FEEEAB81-09CD-0000-B067-6F6464663663]
12:27:19, 510[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: deleting AddressBook entry with uuid = 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:19, 725[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: delete UUID Mapping: 128 - 60beee8d-1f91-42a6-aec7-e8df19719921
12:27:19, 812[DeletionJobTask-1-thread-2] [INFO ] AbstractDeletionTaskExecutor: DeletionTaskExecutor ends

Database Deletion:
When seeing messages being delete from the database, the lines will show something like this:
12:28:03, 412[DumpsterThread] [INFO ] DeleteDocuments: deleteDocumentList:1B55E8253EB9B323B01A9773A452A30550F7665F4E4991BF133C263553A2B607 0 3604
12:28:03, 418[DumpsterThread] [INFO ] DeleteDocuments: deleteDocumentList:7F1C99CFAA51FB93E646F915979756493993148F431B02ACEB711975932FB209 0 3605
12:28:03, 421[DumpsterThread] [TRACE] DeleteDao: Deleting document entry from DB com.gwava.dao.social.Document id=4794, hash=E7FEF0A842443BDA08A6F1B561EC79DCCBB118796189D8D8EE2414FB255AE66F
12:30:09, 920[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: Delete DB Reference (large blob) for 4495 DEAB395DE11294E4206F6851DE565C464CC655AD4B0A1DB5D4F42389BDD9E47E

Disk Deletion:
12:30:09, 784[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of 1960766F46190356631CF7C2A2C772DFABFFA14F8E17E3F50FE6655BDBDCD852
12:30:09, 785[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: DataStoreBlobID [id=FDE7E382A645559F7274581DA4DFD1C50BF6E6F308E3C56CE6A4D26AFEAD38AB] has 1 entries...
12:30:09, 785[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of FDE7E382A645559F7274581DA4DFD1C50BF6E6F308E3C56CE6A4D26AFEAD38AB
12:30:09, 786[BackgroundDeletionThread] [TRACE] BackgroundDeletionThread: DataStoreBlobID [id=A91682FEF9CB63F6DCA3DCC29EE237F2A430E2FC7BAEEE5E01A186C82AF87D50] has 1 entries...
12:30:09, 786[BackgroundDeletionThread] [DEBUG] BackgroundDeletionThread: Removing blob from large blob store with hash of A91682FEF9CB63F6DCA3DCC29EE237F2A430E2FC7BAEEE5E01A186C82AF87D50


For more information on reading the Retain Server and Deletion logs see this KB article:

http://support.gwava.com/kb/?View=entry&EntryID=2850

To see if the Deletion Job has finished look for this line:

DumpsterDive - End

Upcoming for Retain 4.4 Deletion Management These are the list of items that will be included in Retain 4.3.
• Mailbox Deletion - create a report for all messages deleted for the accounts and put into a CSV file.
• Deletion Management having its own log, separate from the Retain Server log.
• Having a report in the Reporting and Monitoring for Deleted items
• Email CSV files, and detailed reports to administrator or other.

• Be able to remove old orphaned documents and meta-data (separate utility) • Status tab that will show the details of the job and to abort if necessary.
Future revisions of Retain will eventually contain a dashboard that will show the process of the deletion, when it is completed. As of right now the log files are still the only way to determine if a deletion job is running, and when it finishes.

[edit] Deletion Management Queries

The following are a list of queries that can be run to help identify the flags and to help troubleshoot database related issues with the Deletion Management.

https://traininggwava.microfocus.net/index.php/Retain_Database#Deletion_Management_specific_Queries
Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox