Retain Deletion Management
Level 1
The Deletion Manager provides for the removal of items from the archive according to the specified criteria. The Deletion Manager runs as a scheduled job in the archive, looking for, and processing or deleting items that match the search terms. Customers often mistakenly feel that a Deletion job is part of the archive jobs. This is not so. Deletion jobs occur immediately following scheduled maintenance jobs and are not linked to archive jobs whatsoever (see the “Schedule” section of this document). Mail removed from the archive is permanently deleted. Use this option with care.
The Deletion Manager will not show up in your system menu if you have not granted the logged-in user the Deletion Management right, or have the litigation hold right. See User Rights. The Litigation Hold right allows users to go to the deletion management section and add or remove the Litigation Hold right for other users. They cannot modify other settings. Users with the deletion management right can view the litigation hold tab, but they cannot grant rights; it is read only.
Core Settings
Here you enable and disable deletion jobs, and detail which actions they will take. When setting up a deletion job, you have the option to tell the job to delete and report on the messages deleted, or to simply generate a report on the mail that will be removed from the database. The report function can be very handy. Rather than deleting any items, it will simply report on the items that would qualify for deletion should you run an actual deletion job.
Basic Options
This tab provides the criteria that the deletion job will use to identify messages to be deleted. This should look nearly identical to the profile of an archive job. The functions are the same. The item type, source, and status determine which messages are flagged for deletion. If none of the boxes are checked, then all items meeting the date criteria are subject to deletion.
Data Scope
There are many dates that are contained in a mail system, and the deletion manager allows you to select different date ranges to identify the scope of the deletion manager. The setup is simple; the date range between the “Begin” and “End” dates will be targeted by the delete job.
The dates can be identified by the date filter. The Date filter allows you to specify the mail system or Retain message dates. The creation and delivered date are mail system dates. The date archived and expiration dates are set in Retain. The expiration date is tied to the job, and is set under the job section.
The Job Expiration option allows you to set an ‘expiration date’ after which the mail no longer needs to be in the archive (States have different laws and requirements but is usually between 5 and 10 years.) The expiration date is set under the worker's Core Settings tab. The Deletion Management interface can utilize this expiration date to identify messages that are due for removal.
Job Members
A deletion job will only be active for selected users or a selected mail server. The Job Members tab allows you to include an entire mail server or group of users, while excluding specific users.
Use this in conjunction with the Generate Report option under Core Settings to pinpoint the mail that will be included in the deletion job.
Notification
The reports, errors, and summaries of delete jobs can be sent to the listed address in the notification tab. Select the options as desired. If no notifications are selected, the report can be accessed from ..\[retain storage directory]\archive. The filename will be Deletion.....html (e.g., Deletion7895881300442619770UTCMSSHMZYBWMRBJJYZAVED.html)
Schedule
The last tab is the schedule tab. This allows you to automate and run a deletion job automatically on mail that has passed its required archive duration. The options are to run this weekly or on a specific day of the month. The Deletion job will run immediately following the scheduled maintenance. Scheduled Maintenance is found under Configuration | Server Configuration | Maintenance. Manually starting a job is not currently supported; however, a workaround to this would be to set the daily maintenance time to a desired time of day (like within minutes of setting up the deletion job). It should be noted, though, that once maintenance begins, no one can log in to the Retain server.
Litigation Hold
The Litigation Hold tab provides the ability to exclude any specified user’s data from any deletion job, preventing any of their data from being deleted when the job runs.
Any official auditors, legal representatives, system administrators, or users may be added to this list. These accounts will be able to set and lift any legislative hold in the system, and therefore this is not a generally granted right and should be restricted to only specified users. Because of the power of this right it is granted separately from the usual rights for users.
To add a user to the legislative hold list, select the ‘Add User’ button to open the ‘Select Mailbox’ window. Select the source system for the user and enter search criteria. After searching, select the desired user or users and select the ‘Ok’ button to add them to the list.
Save all changes.
Hands On
Prerequisite: Archive mail from your test mail system a '''few days in advance'''. The more mail the better, but at least have a few items. This exercise will require that the items be in Retain for over two days. 1. Login to your Retain server (the actual VM server, not the RetainServer web UI). 2. Stop tomcat: rcretain-tomcat7 stop (Linux) or stop the Apache Tomcat service on Windows. 3. Rename the "index" directory to "index.old". The index directory is located under your Retain storage directory (see Server Configuration | Storage). 4. Create a new index directory. If on Linux, make tomcat the owner of that directory: '''chown tomcat:tomcat index'''. 5. Switch over to the server itself again from the Retain web UI and tail the current day's RetainServer log. Baretail.exe can be downloaded and used on Windows servers (see [http://support2.gwava.com/kb/?View=entry&EntryID=527 Location of Logs]. For Linux, change to /var/log/retain-tomcat7 and type: tail -f RetainServer.[yyyy-mm-dd].log. 6. Start tomcat. 7. Login to the Retain Server web UI (http://[ip address]/RetainServer) as admin. 8. Under the Management section in the left-hand pane, select '''Deletion Management'''. 9. Under Core Settings, click on the checkbox for “Job Enabled” to enable it and select “Delete messages as they are processed.” 10. Under the Date Scope tab, click on the arrow for the drop-down box, “Delete messages where”. Note the various options. For this exercise, leave it at the default “Date Stored in Retain”, which is the date the items were archived. 11. While still under Date Scope, select type “1” in the “Older than ___” field and leave the drop-down set to “Days”. 12. Under the Job Members tab, click on the drop-down box below “Include these objects” and select the mail server on which you wish to run this deletion job. Then, click on the “Add Mail Server” button. 13. Go ahead and check the boxes to have mail notifications sent to your mailbox. 14. Under the Schedule tab, click on the drop-down box for “Run Job when” and select “manual”. 15. Click on the “Save Changes” icon at the upper-right of the screen. 16. Click on the "Run Job Now" button. 17. Switch over to the actual Retain server and look at the tail of your RetainServer log. What do you see happening? Did anything get deleted? 18. Login to a Retain mailbox and check to see if anything older than 1 day. Was anything deleted? Why not? 19. Logout of RetainServer web UI. 20. Go back to the actual Retain server and stop tomcat. 21. Delete the new index directory you created. 22. Rename index.old to index. 23. Start tomcat. 24. Login to the RetainServer web UI as admin. 25. Go to Deletion Management and, on the Schedule tab, click on the Run Job Now button. 26. Switch over to the actual Retain server and look at the tail of your RetainServer log. Now what do you see? Any deletion activity? 27. What is the role of indexes in a deletion job? See Level 2 section to check your answer.
Level 2
This section will describe the deletion process. With each stage, the log entries you should expect to see in the RetainServer log are shown. If the Server logging level is set to diagnostic, it will include all entries that apply to the situation (INFO and DEBUG); otherwise, it will just show INFO. Thus, each logging example will provide the conditions followed by the text (in quotes) that will display in the log. For example, the stage 1 shows the first log line as follows: If INFO: "Running deletion job". Thus, If INFO is the condition (the logging level set on the Server) and "Starting the deletion job" is the text displayed in the log.
STAGE 1 – The Deletion Job
1. Deletion job is initiated.
If INFO: “Running deletion job.” If DEBUG and if report only mode: “Will only report, not delete” If DEBUG and not report only mode: “Normal deletion mode” If DEBUG: “autoApprove” [Boolean value] If DEBUG: “Deletion report stored at: [filePath]" If DEBUG and PO chosen: “Deleting PO [PO]"
2. It does a search of the index (Lucene/Exalead) in pages, looping through it a "page" at a time. A "page" consists of 500 records.
If DEBUG: “On Page # [pageNumber]" If INFO and if the last page number is less 20 and if in reporting only mode: “Aborting early, deliberately since this is a trial” By default, we only report the first 10,000 items to be deleted (500 items per page and 20 pages by default: 500 x 20 = 10,000) If INFO: Result page TotalHits: [hits on page] +listOfIDs+” in result set”
- It puts all those items in a list of items to be deleted from the index. The list is kept in memory and is lost if Tomcat is stopped at this point.
3. Removes all messages that are on litigation hold from the list of messages to be deleted.
If DEBUG: "[emailID] is under litigation hold and cannot be deleted" If DEBUG: "[emailID] doesn’t belong to user"
4. Updates each message record, putting it in a “deleted” state (a.k.a., "Dumpster").
There are no log messages for this step. This step is where it puts the messages in the "dumpster". The dumpster is a folder record in the t_folder table and is literally named "Dumpster".
You can find the messages waiting to be deleting in the t_message table by using the following query: SELECT * FROM retain35.t_message where folder_id='273';
- NOTE: "273" is the folder_id from the t_folder table which you can get by executing this SQL query: SELECT * FROM retain35.t_folder where f_name='Dumpster';
5. Once the search of the index is completed (reference step #2), the indexer gets called to delete those items from the index. The time for this to occur varies depending on the indexer load and the scope of the deletion job. It could take minutes, hours, or weeks.
At this point, a browse of the archive mailbox already fails to show the item(s) because the item is now marked for deletion in the database.
Because the item is marked for deletion in the database and before the indexer removes it from the index, it is possible for someone to perform a search of the mailbox (which uses the index) and get back a hit; however, under this scenario, all that will happen is that in the search results the item will be shown as “---- this messages is queued for deletion ----” and nothing more.
The deletion job keeps looping through each page of the index until it is done. This happens while the INDEXER continues working on deleting items from the index. Once done with the list, that's the end of deletion job.
STAGE 2 – The Dumpster Thread
The "dumpster thread" is awakened by the end of the deletion job and looks for new items in the dumpster. The "dumpster" is the t_dsref.f_state field. It would have one of two values:
- 0 = normal
- 1 = queued for deletion
As it finds items with a state value of "1", it pages through them and deletes those records in the database. Events that can trigger the dumpster thread are:
- Deletion job completion.
- Server boot up.
- Maintenance.
If the final document entry is orphaned (reference count=0), it calls the StorageEngine to delete the physical file. See “Storage Engine Process” for more information (below).
When "Dumpster Thread ending" appears in the server log, it just means its last pass found nothing left to process in the database. It does not necessarily mean that the physical file itself has been deleted yet by the storage engine.
Storage Engine Process
A StorageEngine is just a black box with a delete method attached. How it works and its speed depends on the implementation:
- Retain 2.x and Before: Standard Storage Engine
- StandardStorageEngine, which we used regularly before 3.0 simply finds the file and deletes it. Just a simple straightforward method.
- This was changed in Retain 3 because a straightforward implementation is too slow. It would synchronously delete index pieces and DB pieces etc. All would be easy if delete, wait for delete, delete, wait for delete in a linear fashion but that locks up things for days.
- Retain 3.x: DataStore_Process Storage Engine
- DataStore does more complicated things. Internally it simply finds the references in t_dsref and marks them in the deleted state as mentioned in the "Dumpster Thread" discussion (above). It does nothing more.
- At some point, another background thread specific to the DataStore looks for items in the deleted state. After performing various sanity checks, it removes the t_dsref entry and the storage engine deletes the corresponding file from disk.