Retain Job Configuration

From GWAVA Technologies Training
Jump to: navigation, search

A Retain archive job is made up of four major parts:

  • Schedule
    • When the job should run
  • Profile
    • What the job should dredge
  • Worker
    • Who does the job, and how to connect
  • Job
    • What in the email system to dredge

Contents

Job Design Considerations

How you design a job depends on what you are trying to do. There is a difference between an initial dredge that is trying gather everything from the email system and a daily job that is dredging the new messages from them last day.

Before you start an initial dredge you should test the performance of the system by dredging a test user. You want to see if Retain is functioning smoothly. While the module test is useful there is no substitute for a real job. Throughput should be greater than 3-5 messages per second for on-premise email systems, somewhat lower for online email systems. If the throughput is low then you will have to determine the bottleneck in the system.

An initial dredge is an attempt to get everything that exists in the email system, or at least to the start of the retention period. We have had customers with messages going back to 1997 and earlier but with only a seven year retention period there was no point to dredge the full server, which had corruption with the oldest messages anyway.

You want the job to take a little time as possible, but the initial dredge may take days or weeks. The daily job that runs just after the initial dredge will take extra time because of all the new messages that arrived since the beginning of the initial dredge and it may take a few days to catch up on all those messages. But within a few jobs the duration of the daily job should drop to its typical level.

Depending on what you learned from the test job and how many items are in the email system you can calculate how long it will take to dredge the entire system. Assuming a typical 120 message per day per user you can also calculate how long a daily dredge will take. If that is going to take more than a day it makes sense to create multiple workers and have them working in parallel to increase Retains bandwidth utilization.

Archiving Strategies

There are four basic kinds of strategies that you may wish to use to archive mailboxes. You can name Jobs pretty much anything you want but some customers prefer fewer jobs than more jobs.

  • Initial or Special
  • Daily
  • Folder Update
  • Journal Mailbox

Jobs are saved in the database for statistical reasons, so be aware that any job that runs will not be able to be deleted. So name your jobs carefully.

Special

Special jobs are for doing the Initial dredge of the entire system and for running a job against a particular user to solve an issue. These will often try to archive everything possible. They will be run on an as-needed basis.

Daily

Daily jobs are the standard jobs that the customer will be running almost all the time. They will run every day and will use the storage flag so it will only have to archive the newest data.

Folder Update

Folder Update is a job for situations where the users like to have their email organized. The profile is set to go back a few weeks and update the locations of all messages. That is usually enough time for messages to be in their final resting place. Run this job on the weekends as it will take a little longer. This has the added benefit of re-dredging items that may have been missed when using advanced options.

Journal Mailbox

In Exchange, but not O365 although other online services may allow this, you can set up a Journal Mailbox that will gather a copy of all messages sent to or from the email system. Retain should be set to empty this mailbox as it archives because a journal mailbox can grow very quickly and can become too large for Exchange to serve up successfully. While there are options even in this case it is better to avoid journal mailboxes if possible.

Job Configuration

A Job is made up of four parts:

  • Schedule
  • Profile
  • Worker
  • Job

Schedule

There are only two types of schedules:

  • Recurring
  • Single

Recurring

A recurring schedule is the most common. The job is set to run typically every day at a particular time. You can set the start time, in large systems with multiple jobs you would want each job to start at different times so as not to saturate the network.

You can also set the maximum job duration if the customer does not want the job to run to completion. By default, the schedule does not interrupt a running job. This option is available for customers whose network infrastructure has some issues. This way you can set the job to only run at night and stop the job during the day. This does have the downside of users in the latter part of the alphabet may not be dredged. If the jobs are unable complete in the available time, it is recommended that multiple parallel jobs be created.

Single

A single schedule allows you to create a job that will run only once at a particular date and time.

This is also the schedule to use to manually start a job.

Manually Starting a Job

To manually start a job:

  1. Set the job to a single type schedule.
  2. Move any of the date or time options back by one or more. For example, if the date is set to Date: Oct 21 2015 Time: 07 28, you can set the minutes to 27
  3. Press "Save Changes"
  4. Go to the Worker Web Console (http://serverAddress:48080/RetainWorker)
  5. Click "Refresh job cache now"
  6. Click on the Status tab to watch job begin.

Profile

The profile is where you tell Retain what to dredge out of each mailbox during an archive job. Some settings are specific to the module, this example will use the Exchange Module, but they are not that different in general.

The defaults are set so that all items possible will be archived. Disabling or adding exceptions to these setting will open holes where data may be lost through.

Initially, there are no profiles and you will have to click on "Add Profile" to create a new profile:

Core Settings

The defaults are usually just fine for most customers and test systems.

  • Enable Archiving is enabled by default (this is one place where you can disable archiving see also Job and Module).
  • Mark emails as archived is disabled by default, but allows Windows clients to view that status, but would have to be enabled there as well, so few customers bother with this setting.
  • Delete archived messages from messaging system is disabled by default. This allows Retain to delete items from the source. This is a dangerous feature as it can lead to data loss. It is often better to have the email system do that for them.

Message Settings

Most customers choose to archive all items, which is safest. Disabling any items opens potential data loss holes.

  • Mailbox Type: Exchange has two kinds of mailboxes
    • Users
    • Room/Equipment (these are mailboxes dedicated to a particular room or piece of equipment but are managed by one or more users)
  • Item Type: Exchange stores five different kinds of items in each mailbox
    • Mail
    • Appointment
    • Note
    • Task
    • Voice Message (this can often be disabled as a test system is not connected to a phone system to prevent uninformative errors from appearing in the worker log)
  • Item Source: There are four default categories in Exchange, and depending on the organization's retention policy some categories could be ignored
    • Received
    • Sent
    • Draft
    • Personal
  • Message Status: Messages have status flags associated with them
    • Read
    • Private
    • Personal
    • Confidential

Scope

Scope sets the messages and date range that will be scanned.

  • Date Range to Scan
    • All Messages (ignore date) This gets all new items it finds based on the Duplicate Check flag
    • Number of days from job start To get items n days or newer
    • Number of days from job start (older) To get items n days or older
    • Specify custom date range To use a particular date range
    • Specify custom date range (relative to job start) To have a moving data range
  • Duplicate Check
    • Try to publish all messages (SLOW) This starts from the beginning of time (1/1/1970 00:00AM)
    • Ignore all messages older than item store flag (fast) This and the others use the storage flag set below, and they should match.
  • Set Storage Flags
    • Item Store Flag This flag is stored in the Retain database, others are stored in their respective email systems.

Miscellaneous

This tab allows for the fine tuning what is stored in Retain

  • Storing attachments
    • Store all attachments (default)
    • Don't store any attachments, other than the message This option is rarely used since it would prevent data from being stored.
  • Internet Headers
    • Store/index Internet Headers (disabled by default) This is optional
  • Mailbox item options
    • Include user's archive mailbox Exchange stores archived mail in a separate database, enable this option to dredge from that database so that it does not have to be archived separately.
    • Include user's recoverable items Exchange keeps deleted items in a hidden folder for 14 days. This allows Retain to dredge from there.
    • Include Public Folders(slow) If public folders need to be archived choose Owned by mailbox because the Impersonation user will not have rights to documents owned by another user, unless rights were delegated to the Impersonation user.
  • Journaling Mailbox storage format: Microsoft no longer recommends using journaling, but to make it easy to navigate items can be stored in the following manners.
    • Store in one folder
    • Store by year (yyyy)
    • Store by year and month (yyyyMM)

Advanced

This tab allows the creation of advanced criteria to limit the amount of data collected by Retain. This is not recommended but it is available for special cases.

Advanced Criteria

A dredge can be limited to certain criteria

  • Subject
  • Sender
  • Recipient
  • Message size (bytes)
  • Attachment Name
  • Category
Folder Scope

A dredge can be limited by folder

  • Items from All Folders
  • ONLY items from folders listed below
  • All folders EXCEPT those listed below

Worker

The Worker is made up of two parts:

  • Worker engine
  • Worker agent

The Worker engine is the software that needs to be installed on a server. This server can be the Retain server, the mailbox server or a server separate from both of them. We recommend one worker per post office or mailbox database.

When you name a worker it is a best practice to mention the location of the worker, especially if it is on a different server. It can be quite difficult to find the worker log if you don't know what server the worker agent is on. (e.g. Worker0 GW_PO_1)

Once you install the worker engine you download and install the bootstrap so the worker engine can connect to the worker agent on the Retain server.

Polling

  • Interval

This is can be set to 60 minutes so the RetainWorker log is not filled with unchanging configuration information. Status tab will be updated every 500 messages, by default.

Logging

  • Logging Options

It is best to be in Diagnostic (Trace) logging. Any lower and there won't be enough information to diagnose an issue

Connection

  • Server Connection

Fill this information for the RetainWorker agent to know how to connect to the RetainServer

Module specific

  • GroupWise SOAP Access

GroupWise 2014+ requires SSL to be active by default so this may need to be enabled.

  • Exchange

In an Exchange system with multiple mailbox and CAS servers it can help to provide an address to an Active Directory site to improve autodiscover response.

Status

  • Worker Status

Provides status updates during a job, which is set in the Polling tab.

Bootstrap

  • Boot Strap

Download the bootstrap code so the RetainWorker engine can communicate with the Retain server.

Connect to the RetainWorker engine with your browser (http://serverAddress:48080/RetainWorkerN) upload the bootstrap and log into the worker.

It is a very good idea to bookmark this page.

Job

Core Settings

  • Job enabled
  • Schedule
  • Profile
  • Worker
  • Enable data expiration

Journaling

For use with Exchange journaling mailboxes, which are no longer recommended by Microsoft but is still useful in certain circumstances.

  • Enable Journaling
  • Delete archived items from journal

Mailboxes

  • Mail Servers
  • Distribution Lists
  • Distribution Lists (exclude)
  • Users

Notification

  • SMTP connection
  • Mail when errors occur
  • Mail summary when job complete

Status

You can monitor and abort a job here. It generally takes several minutes for a job to abort, but if a job does not abort, you will have to restart tomcat.

Next Step

Now run a job so you have data in Retain. [1]


Return to Retain Training

Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox