A data retention policy is a set of rules that defines how long different kinds of data should be stored as well as how and when the data should be cleaned up once the retention period is over.
Retention policies help organisations to accomplish the following goals:
- comply with legal or compliance requirements for data storage (HIPAA, FINRA, SOX);
- get rid of obsolete or stale data that occupies significant storage space and makes navigation across the company data more complex.
To support these requirements, Afi Backup offers item-level and backup version retention policies that clean up the data after it reaches a certain age defined by the policy.
Supported retention policies
With item-level retention, backup items (ex. emails or files) are deleted after their last modification date exceeds the retention period specified by the policy. See Item-level retention rules for Mail data and Item-level retention rules for Files data for more details.
Item-level retention is supported for the following kinds of data:
- Emails (includes Gmail for Google Workspace; Exchange Mail, Online Archive, Group Mail data for Microsoft 365);
- Files (includes Google Drive and Shared Drives data for Google Workspace; OneDrive, OneNote and Sharepoint sites data for Microsoft Office 365).
Backup version retention
Backup version retention policies define how long the system should keep historical backup snapshots. See How does a backup version retention policy work for more details.
How retention policies work
Retention policies are configured as backup SLA properties and are applied to the resources protected by the corresponding SLAs. The data that is older than the retention period is removed by the cleanup procedure launched during periodic backups. If a protection (SLA) is removed from a resource (user, drive, site, etc), or any SLA without configured retention rules is assigned to a resource, then the system stops applying retention rules to such resource and the data that reached the retention age will not be cleaned up. For the same reason, retention rules are not applied to backups belonging to deleted or suspended users.
Please note that, for performance reasons, the cleanup procedure is not performed on every backup, but is executed with a certain probability depending on the retention period duration and the last cleanup time. Due to this, backups with a custom data retention period enabled may contain some items that are older than the retention age, and such items will be removed during the next cleanup (time lag for the retention cleanup doesn't exceed one month). Also, in case of the retention period change (for example, from 5 years to 3 years), the first cleanup after such change may happen with a delay.
Let's discuss how different retention rules are applied.
Item-level retention rules for Mail data
With an item-level email retention policy set up, all the emails with received dates older than the retention period will be deleted from all historical backup snapshots. Emails that were recently moved between labels (folders) or marked as read/unread are still cleaned up based on their original received date (i.e. such events don't reset an email age).
Example: suppose that user A receives one email per day during 3 months (February 1th, February 2nd, ..., March 1st, ..., April 30th) and has such 90 emails in their mailbox as of 1st May. On May 1st, the 1-month email retention policy for this user is set up and a retention cleanup is done. During the cleanup emails older than April 1st will be removed, and emails dated from April 1st to 30th will remain intact (emails from the last 30 days).
While browsing historical snapshots for backups with enabled retention, you can encounter deleted item placeholders for the items already removed by the cleanup procedure. The data and metadata for these placeholder items are deleted, but they remain present in the browsing view due to the implementation details and to provide better visibility for the retention process.
Item-level retention rules for Files data
With an item-level file retention policy set up, all the files with modification date older than the retention period will be deleted from all historical backup snapshots. Files with creation date older that the retention period, but with newer modification date will be preserved. Files deleted in Google Workspace / Microsoft 365 that are still present in older backup snapshots are also cleaned up based on their last modified date. If for any reason files older than retention age are still present in the corresponding Google Workspace / Microsoft 365 account (although generally the best practice is to setup the same retention policies across the services used by company), they either will not be backed up at all or removed during the next cleanup procedure.
Need to mention that file cleanup job doesn't delete folders regardless of their age so it's expected that the cleanup job will remove files older than the retention age, but will not touch any folders.
Backup version retention rules
With backup version retention, backup versions (snapshots) that are older than the retention period are deleted and backup items (ex. emails or files) are kept as long as there is at least one snapshot where the item is present. A snapshot represents a state of a mailbox, drive, site or a team at a specific point in time.
A backup item (file/email) is present in the snapshot if the item is present in the mailbox (drive, site or team) being backed up at the time of the backup. Please note that a snapshot includes all the items present in the corresponding mailbox (drive, site or team), not only the items that were modified or created at the time the snapshot was taken.
The backup version retention cleans up items that were deleted, or old item versions that were rewritten before current retention period start (these items are therefore absent in all the remaining snapshots).
Example: suppose that user A has document X created on February 1st , then edited on March 1st , April 15th and May 1st ; document Y created on March 1st and deleted on March 2nd; document Z created on February 1st and was not modified since. User A's drive is backed up daily. On May 1st, a 1-month backup version retention policy for this user is enabled and retention cleanup is executed. During the cleanup, the system will delete versions of document X from February 1st and March 1st, and all versions of document Y. Versions of document X from April 15th and May 1st , as well as document Z are preserved since they belong to snapshots within the retention period.
How to configure and manage retention rules
Retention rules are configured as a backup SLA property, defining which data to backup, with what frequency and how long to keep the backup data. The following steps are required to setup custom retention rules:
- go to Service → Protection → Settings → SLA tab, select an existing SLA policy or create a new one, then choose a retention mode (item-level or backup version) and the retention period duration;
- after an SLA policy is set up, assign it to resources or organizational units (Google Workspace) or resource groups (Microsoft 365) that you want to protect with this policy.
Here is an example of item-level retention set up - 1 year retention period for mail data and 6 months retention period for drive data:
Here is an example of backup version retention set up - 1 year snapshots retention period.
You can configure different retention rules for different Google Workspace organizational units or for members of different Microsoft 365 groups by configuring several backup SLA policies with different retention rules and assigning these backup SLAs to the corresponding organizational units or groups.
The retention period can be configured with a monthly granularity (for example, 6 months, 1 year, 3 years, etc).
Example: organization administrator might want to protect all members of the Sales organizational unit with a Gold backup SLA with 3-year email retention, and at the same time protect all Shared Drives with a Silver backup SLA with 1-year document retention.