The other day I was chatting with one of our Supportability Program Managers, Nino Bilic, and he mentioned something that was rather alarming - the number one reason why our Premier customers open Exchange 2010 critical situations is because Mailbox databases dismount due to running out of disk space on the transaction log LUN. I'll let that sink in for a moment. Naturally I'm shocked…to be completely honest, I thought with the Mailbox Requirements Calculator and our guidance on TechNet, we'd have wiped out this issue by now.
After sharing this information with me, Nino decided that I, not he, should write a blog article on the topic of transaction log capacity planning (gee, thanks Nino!).
Capacity Planning 101
In order to properly size a transaction log LUN, we need to understand a few things about the environment:
- How many mailboxes will reside in the database?
- What is the message profile of the mailboxes in the database?
- What is the average message size?
- What is the average mailbox size?
- How many mailboxes are moved per day?
- What is the backup and restore solution?
- Does the solution need to take into account any other failure scenarios, like network failures?
For the purposes of this discussion, let's assume that each database will house 250 mailboxes. Each mailbox sends/receives a 150 messages per day, with an average message size of 100KB. Based on the table in Understanding Mailbox Database and Log Capacity Factors, we know that a 150 message profile with a 75KB average message size generates 30 transaction logs per day (24 hour period). Since our message size is greater than 75KB, we need to account for that in our transaction logs per mailbox generation. The guidance stipulates:
If the average message size doubles to 150 KB, the logs generated per mailbox increases by a factor of 1.9. This number represents the percentage of the database that contains the attachments and message tables (message bodies and attachments).
Therefore, we can determine the impact our 100KB average message size has with this formula:
150 / 1.9 = [average message size of profile] / x
x = (100 * 1.9) / 150
x = 1.266666666666667 ~ 1.27
So by having a message size that is 25KB larger than the baseline, the number of transaction logs generated per day per mailbox increases by a factor of 1.27. Therefore, 30 transaction logs * 1.27 = 39 transaction logs / day / mailbox. This means, that for a database of 250 mailboxes, each database will generate 39 * 250 = 9,750 mailbox generated transaction logs / day / database.
Mailbox moves also generate transaction logs. Each mailbox moved to the destination database generates roughly enough logs that equal the size of the mailbox (including the contents in the Recoverable Items folders). For example, moving 1% of the mailboxes per day will mean that 2.5 mailboxes per database are moved each day. If each mailbox is 5.4GB in size on average (including 14 day deleted item retention with Single Item Recovery enabled), then 2.5 * 5.4GB/1024 = 13,888 mailbox move transaction logs / day / database.
From a backup/restore perspective, we need to take into account the type of backup architecture we are leveraging. With each backup scenario, there is a recommended number of additional days you should provision from a capacity perspective for your mailbox generated transaction logs. By provisioning extra space, you can survive multiple failures without suffering an outage event. For more information on transaction log truncation, see Understanding Backup, Restore and Disaster Recovery.
Transaction Log Truncation | Recommended Backup Failure Protection | |
Daily Full Backup | Daily | 3 days |
Weekly Full Backup / Daily Incremental | Daily | 3 days |
Weekly Full Backup / Daily Differential | Weekly | 7 days |
Bi-Monthly Full Backup / Daily Incremental | Daily | 3 days |
Exchange Native Data Protection | As logs are no longer required | 3 days |
Of course, there are other scenarios that you may need to consider. For example, if you are deploying a stretched Database Availability Group (DAG) across two datacenters, log truncation will only occur if the network link between the two datacenters is operational and the database copies are healthy. If you know, that an outage of the WAN link could take 5 days to repair, you should adjust your backup failure protection to take that into account.
For our scenario, let's assume we only need to ensure we can survive 3 days of truncation failure events. This means that we need 9,750 / 1024 * 3 = 28.5GB of disk space for our mailbox generated transaction logs.
In addition, we need to account for the amount of disk space required for our mailbox move events for the entire week: 13,888 / 1014 * 7 days = 94.9GB of disk space for our mailbox move operations.
All told, this means that each database needs 123GB of disk space for transaction logs. We should also include a data overhead factor as well, to account for any unexplained phenomenon that may occur: 123GB * 1.2 = 148GB of disk space for transaction logs.
If we are deploying a dedicated LUN for the transaction logs, we would not provision a LUN of 150GB as that would mean that we could consume all of the disk space if we were having backup failures and excessive mailbox moves. Typically you want to ensure that each LUN is provisioned such that only 80% of the disk capacity is utilized. The formula is:
LUN Space = [ projected disk space utilization ] / ( 1 – [desired free space percentage])
LUN Space = 148GB / (1 – .2) = 148GB / .8 = 185GB LUN Space for Dedicated Transaction Log Volume
How can I prevent consuming all of my transaction log disk space?
First and foremost you need to obtain a baseline of your environment to determine you typical log generation rate per day. In addition, you must setup monitoring and take action on any alerts that are generated. Monitoring should monitor for the following scenarios:
- Transaction Log LUN disk space. Setup up several thresholds and different alerting mechanisms. Your first alert should not be the one that indicates 90% of your disk has been consumed. If you know your typical log generation baseline, you can setup a threshold to report if you are 20% over, for example.
- Monitor for successful completion of your backups (if you aren't leveraging Exchange Native Data Protection). Your first indication of backup failures should not be when you run out of disk space.
- Monitor for the truncation events in the Application Log.
- Monitor your database copy replication health.
What if I'm having unexplained growth in my Transaction Logs?
My friend, Mike Lagase, wrote a great article on how to troubleshoot this scenario - http://blogs.technet.com/b/mikelag/archive/2009/07/12/troubleshooting-store-log-database-growth-issues.aspx (please note that the article was written with Exchange 2007 in mind, so several of the tools and/or recommendations may no longer apply with Exchange 2010). In addition to the steps Mike mentions, you can utilize the following in Exchange 2010 to help determine the unexplained transaction log growth:
- You can use the store usage statistics cmdlet (get-StoreUsageStatistics with DigestCategory = 'LogBytes') to identify mailboxes generating high log byte count. Note that this doesn't always work for cases where log bytes aren't generated by the mailbox owner or the operation is performed on behalf of client (like CopyOnWrite) and doesn't include log bytes generated by system services (reported in Event ID 9826). These stats provide a summary of last 10 min of activity for top mailboxes generating log activity (up to 6 samples covering last hour). The following shows how to use store usage stats to find top mailbox generating log bytes over last hour:
[PS] C:\>$stats = Get-StoreUsageStatistics –Database <Database Name>
[PS] C:\>$stats | ? {$_.DigestCategory -eq 'LogBytes'} | group MailboxGuid |sort count -Descending | Select -first 1 -ExpandProperty Group | sort SampleTime | ft -a MailboxGuid,Sample*,Log*MailboxGuid SampleID SampleTime LogRecordCount LogRecordBytes c007c87a-e030-4414-b741-9cf61e88b9de 5 11/7/2011 4:25:05 PM 237 274163 c007c87a-e030-4414-b741-9cf61e88b9de 4 11/7/2011 4:35:05 PM 451 387362 c007c87a-e030-4414-b741-9cf61e88b9de 3 11/7/2011 4:45:06 PM 483 144999 c007c87a-e030-4414-b741-9cf61e88b9de 2 11/7/2011 4:55:06 PM 734 293433 c007c87a-e030-4414-b741-9cf61e88b9de 1 11/7/2011 5:05:06 PM 933 411485 c007c87a-e030-4414-b741-9cf61e88b9de 0 11/7/2011 5:15:06 PM 247 209987 - There are also application events generated for administrative clients (Event ID 9826). These stats represent 2 hours of activity:
Starting from <date/time> service <name> has performed this activity on the server:
RPC Operations: 24168.
Database Pages Read: 1329 (of which 629 pages preread).
Database Pages Updated: 12418 (of which 11555 pages reupdated).
Database Log Records Generated: 13906.
Database Log Records Bytes Generated: 660331.
Time in Server: 19142 ms.
Time in User Mode: 6100 ms.
Time in Kernel Mode: 63 ms. - The performance monitor counter "MSExchangeIS Client(*)\JET Log Record Bytes/sec" can be used to identify what client type is causing log growth.
I think all of us understand how critical it is to ensure that there is enough capacity to ensure that your database availability is not affected. Hopefully this information helps in planning your transaction log capacity.
By:
Ross Smith IV
Principal Program Manager
Exchange Customer Experience
No comments:
Post a Comment