in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Russ Houberg's SharePoint Blog

Unlocking the hidden power of SharePoint for document imaging.

SharePoint 2007: Revenge of the 100GB Database

Right...he's a Star Wars geek.  Check.

I wanted to discuss something else I heard a lot about at the SharePoint 2008 conference.  The 100GB database limitation.

Organizations are now looking at SharePoint as a legitimate large scale application.  They want to believe.  They want to engage.  Then they all hit their heads on the same thing.  100GB database size recommendation.  Folks... it's a recommendation.  The answer to the question of can we go bigger is the same as what I heard several times throughout the conference... "it depends".  If properly architected and with quality disaster recovery solutions in place, the content database can be larger.

So what I want to discuss is that the 100GB requirement is a guideline driven primary by SLA requirements.  The point being that you have to be able to back up and/or restore the content databases in an amount of time that is reasonable for your business.  If you're doing log shipping or have a disk to disk backup rig with an acceleration component from a Quest or Avepoint and you can nail a backup quick like, then you can go larger than 100GB! 

The only minor performance issues that I've seen with large content database center around large list updates.  For example, if you add a column or a column index to list or library that has several million content items in it then some of the data tables in the content database will be locked until the change has completed.  This will effectively lock out all other users from accessing any content in that content database until the change has completed. 

I have seen at least one content database of 400+ GB in size and I've heard of others that are about 1TB!  While 1TB is definately pushing it quite a lot and performance isn't as good as with a smaller database it is usable.  With a small number of users or in an archive scenario it could be acceptable.  The 400+GB database runs fine.  So I want to give you some tips if you are comfortable with going larger than 100GB:

  • I/O is everything.  If you know you are going to have a very large content database, then you'll do well to be generous with your storage gear.
  • RAID 5 is a minimum, RAID 10 is better
  • BEFORE you create your site collection, pre-create an empty content database.  Add data files to the empty content database such that you have 1 data file for every processor "core" in your SQL Server.
  • If at all possible, place the individual files on a separate LUN or physical set of spindles
  • LUNs can be large enough to accommodate multiple data files from DIFFERENT databases
  • MONITOR the Average Disk Queue lengths of the (hopefully different) LUNs.  You want to see them under 2 if possible.  If you're in the decimal range then you're golden.  If you're in the single digits then you're acceptable.  If you see ADQ numbers into the double, triple, or quadruple digits, then you've got problems that need to be addressed.

For example, lets say I my corporation has collected 4TB of content over the last 5 years and we want to move it all into SharePoint.  For the sake of this example, we'll ignore the fact that once stored in SharePoint, the content will take up more than 4TB of space.  Also, we have an 8 core SQL server with say 32GB RAM.  You could possibly shuffle that content out as follows:

Create (8) 1TB RAID 5 or RAID 10 LUNs.  Lets say we map those LUNs to drives H: through O:.  Note that you could just as easily mount them to empty folders if you don't want to use drive letters.  With an 8 core SQL Server and 8 content database luns, I can create 8 files per content database and put one of them on each of the different LUNs (neet how that worked out for this example!). 

  • With this rig we could pre-create 20 content databases. 
  • All of the database [dbname].MDF files would be on the H: drive. 
  • We then add [dbname2-7].NDF files on the i:\ through o:\ drives
  • We then create our 20 site collections probably using the "stsadm -o createsiteinnewdb" command

We then go through the effort of getting the content into SharePoint.  <ShamelessPlug>KnowledgeLake has the framework to get this done by the way.</ShameLessPlug>  Once the 4TB of content is done being loaded into the 20 site collections, you will find that each content database is approximately 200GB in size.  That means that each of the 8 data files for a given database is actually 25GB and spread across each of the 8 LUNs.  We now have a 200GB database with excellent I/O numbers and we still have room to double in size without worrying too much about I/O performance.  Of course, your mileage may vary depending on how the LUNs are configured and the performance characteristics of your SAN.

I want to be clear that this is a hypothetical example of one possible solution.  Every organization has variables that would affect this architecture, thus fulfilling the "it depends" mantra.

Russ

Published Mar 08 2008, 10:10 AM by Russ Houberg
Filed under:

Comments

 

Ron said:

Great blog...I agree that database size is limited by SLA and the organizations ability to support large databases. A questions I ask clients is "Whats your largest SQl database today and how long does it take to backup/restore?" This gives me a sense of their operations maturity. I also like your approach to configuring the disks and servers, will support lots of IOs.

March 9, 2008 2:12 PM
 

Russ Houberg said:

Excellent comments Ron.  I like your question to guage operational capability also.  I'm sure I'll use that in the near future!

I appreciate your comments!  It's good to know I'm not alone out there!

March 9, 2008 2:32 PM
 

SharePoint Link Love: 03-10-2008 at Virtual Generations said:

Pingback from  SharePoint Link Love: 03-10-2008 at  Virtual Generations

March 11, 2008 9:41 PM

Leave a Comment

(required )  
(optional )
(required )  
Add

About Russ Houberg

Hmmm. Bio. Ok. Well, I've been a techie geek as far back as I can remember. Which is somewhere in the neighborhood of software program called "Delta Draw" on an original IBM XT personal computer owned by a close friend of my folks. These days I'm into SharePoint mostly. I work for a great company called KnowledgeLake. We specialize in document centric transactional content management. That's fancy talk for high volume scanning using SharePoint as a storage repository and document processing workflow. So I'm particularly interested in scalability and the index/search nuances of SharePoint. I have an amazing wife, two great boys, and a labradoodle. I'm also very active in my local church and little league baseball organizations. I hope you find something useful in my blog.

Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts