in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Russ Houberg's SharePoint Blog

Unlocking the hidden power of SharePoint for document imaging.

March 2008 - Posts

  • SharePoint 2007: Revenge of the 100GB Database

    Right...he's a Star Wars geek.  Check.

    I wanted to discuss something else I heard a lot about at the SharePoint 2008 conference.  The 100GB database limitation.

    Organizations are now looking at SharePoint as a legitimate large scale application.  They want to believe.  They want to engage.  Then they all hit their heads on the same thing.  100GB database size recommendation.  Folks... it's a recommendation.  The answer to the question of can we go bigger is the same as what I heard several times throughout the conference... "it depends".  If properly architected and with quality disaster recovery solutions in place, the content database can be larger.

    So what I want to discuss is that the 100GB requirement is a guideline driven primary by SLA requirements.  The point being that you have to be able to back up and/or restore the content databases in an amount of time that is reasonable for your business.  If you're doing log shipping or have a disk to disk backup rig with an acceleration component from a Quest or Avepoint and you can nail a backup quick like, then you can go larger than 100GB! 

    The only minor performance issues that I've seen with large content database center around large list updates.  For example, if you add a column or a column index to list or library that has several million content items in it then some of the data tables in the content database will be locked until the change has completed.  This will effectively lock out all other users from accessing any content in that content database until the change has completed. 

    I have seen at least one content database of 400+ GB in size and I've heard of others that are about 1TB!  While 1TB is definately pushing it quite a lot and performance isn't as good as with a smaller database it is usable.  With a small number of users or in an archive scenario it could be acceptable.  The 400+GB database runs fine.  So I want to give you some tips if you are comfortable with going larger than 100GB:

    • I/O is everything.  If you know you are going to have a very large content database, then you'll do well to be generous with your storage gear.
    • RAID 5 is a minimum, RAID 10 is better
    • BEFORE you create your site collection, pre-create an empty content database.  Add data files to the empty content database such that you have 1 data file for every processor "core" in your SQL Server.
    • If at all possible, place the individual files on a separate LUN or physical set of spindles
    • LUNs can be large enough to accommodate multiple data files from DIFFERENT databases
    • MONITOR the Average Disk Queue lengths of the (hopefully different) LUNs.  You want to see them under 2 if possible.  If you're in the decimal range then you're golden.  If you're in the single digits then you're acceptable.  If you see ADQ numbers into the double, triple, or quadruple digits, then you've got problems that need to be addressed.

    For example, lets say I my corporation has collected 4TB of content over the last 5 years and we want to move it all into SharePoint.  For the sake of this example, we'll ignore the fact that once stored in SharePoint, the content will take up more than 4TB of space.  Also, we have an 8 core SQL server with say 32GB RAM.  You could possibly shuffle that content out as follows:

    Create (8) 1TB RAID 5 or RAID 10 LUNs.  Lets say we map those LUNs to drives H: through O:.  Note that you could just as easily mount them to empty folders if you don't want to use drive letters.  With an 8 core SQL Server and 8 content database luns, I can create 8 files per content database and put one of them on each of the different LUNs (neet how that worked out for this example!). 

    • With this rig we could pre-create 20 content databases. 
    • All of the database [dbname].MDF files would be on the H: drive. 
    • We then add [dbname2-7].NDF files on the i:\ through o:\ drives
    • We then create our 20 site collections probably using the "stsadm -o createsiteinnewdb" command

    We then go through the effort of getting the content into SharePoint.  <ShamelessPlug>KnowledgeLake has the framework to get this done by the way.</ShameLessPlug>  Once the 4TB of content is done being loaded into the 20 site collections, you will find that each content database is approximately 200GB in size.  That means that each of the 8 data files for a given database is actually 25GB and spread across each of the 8 LUNs.  We now have a 200GB database with excellent I/O numbers and we still have room to double in size without worrying too much about I/O performance.  Of course, your mileage may vary depending on how the LUNs are configured and the performance characteristics of your SAN.

    I want to be clear that this is a hypothetical example of one possible solution.  Every organization has variables that would affect this architecture, thus fulfilling the "it depends" mantra.

    Russ

  • Large Scale Architecture Question

    So the Microsoft SharePoint Conference 2008 wrapped up on Thursday.  What an amazing ride!  It blows me away to see how SharePoint is absolutely exploding.  It fires up my passion for SharePoint technology even more!

    Paul Learning, Andy Hopkins, and I did finally present during the last session slot of the conference.  There weren't as many attendees as I would have like to have seen, but what the heck, I usually bail early on the last day too!  Anyway, with the smaller group of only about 30 folks it was a less formal session.  We were able to engage in some quality discussion around scalability and performance.

    One person asked me a question that I'd like to address here.  He talked about the fact that they have a large volume of files in FileNet.  They'd like to move them into SharePoint but they can't come up with a way to logically group them such that they could keep the site collections/content databases inside of the 100GB recommended size.  I asked him how his users accessed the content in FileNet.  He didn't want to go there because the answer was that they "searched" for the content.  Then he said that his users had been exposed to SharePoint and had been accustomed to just navigating straight to the documents they need.

    So of course my answer to him was something like, use MOSS search... embrace the search... love the search...!  Search is like the keys to the kingdom in a large scale MOSS implementation!  He didn't want to hear that answer unfortunately.  That brings me to two points. 

    First, don't expose your users to technology you don't want them to have.  He feels like limiting direct navigation is like taking candy from a baby and I agree to an extent. So I encourage everyone to spend the time up front to design the system and train users up in the way that they should go from the beginning (whenever possible... I know it's hard).

    Second, if you want to unlock the power of your MOSS implementation.  You have to imerse yourself in the search capabilities of the MOSS platform.  Mind what you have learned!  Save you it can!  Ok, enough with the Yoda references.  Seriously, I get that it doesn't matter what you put in if you can't get it back out easily.  But we have to break the habits of the S:\ drive.  Hierarchical data structure doesn't help the new employee trying to navigate through 4TB of content!

    Russ

  • Microsoft SharePoint Conference 2008

    Uhhh.  If you haven't gotten the memo yet folks.  SharePoint is HOT!!!!

    I've been to a few different conferences in the last couple years and it is very clear that interest in SharePoint is growing at a rapid pace.  Now that SharePoint has split off from the MS Office conference there is a pure focus at the MS SharePoint Conference 2008. 

    Initial indications were compelling as the conference was sold out over a month in advance.  Then the hotels sold out.  Then upon arrival there were people literally hanging around outside hoping to get in.  One guy commented that if we all didn't have pre-registered RFID badges there probably would have been some serious ticket scalping going on!  It's not hard to see why.  I haven't been around this many smart cerebral type folks in a long time.  You can't help but learn something!

    I grabbed a great session today on Planning for Scale and Capacity.  So much of what they had to say rang true.  Their "It Depends" answer to the inevitable "how many servers do I need" is something that I seem to have given to a lot of customers these days.  Fortunately, these guys have taken a lot of the guesswork out of capacity planning with their excellent Capacity Planning Tool. They did another demo of it today and I'm enamored by how easy it is to use!  It is wonderful to have such a powerful tool that takes into consideration so many variables!  Great work guys!

    Another thing I'm excited about is the story around FAST search.  The message of where FAST slots into the mix is really shaping up.  It's clear that you have Search Express on smaller implementations, Enterprise Search on midrange to large implementations, and then there's FAST which can do the EXTREMELY large implementations with mind boggling scales of capacity and search response times.  I just don't see how any other portal product can complete with this holistic platform.  They've covered all the bases. SharePoint scales to the moon.  And if you have an EA with Microsoft, YOU ALREADY OWN SHAREPOINT!  Calling all CIOs...lets get to work pulling together all of that intellectual content you've got scattered around!  It just got even harder to come up with a reason not to!

    Anyway, it's shaping up to be an amazing conference.  I have some friends doing some sweet sessions.  Darrin Bishop (my mentor from back in the 2003 days) is doing cool things with PowerShell and SharePoint administration these days and I'm looking forward to Todd Baginski's session on SSO.  Unfortunately, I missed his BDC session today but I heard it was excellent.

    Oh yeah, and what a great perk Microsoft... Free Prometric testing for the "Configuring MOSS" and "Developing MOSS" tests!  I took a crack at the Configuring and knocked it out with flying colors so I'm officially an MCTS now!  I'm hoping to get a chance to take the "Developing MOSS" test tomorrow.

    Also, I want to plug the session that I've been directly involved with!  Remember that whitepaper I mentioned in an earlier post?  Well, Paul Learning (MS), Andy Hopkins (MS), and I have been working on this scalability effort for the last couple months and Paul and Andy will be presenting on much of what will be in that whitepaper!  For any of you at the conference that happen to catch this blog from an aggregator somewhere, PLEASE don't bail on the conference early!  We're one of the last sessions on Thursday!  Come check out the SharePoint Scalability – Practical Application for the Enterprise session at 12:00pm!  It was a late entry and didn't make it into the schedule book.

    Finally, thanks to Todd Baginski for talking us into waiting in line over 2 hours to get into the flight simulator at the Museum of Flight.  It was WELL WORTH THE WAIT!.  Paul and I had a blast flipping and rolling that thing like crazy.  I'm looking forward to game night tomorrow!  I've been away from home only 2 days now and in addition to missing the wife and kids, I'm also missing my daily dose of Guitar Hero!  I hope they have it tomorrow!

    Good times.


Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts