in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Russ Houberg's SharePoint Blog

Unlocking the hidden power of SharePoint for document imaging.

January 2008 - Posts

  • Joel Oleson and the Anatomy of Indexing

    Ok.  So the real reason that I chose today to on-ramp this blog.  I wanted to add a little something to Joel Oleson's post yesterday regarding the Anatomy of Indexing

    First of all, I'm sure anyone interested in my take on SharePoint is probably well aware of Joel Oleson.  Most of the SharePoint community, myself included, holds Joel in the highest respect.  I respect him for his wealth of SharePoint knowledge and his willingness to share it.  I met him recently and in addition to being a SharePoint guru, I found him to be a very likable guy in general.  Rock on Joel.

    So about the Anatomy of Indexing.  It is a great read for anyone who's interested in that black box called SharePoint Indexing.  Please make sure you've read it before continuing.... Ok, I have just one little point I'd like to add from the document repository front.

    During an incremental crawl, the call to the sitedata.asmx yields a result set of ALL the entries in the change log.  This is particularly important to KnowledgeLake as we tend to blast a lot of content into SharePoint in a short period of time (like during migration/conversion operations).  But this might also apply to standard SharePoint restore operations.  If you find yourself performing some type of action that will cause a high number of document changes all at once, all crawl schedules (full and incremental) should be disabled if at all possible.  If it is not possible to disable them, then the url path to the library where content is being loaded/restored should be EXCLUDED from the crawl. 

    If this guidance is not followed, then the call to the sitedata.asmx web service will likely time out due to the sheer volume of data being packaged and shipped via XML (fat).  I've experienced this phenomenon first hand.  You end up with crawls that grind themselves into oblivion and yield a whole lot of unfriendly errors in the crawl log.  Once the load/restore operation is complete a full crawl should be executed.  Also, if production won't be impacted, it's also a good time to do a complete index reset.

  • Microsoft Gets FAST!

    Before I continue on with any other posts on this blog, I have to lay some very exciting groundwork.

    I said in my bio that SharePoint can be architected to scale from a single server installation that handles just a handful of static documents to extremely large implementations consisting of multiple farms that can handle 50 terabytes or more. 

    Well, this is statement requires a bit of qualification.  I have been part of architecture teams that have loaded 10s of terabytes into SharePoint.  This is absolutely possible and I have actually developed a content load utility that can blast 50 million documents into a properly architected SharePoint system in just a couple weeks.  But the problem isn't putting data into SharePoint.  It's making sure you can get it back out.

    I've always been concerned about the 50 million document limitation for an index server in a SharePoint farm.  Sure you can have multiple farms.  And with the MS Search Server 2008 we can even do federated searches to aggregate results.  But the problem comes with relevance.  I can certainly write code that kicks off searches on multiple farms and aggregates the results but I lose relevance.  Without the internal ranking engines talking to each other, all we get is a group of unranked links munged together in some sort of list.

    Enter FAST.  With the recent acquisition offer for FAST, things may change a bit.  I listened to the tele-conference call.  They didn't want to come right out and talk about the implications that FAST would have on SharePoint, but I think it's pretty obvious.  Heck, SharePoint has been playing nice with FAST since July of last year!  All the sudden, SharePoint is a player in a much LARGER market.  My dream is that FAST technology be baked into SharePoint or at least so closely aligned that we won't be worrying about millions of documents anymore.  We'll be talking about BILLIONS of documents.

  • It Begins

    My name is Russ Houberg (queue the Knight Rider music...OK, don't). 

    I've been serving clients as a SharePoint developer/architect for several years now.  I've tinkered around with blog posts in the past but they were always broad topics.  My goal is to focus this blog directly on my experiences with SharePoint.  I don't claim to know it all.  In fact, much of what you find here will be my take on what I've learned from others.  The rest will be the results of my actual successes and failures.

    For the last 3+ years, I've been a technical architect for a great company called KnoweldgeLake.  In addition to having an amazing management team, my coworkers are an excellent group of professionals.  In my role at KL, I've specialized in SharePoint architecture particularly from the point of view of scalability.  I've experienced many of the documented and undocumented features and limitations of this extremely versatile platform. 

    I say platform for a reason.  SharePoint in many ways is a blank slate framework on which you can build just about any web driven technology.  In our case, we choose to implement it as a document imaging repository.  That puts us squarely in competition with products like FileNet (IBM), Documentum, Captiva and Legato.  But we have something that these guys don't have...a FREE yet fully functional storage repository called Windows SharePoint Services 3.0!  So back in 2003 we used our extensive experience in the ECM industry to layer a high quality set of capture and imaging tools on top of SharePoint.  Just like that, we began pushing the ECM boundaries in SharePoint.

    So before I get flogged for touting the company wares, I do have a point.  SharePoint is what you make it to be.  Whether you use it as a corporate intranet, team collaboration system, report center, or a document imaging repository, there is always a common theme.  We put content into the system with the expectation that it's easy to get it back out.  That's where I come in.  I'm all about the scalability and index/search aspects of SharePoint.  If SharePoint can't provide you with what you're looking for quickly and easily, then I haven't done my job.  It doesn't matter how much content you have.  Whether it's just a handful of static content or 50 terabytes of content collected over the last 20 years.  SharePoint can handle it with the proper architecture.

    By the way, kudos to SharePoint Experts for hosting this blog site and for putting together some fantastic training.  I've experienced their training, it's excellent stuff!  KL is having Todd Baginski come in and do the Development Bootcamp next week.  I can't wait!

    So, keep an eye out.  I hope you'll find some interesting stuff in this blog!


Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts