in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Russ Houberg's SharePoint Blog

Unlocking the hidden power of SharePoint for document imaging.

Joel Oleson and the Anatomy of Indexing

Ok.  So the real reason that I chose today to on-ramp this blog.  I wanted to add a little something to Joel Oleson's post yesterday regarding the Anatomy of Indexing

First of all, I'm sure anyone interested in my take on SharePoint is probably well aware of Joel Oleson.  Most of the SharePoint community, myself included, holds Joel in the highest respect.  I respect him for his wealth of SharePoint knowledge and his willingness to share it.  I met him recently and in addition to being a SharePoint guru, I found him to be a very likable guy in general.  Rock on Joel.

So about the Anatomy of Indexing.  It is a great read for anyone who's interested in that black box called SharePoint Indexing.  Please make sure you've read it before continuing.... Ok, I have just one little point I'd like to add from the document repository front.

During an incremental crawl, the call to the sitedata.asmx yields a result set of ALL the entries in the change log.  This is particularly important to KnowledgeLake as we tend to blast a lot of content into SharePoint in a short period of time (like during migration/conversion operations).  But this might also apply to standard SharePoint restore operations.  If you find yourself performing some type of action that will cause a high number of document changes all at once, all crawl schedules (full and incremental) should be disabled if at all possible.  If it is not possible to disable them, then the url path to the library where content is being loaded/restored should be EXCLUDED from the crawl. 

If this guidance is not followed, then the call to the sitedata.asmx web service will likely time out due to the sheer volume of data being packaged and shipped via XML (fat).  I've experienced this phenomenon first hand.  You end up with crawls that grind themselves into oblivion and yield a whole lot of unfriendly errors in the crawl log.  Once the load/restore operation is complete a full crawl should be executed.  Also, if production won't be impacted, it's also a good time to do a complete index reset.

Published Jan 29 2008, 12:45 PM by Russ Houberg
Filed under: ,

Comments

No Comments

Leave a Comment

(required )  
(optional )
(required )  
Add

About Russ Houberg

Hmmm. Bio. Ok. Well, I've been a techie geek as far back as I can remember. Which is somewhere in the neighborhood of software program called "Delta Draw" on an original IBM XT personal computer owned by a close friend of my folks. These days I'm into SharePoint mostly. I work for a great company called KnowledgeLake. We specialize in document centric transactional content management. That's fancy talk for high volume scanning using SharePoint as a storage repository and document processing workflow. So I'm particularly interested in scalability and the index/search nuances of SharePoint. I have an amazing wife, two great boys, and a labradoodle. I'm also very active in my local church and little league baseball organizations. I hope you find something useful in my blog.

Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts