Ok. So the real reason that I chose today to on-ramp this blog. I wanted to add a little something to Joel Oleson's post yesterday regarding the Anatomy of Indexing.
First of all, I'm sure anyone interested in my take on SharePoint is probably well aware of Joel Oleson. Most of the SharePoint community, myself included, holds Joel in the highest respect. I respect him for his wealth of SharePoint knowledge and his willingness to share it. I met him recently and in addition to being a SharePoint guru, I found him to be a very likable guy in general. Rock on Joel.
So about the Anatomy of Indexing. It is a great read for anyone who's interested in that black box called SharePoint Indexing. Please make sure you've read it before continuing.... Ok, I have just one little point I'd like to add from the document repository front.
During an incremental crawl, the call to the sitedata.asmx yields a result set of ALL the entries in the change log. This is particularly important to KnowledgeLake as we tend to blast a lot of content into SharePoint in a short period of time (like during migration/conversion operations). But this might also apply to standard SharePoint restore operations. If you find yourself performing some type of action that will cause a high number of document changes all at once, all crawl schedules (full and incremental) should be disabled if at all possible. If it is not possible to disable them, then the url path to the library where content is being loaded/restored should be EXCLUDED from the crawl.
If this guidance is not followed, then the call to the sitedata.asmx web service will likely time out due to the sheer volume of data being packaged and shipped via XML (fat). I've experienced this phenomenon first hand. You end up with crawls that grind themselves into oblivion and yield a whole lot of unfriendly errors in the crawl log. Once the load/restore operation is complete a full crawl should be executed. Also, if production won't be impacted, it's also a good time to do a complete index reset.