in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Russ Houberg's SharePoint Blog

Unlocking the hidden power of SharePoint for document imaging.

February 2008 - Posts

  • MOSS Search Results Can Be Near Real Time

    Well, I'm not sure how many times I can mention KnowledgeLake and "Transactional Content Management" without getting flogged by the blog hosts for peddling our wares again... but here I go again.

    So once again, I'll set the stage with the world I work in every day.  KL is all about facilitating document processing all the way from paper to grave.  By grave I mean the end of a document lifecycle.  So after KL Capture Server blasts a batch of documents into SharePoint we often take advantage of some form of workflow to kick off additional document/account processing. 

    For example, imagine a lending branch scanning in and releasing a series of documents related to a loan application.  Upon receipt of the actual application document a workflow might be initiated.  Here's where it gets interesting.  During loan application processing there might be several approval steps that are based on peripheral documents such as income statements and/or loan collateral documentation.  If the institution is processing many loans per day, they don't have time to wait around for an incremental crawl to take an hour or sometimes even 15 minutes.

    So what can we do to really tighten down search result availability?  Well in this type of environment I would architect the farm a certain way and setup the incremental crawl for the content source to fire literally every minute.  So the information below outlines how I would configure the farm to squeeze the absolute most performance out of crawl processing.

    Implementation:
    • The farm should include a separate (and beefy) machine for Index Server.  I recommend a box with at MINIMUM of 4 (64bit) CPU cores 16GB RAM running.  The Query role should not be enabled on this server.  Note that you can't mix 32bit and 64bit WFEs in the farm so if you're running 32bit front ends, stick with 32bit Index Server.
    • In order to get that hefty Index Server to take advantage of available resources we need to force it to use more threads while crawling content.  We can do that using 1 of 2 possible techniques
    • OPTION 1: When configuring the "Office SharePoint Server Search" role on the Index Server, set the Indexer Performance to "Maximum":

    image 

    • OPTION 2: We can create a crawler impact rule in Application Management => Manage search service => Crawler Impact Rules => Add Rule
      NOTE: Crawler Impact Rules take precedence over Indexer Performance Settings and since the default simultaneous requests is based on the number of processors on the index server, it's possible that the "Maximum" indexer performance setting could be overridden by the default crawler impact setting (even if no crawler impact rules exist).

    image

    • Then, regardless of which option is chosen, we need to set the "Target" Web Front End to be the actual Index Server itself (WFE role must be enabled) or possibly a specific "target" WFE machine would not be used for serving content to end users.

    image

    • Finally, we set the incremental crawl schedule to fire in 1 minute increments.  Navigate to the Shared Services Administration page for your SSP.  Then click Search Settings => Content sources and crawl schedules => [Content Source Name].  Then click "Create[/Edit] schedule" under the Incremental Crawl field.  Set the values as identified below and click OK => OK.

    image

    That should do it.  You've just configured the search service to kick off incremental crawls in 1 minute intervals!  Shortly after an incremental crawl completes, if any changes were made to any of the index files, those changes will be propagated out to the Query (Search) servers.  Once that propagation has been processed, the content will be available for searching!

    Monitoring Performance:
    • Keep an eye on the "Manage Content Sources" page in the SSP administration site.  It will tell you the indexing status. 
      • You want to watch the Indexing Status field.  It will cay "Crawling Incremental" when it's crawling.  It should say "Idle" when it is finished crawling.  Refresh often to ensure that at some point during the 1 minute interval it is able to finish the incremental crawl.
      • If Index Status never changes to Idle then unfortunately you don't have the horsepower to maintain a 1 minute incremental crawl interval.  You should increase the interval by 1 minute until you verify that your crawl can complete in the allotted amount of time.
    • Keep an eye on the performance of your Index Server, Target Server (if applicable), and your SQL Server.  If ramping up crawl performance has created an uncomfortable increase on system resource utilization on ANY of these servers, you can either back down the crawl threads (Crawler Impact Rules/Indexer Performance) or you can increase the incremental crawl duration or both.
    Additional Points of Interest:
    • There are many factors related to crawl performance.  Everything from how powerful your Index, Target, and SQL Servers are to the I/O performance of the SQL Server databases.  The SSP Search database is particularly vulnerable as it can become very large quickly. 
    • Not all environments are the same.  Your mileage may vary.  For example, KnowledgeLake solutions often revolve in high volumes of TIFF files.  There is no TIFF iFilter available for MOSS out of the box so the "NULL" iFilter is used.  This means that the document metadata is gathered and inserted into the property store in the SSP Search database but the actual binary file doesn't have to be parsed.  So our indexing speed is often much faster.
    • With such a high load created on the Index Server and SQL Server during crawl processing, it's recommended that any Full Crawls be scheduled during off peak times (evenings and weekends, etc).  This is because the Full Crawl will obey the same threading rules used by the incremental crawl.  This could yield a very high level of stress on the SQL Server over an extended period of time.

    OK.  That's about all I have to say about that.  Once again, the cool thing about SharePoint is that it is so configurable!  If the changes I specified here don't work for you, please don't flame me :)  !  Just back off of the threading or put the settings back where they started and you'll be just fine.

  • Windows Server 2003 Update Breaks RDP?

    It seems that one of the Windows Server 2003 Updates manages to hose up our ability to remote desktop (RDP) into that server.  I've seen this happen several times now and never found a clean fix for it through internet research so I spent several hours one day trying to figure out how to overcome the problem.

    First of all, there are many things that can impact remote desktop.  Often times the problem is related to a hardware router/firewall, other software firewall, or Windows Firewall.  Windows Firewall isn't enabled by default in Win Server 2003 but it is in Windows XP/Vista.  You can test to see if Windows Firewall is causing the problem pretty easily by disabling the Windows Firewall service in Service Manager and rebooting the box.  If you still can't get in, leave the firewall disabled until you fix the problem. 

    So with that out of the way, this tip falls more into the category of "I didn't change anything and all the sudden RDP isn't working".

    I can't pinpoint which update causes the problem, but I believe I know what it does to break things.  It appears that the binding of the RDP protocol to the network adapters on the server become broken after the update.  In order to fix the problem, follow this procedure:

    Start by running the Terminal Services Configuration tool.

    • Click on the Connections "folder"
    • Right click the RDP-TCP connection and select properties
    • Select Network Adapter tab
    • Change "All network adapters..." to the network adapter bound to the IP address that you use for RDP.  If it's already associated directly to that network adapter, then change to "All network adapters..."
    • Click OK
    • Reboot the server

    I've found that if I follow this procedure after losing RDP to a Windows Server 2003 update, it works every time.  By the way, this can all be done in a WMI script remotely if you've got the skills for that.  I'm not a WMI script guru by any stretch but I was able to figure it out the proper code in about an hour.

    There are certainly other obstacles that can cause problems with RDP, but this is a big one that I don't think many people realize.  Hope this helps somebody.

  • SharePoint Scalability Whitepaper at SP Conference 08

    Well, the last couple weeks have been a whirlwind.

    In the past few months, I've been working hard with Paul Learning (Microsoft Consulting Services), Andy Hopkins (Technical Development Manager, Microsoft) as well as a few guys on the MS SharePoint Product team.  Paul and I have been busy loading up a massive Fujitsu server farm while Andy burns through logistical hurdles all in an effort to develop a SharePoint Scalability whitepaper.  It looks like we're going to have this whitepaper done in time for SP Conference 08!  So I thought I would provide a little background.

    It's been a rediculous ride.  We started with an incredible hardware rig from Fujitsu, complete with blade servers, rack servers, an Itanium SQL box and a full 10TB of storage space on a Fujitsu Eternus SAN.  My job was to use some of the KnowledgeLake secret sauce and a little creative multithreading to blast 50 million records into SharePoint.  Here's a hint as to the scalability of SharePoint... We were able to use the load tool to send the 50 million documents into SharePoint at a peak rate of 7+ MILLION documents PER DAY!

    Anyway, word has leaked out that we were working on this whitepaper so I thought I might as well blog about it.  In the last month or so, we've had a lot of folks contacting Microsoft and KnowledgeLake about scalability architecture and possibly executing similar tests "on their hardware".  It's amazing how much buzz this is getting and we haven't even advertised it! 

    So the whitepaper is coming soon!  There will be a big splash at Microsoft SharePoint Conference '08 and probably a couple webcasts after that.  We're here to sing the story.  SharePoint CAN SCALE, and we can prove it!

    This is the real reason I wanted to start this blog.  I wanted a forum to talk about the incredible scalability story that SharePoint has to offer.  I'm a believer and I'm here to make you one too!

  • Todd Baginski and His Excellent SharePoint Dev Course

    Just wanted to give a shout out to Todd Baginski and his excellent SharePoint Development Bootcamp training course.  I'm sure many of you that cruise SharePoint Blogs are quite familiar with him.  But for you Google Searchers (or Live Searchers Paul ;) ) who are looking for a review of his class, here's the bottom line.  If you haven't had a chance to attend one of his classes or at least one of his conference lectures, you're missing out.

    I've been writing code for about 14 years now and I've been around the SharePoint block since late 2003.  I've been to many classes and lectures on various topics in the SharePoint space.  I've endured several that have been light on content and full of fluff with maybe one or two interesting points.  Not in Todd's class!  This guy has a deep understanding of the nuts and bolts that make up the collective SharePoint API and it was a pleasure to listen to him (even though business called me out of a few sessions)!

    Also, he left behind a collective of resources that are simply amazing.  Added to my personal knowledge base is a collection of easily referenceable courseware chapters, code snippets, and various handy SharePoint utilities.  Good form Todd.  Good form.

    On a personal note, I'd like to say something else about Todd.  For all the knowledge of SharePoint development concepts that he possesses, there isn't even a hint of ego. It's easy for confidence in one's abilities to bleed over into arrogance and that goes for any walk of life.  It's not a problem that Todd has.  He's a very humble person high on character and passionate about teaching others what he knows.  I really respect him for that.

    Thanks Todd.  It was a pleasure.  See you at the MS SharePoint Conference '08.


Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts