Wednesday, February 22, 2017

Crawling content

Crawling content

Before end users can use the enterprise search functionality in Microsoft Search Server 2008 to search for content, you must first crawl the content that you want to make available for users to query. For the purpose of this article, content is an item that can be crawled, such as a webpage, a Microsoft Office Word document, or a SharePoint site.

To crawl content, do the following:

  1. Create a content source     A content source defines the type of repository that contains the content you want to crawl, the start addresses from which to start crawling, the behavior to use when crawling, and the crawling schedule.

    For information about creating a content source, see Manage content sources.

  2. Specify the credentials to use when crawling all URLs or a specific range of URLs     By default, the default content access account uses Windows domain user credentials to crawl the content repositories that are defined by content sources. Instead, you can use a crawl rule to specify a different content access account, which can be a client certificate, forms credentials, a cookie, or a different content access account.

    For information about setting the default content access account, see Configure default content access account. For information about using a crawl rule, see Manage crawl rules.

  3. Configure proxy server settings for search     When you crawl content that is hosted outside your network, you'll probably set up a proxy server to reach the host server. In this case, it's important to verify the settings for the proxy server and configure them in Search Server 2008. To do this, on the Search Administration page, under Crawling, choose Proxy and timeouts. Usually, you only need to set this option once.

  4. Start a full crawl    For comprehensive information about crawling content, see the topic named "Crawl content" at the website for Microsoft Search Server 2008 on TechNet. You can begin by crawling small amounts of content defined in a particular content source in order to test your set up configuration. Once you have a small amount of content working, increase your criteria to build your index.

    For information about starting a full crawl, see Start a full crawl.

  5. View the crawl log     During the crawl, we recommend that you view the crawl log to check on its progress. Viewing the log lets you confirm that the crawl is successful or detect problems. Common problems are that the authorization failed or that the host is unreachable. When you see problems in the log file, you can stop the crawl, adjust the settings in the Manage Content Sources, Manage Crawl Rules, and Manage Farm-Level Search Settings pages, and then try the crawl again.

No comments:

Post a Comment