Tuesday, January 10, 2017

Manage crawler impact rules

Manage crawler impact rules

A crawler impact rule defines the rate at which the Office SharePoint Search service requests documents from a Web site during crawling. The rate can be defined either as the number of simultaneous documents requested or as the delay between requests. In the absence of a crawler impact rule, the number of simultaneous documents requested ranges from 5 through 16 depending on the hardware resources of the indexer.

You can use crawler impact rules to modify loads placed on sites when you crawl them.

Site name expressions are evaluated in order. Typically, you should list the crawler impact rules in most-specific-to-most-general order, because the first matching rule is applied. For example, * must always be the last rule in the list; otherwise, any rules listed later will not apply. If you create a new rule while a crawl is in progress, the new rule is effective as soon as you save it: You do not need to wait for the crawl to finish before the rule is effective (though content that is already crawled is not subject to the new rule).

To add, edit, delete, or reorder crawler impact rules, you must first open the Crawler Impact Rules page:

  • On the Search Administration page, under Crawling, click Crawler impact rules.

What do you want to do?

Add a crawler impact rule

Edit a crawler impact rule

Delete a crawler impact rule

Reorder crawler impact rules

Add a crawler impact rule

  1. On the Crawler Impact Rules page, click Add Rule.

  2. On the Add Crawler Impact Rule page, in the Site box in the Site section, type the URL of the site but exclude the protocol (for example, do not include http://). The following table shows the wildcard characters that you can use in the site name when adding a rule.

Use

To

* as the site name

Apply the rule to all sites.

*.* as the site name

Apply the rule to sites with dots in the name.

*.site_name.com as the site name

Apply the rule to all sites in the site_name.com domain (for example, *.adventure-works.com).

*.top-level_domain (such as *.com or *.net) as the site name

Apply the rule to all sites that end with a specific top-level domain (for example, .com or .net).

?

Replace a single character in a rule. For example, *.adventure-works?.com will apply to all sites in the domains adventure-works1.com, adventure-works2.com, and so on.

  1. You can create a crawler impact rule for *.com that applies to all Internet sites whose addresses end in .com. For example, an administrator of a portal might add a content source for samples.microsoft.com. The rule for *.com applies to this site unless you add a crawler impact rule specifically for samples.microsoft.com.

  2. In the Request Frequency section, select one of the following options:

    • Request up to the specified number of documents at a time and do not wait between requests      You can specify the maximum number of requests that the search service can make at one time to the site. On the Simultaneous requests menu, click the number of simultaneous requests to perform.

    • Request one document at a time and wait the specified time between requests      You can specify a delay between requests. The search service makes one request per site at one time, and then it waits for the specified amount of time before making the next request. In the Time to wait (in seconds) box, type the time to wait between requests. The minimum time to wait between requests is one second, and the maximum time is 999 seconds.

      If the request rate is too frequent, the search service can overload some Web sites with requests.

  3. Click OK.

Top of Page

Edit a crawler impact rule

  • On the Crawler Impact Rules page, in the list of rules, click Edit on the menu of the rule that you want to edit.

    The settings that you can edit are described in the Add a crawler impact rule section.

Top of Page

Delete a crawler impact rule

  • On the Crawler Impact Rules page, in the list of rules, click Delete on the menu of the rule that you want to delete.

Top of Page

Reorder crawler impact rules

  • On the Crawler Impact Rules page, in the list of rules, in the Order column, select a value in the drop-down list that specifies the position you want the rule to occupy.

    The rule that currently occupies that position is shifted down by one position, along with all the rules below it.

Top of Page

1 comment:

  1. Bots, spiders, and other crawlers hitting your dynamic pages can cause extensive resource (memory and CPU) usage.
    This can lead to high load on the server and slow down your site(s).
    If you want to know that How to stop bots from crawling my site then click this link for more information:
    How to stop bots from crawling my site

    ReplyDelete