Thursday, June 20, 2013

Understanding Crawl and Index components interactions in SharePoint 2013 - Part 1

Hi,

I’m writing this post to describe the main components in SharePoint 2013 Search and related components.
As you know, SharePoint 2013 has a slight change in Search components than SharePoint 2010 since they re-architected search components and their dependencies from the previous version.
I will explain the purpose of the 4 main components (marked with asterisk below) so it will be easier to understand and help you when you plan for your SharePoint Farm deployment since Search is a crucial component to scale for SharePoint collaboration and user adoption.

Main SharePoint 2013 Search Components:

1)      Crawl component*.

2)      Content Processing component*.

3)      Analytics processing component.

4)      Index component*.

5)      Query Processing component*.

6)      Search Administration component.

*These components are covered in this article.

SharePoint 2013 Databases:

1)      Crawl database.

2)      Link database.

3)      Analytics reporting database.

4)      Search Administration database.

First and foremost, is to understand the Crawl components interaction, here I will talk about 3 main components:

a)      Crawl Component

b)      Content Processing component

c)       Crawl database

n  Crawl component: The crawl component is responsible for crawling content sources. It delivers crawled items – both the actual content as well as their associated metadata – to the content processing component.

The crawl component invokes connectors or protocol handlers that interact with content sources to retrieve data. Multiple crawl components can be deployed to crawl simultaneously.

Note: The crawl component uses one or more crawl databases to temporarily store information about crawled items and to track crawl history.

n  Crawl database: The crawl database contains detailed tracking and historical information about crawled items. This database holds information such as the last crawl time, the last crawl ID and the type of update during the last crawl.

n  Content processing component: The content processing component is placed between the crawl component and the index component. It processes crawled items and feeds these items to the index component.

The content processing component transforms crawled items into artifacts that can be included in the search index by carrying out operations such as document parsing and property mapping.

Both the content processing component and the query processing component perform linguistics processing. Examples of linguistics processing during content processing are language detection and entity extraction.· The content processing component writes information about links and URLs to the link database.

 Below shows the dependency data flow between Crawl and Index components:

Crawl Component àStore à Crawl DB àFetch/Process crawled items

à Content processing àcontent processing & extractionà Index Component

Second, once the data has been processed to Index Component, The next step contains other 3 main components are interacting to provide search capability in SharePoint:

a)      Index Component

b)      Index Partition

c)       Query Processing Component

 n  Index component: An index component is the logical representation of an index replica.

In the search architecture, you have to provision one index component for each index replica.

The index component receives processed items from the content processing component and writes those items to an index file.

The index component receives queries from the query processing component and provides results sets in return.

Queries are sent to the index replicas through the query processing component. The system routes and load balances the incoming queries to the index replicas.

 
n  Index partition: An index partition is a logical portion of the entire search index.

The search index is the aggregation of all index partitions.

n  Query processing component: The query processing component is between the search front-end and the index component.

The query processing component analyzes and processes search queries and results.

Both the query processing component and the content processing component perform linguistics processing. Examples of linguistics processing during query processing are word-breaking and stemming.

When the query processing component receives a query from the search front-end, it analyzes and processes the query to attempt to optimize precision, recall, and relevancy. The processed query is then submitted to the index component.

The index component returns a result set based on the processed query back to the query processing component, which in turn processes that result set before sending it back to the search front-end.
 
 
 
There are my thoughts on the main Search interacting components in SharePoint 2013, Drop me a line if you have any additions or questions.
 
 
-ME
 

No comments: