What is Web Search Engines?

A web search engine is a specialized computer server that searches for data on the Web. The search results of a user query are restored as a list (known as hits). The hits can include web pages, images, and different types of files.

There are various search engines also search and return data available in public databases or open directories. Search engines differ from web directories in that web directories are supported by human editors whereas search engines works algorithmically or by a combination of algorithmic and human input.

Web search engines are large data mining applications. There are several data mining techniques are used in all elements of search engines, ranging from crawling (e.g., deciding which pages must be crawled and the crawling frequencies), indexing (e.g., selecting pages to be indexed and determining to which extent the index must be constructed), and searching (e.g., determining how pages must be ranked, which advertisements must be added, and how the search results can be customized or create “context aware”).

Search engines mannerism big challenges to data mining. First, they have to manage a large and increasing amount of data. Usually, such data cannot be processed using several machines. Instead, search engines required to use computer clouds, which includes thousands or even hundreds of thousands of computers that collaboratively mine the large amount of information. Scaling up data mining approaches over computer clouds and high distributed data sets is an application for research.

Second, Web search engines have to deal with online records. A search engine can afford building a model offline on large data sets. It can make a query classifier that creates a search query to predefined elements based on the query topic. Whether a model is constructed offline, the software of the model online should be quick to solve user queries in real time.

There is another challenge is supporting and incrementally refreshing a model on fast increasing data streams. For instance, a query classifier can required to be incrementally maintained continuously because new queries keep increasing and predefined elements and the data distribution can change. Some current model training methods are offline and static and thus cannot be used in such method.

Third, Web search engines have to deal with queries that are asked only a small number of times. Suppose a search engine required to support context-aware query instruction. When a user poses a query, the search engine attempt to infer the context of the query using the customer profile and its query history to return more customized answers inside a small fraction of a second.