How Search Engine Work
Search engine act like answer machine, means you put any query or question about anything and this machine bring back the content available online that is most relevant to your queries. Now there is one thing that first when create a site then you need to expose your site or content to search engines so that these pages are crawled and indexed and if your content is not accessible to search engines then it will never appear on Search engine results pages referred to SERPs.
There are three main functions of search engines, crawling, indexing, and ranking.
Crawling is done through algorithms in which Google tracking your website content.
Indexing is done after crawling and content and stored in a database to use later against queries.
Ranking is based on the best and relevant content to queries placed by visitors in search engines.
It is a process in which search engine send out robots also known as spiders or crawlers to find new or updated content. Content can be pages, PDF files, videos or in any formats, it is free from formats. Google crawlers fetch some web pages and follow the links on these webpages to find new URLs and content and index it.
Crawling is an important process before showing up on SERPs. If you have a website then it is good if you want to know how many pages of your website are crawled and indexed, open google.com and type in the console site: yourdomain.com.
It will show the result how many pages are indexed. This number not so solid but give you some insight of indexed pages and for more accurate result you should go for Index Coverage report in Google search Console. First organize sitemap of your site and then submit it to index coverage report in order to show you how many of your site pages is indexed. If you site is not showing in search results then there are some reasons behind it.
- Your site is not crawled yet.
- There are no backlinks of your site.
- Some crawler directives in your site which block search engine.
- Some spammed activities of your site can be blocked by search engine.
- Hard navigation of site create hurdle for robot for crawling your site.
These are the file located in the root directory of your site, which suggest how many pages should be crawled and at what speed. Googlebot first approach robot.txt file and according to rule and suggestion, Googlebot crawl the site, if there no robot.txt file then it proceed to crawl the site. Sometime googlebot encounter some error while try to access robot file then Googlebot stop crawling the site.
Indexing is the process of search engine to collect data and storing or saving it into databases or a place called actual search engine index, provide results for search queries. The data stored in the search engine index is shown on search engine results pages (SERPs). It will take much to fetch results pages without indexing.
There are many parts of search engine index like design factor and data structure. Design factor of index outline the architecture and decide how the index will work. Some other parts of search engine index are given below.
It decides how the page is indexed and deciding whether the data is new or updated.
It specifies the space required for the size of index.
It is all about the techniques in order to store large data and filter small data size.
When search engine index is created then data structure is created for it, there are many type of data structure including tree, suffix tree, citation tree, inverted tree, Ngram index, term documented index.
When we search for something in search engines then search manages pages in such order that show most relevancy to the query, and managing of an order of webpages in search engine results is called ranking. So if a site gets high rank in search engine then it means it is highly relevant to the query.