index can be divided into two kinds, one is the objective index, the other is a content index. The objective index is independent of the content of the document, for example, we send again inside the post, post author, post time, post address and so on is objective index; while the content index is the reaction of document content, easy to understand, this is not for example. The content index can be divided into single index and multiple index two.
Chinese segmentation is the basis of text extraction, for the input of a Chinese, then Chinese segmentation, and the search engine to identify the effect of sentence meaning.
we work together to understand the function of the index, the indexer is the function of understanding search engine (before I A5 published an article "Shanghai dragon needs to know why the search engine" in search of information, and introduce) to collect "analysis, extract the relevant information on the web page. As a web page using keywords, encoding, URL and so on.". The relevant algorithm then more search engine to calculate a large number of complex, get some relevant information and the related information to establish corresponding index database using the web page.
a qualified Shanghai dragon Er need to have some understanding of the structure of the search engine, so in Shanghai dragon can do well, and do not have to stick to everything in good order and well arranged, optimization methods and techniques of fixed. The indexer to be analyzed, it is the search engine structure in a more important part.
through the introduction above, you must learn some knowledge of the indexer, here it is necessary to understand the index entry. What is the index? For example, the indexer index such as what we think of the time, encoding, author, title and so on are the index entry.
provides Chinese word, you will think of the world’s largest Chinese search engine, love Shanghai, love Shanghai because the search engine has a strong technical background in Chinese participies. The general search engine index of English words or sentences, is relatively easy to extract, because English between words are separated by spaces, all of which are common. But for the Chinese writing sentences, not by space difference, must carry on the word segmentation, word segmentation is we often say.
Chinese segmentation generally can be divided into two types. "
indexer of search engine is more important, such as we are familiar with the love of Shanghai news, it is "every 5 minutes by the machine automatically select Update", compared with real-time, and the amount of the data is quite large. The influence of the indexer algorithm on the indexer is very prominent. It was said that the effectiveness of a search engine to a great extent depends on the index (the index algorithm should be attributed to the indexer accurate).