Method and apparatus for removing html tag from search engine
A search engine and tag technology, applied in the field of network search, can solve the problems of narrow usability, transformer failure, frequent updates, etc., and achieve the effect of strong versatility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0028] In view of the problems in the prior art, regular expressions can be used to remove html tags. The key question is where to do it. On a large scale, it is divided into doing it when building an index and doing it when searching. Of course, the efficiency is the highest when building an index. At that time, the solr search engine was used. The solr search engine has a regular expression filtering function, but this step must be done after the word segmentation. After the word segmentation, the html tags will also be divided into pieces due to semantic word segmentation. Regular expressions can no longer be used. Considering this problem, none of the existing solutions of these search engines can solve it. Chinese is too complicated, and foreign software considerations are based on the idea that their tokenizers basically divide words according to spaces, which is not applicable to Chinese.
[0029] It is also possible to remove html tags after fetching from the datab...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 