Method for automatically analyzing Internet web page
An automatic parsing, Internet technology, applied in the field of web page parsing, which can solve the problems of inability to provide classification and screening services, inability to make judgments, and narrow search scope.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0018] Taking a shopping website as an example, users use vertical search to search on the website, and the specific steps are as follows:
[0019] ①Choose a representative webpage of a shopping website such as Taobao, search for men’s shirts, and segment the representative webpage and display it to the user when the industry word segmentation lexicon is up-to-date. In the most common case, men and shirts are segmented;
[0020] ②According to the graphical display of the word segmentation results on the web page, regular expression matching items are provided, and the regular expressions are replaced by numbers, such as 222 for men and 444 for shirts;
[0021] ③According to the regular expression matching items, select the data to be extracted and set the data name;
[0022] ④According to regular expressions, automatically generate a program for extracting structured data, and establish a vertical search template. When you encounter a shopping website, search for men's shirts ...
Embodiment 2
[0025] Taking the education website as an example, users use vertical search to search on the website. The specific steps are as follows:
[0026] ①Select representative web pages of educational websites such as New Oriental, search for middle school English, and in the case of the latest industry word segmentation thesaurus, segment the representative web pages and display them to users. In the most common cases, the middle school and English are segmented;
[0027] ②According to the graphical display of the word segmentation results on the web page, regular expression matching items are provided, and the regular expression adopts the method of replacing content, such as replacing zx in middle school and yy in English;
[0028] ③According to the regular expression matching items, select the data to be extracted and set the data name;
[0029] ④According to regular expressions, automatically generate a program for extracting structured data, and establish a vertical search tem...
Embodiment 3
[0032] Taking a travel website as an example, users use vertical search to search on the website, and the specific steps are as follows:
[0033] ①Choose a representative web page of a tourism website such as CYTS, search for Huahai, and in the case of the latest industry word segmentation lexicon, segment the representative webpage and display it to the user.
[0034] ②According to the graphical display of the word segmentation results on the web page, regular expression matching items are provided. The regular expression uses the method of deleting specified content or deleting spaces, such as replacing Huahai with Huahai or Huahai;
[0035] ③According to the regular expression matching items, select the data to be extracted and set the data name;
[0036] ④According to regular expressions, automatically generate a program for extracting structured data, and establish a vertical search template. When you encounter a travel website, search for Huahai and use regular expressio...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com