Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page extraction accuracy calculation method and system

A calculation method and technology of a calculation system, which is applied in the field of web page search, can solve the problems of not being able to effectively reflect the real effect, not being able to automate batch testing, and not being able to guarantee the accuracy of web pages, etc.

Active Publication Date: 2017-12-08
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the traditional method for calculating the accuracy of webpage extraction cannot perform automated batch testing because it uses manual observation of webpages or manual judgment of whether the DOM tree is accurate. accuracy
Therefore, the traditional calculation method of web page extraction accuracy cannot effectively reflect the real effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page extraction accuracy calculation method and system
  • Web page extraction accuracy calculation method and system
  • Web page extraction accuracy calculation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Such as figure 1 As shown, in one embodiment, a method for calculating the accuracy of webpage extraction comprises the following steps:

[0030] Step S102, acquiring the browser's analysis result of the webpage.

[0031] The browser's analysis of the webpage is relatively perfect, so the browser's analysis result of the webpage can be used as the result of the webpage extraction standard. The closer the result obtained by the webpage extraction module to be tested is to the analysis result of the browser, that The higher the similarity between the two, the more accurate the webpage extraction performed by the webpage extraction module to be tested is.

[0032] It can be understood that before this step, it may include: the browser parses the webpage.

[0033]In a preferred embodiment, an application programming interface (API) provided by the browser is used to obtain the browser's parsing result of the webpage, and the parsing result includes information such as vis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for calculating the accuracy of web page extraction, comprising the following steps: obtaining the result of web page analysis by a browser; obtaining the result of web page analysis by a web page extraction module to be tested; similarity of results. Using the above method, the result of the browser parsing the webpage is the result of the webpage extraction standard, and the calculated similarity can effectively reflect the accuracy of the webpage extraction performed by the webpage extraction module to be tested. This method does not require manual participation and can automatically perform batch webpage for testing. In addition, a web page extraction accuracy calculation system is also provided.

Description

【Technical field】 [0001] The invention relates to web page search technology, in particular to a method and system for calculating accuracy of web page extraction. 【Background technique】 [0002] In web search, web page extraction is a very critical link. Webpage extraction means that the search engine extracts information such as text and links from the crawled webpages, and builds an index. The extracted links are used to continue to crawl new webpages, and the extracted text is used to search for keywords when users query. Match to return web pages related to the query term as query results. Therefore, the accuracy of web page extraction greatly affects the retrieval quality of search engines. [0003] In web page extraction, the web page is usually expressed in the form of a DOM (Document Object Model, Document Object Model) DOM tree. The so-called DOM tree refers to representing links and texts in HTML (HyperTextMark-up Language, Hypertext Markup Language) webpages a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 朱靖君林世飞张立明
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products