Check patentability & draft patents in minutes with Patsnap Eureka AI!

Webpage text extracting method

A technology for text and web pages, applied in the field of extracting web page text, can solve a large number of problems such as manual operation and intervention costs, and achieve the effects of reducing manual operation and intervention costs, reducing the difficulty of extraction, and improving efficiency.

Active Publication Date: 2015-02-25
IOL WUHAN INFORMATION TECH CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In view of this, in order to solve the problem in the prior art that a large amount of manual operations and intervention costs are required when extracting webpage content information, the purpose of the present invention is to propose a method for extracting webpage text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage text extracting method
  • Webpage text extracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The following description and drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely represent possible variations. Individual components and functions are optional unless explicitly required, and the order of operations may vary. Portions and features of some embodiments may be included in or substituted for those of other embodiments. The scope of embodiments of the present invention includes the full scope of the claims, and all available equivalents of the claims. These embodiments of the present invention may be referred to herein, individually or collectively, by the term "invention", which is for convenience only and is not intended to automatically limit the application if in fact more than one invention is disclosed The scope is any individual invention or inventive concept.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A webpage text extracting method comprises the steps that according to the domain name of a webpage to be extracted, whether extraction information which corresponds to the domain name and is used for extracting the text is stored in a preset site knowledge base or not is judged; if yes, the text of the webpage is extracted according to the extraction information; if no extraction information which corresponds to the domain name exists in the site knowledge base, or the text of the webpage fails to be extracted according to the extraction information, the text node of the webpage is determined, and the text of the webpage is obtained by extracting the text in the text node. Manual processing is thoroughly liberated, and the manual operation and intervention cost is lowered; the webpage text extracting efficiency of programs is improved; automatic extraction of webpages of various languages is achieved, and the extraction difficulty is greatly reduced.

Description

technical field [0001] The invention belongs to the communication field, and in particular relates to a method for extracting webpage text. Background technique [0002] At present, the extraction of website content information is aimed at the current website, manually analyzing the website structure, and then formulating corresponding templates for the main content structure of the website. When the website is revised, it is necessary to manually judge and modify the previous template, each different site content, or even different types of site content on the same website, it is necessary to configure a set of corresponding templates. When the number of websites increases, the workload of template formulation and maintenance will increase, and the cost of manual intervention will also increase. It will be higher and higher, and the efficiency will be lower and lower. Contents of the invention [0003] In view of this, in order to solve the problem in the prior art that ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/986
Inventor 江潮贺建华蒋汉华
Owner IOL WUHAN INFORMATION TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More