Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Extraction method of case information in webpage

A case information and web page technology, applied in the field of case information extraction from web pages, can solve problems such as affecting user experience and low information extraction efficiency, and achieve the effect of high accuracy.

Inactive Publication Date: 2016-05-11
HYLANDA INFORMATION TECH
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Applying the above method can extract some relevant informatics from the target webpage, but due to the large amount of information contained in the webpage, there may be multiple information matching the preset keywords, which may cause the extracted The information contains redundant information or wrong information, which affects user experience
When the extracted information contains redundant information, if you want to remove the redundant information, you need to perform secondary processing on the extracted information, and the efficiency of information extraction is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extraction method of case information in webpage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be described in detail below through specific examples.

[0020] The method for extracting case information in the webpage of the present invention comprises the following steps:

[0021] A. Establish a case attribute information knowledge base, which includes a set of proper nouns for case information types, a set of limited words, a set of prohibited words, and a set of specific modifiers for case subjects;

[0022] B. Format the source code of the webpage, extract the text and title content;

[0023] C. Scan the text, segment the text, and divide the article into several complete sentences;

[0024] D. Abstract the unique sentence description of the key fields in the case, and translate it into corresponding rules in combination with the word set in the knowledge base;

[0025] E. When the word in the corresponding word set in the scanning and learning rule appears, and the sentence pattern after the word satisfies a certain rule, and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an extraction method of case information in a webpage. A case attribute knowledge base is built; the knowledge base comprises a case information type special noun set, a qualifier set, a forbidden word set and a case body special modifier set; word segmentation is carried out to a text; an article is segmented into a plurality of whole sentences; corresponding rules are translated in combination with the word sets in the knowledge base; when learning that the words in the word sets corresponding to the rules appear through scanning, the sentence patterns of the words satisfy a certain rule, and there is no forbidden word in the sentence, it is indicated that the sentence satisfies the rule; and the related fields of a court are extracted according to the field positions restricted in the rule. According to the method of the invention, the targeted case information is extracted through a method of combining rules and word count sets; and the case extracted by the method has relatively high accuracy.

Description

technical field [0001] The invention relates to the technical field of Internet information collection, in particular to a method for extracting case information in web pages. Background technique [0002] With the rapid development of the Internet, webpage information has become a huge source of information release and dissemination, and the webpage information is still increasing rapidly, and the webpage may contain a large amount of information required by users, for example, for a certain case Judgment time, judgment result, etc. [0003] In practical applications, in order to provide users with webpage information in a targeted manner, useful information is generally extracted from existing webpages by means of webpage information extraction, and the extracted information is provided to users. In the prior art, when web page information is extracted, methods such as keyword matching can be used. Specifically, when extracting webpage information through the keyword mat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/254G06F16/951G06F40/289
Inventor 郝静张作职
Owner HYLANDA INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products