Extraction method of case information in webpage

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A case information and web page technology, applied in the field of case information extraction from web pages, can solve problems such as affecting user experience and low information extraction efficiency, and achieve the effect of high accuracy.

Inactive Publication Date: 2016-05-11

HYLANDA INFORMATION TECH

View PDF4 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Applying the above method can extract some relevant informatics from the target webpage, but due to the large amount of information contained in the webpage, there may be multiple information matching the preset keywords, which may cause the extracted The information contains redundant information or wrong information, which affects user experience

When the extracted information contains redundant information, if you want to remove the redundant information, you need to perform secondary processing on the extracted information, and the efficiency of information extraction is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The present invention will be described in detail below through specific examples.

[0020] The method for extracting case information in the webpage of the present invention comprises the following steps:

[0021] A. Establish a case attribute information knowledge base, which includes a set of proper nouns for case information types, a set of limited words, a set of prohibited words, and a set of specific modifiers for case subjects;

[0022] B. Format the source code of the webpage, extract the text and title content;

[0023] C. Scan the text, segment the text, and divide the article into several complete sentences;

[0024] D. Abstract the unique sentence description of the key fields in the case, and translate it into corresponding rules in combination with the word set in the knowledge base;

[0025] E. When the word in the corresponding word set in the scanning and learning rule appears, and the sentence pattern after the word satisfies a certain rule, and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an extraction method of case information in a webpage. A case attribute knowledge base is built; the knowledge base comprises a case information type special noun set, a qualifier set, a forbidden word set and a case body special modifier set; word segmentation is carried out to a text; an article is segmented into a plurality of whole sentences; corresponding rules are translated in combination with the word sets in the knowledge base; when learning that the words in the word sets corresponding to the rules appear through scanning, the sentence patterns of the words satisfy a certain rule, and there is no forbidden word in the sentence, it is indicated that the sentence satisfies the rule; and the related fields of a court are extracted according to the field positions restricted in the rule. According to the method of the invention, the targeted case information is extracted through a method of combining rules and word count sets; and the case extracted by the method has relatively high accuracy.

Description

technical field [0001] The invention relates to the technical field of Internet information collection, in particular to a method for extracting case information in web pages. Background technique [0002] With the rapid development of the Internet, webpage information has become a huge source of information release and dissemination, and the webpage information is still increasing rapidly, and the webpage may contain a large amount of information required by users, for example, for a certain case Judgment time, judgment result, etc. [0003] In practical applications, in order to provide users with webpage information in a targeted manner, useful information is generally extracted from existing webpages by means of webpage information extraction, and the extracted information is provided to users. In the prior art, when web page information is extracted, methods such as keyword matching can be used. Specifically, when extracting webpage information through the keyword mat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30G06F17/27

CPCG06F16/254G06F16/951G06F40/289

Inventor郝静张作职

OwnerHYLANDA INFORMATION TECH

Extraction method of case information in webpage

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology