Content extracting method and device

An extraction method and extraction device technology, applied in the field of communication, can solve the problems of extracting large templates, etc., and achieve the effect of strong adaptability, fast and accurate content data extraction

Inactive Publication Date: 2017-04-26
XIAMEN MEIYA PICO INFORMATION
View PDF11 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the extraction of this type of data is basically done by templates. The key content information to be extracted is obtained by matching the pre-set templates. The advantages of extracting data through templates are accurate and fast, and the disadvantage is that it requires manual work. Continuously extract a large number of templates

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content extracting method and device
  • Content extracting method and device
  • Content extracting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Embodiment 1 is a content extraction method based on semantic analysis and rules, such as figure 1 As shown, it is the processing flow of the content extraction method, which includes:

[0031] S01, perform semantic analysis on the sample data, and construct content extraction rules according to the semantic analysis results and target content;

[0032] S02, establishing a rule base using content extraction rules constructed from a plurality of sample data;

[0033] S03, perform semantic analysis on the data to be extracted, match the corresponding content extraction rules in the rule base according to the semantic analysis results, if the matching is successful, use the content extraction rules to extract content, if the matching fails, record the semantic analysis results, and establish A new content extraction rule is used to update the newly established content extraction rule to the rule base.

[0034] In this embodiment, the semantic analysis specifically includ...

Embodiment 2

[0047] Embodiment 2 On the basis of embodiment 1, the method for extracting content in combination with traditional sets and templates is used for content extraction, which includes steps:

[0048]S00, perform template matching on the data to be extracted, if the matching is successful, use the template for content extraction, and if the matching fails, perform steps S01 to S03;

[0049] S01, perform semantic analysis on the sample data, and construct content extraction rules according to the semantic analysis results and target content;

[0050] S02, establishing a rule base using content extraction rules constructed from a plurality of sample data;

[0051] S03, perform semantic analysis on the data to be extracted, match the corresponding content extraction rules in the rule base according to the semantic analysis results, if the matching is successful, use the content extraction rules to extract content, if the matching fails, record the semantic analysis results, and esta...

Embodiment 3

[0054] Based on the method described in Embodiment 1, the present invention also proposes a content extraction device, including:

[0055] The rule building module is configured to perform semantic analysis on the sample data, and construct content extraction rules according to the semantic analysis results and target content;

[0056] The rule base module is configured to use the content extraction rules constructed by a plurality of sample data to establish a rule base;

[0057] The content extraction module is configured to perform semantic analysis on the data to be extracted, and match the corresponding content extraction rule in the rule base according to the semantic analysis result. If the match is successful, use the content extraction rule to extract the content. If the match fails, record the semantic The result is analyzed, and a new content extraction rule is established, and the newly established content extraction rule is updated to the rule base.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a content extracting method and device based on semantic analysis and rules. And the traditional content extracting method based on a template is combined on the basis of the content extracting method and device based on semantic analysis and rules. By use of the method and device provided by the invention, the advantages that the template extracting speed is fast, the data is accurate are realized, and the semantic analysis and rules are strong in parsing adaptability are realized; through the combination of two ways, the content data can be quickly and accurately extracted.

Description

technical field [0001] The present invention relates to the field of communication technology, in particular to a content extraction method and device. Background technique [0002] With the rapid development of mobile terminals, mobile phones have become a necessity in people's lives. In electronic data forensics, the amount of chat content data is the largest, accounting for an average of 70% of the total amount of data. Each mobile terminal usually has hundreds of thousands of chat messages, up to several million. Chat content information is of great value for research and analysis, and many clues can be found from it. Notification text messages often contain a lot of key information, such as banks, mobile operators, natural gas providers, and so on. Notifications often include basic user information. The bank’s consumption notification text messages include the owner’s name, the last four digits of the bank card number, and the type of bank card. The train ticket and a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/30
Inventor 曾超林艺滨朱健伟江汉祥
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products