Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic model based text message extraction method and device

A semantic model and text information technology, applied in the field of text processing, can solve the problems of increased workload of staff, low matching flexibility, low extraction efficiency, etc., and achieve the effect of reducing generation difficulty, improving extraction efficiency, and reducing workload

Active Publication Date: 2018-01-19
ZHONGKE DINGFU BEIJING TECH DEV
View PDF11 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a semantic model-based text information extraction method and device to solve the problem of extracting some complex text sentences or extracting specific words, such as words with specific parts of speech and time words, etc. Including one or more complex regular expressions not only leads to difficult generation and low matching flexibility, resulting in low extraction efficiency, but also increases the workload of staff

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic model based text message extraction method and device
  • Semantic model based text message extraction method and device
  • Semantic model based text message extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Such as figure 1 As shown, one embodiment of the present application provides a method for extracting text information based on a semantic model, including:

[0022] Step 11: Obtain text information to be extracted.

[0023] The text information to be extracted may be a document in doc format, a text document in txt format, or an html document, etc. The text information to be extracted may be characters, numbers, or a combination of characters and numbers, which is not limited in this embodiment.

[0024] Step 12: According to the extraction expression and the semantic model corresponding to the extraction expression, perform information extraction on the text information to be extracted to obtain target information, the extraction expression includes a part-of-speech extraction expression, a time extraction expression and / or Or a rule extraction expression, wherein, the semantic model corresponding to the part-of-speech extraction expression is a statistical semantic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semantic model based text message extraction method and device. The method comprises the steps that to-be-extracted text messages are obtained; the to-be-extracted text messages are subjected to message extraction according to extract expressions and semantic models corresponding to the extract expressions, and target messages are obtained, wherein the extract expressionscomprise a part of speech extract expression, a time extract expression and / or a rule extract expression, the semantic model corresponding to the part of speech extract expression is a statistical semantic model, the semantic model corresponding to the time extract expression is a time semantic conceptual model, and the semantic model corresponding to the rule extract expression is a rule semantic model. Accordingly, the corresponding extract expressions and the semantic models are set according to different extract requirements, message extraction is conducted on the to-be-extracted text messages, workers do not need to compile complex regular expressions one by one, the generation difficulty is lowered, the matching flexibility is improved, and therefore the method can not only improvethe extract efficiency but also lower the workload of the workers.

Description

technical field [0001] The present application relates to the technical field of text processing, in particular to a semantic model-based text information extraction method and device. Background technique [0002] With the explosive growth of Internet information, the contents of various documents are becoming more and more colorful. Since the information people need is hidden in various styles of content, it is increasingly difficult to find it. Therefore, people need to use information extraction methods to find the required information in relevant texts. [0003] At present, the information extraction method is mainly based on the HTML structure extraction method, which uses the HTML parser to scan the characters in the HTML text information one by one, analyzes the structural hierarchical relationship of the HTML text information, and numbers the same HTML tags sequentially from zero, and finally A DOM tree corresponding to the HTML text information is formed, and then...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/22G06F17/27G06F17/30
Inventor 李德彦晋耀红席丽娜
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products