Supercharge Your Innovation With Domain-Expert AI Agents!

A method and device for text information extraction based on semantic model

A semantic model and text information technology, applied in the field of text processing, can solve problems such as increased workload of staff, low matching flexibility, and low extraction efficiency, and achieve the effects of reducing generation difficulty, improving extraction efficiency, and reducing workload

Active Publication Date: 2019-04-16
ZHONGKE DINGFU BEIJING TECH DEV
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a semantic model-based text information extraction method and device to solve the problem of extracting some complex text sentences or extracting specific words, such as words with specific parts of speech and time words, etc. Including one or more complex regular expressions not only leads to difficult generation and low matching flexibility, resulting in low extraction efficiency, but also increases the workload of staff

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for text information extraction based on semantic model
  • A method and device for text information extraction based on semantic model
  • A method and device for text information extraction based on semantic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Such as figure 1 As shown, one embodiment of the present application provides a method for extracting text information based on a semantic model, including:

[0022] Step 11: Obtain text information to be extracted.

[0023] The text information to be extracted may be a document in doc format, a text document in txt format, or an html document, etc. The text information to be extracted may be characters, numbers, or a combination of characters and numbers, which is not limited in this embodiment.

[0024] Step 12: According to the extraction expression and the semantic model corresponding to the extraction expression, perform information extraction on the text information to be extracted to obtain target information, the extraction expression includes a part-of-speech extraction expression, a time extraction expression and / or Or a rule extraction expression, wherein, the semantic model corresponding to the part-of-speech extraction expression is a statistical semantic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application discloses a method and device for extracting text information based on a semantic model. The method includes obtaining text information to be extracted; performing information extraction on the text information to be extracted according to the extraction expression and the semantic model corresponding to the extraction expression to obtain target information , the extraction expression includes a part-of-speech extraction expression, a time extraction expression and / or a rule extraction expression, wherein the semantic model corresponding to the part-of-speech extraction expression is a statistical semantic model, and the semantic model corresponding to a time extraction expression is a temporal semantic concept model , the semantic model corresponding to the rule extraction expression is the rule semantic model. This application can set corresponding extraction expressions and semantic models according to different extraction requirements, and perform information extraction on the text information to be extracted. The staff does not need to write complicated regular expressions one by one, which reduces the difficulty of generation and improves the flexibility of matching. , therefore, this method can not only improve the extraction efficiency, but also reduce the workload of the staff.

Description

technical field [0001] The present application relates to the technical field of text processing, in particular to a semantic model-based text information extraction method and device. Background technique [0002] With the explosive growth of Internet information, the contents of various documents are becoming more and more colorful. Since the information people need is hidden in various styles of content, it is increasingly difficult to find it. Therefore, people need to use information extraction methods to find the required information in relevant texts. [0003] At present, the information extraction method is mainly based on the HTML structure extraction method, which uses the HTML parser to scan the characters in the HTML text information one by one, analyzes the structural hierarchical relationship of the HTML text information, and numbers the same HTML tags sequentially from zero, and finally A DOM tree corresponding to the HTML text information is formed, and then...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22G06F17/27G06F16/80
Inventor 李德彦晋耀红席丽娜
Owner ZHONGKE DINGFU BEIJING TECH DEV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More