Text information extraction method and device, computer equipment and storage medium

A technology for computer storage and text information, which is applied in the field of computer equipment and computer storage media, and text information extraction, and can solve problems such as unsatisfactory information extraction effects

Active Publication Date: 2020-03-17
PING AN TECH (SHENZHEN) CO LTD
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Existing information extraction systems can achieve better information extraction for texts in a single language (such as Chinese and English), but the effect of information extraction for texts in two different languages ​​is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information extraction method and device, computer equipment and storage medium
  • Text information extraction method and device, computer equipment and storage medium
  • Text information extraction method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] figure 1 It is a flow chart of the text information extraction method provided by Embodiment 1 of the present invention. The text information extraction method is applied to a computer device for extracting entity relations from mixed language texts.

[0073] Such as figure 1 As shown, the text information extraction method includes:

[0074] S101, identifying the first language entity contained in each sentence in the first language corpus text and the second language entity contained in each sentence in the second language corpus text, and combining each two of the first language corpus texts in the same sentence The first language entities that include at least one verb in and between them form a first language corpus entity pair, and every two second language entities that are in the same sentence and include at least one verb between them in the second language corpus text are composed The second language corpus entity pair.

[0075] The first language corpus t...

Embodiment 2

[0144] figure 2 It is a structural diagram of a text information extraction device provided in Embodiment 2 of the present invention. The text information extraction device 20 is applied to a computer device. The text information extraction device 20 is used for extracting entity relations from mixed language texts. Such as figure 2 As shown, the text information extraction device 20 may include an identification module 201, an extension module 202, a first extraction module 203, a labeling module 204, a training module 205, a second extraction module 206, a first classification module 207, and a third extraction module 208 , the second classification module 209 , and the determination module 210 .

[0145] The identification module 201 is used to identify the first language entity contained in each sentence in the first language corpus text and the second language entity contained in each sentence in the second language corpus text, and each two Two first language entit...

Embodiment 3

[0216] image 3 It is a schematic diagram of a computer device provided by Embodiment 3 of the present invention. The computer device 30 includes a memory 301, a processor 302, and a computer program 303 stored in the memory 301 and operable on the processor 302, such as a text information extraction program. When the processor 302 executes the computer program 303, the steps in the above embodiment of the text information extraction method are implemented, for examplefigure 1 S101-S111 shown. Alternatively, when the computer program is executed by the processor, the functions of the modules in the above-mentioned device embodiments are realized, for example figure 2 Modules 201-210 in.

[0217] Exemplarily, the computer program 303 can be divided into one or more modules, and the one or more modules are stored in the memory 301 and executed by the processor 302 to complete the method. The one or more modules may be a series of computer program instruction segments capable...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text information extraction method and related equipment. The method comprises the steps of obtaining a first language marking corpus set, a first language unmarked corpus set, a second language marking corpus set and a second language unmarked corpus set according to a first language corpus text and a second language corpus text; cooperatively training a first language classifier and a second language classifier by utilizing each corpus set; utilizing a first language classifier to classify a first language target entity pair obtained according to the mixed statement,and classifying the second language target entity pair obtained according to the mixed statement by utilizing a second language classifier, and obtaining an entity relationship of the mixed entity pair of the mixed statement according to a classification result of the first language target entity pair and the second language entity pair. According to the method, the entity relationship can be accurately extracted from the texts using two different languages.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a text information extraction method, device, computer equipment and computer storage medium. Background technique [0002] Information extraction is a key technology in the field of natural language processing. Information extraction is to extract specific information from text to form structured data for users to query and use. [0003] Information extraction includes entity extraction and relation extraction. Entity extraction is the basis of relationship extraction, which is to identify entity information such as person names, place names, institution names, dates, and amounts from text. Relation extraction is to identify semantic relations between entities. Relation extraction is an important research topic in information extraction, and it is also a key step in building knowledge graphs. It is of great help to natural language processing tasks ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/28G06F40/211G06F40/284
CPCG06F16/288G06F16/285
Inventor 杨冬艳王智浩
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products