Text-oriented relative position information extraction method

A relative position and information extraction technology, applied in neural learning methods, digital data information retrieval, instruments, etc., can solve problems such as difficult identification, ignoring identification and conversion, and inaccurate identification range, so as to avoid manual feature extraction and reduce Demand, the effect of improving efficiency

Active Publication Date: 2021-08-20
WUHAN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, relative positions can be expressed in various forms, which are very close to everyday language, and it is difficult to identify them through fixed grammatical rules. We need to explore new methods to extract them.
Existing related inventions often only focus on the extraction method of geographically named entities, ignoring the identification and transformation of the relative positional relationship between entities, and lack of relevant full-location information corpus
At the same time, the existing recognition methods still have many deficiencies, such as low recall rate for complex place name recognition, inaccurate recognition range, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-oriented relative position information extraction method
  • Text-oriented relative position information extraction method
  • Text-oriented relative position information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The process of the present invention is as figure 1 As shown, the specific steps are as follows:

[0029] Step 1: Text preprocessing, word segmentation and labeling.

[0030] Since the present invention is limited to Simplified Chinese, some texts are mixed with Hanyu Pinyin, English, Traditional Chinese or other languages. Therefore, in order to prevent the word segmentation and labeling of the text from affecting the results, I choose to delete it.

[0031] Use the NLPIR word segmentation tool to perform word segmentation and part-of-speech tagging on the text content after preprocessing the text data.

[0032] Step 2: Use the BiLSTM+CRF+spatial semantic feature template to extract the position and relative position indication information of the place name entity from the word segmentation and tagged text content.

[0033] The present invention adopts the method of combining the model and the rules, and first utilizes the BiLSTM+CRF model to carry out large-scale t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The position information not only comprises geographically named entities, but also is an important part of relative position information between the entities. However, the expression forms of the relative positions are diversified, are very close to daily languages, are difficult to identify through fixed grammar rules, and need to explore a new method for extraction. The invention provides a text-oriented relative position information extraction method. According to the method, a BiLSTM (bidirectional LSTM) + CRF (conditional random field) model and a spatial semantic feature template are utilized to train data in a text, place name entity information and relative position information in the text are extracted, a semantic structure in a Chinese text and an obtained external feature library are combined, a position indicating information dictionary is constructed, a recognition rule of the relative position information is formulated. And the precision of relative position information extraction is further optimized.

Description

technical field [0001] The invention relates to a text-oriented location information extraction method, in particular to a text-oriented relative location information extraction method. Background technique [0002] Location information not only includes geographic named entities, but relative location information between entities is also an important part. However, unlike the spatial position expressed by coordinates and other means, the spatial relationship in text is often ambiguous. Most of the descriptions are qualitative expressions, such as common words such as "surrounding" and "next to", from which precise location descriptions cannot be obtained. Geographically named entities are often fixed and represent a definite location. Therefore, the combination of geographic named entities and relative location information between entities is an ideal method for extracting location information in text. However, relative positions can be expressed in various forms, which ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/387G06F40/242G06F40/295G06F40/30G06N3/04G06N3/08
CPCG06F16/387G06F40/30G06F40/295G06F40/242G06N3/08G06N3/044
Inventor 李霖罗振威朱海红沈航金榜李昭熹
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products