Segmented semantic annotation method in weak annotation environment

A semantic labeling and segmentation technology, applied in semantic analysis, semantic tool creation, natural language data processing, etc., can solve the problem that multi-disciplinary weakly labeled resource text cannot be accurately processed and analyzed, and achieve the effect of good labeling effect.

Pending Publication Date: 2020-03-17
HARBIN ENG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to provide a segmented semantics in a weak tagging environment that can solve the problem that multidisciplinary weakly tagged resource texts cannot be accurately processed and analyzed, help users narrow the search scope, quickly find search results, and improve search accuracy Labeling method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Segmented semantic annotation method in weak annotation environment
  • Segmented semantic annotation method in weak annotation environment
  • Segmented semantic annotation method in weak annotation environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention comprises the following steps in the realization process:

[0031] (1) Use data mining technology to automatically build a four-layer domain ontology related to the text topic "category-entity-relationship-extension" based on weak tags such as article titles;

[0032] (2) Use word segmentation technology to bring the article into the constructed four-layer domain ontology by paragraph, and realize the preliminary semantic annotation of the weakly marked text;

[0033] (3) Mix the semantic annotation information of each natural paragraph with a certain number of content words before and after the natural paragraph, use the skip-gram model to generate word vectors, and use the convolutional neural network with the attention mechanism for training to realize the recognition of adjacent paragraphs Judgment whether they belong to the same sentence group;

[0034] (4) Use the bag-of-words model to verify the accuracy of the generated sentence groups and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a segmented semantic annotation method in a weak annotation environment. The method comprises the following steps: (1) inputting a to-be-labeled text, and automatically constructing a category-entity-relationship-extension four-layer domain ontology related to a text topic by using a data mining algorithm; (2) carrying out semantic annotation on the weak annotation text by using the constructed four-layer domain ontology; (3) performing paragraph sentence group division on the marked paragraph text by adopting a convolutional neural network added with an attention mechanism; and (4) comprehensively using the bag-of-words model to verify the sentence group division accuracy and screening the annotation information of the newly generated sentence group. The method hasgood labeling and sentence group division effects on texts in different fields of history, literature, entertainment, computers and the like, solves the problem that multidisciplinary weak labeling resource texts cannot be accurately processed and analyzed, and can help a user to narrow a retrieval range, quickly find a search result and improve search accuracy.

Description

technical field [0001] The invention relates to a natural language processing (NLP) method, in particular to a method for dividing sentence groups based on text annotation. Background technique [0002] With the leapfrog progress of the Chinese Internet world, a large amount of text data has been accumulated on major Chinese Internet platforms. Analyzing and classifying these text data will help the platform to build a clear user portrait, which is of great significance to the future development and positioning of the platform. Significance. [0003] But in real life, most data labels are weak labels (Weak Label), that is, inaccurate and incomplete labels. Weakly labeled samples may contain only a fraction of their corresponding labels, or may not have any labels at all. However, the existing popular methods of processing data are all based on multi-label data. Common multi-label classification algorithms such as label Powerset (Label Powerset, LP), binary association (Bi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/36G06F40/30G06F40/289G06F16/35
CPCG06F16/367G06F16/35
Inventor 张健沛安立桐杨静王勇
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products