Text multi-feature ambiguity resolution method and system

An ambiguity resolution, multi-feature technology, applied in the field of text multi-feature ambiguity resolution methods and systems, can solve problems such as ambiguity, and achieve accurate text classification and recognition.

Pending Publication Date: 2021-09-03
SHANDONG NORMAL UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (2) Ambiguity
Word segmentation and ambiguity resolution of TCM medical record texts belong to the field of Chinese word segmentation and ambiguity resolution. However, due to the use of a large number of proper nouns, personal idioms, and the mixture of modern and ancient terms in TCM medical record texts, the existing ambiguity resolution algorithms are not very good. Applied to TCM medical record text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text multi-feature ambiguity resolution method and system
  • Text multi-feature ambiguity resolution method and system
  • Text multi-feature ambiguity resolution method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] In this embodiment 1, a text multi-feature ambiguity resolution system is provided, the system includes:

[0039] A dissolving module, the dissolving module is configured to: use the trained dissolving model to identify and extract the combined ambiguity field in the text to be disintegrated, and perform the extraction on the extracted text according to the context relevance and part-of-speech features of the words in the text Segmentation to obtain the text after ambiguity resolution; wherein, the trained resolution model is obtained by training a training set, and the training set includes a feature vector composed of text weight features, context-related features and part-of-speech features of the text where the ambiguous field is located .

[0040] In this embodiment 1, the above-mentioned system is used to implement the text multi-feature ambiguity resolution method, which includes: inputting the text to be digested into the trained resolution model, and identifyin...

Embodiment 2

[0049] In this embodiment 2, combined with the characteristics of concise, fuzzy and unstructured language of TCM medical record texts, a method that combines multiple features is proposed to improve the existing ambiguity resolution algorithm and design a method suitable for TCM medical records. The multi-feature ambiguity resolution model of case text is applied to TCM medical case text for text disambiguation recognition, which further improves the accuracy of word segmentation of TCM medical case text.

[0050] In this embodiment 2, the multi-feature ambiguity resolution method for the TCM proposal text includes: inputting the TCM medical case text to be resolved into the trained resolution model, identifying and extracting the combined ambiguity field in the text, according to The contextual relevance and part-of-speech features of the words in the text are used to segment the extracted text to obtain the disambiguated text; wherein, the trained resolution model is obtaine...

Embodiment 3

[0148] Present embodiment 3 is aimed at the multi-feature ambiguity resolution of Chinese medical record text, provides a kind of multi-feature ambiguity resolution method of Chinese medical record text, and builds multi-feature ambiguity resolution model, is divided into four stages: (1) for combination type The problem of ambiguity resolution is to build a combined ambiguous thesaurus to identify and extract ambiguous fields. (2) Select an appropriate context window; (3) Generate a feature vector by extracting the weight feature, context word feature and part-of-speech feature of the text. (4) Input the feature vector into the nonlinear SVM, and train the "combine" classifier and "separate" classifier through the training set text to realize the resolution of combined ambiguity in the text of TCM medical records and improve the accuracy of word segmentation in TCM medical case texts Rate.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text multi-feature ambiguity resolution method and system, and belongs to the technical field of text recognition processing, and the method comprises the steps: inputting a to-be-resolved text into a trained resolution model, and carrying out the recognition and extraction of a combined ambiguity field in the text, and segmenting the extracted text according to context relevance and part-of-speech features of words in the text to obtain a text after ambiguity resolution, wherein the trained resolution model is obtained through training of a training set, and the training set comprises feature vectors composed of text weight features, context association features and part-of-speech features of the text where the ambiguous fields are located. The weight feature, the context word feature and the part-of-speech feature in the text are combined to generate the feature vector, a linear kernel function is adopted for classification by using a nonlinear SVM model, and finally a correct combined ambiguous segmentation mode is obtained, so that the classification and recognition of the text with multi-feature ambiguous fields are more accurate.

Description

technical field [0001] The invention relates to the technical field of text recognition and processing, in particular to a text multi-feature ambiguity resolution method and system. Background technique [0002] Aiming at the disambiguation work in text processing and recognition, regarding word sense disambiguation of English text, for example, Andreim et al. obtained a new unsupervised global word sense disambiguation method by improving Shotgun, and Calvo et al. explored a method based on multi-layer Perceptron, which integrates deep neural network LSTM and GRU to solve the generalization problem of word sense disambiguation (WSD, word sense disambiguation); Abuakgaija et al. proposed an improved word disambiguation method that combines genetic simulated annealing and ant colony algorithms; Aiming at the special characteristics of Korean, Nguyen et al. manually built a formal semantic network for developing Korean word sense disambiguation system: UTagger. Simov et al. p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06F40/216G06N20/10
CPCG06F40/289G06F40/30G06F40/216G06N20/10
Inventor 袁锋段成志张宇昂刘悦徐传杰于凤洋
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products