Chinese word sense disambiguation method based on tree feature selection and transfer learning

A technology for feature selection and word meaning disambiguation, applied in semantic analysis, character and pattern recognition, natural language data processing, etc., can solve the problems of too little labeled corpus and low quality of disambiguation features, and achieve good disambiguation effect

Active Publication Date: 2022-07-01
HARBIN UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to solve the problems of too little labeled corpus and low quality of disambiguation features encountered in the process of word sense disambiguation, the present invention proposes a Chinese word sense disambiguation method based on tree feature selection and transfer learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word sense disambiguation method based on tree feature selection and transfer learning
  • Chinese word sense disambiguation method based on tree feature selection and transfer learning
  • Chinese word sense disambiguation method based on tree feature selection and transfer learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to clearly and completely describe the technical solutions in the embodiments of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings in the embodiments.

[0054] Disambiguate the ambiguous word "surface" in the Chinese sentence "Proposal on vigorously promoting new surface engineering technology in the industrial field".

[0055] The flowchart of the Chinese word sense disambiguation method based on tree feature selection and transfer learning according to the embodiment of the present invention is as follows: figure 1 shown, including the following steps.

[0056] Step 1 The extraction process of disambiguation features is as follows:

[0057] For the Chinese sentence "Proposal to vigorously promote new surface engineering technology in the industrial field.", the feature extraction steps are as follows:

[0058] Step 1-1 Use the Chinese word segmentation tool to segment Chinese s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese word sense disambiguation method based on tree feature selection and transfer learning. The present invention firstly processes Chinese corpus, and performs word segmentation, part-of-speech tagging, translation tagging and semantic tagging on Chinese sentences containing ambiguous words to obtain processed training corpus, test corpus and auxiliary training corpus. Extract features from the obtained training corpus, test corpus and auxiliary training corpus according to the feature selection method of the tree model to obtain a training data set, a test data set and an auxiliary training set. Based on the training data set and auxiliary training set, the improved Tradaboost algorithm is used to optimize the word sense disambiguation model. Disambiguate the test dataset using the optimized disambiguation model. The present invention achieves better disambiguation effect in word sense disambiguation.

Description

Technical field: [0001] The invention relates to a Chinese word sense disambiguation method based on tree feature selection and transfer learning, and the method has good application in natural language processing. Background technique: [0002] In the field of natural language processing, word sense disambiguation plays a very important role. The purpose of word sense disambiguation is to determine the semantics of ambiguous words in a specific context. Word sense disambiguation has important applications in machine translation, speech recognition, information retrieval, and text classification. The performance of these application systems is closely related to word sense disambiguation. [0003] The low quality of disambiguation features and the lack of labeled corpus have a great impact on the accuracy of word sense disambiguation. After preprocessing the corpus, a subset of features with higher quality is selected from the disambiguation features as the input of the w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/30G06K9/62
CPCG06F18/24155
Inventor 张春祥熊经钊高雪瑶赵凌云
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products