Text file word sense disambiguation method and device

A text file and word sense disambiguation technology, applied in the information field, can solve problems such as difficult and large-scale word sense disambiguation tasks, and achieve the effect of improving accuracy and efficiency

Active Publication Date: 2016-07-13
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Supervised word sense disambiguation technology needs to annotate the corpus according to the meaning of the word, and use machine learning technology to train the classifier to determine the meaning of the new instance. However, this method relies on labeled data, and the acquisition of labeled data consumes a lot of labor costs, making this method difficult Applied to large-scale word sense disambiguation tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text file word sense disambiguation method and device
  • Text file word sense disambiguation method and device
  • Text file word sense disambiguation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The technical solutions in this application will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described examples are part of the examples in this application, not all of them. Based on the examples in this application, all other examples obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0017] figure 1 It is a schematic diagram of the implementation environment involved in this application. see figure 1 , in a system that needs to disambiguate text files, a word sense disambiguation device 100 will be set, and the word sense disambiguation device 100 can obtain the text file to be disambiguated from an external data source 200 or locally, and then, according to a predetermined The established algorithm disambiguates the text content in the text file, and determines the semantic category corresponding to the text file. Whe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word sense disambiguation method. The text file word sense disambiguation method comprises the steps that multiple reference text contents with determined word senses are configured; at least one text file to be disambiguated is obtained; the text contents are extracted from the text file according to each text file to be disambiguated and are subjected to word segmentation processing so as to obtain a first word set and determine words to be disambiguated in the first word set, at least one reference text contents corresponding to the words to be disambiguated are extracted and are subjected to word segmentation processing so as to obtain at least one second word set, correlation values between the text files and the reference text contents are calculated based on the first word set and the second word sets, and it is determined that the text files are correlated with the reference text contents having the highest correlation values; the text files to be disambiguated are put in a word sense category corresponding to the correlated reference text contents. The invention further discloses a corresponding device. The method and the device can improve the disambiguation efficiency.

Description

technical field [0001] The present application relates to the field of information technology (IT, Information Technology), in particular to a method and device for word sense disambiguation of text files. Background technique [0002] Word Sense Disambiguation (Word Sense Disambiguation, WSD) is an important research topic in the field of computational linguistics and natural language processing. The accuracy of the disparity results will directly affect the processing results of these technologies. [0003] Word sense disambiguation techniques can be classified into supervised and unsupervised. Supervised word sense disambiguation technology needs to annotate the corpus according to the meaning of the word, and use machine learning technology to train the classifier to determine the meaning of the new instance. However, this method relies on labeled data, and the acquisition of labeled data consumes a lot of labor costs, making this method difficult Applied to large-scal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/951G06F40/205
Inventor 蔡淇森
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products