A method and device for semi-supervised field word mining and classification

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A domain word and semi-supervised technology, applied in text database clustering/classification, character and pattern recognition, text database query, etc., can solve problems such as poor effect and difficulty in obtaining labeled corpus

Active Publication Date: 2020-04-10

广东惠禾科技发展有限公司

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, supervision requires a large amount of labeled corpus, and labeled corpus is actually difficult to obtain, so the actual use effect is not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0080] Embodiment 1 of the present invention discloses a method of semi-supervised field word mining and classification, such as figure 1 shown, including the following steps:

[0081] Step 101, perform word segmentation and syntactic analysis on the text data in the field to be processed, and obtain the word vector matrix of all words in the text data based on the result of the word segmentation;

[0082] Specifically, in the field of medicine, for example, text data can be obtained from medical websites through web crawlers, etc. Text data in other fields is similar, as long as the corresponding text data can be obtained, it is not limited to specific methods.

[0083] After obtaining the text data, word segmentation and syntactic analysis will be performed;

[0084] As for the "obtaining the word vector matrix of all words in the text data based on the result of the word segmentation" in the above steps includes:

[0085] Obtaining the result of word segmentation of the t...

Embodiment 2

[0115] Embodiment 2 of the present invention discloses a semi-supervised field word mining and classification equipment, such as figure 2 shown, including:

[0116] An acquisition module 201, configured to perform word segmentation and syntactic analysis on the text data in the field to be processed, and obtain word vector matrices of all words in the text data based on the result of the word segmentation;

[0117] The construction module 202 is used to start with a certain number of seed words artificially constructed in the text data, expand the seed words based on the part-of-speech and syntactic composition mode of the seed words in the text data, and use word frequency, part-of-speech , word vectors to filter the seed words to obtain the seed vocabulary;

[0118] Generating module 203, for described seed vocabulary, utilize word vector, knowledge base, statistical feature etc. to determine the general similarity of any two words, and generate word similarity matrix with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Embodiments of the invention disclose a semi-supervised field word mining and classifying method and equipment. The method comprises the following steps of: preprocessing a field-related corpus and constructing a seed word list and a word similarity matrix; mining candidate field words and determining similarity distribution of the candidate field words; and carrying out category classification onthe screened field words. According to the method and the equipment, a semi-supervised manner is adopted, so that the field word mining and classification can be completed on the basis of common field texts and a small amount of seed word tables without a large amount of tagged data.

Description

technical field [0001] The invention relates to the field of domain word mining and classification, in particular to a method and equipment for semi-supervised domain word mining and classification. Background technique [0002] Domain words are the characteristics that can best represent the characteristics of the domain and distinguish other domains, and domain words can be divided into different category labels according to different roles in the domain. Domain words and their categories constitute the basic vocabulary data of the domain; therefore, domain words The mining and classification of Chinese information processing is an important basic work in Chinese information processing. In many Chinese information processing projects (such as automatic question answering, automatic summarization, automatic classification, search engines, etc.), domain word mining and classification problems will be involved. [0003] At present, the mining and classification algorithms of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/33G06F16/35G06K9/62

Inventor 高登科姚佳

Owner 广东惠禾科技发展有限公司

A method and device for semi-supervised field word mining and classification

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology