Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Word tag based word labeling method and device, server and storage medium

A technology for tagging words and words, applied in the computer field, can solve the problems of tagging efficiency, low accuracy, lack of guidance in the division process, and limited classification of words to be tagged, so as to reduce manpower consumption, improve accuracy and recall, and improve efficiency effect

Active Publication Date: 2017-12-15
SHENZHEN INST OF ADVANCED TECH
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The object of the present invention is to provide a method, device, server, and storage medium for labeling words based on word tags, aiming to solve the problem of the limited classification of new words in the prior art when labeling new words, and the difficulty of dividing new words. The lack of guidance in the process leads to the problem of low efficiency and accuracy of the words to be marked

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word tag based word labeling method and device, server and storage medium
  • Word tag based word labeling method and device, server and storage medium
  • Word tag based word labeling method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] figure 1 It shows the implementation process of the word tag-based word tagging method provided by Embodiment 1 of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

[0026] In step S101, words to be tagged are searched in the input text document.

[0027] In the embodiment of the present invention, the words to be tagged are new words that need to be tagged, such as words and words similar to "嘎舞" and "freestyle" that appear on new network media such as Weibo and Facebook (Facebook) , data collection is carried out on this new network media, and text documents for input can be obtained. As an example, the original data is collected on the Weibo platform, and a part of the original data with the latest publishing time is set as a text document for input.

[0028] In the embodiment of the present invention, word segmentation processing can be performed ...

Embodiment 2

[0036] figure 2 The implementation flow of the word classifier training process in the word tag-based word tagging method provided by the second embodiment of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

[0037] In step S201, search for sample words in the pre-built training data set.

[0038] In the embodiment of the present invention, word segmentation processing can be performed on the training data set, and words whose occurrence frequency exceeds a preset frequency threshold and which do not appear in the known thesaurus are searched in the training data set after word segmentation processing, and these words are set as Sample words, that is, new words in the training data set. As an example, the original data is collected on the Weibo platform, and a part of the original data whose release time is in the middle period is set as the traini...

Embodiment 3

[0069] image 3 The structure of the word labeling device provided by the third embodiment of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:

[0070] The word search unit 31 is configured to search for words to be tagged in the input text document.

[0071] In the embodiment of the present invention, the words to be tagged are new words that need to be tagged, such as words and words similar to "嘎舞" and "freestyle" that appear on new network media such as Weibo and Facebook (Facebook) , data collection is carried out on this new network media, and text documents for input can be obtained. As an example, the original data is collected on the Weibo platform, and a part of the original data with the latest publishing time is set as a text document for input.

[0072]In the embodiment of the present invention, word segmentation processing can be performed on the text in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is suitable for the technical field of computers, and provides a word tag based word labeling method and device, a server and a storage medium. The method comprises the steps that to-be-labeled words are sought in an input text document, by means of a pre-trained word separator, known words related to the to-be-labeled words are sought in a preset known word bank, the related known words are set to be tag words of the to-be-labeled words, the to-be-labeled words are labeled through the tag words, the word classifier is obtained through training in a monitoring mode, the word classifier is trained in a monitoring mode, the known words serve as tag words, automatic to-be-labeled word labeling based on the word tags is achieved, the to-be-labeled word labeling efficiency is effectively improved, human consumption of to-be-labeled word labeling is lowered, and the accuracy and the recall rate of to-be-labeled word labeling are effectively increased.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a word tag-based word labeling method, device, server and storage medium. Background technique [0002] Today, with the development of social media, many new words are derived from new network media such as Weibo and Facebook, and these new words are increasingly used in our real life. At the beginning of the birth of new words in the new network media, it is difficult for people to obtain the annotations of these new words in a timely manner, because in dictionaries or online encyclopedias (such as Wikipedia), the entries of these new words have not yet been created, and each Entries of new words need to do a lot of tedious work. [0003] At present, most of the research on word tagging focuses on Part of speech tagging (POS), that is, preset several categories (such as people, places, organization names, etc.), and then divide the target words into one or several...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/332G06F16/35G06F16/374
Inventor 梁予之曲强
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products