Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and device for adding tags to short texts automatically

A text label, automatic adding technology, applied in the Internet field, can solve the problems of low accuracy, inappropriate keywords, increase user operations, etc., to achieve the effect of improving accuracy

Active Publication Date: 2013-10-30
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the method of adding tags to articles in the prior art, extracting keywords as tags is not suitable for short text operations, and keywords may not be suitable as tags, and the accuracy is low
In addition, you need to manually add tags to the article to increase user operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for adding tags to short texts automatically
  • Method and device for adding tags to short texts automatically
  • Method and device for adding tags to short texts automatically

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] An embodiment of the present invention provides a method for automatically adding tags to short texts, such as figure 1 As shown, the method includes:

[0025] Step 101, counting the reciprocal document frequency of each tag word in the tag word set;

[0026] Optionally, a set of tagged words and corpus associated with the set of tagged words are preset; generally, large-scale language instances may not be observed in statistical natural language processing. So, people simply use the text as a substitute, and take the context in the text as the context of language in the real world. A text collection can be called a corpus (Corpus). Optionally, relevant texts are collected from the Internet, for example, the question-and-answer content in Tencent's "Wanwen" product can be used as the corpus.

[0027] Segmenting the corpus; segmenting a sentence into individual words, for example, segmenting the sentence "This is a method for automatically adding tags to short texts",...

Embodiment 2

[0057] An embodiment of the present invention provides a method for automatically adding tags to short texts, such as image 3 As shown, the method includes:

[0058] Step 301, preset the set of tagged words and the corpus associated with the set of tagged words;

[0059] Optionally, according to requirements, obtain a set of tag words. For example, if you want to add tags to film and television content, you need to collect a set of commonly used tags for film and television, including film and television genres, stars, and so on.

[0060] In general, large-scale language instances may not be observed in statistical natural language processing. So, people simply use the text as a substitute, and take the context in the text as the context of language in the real world. A text collection can be called a corpus (Corpus). Optionally, relevant texts are collected from the Internet, for example, the question-and-answer content in Tencent's "Wanwen" product can be used as the cor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for adding tags to short texts automatically and relates to the technical field of the Internet. By the aid of the method and the device, the tags can be added to the short texts automatically, and the accuracy of the added tags is improved. According to the method and the device, through statistics of inverse document frequencies of various tag words in tag word sets, the short texts are extended into long texts, word frequencies of appearance of the various tag words of the tag word sets in the long texts are determined, and according to the inverse document frequencies and the word frequencies, the text tags of the short texts are determined. The method and the device are applicable to adding of the tags to the short texts.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and a device for automatically adding labels to short texts. Background technique [0002] Tags are a way of organizing Internet content and are highly relevant keywords. Tags help people easily describe content or classify content for easy retrieval and sharing. Currently, there are three ways to add tags to articles: method 1, manual tags, professionals manually assign specific tags to articles; method 2, social tags, users add custom tags for their own articles or pictures; method 3 , keyword tags, analyze the content of longer articles, and automatically extract important keywords as tags. [0003] However, the method of adding tags to articles in the prior art, extracting keywords as tags is not suitable for short text operations, and keywords may not be suitable as tags, and the accuracy is low. In addition, it is necessary to manually add tags to articles t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 贺翔路彦雄焦峰
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products