Automatic annotation method and system for innovative and creative label on the basis of big data

An automatic labeling and big data technology, applied in the direction of network data indexing, network data retrieval, and other database retrieval, etc., can solve the problems of manual labeling datasets such as time-consuming, laborious, subjectivity, cold start, etc.

Active Publication Date: 2017-08-01
SHANDONG UNIV
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, user-based social labeling has the problem of cold start in the early stage of system service because there is no past data for reference; multi-label classification and labeling methods are mostly based on supervised learning algorithms, which require a large number of manually labeled data sets as Training sets and manual labeling data sets are not only time-consuming and laborious, but also have a lot of subjectivity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic annotation method and system for innovative and creative label on the basis of big data
  • Automatic annotation method and system for innovative and creative label on the basis of big data
  • Automatic annotation method and system for innovative and creative label on the basis of big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0079] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0080] The present invention comprehensively uses an improved text labeling algorithm based on TextRank, Word2ve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic annotation method and system for an innovative and creative label on the basis of big data. The method comprises the following steps that: using a Sogou corpus to train Word2vector and LDA (Latent Dirichlet Allocation) to obtain a training result set; carrying out word segmentation on the document data of a user browsing page, removing a stop word, and carrying out word filtering processing; for preprocessed document data, combining an improved TextRank algorithm with the Word2vector to calculate a label from the text data; in addition, carrying out calculation on the preprocessed document through the LDA to obtain the label related to a document data theme; and through a label cloud generation way, realizing visualization, and annotating all text label words in the document data so as to bring convenience for users to read and find key content parts.

Description

technical field [0001] The invention relates to a method and system for automatically marking innovative and creative labels based on big data. Background technique [0002] With the rapid development and popularization of the Internet, the explosive growth of information has resulted in the accumulation of a large amount of information on the Internet. At the same time, Internet users are not only browsers of Internet content, but also create various information on the Internet, which leads to the diversification of Internet information forms, which makes information screening very difficult. Information with text as the carrier of Internet information accounts for a large proportion. The increase in the amount of information and the confusion of the structure make people have more references in the process of searching for information, and the coverage of information is more comprehensive. All aspects of life have greatly facilitated people's lives, but a large amount of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355G06F16/951
Inventor 鹿旭东张盘龙陈志勇郭伟崔立真
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products