Content semantic mining method for non-structured big data stream

An unstructured and semantic mining technology, applied in the computer field, can solve the problems of not considering the semantics and emotional tendency of data content, difficult to process unstructured data, difficult to correctly grasp the core point of data content, etc.

Active Publication Date: 2016-07-06
ZHEJIANG WANLI UNIV
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, on the one hand, these mining methods can only deal with structured or semi-structured data, and it is difficult to deal with unstructured data such as microblogs and blogs;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content semantic mining method for non-structured big data stream
  • Content semantic mining method for non-structured big data stream
  • Content semantic mining method for non-structured big data stream

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0039] The content semantic mining method disclosed in the embodiment of the present invention uses a weighted directed graph method combined with a small-world network clustering method to realize content semantic and emotional tendency analysis and content characteristic item extraction of data streams. Taking Weibo big data flow as an example, such as figure 1 As shown, the weighted directed graph is used to describe the semantic and emotional tendency and structure of the microblog text and links, and the related theories and methods of the directed graph structure are used to mine the content semantics.

[0040] A directed graph can be represented by a tuple T=, where V represents a set of nodes, and E represents a set of edges between nodes. In this embodiment, the markedness and orderliness of the directed graph are used to describe the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a content semantic mining method for a non-structured big data stream. The method comprises the steps of S1: extracting a text link, a label attribute and a semantic tendency keyword in the big data stream, and correspondingly defining text nodes, label nodes and content nodes; S2: constructing a text node set containing the text nodes and a label node set containing the label nodes, calculating and outputting weight values from the text nodes to the label nodes and weight values from any label node to other all label nodes; S3: according to the text node set, the label node set, the weight values from the text nodes to the label nodes and the weight values from any label node to other all label nodes, performing semantic classification on the content nodes and constructing different content node classification sets; and S4: according to the text node set and the content node classification sets, performing weighted small-world network clustering calculation on the text nodes to obtain a text node cluster set.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a content semantic mining method for unstructured large data streams. Background technique [0002] With the rapid development and application of WEB2.0 technology, network information interaction in the form of blogs, microblogs, and WeChat has become an important way of information exchange. The data information generated by these communication methods includes structured data and semi-structured data. and unstructured data, most of which are dominated by unstructured data. These data have been disseminated and updated by many people, and have accumulated over time to become a heterogeneous, heterogeneous, massive big data collection with complex structure and diverse content. This kind of big data contains a variety of information, such as users' comments, attitudes, and behaviors on certain things, events, goods or services, etc. How to extract these valuable content from ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 张少中
Owner ZHEJIANG WANLI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products