Multi-dimensional information merging method and device, equipment and storage medium

A multi-dimensional, integrated technology, applied in the field of text analysis, can solve problems such as waste of manpower, unusable massive data, and pressure on users to check in time, so as to reduce the complexity of processing, reduce re-reporting, and improve the effect of merging

Active Publication Date: 2020-12-18
北京智慧星光信息技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Internet public opinion information has the characteristics of massive data and high repetition, and the amount of information is expanding at an unprecedented speed. Therefore, the development of public opinion will explode and disappear in a short period of time. Timely viewing of users causes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-dimensional information merging method and device, equipment and storage medium
  • Multi-dimensional information merging method and device, equipment and storage medium
  • Multi-dimensional information merging method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Embodiments of the present invention will be described below with reference to the drawings. Those skilled in the art would recognize that the described embodiments can be modified in various ways or combinations thereof without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims. Also, in this specification, the drawings are not drawn to scale, and like reference numerals denote like parts.

[0038] Methods for combining multidimensional information include:

[0039] S1, extract the characteristic keywords of each text through TF-IDF (a commonly used weighting technique for information retrieval and data mining).

[0040] The characteristic subject words refer to words that can indicate the main meaning of the text. For example the following text:

[0041] 1. Types of metal materials

[0042] Metal materials specifically refer to shiny, conducti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-dimensional information merging method, device and equipment and a storage medium. The method comprises the steps: extracting feature subject terms of a text; respectively performing hash calculation on the feature subject term to obtain a set containing a plurality of hash values, and taking the sum of all hash values in the set as a main fingerprint; performing combined summation on the hash values in the set to serve as a slave fingerprint set; merging the slave fingerprint set and the master fingerprint into a union set, taking the union set as a Key, and taking the master fingerprint as a Value to form a key value pair as a fingerprint feature sub-library; matching the fingerprint feature sub-libraries in the fingerprint feature total library; if matching succeeds, abandoning the fingerprint feature sub-libraries, and if matching fails, combing and summing the hash values of the feature subject terms to serve as Value of the fingerprint feature sub-libraries, and adding the fingerprint feature sub-libraries of the text into the fingerprint feature total library. According to the method, the problems of low simhash calculation speed and exponential growth are solved. High-quality duplicate removal data can be provided, and the phenomena of repeated reporting and missing reporting of pushed data are reduced.

Description

technical field [0001] The invention relates to the similarity merging of massive texts in the field of text analysis. The network public opinion information is similarly merged to achieve the effect of deduplication. Specifically, it relates to a method, device, equipment and storage medium for combining multi-dimensional information. Background technique [0002] With the popularization of the Internet, the Internet has gradually become the main carrier for people to publish, obtain and transmit information. With the advent of the self-media era, every individual can become a reporter and a messenger. Public opinion data is crucial to timely understanding of public opinion, reflecting social information, controlling and guiding the correct development of public opinion, social stability and national development. The analysis of public opinion is helpful to understand the development trend of the event, avoid the vicious spread of the event, and provide important decision...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F16/33G06F40/216G06F40/258
CPCG06F16/325G06F16/334G06F40/216G06F40/258
Inventor 赵自波李青龙骆飞赵冲
Owner 北京智慧星光信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products