Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A multi-dimensional information combination method, device, equipment and storage medium

A multi-dimensional and collective technology, applied in the field of text analysis, can solve problems such as unavailability of massive data, pressure on users to check in time, waste of manpower, etc., and achieve the effect of reducing re-reporting, improving the effect of merging, and reducing the complexity of processing

Active Publication Date: 2021-02-26
北京智慧星光信息技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Internet public opinion information has the characteristics of massive data and high repetition, and the amount of information is expanding at an unprecedented speed. Therefore, the development of public opinion will explode and disappear in a short period of time. Timely viewing of users causes pressure and wastes manpower
[0003] At present, the commonly used text deduplication method simhash is to convert all the texts into binary and then perform pairwise comparison. When there is a large amount of data, the calculation increment will be exponential, and it will be unusable in the massive data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-dimensional information combination method, device, equipment and storage medium
  • A multi-dimensional information combination method, device, equipment and storage medium
  • A multi-dimensional information combination method, device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Embodiments of the present invention will be described below with reference to the drawings. Those skilled in the art would recognize that the described embodiments can be modified in various ways or combinations thereof without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims. Also, in this specification, the drawings are not drawn to scale, and like reference numerals denote like parts.

[0038] Methods for combining multidimensional information include:

[0039] S1, extract the characteristic keywords of each text through TF-IDF (a commonly used weighting technique for information retrieval and data mining).

[0040] The characteristic subject words refer to words that can indicate the main meaning of the text. For example the following text:

[0041] 1. Types of metal materials

[0042] Metal materials specifically refer to shiny, conducti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A multi-dimensional information merging method, device, device, and storage medium, the method comprising: extracting feature keywords of a text; performing hash calculations on the feature keywords respectively, so as to obtain a set containing multiple hash values; Take the sum of all hash values ​​as the main fingerprint; combine each hash value in the set and take the sum as the secondary fingerprint set; merge the secondary fingerprint set and the primary fingerprint into a union, use the union as the Key, and the primary fingerprint as the Value constitutes a key-value pair as the fingerprint feature sub-base; match the fingerprint feature sub-base to the fingerprint feature general library, if the match is successful, the fingerprint feature sub-base will be discarded, if the match fails, the hash value of the feature subject word will be Combine the sum as the Value of the fingerprint feature sub-library, and add the text fingerprint feature sub-library to the fingerprint feature total library. The invention solves the problems of slow calculation speed and exponential growth of simhash. It can provide high-quality deduplication data and reduce the phenomenon of re-reporting and omission of push data.

Description

technical field [0001] The invention relates to the similarity merging of massive texts in the field of text analysis. The network public opinion information is similarly merged to achieve the effect of deduplication. Specifically, it relates to a method, device, equipment and storage medium for combining multi-dimensional information. Background technique [0002] With the popularization of the Internet, the Internet has gradually become the main carrier for people to publish, obtain and transmit information. With the advent of the self-media era, every individual can become a reporter and a messenger. Public opinion data is crucial to timely understanding of public opinion, reflecting social information, controlling and guiding the correct development of public opinion, social stability and national development. The analysis of public opinion is helpful to understand the development trend of the event, avoid the vicious spread of the event, and provide important decision...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F16/33G06F40/216G06F40/258
CPCG06F16/325G06F16/334G06F40/216G06F40/258
Inventor 赵自波李青龙骆飞赵冲
Owner 北京智慧星光信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products