Spark-Streaming text similarity analysis-based data processing method and device

A text similarity and similarity technology, applied in the field of information processing, to achieve the effect of accurate text similarity

Inactive Publication Date: 2018-05-08
陕西识代运筹信息科技股份有限公司
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present invention provides a data processing method and device for text similarity analysis based on Spark-Streaming, which solves the technical problem that cannot realize fast and accurate sentiment analysis on real-time network data stream in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark-Streaming text similarity analysis-based data processing method and device
  • Spark-Streaming text similarity analysis-based data processing method and device
  • Spark-Streaming text similarity analysis-based data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] figure 1 It is a schematic flowchart of a data processing method based on Spark-Streaming text similarity analysis in an embodiment of the present invention. Such as figure 1 As shown, the method includes:

[0037] Step 110: dynamically obtain the real-time text database according to Spark-Streaming;

[0038] Step 120: Obtain first text information according to the real-time text database, and the first text information includes first text length information, first text word order information, first text keyword information, and first text grammar information;

[0039] Step 130: Obtain second text information according to the real-time text database, and the second text information includes second text length information, second text word order information, second text keyword information, and second text grammar information;

[0040] Step 140: Obtain text length similarity information according to the first text length information and the second text length informat...

Embodiment 2

[0083] Based on the same inventive concept as the data processing method of a text similarity analysis based on Spark-Streaming in the foregoing embodiment, the present invention also provides a data processing device based on a Spark-Streaming text similarity analysis, such as figure 2 shown, including:

[0084] The first obtaining unit 11, the first obtaining unit 11 is used to dynamically obtain the real-time text database according to Spark-Streaming;

[0085] The second obtaining unit 12, the second obtaining unit 12 is used to obtain the first text information according to the real-time text database, the first text information includes the first text length information, the first text word order information, the first text keyword information, first text grammar information;

[0086] The third obtaining unit 13, the third obtaining unit 13 is used to obtain the second text information according to the real-time text database, the second text information includes the s...

Embodiment 3

[0124] Based on the same inventive concept as the authentication method of a network authority in the foregoing embodiment, the present invention also provides a data processing device based on Spark-Streaming text similarity analysis, on which a computer program is stored, and the program is processed by a processor. During execution, the steps of any one of the above-mentioned network authority authentication methods are realized.

[0125] Among them, in image 3In, bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 will include one or more processors represented by processor 302 and various types of memory represented by memory 304 circuits linked together. The bus 300 may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, etc., which are well known in the art and thus will not be further described herein. The bus interface 306 provides an in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Spark-Streaming text similarity analysis-based data processing method and device, and relates to the technical field of computers. The method comprises the following steps of:dynamically obtaining a real-time text database; obtaining first text information and second text information according to the real-time text database; obtaining text length similarity information; obtaining text word sequence similarity information; obtaining text keyword similarity information; obtaining text grammar similarity information; and determining a statement similarity between the first text information and the second text information according to the text length similarity information, the text word sequence similarity information, the text keyword similarity information and thetext grammar similarity information. According to the method and device, the technical problem that rapid and correct emotion analysis cannot be carried out real-time network data in the prior art issolved, and the technical effect of carrying out multi-dimensional, real-time and correct text similarity analysis on mass texts is realized.

Description

technical field [0001] The present invention relates to the technical field of information processing, in particular to a data processing method and device for text similarity analysis based on Spark-Streaming. Background technique [0002] The data in the computing platform commonly used in the prior art has the characteristics of mass, real-time and dynamic change, so the processing task size of the data platform also has the characteristics of dynamic change, and the query of data flow calculation in the enterprise is also dynamic change . [0003] However, in the process of realizing the technical solution of the invention in the embodiment of the present application, the inventor of the present application found that the above-mentioned technology has at least the following technical problems: [0004] The existing technology cannot realize the technical problem of performing fast, accurate and comprehensive similarity analysis on real-time network data streams. Cont...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/22
CPCG06F40/194G06F40/284G06F40/289G06F40/30
Inventor 李哲君卫华飞刘欢程瑞辉
Owner 陕西识代运筹信息科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products