Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for determining data similarity, electronic equipment and storage medium

A technology for determining data and similarity. It is used in electrical digital data processing, digital data information retrieval, special data processing applications, etc. It can solve the problems of complex processing process, poor accuracy of data similarity recognition results, and low efficiency.

Pending Publication Date: 2022-07-29
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Data similarity plays a very important role in recommender systems, user behavior recognition, natural language processing and other fields. It is difficult to use the same standard to quantify the similarity of data sequences of different dimensions. In related technologies, for different dimensions Large and small data sequences need to process data sequences of different dimensions into sequences of uniform size through methods such as dimensionality reduction, cutting, or data completion. This data processing method will lose part of the information of the original data sequence, and due to the need to Dimensionality reduction and other preprocessing will make the processing process more complex and inefficient, and ultimately lead to poor accuracy of the identification results of data similarity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for determining data similarity, electronic equipment and storage medium
  • Method and device for determining data similarity, electronic equipment and storage medium
  • Method and device for determining data similarity, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

[0029] In order to facilitate those skilled in the art to better understand the relevant embodiments of the present application, technical terms or some terms that may be involved in the relevant embodiments of the present application are now explained as follows:

[0030] Cosine similarity, also known as cosine distance, uses the cosine value of the angle between t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for determining data similarity, electronic equipment and a storage medium, and relates to the field of data processing, in particular to the field of data similarity identification. According to the specific implementation scheme, a target data set is obtained, and at least two target data sequences are selected from the target data set; determining a target union set of each first element in the at least two target data sequences; probability matrixes corresponding to the at least two target data sequences are constructed according to the target union set, the sum of transition probabilities of all rows in the probability matrixes is 0 or 1, and elements in the probability matrixes are used for indicating the transition probabilities between the first elements; and determining the similarity between the at least two target data sequences according to the probability matrix.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, and in particular, to the field of identification of data similarity. Background technique [0002] Data similarity plays a very important role in recommendation systems, user behavior recognition, natural language processing and other fields. It is difficult to use the same standard to quantify the similarity of data sequences of different dimensions. In related technologies, for different dimensions Data sequences of different sizes need to be processed into sequences of uniform size by means of dimensionality reduction, cropping, or data complementing. Preprocessing such as dimensionality reduction will make the processing process more complex and inefficient, and ultimately lead to poor accuracy of data similarity recognition results. SUMMARY OF THE INVENTION [0003] The present disclosure provides a method, apparatus, electronic device, and storage medium for determining...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458G06K9/62
CPCG06F16/2462G06F16/2474G06F18/22
Inventor 高建虎
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More