Supercharge Your Innovation With Domain-Expert AI Agents!

A machine learning-based similarity computation method for multi-feature text data

A similarity calculation and text data technology, which is applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc. Effect

Active Publication Date: 2019-01-04
深圳市翼海云峰科技有限公司
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 2. Text is a kind of non-quantitative information, and an effective quantitative algorithm is required to calculate the similarity between data; for a data group with N data sets, if you want to obtain the data with the highest similarity with one of the data, Then in theory, N×(N-1) calculations are required. When N is large, the efficiency of the algorithm is very low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A machine learning-based similarity computation method for multi-feature text data
  • A machine learning-based similarity computation method for multi-feature text data
  • A machine learning-based similarity computation method for multi-feature text data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054]All features disclosed in this specification, or steps in all methods or processes disclosed, may be combined in any manner, except for mutually exclusive features and / or steps.

[0055] Any feature disclosed in this specification (including any appended claims, abstract and drawings), unless expressly stated otherwise, may be replaced by alternative features which are equivalent or serve a similar purpose. That is, unless expressly stated otherwise, each feature is one example only of a series of equivalent or similar features.

[0056] Such as figure 2 As shown, each piece of data in the data group has d features, and each feature is a text type or data that can be converted into a text type. Meanwhile, the total number of pieces of data is N. Now our goal is to obtain other data with the highest similarity to any piece of data in the data group.

[0057] To solve this problem, we propose a similarity calculation method for multi-feature text data. The procedure o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for calculating similarity of multi-feature text data based on machine learning. Each feature of each data is converted into a vector array by using a text vectorization algorithm; the vector data generated by multiple features of each data is spliced and regularized, and all the corresponding vector arrays of the data are formed into a matrix; alternatively, a PCAalgorithm is used to reduce the dimension of the matrix; a series of similar data pairs are marked out by the business experts in the above data, and each data pair is composed of two similar data pairs; a vector distance mapping matrix is calculated based on the similar data pairs, and a vector distance calculation formula is obtained based on the vector distance mapping matrix; a low-precisionaggregation algorithm is used. The method uses machine learning algorithm to realize the distance calculation of multi-feature text data, and uses low-precision clustering method to reduce the computational load and improves the performance of the algorithm.

Description

technical field [0001] The invention belongs to the technical field of computer software, in particular to a method for calculating similarity of multi-feature text data based on machine learning. Background technique [0002] In the field of big data processing, one requirement is to calculate the similarity of multi-featured text data. The so-called multi-feature text data refers to such data: the data contains multiple features (also called domains), and each feature is a piece of text composed of multiple characters, or other data types that can be converted into text. [0003] Such as Figure 1 As shown, this is a set of e-commerce product data, which contains three features of "product number", "brand" and "product model", each feature is a piece of text data (or can be simply converted into text other datatypes of the data). [0004] For such a set of multi-featured text data, many data applications involve calculating the similarity between the data. For example: ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27G06K9/62
CPCG06F40/289G06F18/2135G06F18/22Y02D10/00
Inventor 陈磊
Owner 深圳市翼海云峰科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More