A machine learning-based similarity computation method for multi-feature text data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A similarity calculation and text data technology, which is applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc. Effect

Active Publication Date: 2019-01-04

深圳市翼海云峰科技有限公司

View PDF5 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] 2. Text is a kind of non-quantitative information, and an effective quantitative algorithm is required to calculate the similarity between data; for a data group with N data sets, if you want to obtain the data with the highest similarity with one of the data, Then in theory, N×(N-1) calculations are required. When N is large, the efficiency of the algorithm is very low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0054]All features disclosed in this specification, or steps in all methods or processes disclosed, may be combined in any manner, except for mutually exclusive features and / or steps.

[0055] Any feature disclosed in this specification (including any appended claims, abstract and drawings), unless expressly stated otherwise, may be replaced by alternative features which are equivalent or serve a similar purpose. That is, unless expressly stated otherwise, each feature is one example only of a series of equivalent or similar features.

[0056] Such as figure 2 As shown, each piece of data in the data group has d features, and each feature is a text type or data that can be converted into a text type. Meanwhile, the total number of pieces of data is N. Now our goal is to obtain other data with the highest similarity to any piece of data in the data group.

[0057] To solve this problem, we propose a similarity calculation method for multi-feature text data. The procedure o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for calculating similarity of multi-feature text data based on machine learning. Each feature of each data is converted into a vector array by using a text vectorization algorithm; the vector data generated by multiple features of each data is spliced and regularized, and all the corresponding vector arrays of the data are formed into a matrix; alternatively, a PCAalgorithm is used to reduce the dimension of the matrix; a series of similar data pairs are marked out by the business experts in the above data, and each data pair is composed of two similar data pairs; a vector distance mapping matrix is calculated based on the similar data pairs, and a vector distance calculation formula is obtained based on the vector distance mapping matrix; a low-precisionaggregation algorithm is used. The method uses machine learning algorithm to realize the distance calculation of multi-feature text data, and uses low-precision clustering method to reduce the computational load and improves the performance of the algorithm.

Description

technical field [0001] The invention belongs to the technical field of computer software, in particular to a method for calculating similarity of multi-feature text data based on machine learning. Background technique [0002] In the field of big data processing, one requirement is to calculate the similarity of multi-featured text data. The so-called multi-feature text data refers to such data: the data contains multiple features (also called domains), and each feature is a piece of text composed of multiple characters, or other data types that can be converted into text. [0003] Such as Figure 1 As shown, this is a set of e-commerce product data, which contains three features of "product number", "brand" and "product model", each feature is a piece of text data (or can be simply converted into text other datatypes of the data). [0004] For such a set of multi-featured text data, many data applications involve calculating the similarity between the data. For example: ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/35G06F17/27G06K9/62

CPCG06F40/289G06F18/2135G06F18/22Y02D10/00

Inventor 陈磊

Owner 深圳市翼海云峰科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A machine learning-based similarity computation method for multi-feature text data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology