Content similarity sorting algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A sorting algorithm and similarity technology, applied in computing, biological neural network model, semantic analysis and other directions, can solve the problems of high consumption of computing resources, destroy the sense of experience, long computing time, etc., to reduce the scope of computing, save computing resources, The effect of reducing computation time

Pending Publication Date: 2020-08-07

广东南方智媒科技有限公司

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Although the Word2Vec algorithm released by Google meets the needs of text similarity calculation, when the algorithm is used in the real-time calculation scenario of a large number of manuscripts under the condition of 100-dimensional level text vector features, it will consume a lot of computing resources and consume a lot of computing power. Time and other issues, thus destroying the user's sense of experience with the APP

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0045] Example 1, such as figure 1 As shown, the present invention comprises two parts of text similar recall and text semantic similarity sorting; Its text similar recall is the text similar recall layer, and its text similar recall layer adopts BERT language model and K-Means clustering model to realize; Its text The semantic similarity ranking is the text semantic similarity ranking layer, and its text semantic similarity ranking layer is implemented by the Word2Vec neural network model.

[0046] Such as Figure 2-Figure 3 As shown, text similarity recall specifically includes the following steps:

[0047] Step 1: Predict the classification probability vector P

[0048] (1) Perform data processing on massive text data, extract text length seq_len and text content chars, and combine them into a text set with two dimensions;

[0049] (2) For all text content chars, establish a bag of words model, and perform word bag vector words conversion on the text content chars, extra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of content similarity sorting of mass data, relates to content similarity sorting based on text classification probability vectors and text semantic vectors in mass data, in particular to a content similarity sorting algorithm based on text classification probability vectors and text semantic vectors in mass data. The algorithm can be divided into two layers: the first layer is a similarity calculation recall layer based on a text classification probability vector, and the layer calculates a local optimal similar text pool through the classificationprobability vector, so the layer is called as a text similarity recall layer; and the second layer performs content similarity calculation based on the local optimal similar text pool of the first layer, so the second layer is called as a text semantic similarity sorting layer. According to the method, a text similarity coarse-grained sorting layer is embedded on the basis of traditional text similarity calculation, so the optimal similar text pool is screened, and the calculation range of the text similarity is reduced through the algorithm. And when facing massive data, the method not onlysaves a large number of computing resources, but also reduces the computing time consumption.

Description

technical field [0001] The invention belongs to the technical field of content similarity sorting of massive data, and relates to content similarity sorting based on text classification probability vectors and text semantic vectors in massive data, in particular to a kind of content based on text classification probability vectors and text semantic vectors in massive data Similarity sorting algorithm. Background technique [0002] With the rapid development of new media technology in recent years, major media platforms have developed towards the advantages of digital technology such as high timeliness, low cost, and multiple channels, and this trend has gradually been accepted by the public. However, in the process of changing the operation mode of the media platform, the reading mode of consumers has also changed, which is specifically reflected in the transformation from systematic reading to fragmented reading, which leads to the lack of integrity and continuity of the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/9538G06F40/284G06F40/30G06N3/04

CPCG06F16/9538G06F40/284G06F40/30G06N3/045

Inventor 麦淼王梦环李梓华

Owner 广东南方智媒科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Content similarity sorting algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology