Content similarity sorting algorithm

A sorting algorithm and similarity technology, applied in computing, biological neural network model, semantic analysis and other directions, can solve the problems of high consumption of computing resources, destroy the sense of experience, long computing time, etc., to reduce the scope of computing, save computing resources, The effect of reducing computation time

Pending Publication Date: 2020-08-07
广东南方智媒科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the Word2Vec algorithm released by Google meets the needs of text similarity calculation, when the algorithm is used in the real-time calculation scenario of a large number of manuscripts under the condition of 100-dimensional level text vector features, it will consume a lot of computing resources and consume a lot of computing power. Time and other issues, thus destroying the user's sense of experience with the APP

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content similarity sorting algorithm
  • Content similarity sorting algorithm
  • Content similarity sorting algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] Example 1, such as figure 1 As shown, the present invention comprises two parts of text similar recall and text semantic similarity sorting; Its text similar recall is the text similar recall layer, and its text similar recall layer adopts BERT language model and K-Means clustering model to realize; Its text The semantic similarity ranking is the text semantic similarity ranking layer, and its text semantic similarity ranking layer is implemented by the Word2Vec neural network model.

[0046] Such as Figure 2-Figure 3 As shown, text similarity recall specifically includes the following steps:

[0047] Step 1: Predict the classification probability vector P

[0048] (1) Perform data processing on massive text data, extract text length seq_len and text content chars, and combine them into a text set with two dimensions;

[0049] (2) For all text content chars, establish a bag of words model, and perform word bag vector words conversion on the text content chars, extra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of content similarity sorting of mass data, relates to content similarity sorting based on text classification probability vectors and text semantic vectors in mass data, in particular to a content similarity sorting algorithm based on text classification probability vectors and text semantic vectors in mass data. The algorithm can be divided into two layers: the first layer is a similarity calculation recall layer based on a text classification probability vector, and the layer calculates a local optimal similar text pool through the classificationprobability vector, so the layer is called as a text similarity recall layer; and the second layer performs content similarity calculation based on the local optimal similar text pool of the first layer, so the second layer is called as a text semantic similarity sorting layer. According to the method, a text similarity coarse-grained sorting layer is embedded on the basis of traditional text similarity calculation, so the optimal similar text pool is screened, and the calculation range of the text similarity is reduced through the algorithm. And when facing massive data, the method not onlysaves a large number of computing resources, but also reduces the computing time consumption.

Description

technical field [0001] The invention belongs to the technical field of content similarity sorting of massive data, and relates to content similarity sorting based on text classification probability vectors and text semantic vectors in massive data, in particular to a kind of content based on text classification probability vectors and text semantic vectors in massive data Similarity sorting algorithm. Background technique [0002] With the rapid development of new media technology in recent years, major media platforms have developed towards the advantages of digital technology such as high timeliness, low cost, and multiple channels, and this trend has gradually been accepted by the public. However, in the process of changing the operation mode of the media platform, the reading mode of consumers has also changed, which is specifically reflected in the transformation from systematic reading to fragmented reading, which leads to the lack of integrity and continuity of the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9538G06F40/284G06F40/30G06N3/04
CPCG06F16/9538G06F40/284G06F40/30G06N3/045
Inventor 麦淼王梦环李梓华
Owner 广东南方智媒科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products