Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text similarity calculation method and apparatus

A text similarity and calculation method technology, which is applied in the field of text similarity calculation methods and devices, can solve problems such as unsatisfactory prevention and control, and achieve the effect of improving response speed and calculation efficiency

Active Publication Date: 2017-10-03
ADVANCED NEW TECH CO LTD
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004]However, through algorithms such as edit distance or cosine distance, when calculating the similarity between the text sample generated by social text and each black sample, it usually faces 1: N polling; therefore, when the number of black samples is large, all black samples are polled to calculate the similarity in turn. From the perspective of response speed, it cannot meet the requirements of real-time online prevention and control.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation method and apparatus
  • Text similarity calculation method and apparatus
  • Text similarity calculation method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In related technologies, based on the audited black samples containing bad content, the content audit of the social text generated in the social application is carried out, and when the real-time online prevention and control is completed, it can usually be achieved in the following ways:

[0023] In one implementation shown, at the beginning of the launch of the social application, special risk control personnel can be set up, and the risk control personnel manually browse the social text generated by the social application, and rely on manual judgment of the messages posted by the user through the social application Or social texts such as service content, etc., whether there is any inappropriate content that violates regulations. When the number of users of social applications continues to grow, and relying on manual labor is not enough to support rapid review, risk control personnel can configure a large number of keyword rules based on experience, and then the revie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text similarity calculation method. The method comprises the steps of performing text segmented word filtering processing on text segmented words obtained by performing word segmentation processing on a text sample in an original black sample library and a newly input text sample according to a text filtering ratio of multiple preserving gradients based on a same drop policy; performing reconstruction on the text sample in the original black sample library and the newly input text sample by using the residual text segmented words after the filtering; representing the similarity between the newly input text sample and the black sample by utilizing the filtering ratio of the text segmented words; and by matching the text segmented words in the reconstructed black sample library and newly input text sample, setting black sample similarity for the text segmented words obtained by performing word segmentation on the newly input text sample. According to the method, the calculation efficiency of calculating the similarity between the newly input text sample and the text sample in the black sample library can be remarkably improved.

Description

technical field [0001] The present application relates to the field of computer applications, in particular to a method and device for calculating text similarity. Background technique [0002] Social applications usually face the problem of content review. A social product usually has tens of millions or even hundreds of millions of users, and there is a huge amount of information interacting every day and every moment. Therefore, how to quickly complete the real-time online prevention and control of various harmful content based on the audited historical content is of great significance. [0003] In related technologies, real-time online prevention and control of various bad content based on reviewed bad historical content is usually based on text similarity; for example, based on algorithms such as edit distance or cosine distance, Calculate the text similarity between the text samples generated by social applications and every black sample containing bad content that h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27
CPCG06F40/194G06F40/30
Inventor 郑丹丹
Owner ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products