Text similarity determining method and device, electronic equipment and system

A text similarity and determination method technology, which is applied in the fields of text similarity determination methods, devices, electronic equipment and systems, can solve problems such as inaccurate text similarity, and achieve the goals of reducing the amount of calculation, improving efficiency, and ensuring accuracy Effect

Inactive Publication Date: 2015-11-25
TENCENT TECH (SHENZHEN) CO LTD
View PDF10 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the embodiment of the present invention provides a text similarity determination method, device, electronic equipm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity determining method and device, electronic equipment and system
  • Text similarity determining method and device, electronic equipment and system
  • Text similarity determining method and device, electronic equipment and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0040] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0041] figure 1 This is a flowchart of a method for determining text similarity provided by an embodiment of the present invention. The method can be applied to user-side devices (such as notebook computers, mobile phones and other terminal devices), and can also be applied to network-side devices (such as servers and other devices). Refer to figure 1 , the method can include:

[0042] Step S100, acquiring at least two texts;

[004...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a text similarity determining method and device, electronic equipment and a system. The method comprises the following steps: obtaining at least two texts; mapping each obtained text into a topic space; carrying out dimension reduction processing on a vector corresponding to each text which is mapped to the topic space; determining a Hash function which corresponds to each text subjected to the dimension reduction processing and represents text characteristics; carrying out binarization processing on the Hash function corresponding to each text to obtain a binary code corresponding to each text, wherein the binary codes corresponding to all texts have the same length; and according to the binary code corresponding to each text, determining a Hamming distance among the texts, and determining text similarity among the texts through the Hamming distance among the texts. The embodiment of the invention reduces a calculation amount of text similarity calculation on the premise that the determined accuracy of the text similarity is guaranteed, and the calculation efficiency of the text similarity is improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more specifically, relates to a text similarity determination method, device, electronic equipment and system. Background technique [0002] Text similarity refers to the degree of semantic association between different texts. The determination of text similarity is one of the core tasks of text mining and text retrieval. Therefore, how to better determine the text similarity has always been of great concern to those skilled in the art. The problem. [0003] Generally speaking, a single text can be directly expressed as a vector of text in the word space, and the calculation of text similarity can be transformed into the calculation of Euclidean distance or cosine distance between vectors. On this basis, the existing methods for determining text similarity mainly include: mapping the text into a vector in the word space, calculating the Euclidean distance or cosine distance b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22G06F17/30
Inventor 刘洋李霖刘黎春陈川
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products