Text relevance calculating method and device

A correlation calculation and correlation technology, applied in the field of Internet applications, can solve problems such as a large amount of storage space, aggravate sparseness, waste storage space, etc., and achieve the effect of saving storage space, avoiding feature sparseness, and improving computing speed.

Active Publication Date: 2015-03-18
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 68 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the traditional calculation method based on the word vector space model in the document faces the problem of sparse features on short texts
At the same time, since the word segmentation results of short texts depend on the language model, the consistency of different word segmentations cannot be guaranteed, and the sparseness of vectors will also be exacerbated to a certain extent.
Therefore, the traditional calculation method based on the word vector space model in the document has the disadvantage of low accuracy of correlation determination
[0007] Moreover, in the traditional calculation method based on the word vector space model in the document, a large amount of storage space is required to store the word vector, so the storage space is wasted and the cost is increased.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text relevance calculating method and device
  • Text relevance calculating method and device
  • Text relevance calculating method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0117] Embodiments of the present invention include:

[0118] For each pair of edited and marked short texts, calculate the aforementioned multiple correlation feature scores to form a feature vector;

[0119] Use each eigenvector to form M training samples, set the editing score as S(S∈[0,1]), then the The category of samples is marked as 1, and the remaining samples are marked as 0;

[0120] The weight w of each correlation feature is obtained by training the binary logistic regression model 1 ,w 2 ...w n and bias b;

[0121] For given two short texts T 1 , T 2 , first calculate its aforementioned multiple correlation feature scores R 1 , R 2 ...R n , and then use the Sigmoid function to calculate the final correlation score as

[0122] R ( T 1 , T 2 ) = 1 1 + ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a text relevance calculating method and a device thereof. The method comprises the following steps: receiving a first character string and a second character string; calculating a text relevance characteristic value of the first character string and the second character string, and calculating a semantic relevance characteristic value of the first character string and the second character string; fitting the text relevance characteristic value and the semantic relevance characteristic value into a relevance characteristic value of the first character string and the second character string based on the logistic regression model. The text relevance calculating method and the device thereof increase the precision of relevance judgment, save storage space and reduce the cost.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of Internet applications, and more specifically, to a text correlation calculation method and device. Background technique [0002] With the rapid development of computer technology and network technology, the Internet (Internet) plays an increasingly important role in people's daily life, study and work. Various applications emerge in endlessly on the Internet. [0003] Search advertising is a very important business in the Internet advertising ecosystem. It is attached to search engines and is essentially keyword-based sales matching. In the commercial promotion database, in addition to providing the title and description of the ad for display, the advertiser must also add some relevant keywords (that is, purchase words) to the ad, and specify the matching type, bid and orientation Match target traffic (that is, users who match the search intent). In a classic matching process,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3347G06F16/35G06F16/374
Inventor 赫南张文斌姚伶伶王莉峰何琪张博
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products