Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and a device for calculating the weight of a word segmentation item

A term and word segmentation technology, applied in the computer field, can solve problems such as not reflecting the importance of the same term, and achieve the effect of accurate prediction and good calculation effect

Active Publication Date: 2019-06-28
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the prior art, when calculating the weight of each term in the query word, the main calculation method is to obtain the relevant co-occurrence statistical features of the word from the multi-text data set, such as the common term frequency-inverse document frequency (Term Frequency-Inverse Document Frequency) frequency, TF-IDF), mutual information and other features, the relevant co-occurrence statistical features provided by the existing technology only consider the co-occurrence of words in the text and other information, and these information are independent of the query words themselves, making the The calculation result of term weight does not reflect the importance of the same term in different query words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a device for calculating the weight of a word segmentation item
  • A method and a device for calculating the weight of a word segmentation item
  • A method and a device for calculating the weight of a word segmentation item

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention provide a method and device for calculating the weight of word segmentation terms, which are used to accurately predict the weight of each word segmentation term in a query word.

[0028] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the following The described embodiments are only some, not all, embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention.

[0029] The terms "comprising" and "having" in the description and claims of the present invention and the above drawings, as well as any variations thereof, are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a device for calculating the weight of a segmented word item, which are used for accurately predicting the weight of each segmented word item ina query word. The embodiment of the invention provides a segmented word item weight calculation method, which comprises the following steps: carrying out word vector training by using text corpus andhistorical query words to obtain word vectors of the historical query words, and obtaining the historical query words through historical search data; taking The word vectors of the historical query words as features; on the basis of historical behavior data and a target value calculated by a recall result obtained by searching historical query words, training a plurality of segmented word item weights depending on the historical query words by using a machine learning algorithm, and ending the training when the error is minimum or the iteration frequency reaches a frequency threshold value; And using the machine learning algorithm to calculate the weights of the word segmentation items of the target query word, and outputting weight values of a plurality of word segmentation items depending on the target query word.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for calculating the weight of word segmentation items. Background technique [0002] In the search engine, the user can input a query word (query), and after performing word segmentation on the query word, multiple word segmentation terms (term) can be obtained. When a user enters a query word, the goal is to obtain useful information related to the query word. A good search engine is to accurately return the information the user is looking for and rank them first. The recall of the document is based on the intersection of each term in the query in the document. If the query is too long, some documents may not be recalled correctly and displayed to the user. Therefore, it is necessary to calculate the weight of each term in the query. According to the weight Wait for processing to recall and sort the documents. As an effective module, term weight is very i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/9535G06F17/18G06N99/00
Inventor 邓亚平连凤宗
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products