Webpage training method and device and search intention recognition method and device

A web page and intent technology, applied in the Internet field, can solve the problems of limited number of results, high cost, and low accuracy of intent recognition, and achieve the effect of high intent accuracy and large web page coverage.

Active Publication Date: 2017-07-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF20 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing search intent recognition methods often use manual labeling methods to mark webpage categories. When performing intent recognition, manually labeled webpage categories need to be used for identification, and the collection of webpages for each category needs to be manually labeled, which is too costly and The number of manually labeled results is often limited. For web pages with a low click-through rate, it is likely that the category of the web page is unknown, resulting in low accuracy of intent recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage training method and device and search intention recognition method and device
  • Webpage training method and device and search intention recognition method and device
  • Webpage training method and device and search intention recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] figure 1 It is an application environment diagram for running the webpage training method and the search intent recognition method in one embodiment. Such as figure 1 As shown, the application environment includes a terminal 110 and a server 120, wherein the terminal 110 and the server 120 communicate through a network.

[0040] The terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto. The terminal 110 sends a query string to the server 120 through the network for searching, and the server 120 can respond to the request sent by the terminal 110 .

[0041] In one embodiment, figure 1 The internal structure of the server 120 in figure 2 As shown, the server 120 includes a processor, a storage medium, a memory, and a network interface connected through a system bus. Wherein, the storage medium of the server 120 stores an operating system, a database, and a device for identifying search intenti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a webpage training method and device. The method comprises the steps that a training webpage set of a manual annotation category is acquired, and webpage vectors of webpages in the training webpage set are generated, wherein a valid historical query string of a first training webpage in the training webpage set is acquired and subjected to word segmentation; the number of times of validity of all segmented words is acquired, wherein the number of times of validity is the total number of times of appearance of the segmented words in the valid historical query string; segmented word weights of all the segmented words are calculated according to the number of times of validity of all the segmented words; the webpage vector of the first training webpage is generated according to all the segmented words and the corresponding segmented word weights; and training is performed according to the manual annotation category of the webpages in the training webpage set and the corresponding webpage vectors to generate a webpage classification model. Through the webpage training method and device, training cost is low, efficiency is high, category annotation can be automatically performed on the webpages after the webpage classification model is generated, and the correct rate of a recognized intention is higher. The invention furthermore provides a search intention recognition method and device.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for webpage training, and a method and device for identifying search intentions. Background technique [0002] With the development of Internet technology, people can use search engines to retrieve the information they need through the Internet. For example, when a user enters "Legend of Sword and Fairy" in a search engine, the user's intention is likely to be to search for TV dramas or games. The search engine needs to determine the user's search intention first, so that the returned search results are closer to the user's needs. Content. Intent recognition is for any given query string, to determine the category to which the query string belongs. [0003] Existing search intent recognition methods often use manual labeling methods to mark webpage categories. When performing intent recognition, manually labeled webpage categories need to be used for id...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62G06N20/00
CPCG06F16/951G06N20/00G06F18/24G06F16/2457G06F16/9535G06N7/01G06F16/24578G06N5/04
Inventor 王忠存
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products