Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Understanding method and device for queries, equipment and storage medium

A sequence and search vector technology, applied in the field of information processing, can solve problems such as model effect constraints, inability to parse, and inability to transfer labeled corpus, so as to optimize the training model, improve model ability and generalization ability, and improve the effect of understanding

Active Publication Date: 2018-03-23
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the existing technology has the following problems: 1) The cost of labeling data is high, and developers need to label a large amount of data for model training in order to achieve the ideal query understanding effect
But when the amount of labeled data is relatively small, the effect of the model is restricted
2) The generalization ability of the Query understanding model is not strong. If the new Query is literally completely different from the Query of the training set, it may not be able to be parsed
3) In addition to the marked corpus, developers generally have a large amount of unmarked corpus, which implies knowledge in the field and common grammatical structures, but cannot be used by existing technologies
The current technology cannot transfer annotation corpus in other fields, and optimize the query understanding effect in a new field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Understanding method and device for queries, equipment and storage medium
  • Understanding method and device for queries, equipment and storage medium
  • Understanding method and device for queries, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] figure 1 It is a flow chart of a method for understanding a search sequence in Embodiment 1 of the present invention. This embodiment is applicable to the understanding of a search sequence in a specific field, and the method can be executed by a device for understanding a search sequence. Specifically Including the following steps:

[0034] Step 110, determine the word vectors of each word included in the tagged search sequence.

[0035] In this embodiment, an annotated search sequence refers to an artificially annotated search sequence with annotated results. Specifically, for a specific field, the field annotation content of the search sequence may be the name of the field, such as the movie field, the transportation field, and the like.

[0036] Among them, the word vector can represent a word as a very long vector through the one-hot encoding notation, and its dimension is the size of the vocabulary, most of the elements are 0, and there is only one dimension val...

Embodiment 2

[0046] figure 2 It is a flowchart of a search sequence understanding method in Embodiment 2 of the present invention. In this embodiment, on the basis of the above embodiments, the method for understanding the above search sequence is further optimized. Correspondingly, such as figure 2 As shown, the method of this embodiment specifically includes:

[0047] Step 210, determine the word vectors of each word included in the tagged search sequence.

[0048] Step 220, acquiring each URL site name and the click-through search sequence and non-click search sequence of each URL site name.

[0049] Wherein, the URL site name is a combination of server name and domain name in the URL, for example: if the URL is: http: / / flights.ctrip.com / fuzzy / #ctm_ref=ctr_nav_flt_fz_pgs, then the server name of this URL is flights, The domain name is ctrip.com, and the site name is flights.ctrip.com, or you can use the page title of flights.ctrip.com as the site name.

[0050] Specifically, all ...

Embodiment 3

[0066] image 3 It is a flowchart of a search sequence understanding method in Embodiment 3 of the present invention. On the basis of the foregoing embodiments, this embodiment specifically illustrates the model determination of domain recognition, intention recognition, and slot recognition in the method for understanding the search sequence. Correspondingly, the method in this embodiment specifically includes:

[0067] Step 310, determine the word vectors of each word included in the tagged search sequence.

[0068] Step 320, using the hidden layer parameters, convolutional layer parameters and pooling layer parameters in the search sequence CNN model obtained in advance according to each URL site name and the click search sequence and non-click search sequence training of each URL site name as the initial field Identify hidden layer parameters, convolutional layer parameters, and pooling layer parameters in the model.

[0069] Step 321: According to the domain annotation...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses an understanding method and device for queries, equipment and a storage medium. The method comprises the steps that word vectors of all words contained in annotated queries are determined; hidden layer parameters, convolutional layer parameters and pooled layer parameters in a query CNN model obtained through training according to all URL site names and clicked queries and non-clicked queries of all the URL site names in advance are used as hidden layer parameters, convolutional layer parameters and pooled layer parameters in an initial domain recognition model; and according to domain annotations of the annotated queries and the word vectors of all the words contained in the annotated queries, the initial domain recognition model is trained to determine fully-connected layer parameters in the initial domain recognition model, and a domain recognition model is obtained. Through the scheme, model ability and generalization ability under a smallquantity of samples can be improved, the model is trained optimally, and the understanding effect of the queries is improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of information processing, and in particular, to a search sequence understanding method, device, device, and storage medium. Background technique [0002] With the rapid development of artificial intelligence (AI) technology, more and more products and applications such as smart customer service, smart assistant, car navigation and smart home are trying to introduce conversational human-computer interaction. However, in actual work, the development of dialogue systems is a very difficult task for most developers, and one of the main technical difficulties is the understanding of search sequences (Query). The core task of query understanding is to transform natural language into a formal language that can be processed by machines, and to establish connections between natural language and resources and services. [0003] Query understanding can be broken down into three tasks, namely...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/04G06N3/08
CPCG06F16/3329G06F16/334G06F16/9566G06N3/084G06N3/045
Inventor 王硕寰孙宇于佃海
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products