Unlock instant, AI-driven research and patent intelligence for your innovation.
Method and device for mining bad examples of search engine
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A search engine and confidence technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low efficiency, failure to detect badcases in time and accurately, and achieve the effect of improving efficiency and accuracy
Active Publication Date: 2018-07-10
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
This method is inefficient, and can only find a small number of badcases that happen to be encountered, and cannot find badcases in a timely and accurate manner, so it is bound to be difficult to use them as a decision-making reference for search engine improvement
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0051] figure 1 The flow chart of the mining method of the search engine badcase provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:
[0052] Step 101: extract a certain number of sessions from the session log as samples, and extract feature vectors describing search quality from each session of the samples.
[0053] Session refers to the time period during which the user communicates with the interactive system. It usually refers to the time elapsed from entering the interactive system to exiting the system, and there is still a certain room for manipulation. In the embodiment of the present invention, a session in the session log contains the behavior information of the user using the search engine.
[0054] The session logs of search engines are massive, and may be T (1T=1024G) level files per day, so in this step, only a certain number of sessions need to be extracted as samples, for example, 600 sessi...
Embodiment 2
[0086] figure 2 The search engine badcase mining device provided for the second embodiment of the present invention includes a preprocessing unit 200 and a mining unit 210, such as figure 2 As shown, the preprocessing unit 200 specifically includes a sample feature extraction module 201, a sample clustering module 202, and a confidence determination module 203, and the mining unit 210 specifically includes a query feature extraction module 211, a query category determination module 212, and a bad case discrimination module 213 .
[0087] The sample feature extraction module 201 extracts a certain number of sessions from the session logs as samples, and extracts feature vectors describing search quality from each session of the samples.
[0088] The sample clustering module 202 uses the feature vectors of each session to cluster the samples.
[0089] The confidence determination module 203 determines the confidence of each category obtained by clustering by the sample clust...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention provides a method and a device for excavating a badcase (badcase) of a search engine, wherein the method comprises the following steps of a preprocessing procedure: extracting a certain number of sessions as samples from a session (session) log, and extracting a feature vector describing the search quality from each session of the samples; clustering the samples by utilizing the feature vector of each session; determining confidence coefficient of each category obtained by clustering the samples, wherein the confidence coefficient represents the low degree of the search quality; an excavating procedure: determining an action sequence in the same query in a session log to be excavated, and extracting a feature vector describing the search quality from the action sequence; determining the category of the query by computing the distance between the feature vector of the query and the feature vector of each category; if the confidence coefficient of the category of the query is beyond a preset high threshold, determining that the search engine has the badcase to the query. According to the method and device for excavating the badcase of the search engine, which are disclosed by the invention, the automatic excavation of the badcase of the search engine can be realized, so that the badcase of the search engine can be timely and exactly found out.
Description
【Technical field】 [0001] The invention relates to the technical field of computer applications, in particular to a method and device for mining badcases of search engines. 【Background technique】 [0002] With the continuous development of computer technology, the network has become the main channel for people to obtain information. Among them, the search engine can understand the user's query needs and intentions through analysis, and search for the webpage that best matches the user's query within the entire network. However, due to the vast amount of web pages on the Internet, the content of web pages varies greatly, and the expressions of user needs are also diverse. Therefore, the biggest difficulty for search engines is to be able to return the most relevant search results regardless of the user's query. result. [0003] The interior of the search engine is composed of many complex coupled correlation strategies, the number and complexity of which, as well as the mutu...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.