A method for expressing structured query information based on tagged search semantic roles

A structured query and semantic role technology, applied in the field of structured information retrieval, can solve problems such as irregular query text, difficulty in semantic unit classification, and lack of theoretical basis, so as to improve search experience and product conversion rate, and realize structural The effects of optimized search and broad application prospects

Active Publication Date: 2022-03-22
上海欣兆阳信息科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] First, the text search or database system tools in the prior art perform simple full-text character string comparison on the query and the text, and the returned results usually cannot meet the needs of the user, and there is a problem of information retrieval vocabulary mismatch. In addition, the search input by the user Query words have a focus, implicitly reflecting the needs of users and personal preferences. Most retrieval models often only perform simple string statistics on the text and do not care about the internal structure of the language. This retrieval model will make the results fed back by search engines relevant. There is a large deviation in nature, and this deviation is more obvious in vertical search engines such as e-commerce;
[0008] Second, most of the existing technologies rely on the bag of words, which is a simple representation of text. In these retrieval models, the text is regarded as a set of unordered words, and the overall syntax or context of the text is not reflected. The expressive ability of the bag is very limited. The existing technology cannot break the shackles of the bag-of-words model, analyze the internal structure of the text and establish a conventional retrieval model that can handle structured and unstructured data, and cannot accurately identify the user's search intent. The current The search experience cannot meet the user's demand for information acquisition, dampen the user's search enthusiasm and overall user stickiness, and is not conducive to the healthy development of the website platform;
[0009] Third, there are the following difficulties and deficiencies in the existing technology to identify query core words: First, the query text is short in length, which belongs to the entity recognition at the sentence level, while the traditional named entity recognition pays more attention to the analysis at the chapter level, so the existing technical text Analysis techniques (such as lexical analysis and syntactic analysis) are not effective in identifying the core words of the query; second, the structure of the query text is not rigorous, there are a large number of irregular expressions, and data generalization and standardization are difficult; third, conventional Named entity recognition technology is to identify specific entities in specific texts, and generally there is only one core word in the query that can reflect the user's search purpose, which requires deep mining of word context information; fourth, named entity recognition only recognizes entities in the text, The identification of core words in the query needs to identify the key components in the search that best reflect the user's search intention, and attribute them to a specific category;
[0010] Fourth, the existing technology has the following difficulties and deficiencies in extracting structured query information: first, the query text is not standardized, and a lot of generalization and standardization work is required in the process of semantic role labeling; second, the structure of the query text is not rigorous, Existing search engines are all based on the bag-of-words model for retrieval. The search queries that guide users to input are piles of keywords. Many queries do not follow the syntax rules or even do not form a sentence at all. The existing text analysis technology is not complete. Applicable; third, due to the diversity of text expressions, many examples have the phenomenon of one word with multiple meanings or one meaning with multiple words, which brings great difficulty to the classification of semantic units;
[0011] Fifth, prior art semi-supervised or unsupervised learning methods are applied in the field of natural language processing. The commonly used semi-supervised learning methods are self-learning methods. These self-learning methods are a summary of practical experience and lack theoretical basis. In many The problem cannot achieve good results, and there is still an unknown sequence of unlabeled data states, and only a small amount of labeled data cannot cover all the potential patterns of unlabeled data
Due to the lack of a large amount of human-labeled data, the existing conditional random field model based on supervised learning cannot solve the problem of labeling search semantic roles very well.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for expressing structured query information based on tagged search semantic roles
  • A method for expressing structured query information based on tagged search semantic roles
  • A method for expressing structured query information based on tagged search semantic roles

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086] The technical solution of the structured query information expression method for labeling search semantic roles provided by the present invention will be further described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand and implement the present invention.

[0087] In Internet search engines, the search queries entered by users often point to structured data, such as product searches on e-commerce platforms, flights, movie show times, etc. However, since the search queries entered by users are expressed in the form of natural language text, it is very difficult to return relevant results from these structured data. If the structured query information can be extracted from the search query entered by the user, representing the natural language text as structured data can more accurately analyze the user's search intent and improve the user's search satisfaction. The present invention is based on the latent seman...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The structured query information expression method of marking search semantic roles in the present invention regards the search query input by the user as a word sequence, builds a model analysis on the sequence to analyze user behavior, and integrates computing science, cognitive science, and psychology. Analyze the real search intent behind user behavior by building a model of the input sequence; propose a method to represent the natural language text entered by the user into structured query information, which is a successful practice in the field of structured query information extraction and structure prediction. Can be promoted in other fields such as natural language processing and data mining; based on semi-supervised learning methods, combining machine learning and artificial experience, reducing the cost of manual labeling of a large number of samples in supervised learning methods, and giving a reasonable explanation for the result set ; Help search engines analyze users' search intentions, improve users' search experience and product conversion rate.

Description

technical field [0001] The invention relates to a method for expressing structured query information, in particular to a method for expressing structured query information that marks search semantic roles, and belongs to the technical field of structured information retrieval. Background technique [0002] Information retrieval is to analyze and model the process of people querying information, and design computer algorithms to automatically execute the query in order to analyze the information required by users. One of the key issues in information retrieval is relevance. Relevance refers to whether the search results fed back by the search engine match the real search needs of users, that is, whether they can meet the search needs of users. Correlation is also directly related to commodities in e-commerce and other fields The conversion rate of this kind of correlation usually requires a deeper analysis of the user's search intent, so when designing an algorithm for compar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/242G06F40/30G06N20/00
Inventor 王程
Owner 上海欣兆阳信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products