Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Splitting method for search string of Chinese vertical search

A vertical search, Chinese technology, applied in the computer field, can solve the problems of increasing the phrase boundary judgment error, huge labor cost, poor ambiguity processing ability, etc., to ensure accuracy, save labor costs, and reduce the effect of impact.

Inactive Publication Date: 2014-01-29
北京中搜云商网络技术有限公司
View PDF6 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The method based on the entity dictionary recognizes phrases by directly looking up dictionary entries in the process of segmenting the retrieval string, without using any context information, so the ability to deal with ambiguity is relatively poor
At the same time, in order to ensure the quality of the dictionary, the construction and update of the dictionary is often done manually or semi-manually, resulting in a slower update speed of the dictionary and affecting the segmentation effect.
[0007] Supervised learning methods require a sufficient scale of manually labeled data. Due to the differences in language rules in different fields, it is usually necessary to construct different sets of manually labeled data for different vertical searches, which leads to huge overhead in terms of labor costs.
[0008] The unsupervised learning method uses the original unsegmented data as the training set, so the formed phrase structure model is easy to introduce noise, which increases the error in the judgment of the phrase boundary, resulting in a decrease in the segmentation accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Splitting method for search string of Chinese vertical search
  • Splitting method for search string of Chinese vertical search
  • Splitting method for search string of Chinese vertical search

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The specific embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0040] Such as figure 1 as shown, figure 1 The system basic frame diagram of the retrieval string splitting method for Chinese vertical search; the retrieval string splitting method of Chinese vertical search uses a hybrid method of entity dictionary and unsupervised learning method to identify phrases in Chinese vertical search retrieval string; according to user Query logs, use unsupervised learning methods to build user retrieval language models; users input retrieval strings, split the retrieval strings according to entity dictionaries and language models, and obtain split results.

[0041] Such as figure 2 as shown, figure 2 It is a flow chart of the search string splitting method for Chinese vertical search; the search string splitting method for Chinese vertical search includes building entity dictionaries and language ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a splitting method for a search string of Chinese vertical search, which is to split the search string of the Chinese vertical search through an entity dictionary and an unsupervised learning method. The splitting method for the search string of Chinese vertical search comprises the steps of: establishing an entity dictionary and a language model; carrying out entity name matching for the search string; processing non-Chinese characters in the search string; performing word segmentation on the search string; establishing a weight matrix of a candidate phrase; obtaining a weight of a combination of all candidate phrases of the search string; regarding a phrase combination with the maximum weight as a splitting result of the search string to be turned back. The method overcomes the difficulty of a dictionary method for dealing with ambiguity meanings, avoids expense of a supervised learning method taken for manual corpus tagging and reduces influence from noise to segmentation boundary without the supervised learning method.

Description

technical field [0001] The invention relates to a method and device in the field of computers, in particular to a method for splitting error detection strings for Chinese vertical search. Background technique [0002] With the explosive growth of network information, the data sources and data scale of vertical search engines are also growing rapidly. In order to improve the search accuracy and enable users to obtain a better search experience, the key Split the search string entered by the user into consecutive phrases. At present, retrieval string splitting is mainly aimed at webpage search. There are two main types of splitting methods: methods based on entity dictionaries and methods based on statistical machine learning. Methods based on statistical machine learning can be divided into supervised learning methods and unsupervised learning methods. learning method. [0003] The method based on the entity dictionary: the entity name dictionary is collected manually or se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/95G06F40/30
Inventor 赵毅强杨红尘
Owner 北京中搜云商网络技术有限公司
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More