Method and device for extracting core words

A core word, non-core technology, applied in the field of core word extraction, can solve the problems of inaccurate and wrong query results, achieve the effect of improving query accuracy and increasing the probability of core words

Active Publication Date: 2015-03-18
ALIBABA (CHINA) CO LTD
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this kind of query method will have the following technical defects: due to the segmentation of query words, multiple word segmentations will be obtained, but some of the word segmentations are not the core words of the query word (the core word refers to the smallest complete word that can accurately express the meaning of the query word. word units), if the query results obtained based on these non-core word queries have the highest frequency, taking the query result with the highest frequency as the query result may not be the result actually required by the user, resulting in inaccurate or wrong query results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting core words
  • Method and device for extracting core words
  • Method and device for extracting core words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the above objects, features and advantages of the present invention more comprehensible, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

[0024] see figure 1 , is a flowchart of a method for extracting core words provided by an embodiment of the present invention. This method can be applied to any application scenario that requires input of query words for query, such as map search and surrounding search. The method can be pre-configured to save the existing A core word thesaurus for known core words, and a non-core word thesaurus for saving known non-core words, including:

[0025] S110. Segment the query word by using a preset word segmentation method to obtain the word segmentation that forms the query word;

[0026] Wherein, the preset word segmentation methods may include basic word segmentation methods, mixed word segmentation me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses a method and a device for extracting core words. Accurate core words can be extracted from query words inputted by users, so that query accuracy ratio is increased. The method includes the steps: segmenting the query words by the aid of preset segmentation words to obtain segmentation words forming the query words; respectively matching the segmentation words of the query words with phrases in a core word bank and a non-core word bank; determining the segmentation words matched with the core word bank to serve as core words of the query words if the segmentation words are matched with the core word bank and / or the non-core word bank and unknown segmentation words exist in the segmentation words of the query words; acquiring unknown segmentation words meeting length standards of preset core words or splicing the unknown segmentation words to obtain segmentation words, and enabling the segmentation words to serve as the core words of the query words. The unknown segmentation words are segmentation words which are not matched with the phrases in the core word bank and the non-core word bank.

Description

technical field [0001] The invention relates to the field of word processing, in particular to a method and device for extracting core words. Background technique [0002] In the electronic map query application, when performing POI query according to the query word input by the user, the usual practice is to first segment the query word input by the user, and then match each segment word with the POI database to obtain multiple query results. The query result with the highest frequency among the query results is used as the query result of this query. However, this kind of query method will have the following technical defects: due to the segmentation of query words, multiple word segmentations will be obtained, but some of the word segmentations are not the core words of the query word (the core word refers to the smallest complete word that can accurately express the meaning of the query word. word units), if the query results obtained based on these non-core word querie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 彭松
Owner ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products