Method and device for acquiring alternative name matched pair

A matching pair and alias technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of inability to fully identify alias matching pairs

Active Publication Date: 2010-06-09
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the object of the present invention is to provide a method and device for obtaining an alias...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for acquiring alternative name matched pair
  • Method and device for acquiring alternative name matched pair
  • Method and device for acquiring alternative name matched pair

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0153] Embodiment 1. The preset rule may be an information extractor rule, and the step of extracting according to the rule may be: judging whether the content presented on the webpage contains a preset information extractor; if the information extractor is included, Judging whether the character string in the information extractor contains a preset keyword; if the character string contains the keyword, then determine a character string pair including an alias matching pair according to the information extractor and the keyword.

[0154] Wherein, the information extractor may include:

[0155] ( ), [], [], "", "", '', "",

[0156] These symbols usually appear in pairs with start marks, for example, a left parenthesis is the beginning, and the corresponding right parenthesis is the end. Through statistics, it can be found that such symbols are usually included in the character strings that contain alias matching pairs. For example, the content presented on the web page contai...

Embodiment 2

[0165] Embodiment 2. The preset rule may be an extraction keyword rule, and the step of extracting according to the rule may be: judging whether the content presented on the web page contains a preset extraction keyword; if the preset extraction keyword is included, keyword, the character string pair including the alias matching pair is determined according to the position of the extracted keyword and the specific punctuation.

[0166] Wherein, the extracted keywords may include:

[0167] Also known as, alias, common name, abbreviation, etc.

[0168] The specific punctuation may include:

[0169] . ;- / .! *×— - | ′, > _

[0170] For example, the content presented on the acquired webpage is scanned, and keywords such as "short name" and "full name" are found therein and extracted. The part from the starting position of the extracted keyword to the first specific punctuation is taken as string 1, and the content between the extracted keyword and the first specific punctuatio...

Embodiment 3

[0205] Embodiment 3. Correction method based on frequency

[0206] Since there will be a large number of repetitions in the truncated alias matching pairs, the credibility of the alias matching pairs can be judged to a certain extent according to the number of occurrences of the alias matching pairs. Therefore, the correction method based on frequency can be carried out according to the following steps: filter the alias matching pair, and count the number of times each alias matching pair occurs, and judge the credibility of the alias matching pair obtained according to the number of occurrences; filter out Alias ​​matching pairs whose confidence is lower than a preset threshold.

[0207] In practical applications, according to the rules of the user's input habits, it can be found that when the user inputs an alias by using information extractors such as brackets, or only by using keywords to input an alias, different contextual relevance is usually generated. Therefore, for ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for acquiring an alternative name matched pair, comprising the following steps: acquiring contents presenting on each webpage in an internet; extracting a character string pair containing alternative name matched pair in the contents presenting on each webpage according to preset rules; truncating the character string pair containing the alternative name matched pair to acquire the alternative name matched pair. The invention also discloses a device for acquiring the alternative name matched pair. The invention can more and more roundly distinguish possibly emerging alternative name matched pairs, further effectively utilizes the distinguished alternative name matched pairs to improve the experience of users and the utilization rate of data.

Description

technical field [0001] The invention relates to the field of network data processing, in particular to a method and a device for obtaining an alias matching pair. Background technique [0002] In daily life, people often use aliases, which include abbreviations, aliases, former names, etc. For example, the abbreviation of Peking University is "Peking University", the alias of "Mercury" is "Mercury", "Peking University" Its former name was "Kyoto University Hall" and so on. Usually, the corresponding relationship between the original name and the alias can be called an alias matching pair. However, the current search engines cannot automatically handle the correspondence between the original name and the alias, which causes a lot of waste of webpage resources and affects the user experience. For example, "Bird's Nest" is another name for "National Stadium", and some webpages may only contain "National Stadium" but not "Bird's Nest". , such pages will not be included in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 刘珊瑞张阔
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products