Method and device for acquiring alternative name matched pair
A matching pair and alias technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of inability to fully identify alias matching pairs
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0153] Embodiment 1. The preset rule may be an information extractor rule, and the step of extracting according to the rule may be: judging whether the content presented on the webpage contains a preset information extractor; if the information extractor is included, Judging whether the character string in the information extractor contains a preset keyword; if the character string contains the keyword, then determine a character string pair including an alias matching pair according to the information extractor and the keyword.
[0154] Wherein, the information extractor may include:
[0155] ( ), [], [], "", "", '', "",
[0156] These symbols usually appear in pairs with start marks, for example, a left parenthesis is the beginning, and the corresponding right parenthesis is the end. Through statistics, it can be found that such symbols are usually included in the character strings that contain alias matching pairs. For example, the content presented on the web page contai...
Embodiment 2
[0165] Embodiment 2. The preset rule may be an extraction keyword rule, and the step of extracting according to the rule may be: judging whether the content presented on the web page contains a preset extraction keyword; if the preset extraction keyword is included, keyword, the character string pair including the alias matching pair is determined according to the position of the extracted keyword and the specific punctuation.
[0166] Wherein, the extracted keywords may include:
[0167] Also known as, alias, common name, abbreviation, etc.
[0168] The specific punctuation may include:
[0169] . ;- / .! *×— - | ′, > _
[0170] For example, the content presented on the acquired webpage is scanned, and keywords such as "short name" and "full name" are found therein and extracted. The part from the starting position of the extracted keyword to the first specific punctuation is taken as string 1, and the content between the extracted keyword and the first specific punctuatio...
Embodiment 3
[0205] Embodiment 3. Correction method based on frequency
[0206] Since there will be a large number of repetitions in the truncated alias matching pairs, the credibility of the alias matching pairs can be judged to a certain extent according to the number of occurrences of the alias matching pairs. Therefore, the correction method based on frequency can be carried out according to the following steps: filter the alias matching pair, and count the number of times each alias matching pair occurs, and judge the credibility of the alias matching pair obtained according to the number of occurrences; filter out Alias matching pairs whose confidence is lower than a preset threshold.
[0207] In practical applications, according to the rules of the user's input habits, it can be found that when the user inputs an alias by using information extractors such as brackets, or only by using keywords to input an alias, different contextual relevance is usually generated. Therefore, for ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com