A name ambiguity eliminating method applied to Web figure search

A technology for people's names and characters, applied in the field of photoelectric transmission, can solve the problems that the social circles of people with the same name do not overlap very much, and people with the same name cannot have the same occupation, etc., and achieve the effect of improving accuracy.

Inactive Publication Date: 2019-05-28
四川易诚智讯科技有限公司 +1
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] Existing Web name disambiguation technologies have some shortcomings, such as the classification method based on network knowledge resources requires that people with the same name cannot have the same occupation, and the clustering method based on graph segmentation requires that the social circles between people with the same name do not overlap much, etc.
In addition, for the clustering method based on the vector space model, the previous feature selection and processing reasons, as well as the reasons for the feature set fusion method, lead to certain limitations in using the clustering method based on the vector space model for Web name disambiguation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A name ambiguity eliminating method applied to Web figure search
  • A name ambiguity eliminating method applied to Web figure search
  • A name ambiguity eliminating method applied to Web figure search

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] At first the prior art related to the present invention is briefly described:

[0034] 1. Word frequency-inverse document frequency algorithm

[0035] TF-IDF (Term Frequency-Inverse Document Frequency), term frequency-inverse document frequency algorithm, it is a statistical method used to evaluate the importance of a word to a certain document in a file set or a corpus, The importance of a word increases proportionally to the number of times it appears in the document, but decreases inversely proportional to the frequency it appears in the corpus. To sum up, the more times a word appears in a document and the less it appears in all other documents, the more the word can represent the content of this document.

[0036] Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of how relevant a document is to a user query. In addition to TF-IDF, search engines on the Internet also use ranking methods based on link analysis to determin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a name ambiguity eliminating method applied to Web task searching, which comprises the following steps of S1, extracting an HTML webpage source code, and extracting noise irrelevant to character information from the HTML webpage source code; S2, extracting a character webpage feature set; S3, generating a combined feature vector representing a certain person related webpagefrom the person webpage feature set extracted in the step S2; S4, performing hierarchical clustering by adopting a condensation hierarchical clustering algorithm to obtain a character webpage clustering result. According to the method, through introduction of the n-element capital model, the limitation of traditional named entity recognition is solved, named entity extraction is limited, and a plurality of special vocabularies and special vocabularies in the text cannot be extracted; different extracted features are endowed with different weights according to the importance of the features tothe character representation, so that the name disambiguation accuracy is improved.

Description

technical field [0001] The invention belongs to the field of photoelectric transmission, in particular to an all-fiber distributed acoustic wave sensing technology. Background technique [0002] With the advent of the mobile Internet era, search engines have become an important tool for people to acquire knowledge, and it is very common to search for personal information on the Internet. According to statistics, about 5%-10% of search engine queries involve names, and only less than 20% of people are willing to add additional information when searching for names. At the same time, personal names are highly ambiguous. According to the report of the US Census Bureau, there are 1 billion people who use only 90,000 different names. The name retrieval of the search engine gets mixed results of multiple related webpages with the same name, and there is a tendency for "celebrity" webpages to overwhelm "non-celebrity". For example, if Google searches for "Michael Jordan", the resu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9535G06F16/36G06F17/27
Inventor 张军胡欣占梦来邹佩良王另
Owner 四川易诚智讯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products