Figure information disambiguation treatment method based on social network and name context

A social network and character information technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., to achieve the effect of disambiguation processing

Inactive Publication Date: 2011-05-11
HARBIN INST OF TECH
View PDF1 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a kind of person information disambiguation processing method based on social network and person's name context, to solve the problem that the retrieval results of a certain person's name by search engines in the prior art are often different person-related webpages sharing this person's name mixed problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Figure information disambiguation treatment method based on social network and name context
  • Figure information disambiguation treatment method based on social network and name context
  • Figure information disambiguation treatment method based on social network and name context

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0007] Specific embodiment one: this embodiment comprises the following steps: one, the user inputs a name to be retrieved, utilizes a search engine, such as Google API, (that is, the application programming interface provided by Google Inc.) to complete the retrieval, and the retrieved webpage is downloaded to the local computer; 2. The above-mentioned web pages are respectively extracted from the text, word segmentation and part-of-speech tagging to form a document; the word segmentation is about to divide each sentence into entries with independent meanings, and the part-of-speech tagging refers to marking each word at the same time Such as parts of speech such as nouns and verbs, word segmentation and part-of-speech tagging can respectively adopt the widely used forward maximum matching method and N-grams. 3. Classify the documents first by using the person domain information, and then use the social network and context information to cluster the person domain information, ...

specific Embodiment approach 2

[0008] Specific Embodiment 2: The difference between this embodiment and Embodiment 1 is that in the third step, the classification is carried out by using the person field information in this way: pre-classify based on the person field information, and divide the person information into entertainment, administration, and military , science and education, sports, medical care, economy and other seven categories, for each category, manually mark several representative documents, and then extract the feature information of each field category to form a field feature library, then use SVM for document classification processing, Simply classify people in reality. In this way, the characters in one type are separated from the characters in other types, and there is no comparability between them. You only need to process the information of the characters in the same field category, and aggregate the characters in the same category. Class processing, so as to finally realize the disa...

specific Embodiment approach 3

[0009] Embodiment 3: The difference between this embodiment and Embodiment 1 is that in the third step, the social network and context information are used to cluster the person domain information in this way: the context information of other person information appearing in the document It can well show some unique attributes of characters that are used to distinguish others. The co-occurring names in documents constitute their social network, and the contextual information constitutes their social attributes. Retrieve person name A, if person names A and B appear in document D1, and person names A and B also appear in document D2, then documents D1 and D2 refer to the same real person entity, then they correspond to the same category, otherwise D2 If the names A and C appear in , they are considered to be different categories of characters. And in the process of processing, its social network is constantly expanding, that is, if the names A, B, and C appear in document D1, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a figure information disambiguation treatment method based on a social network and a name context, which relates to a disambiguation treatment method of Internet figure information and solves the problem that related web-pages of different figures sharing one name are mixed in the retrieval result of a certain specified name by the search engine in the prior art. The method is used for retrieving Internet figure information and comprises the following steps of: firstly, inputting a name to be retrieved, finishing the retrieval by utilizing a search engine, and downloading a searched web page to a local computer by utilizing downloading software by a user; secondly, respectively carrying out text extraction treatment, participle treatment and part-of-speech tagging treatment on the webpage to form a document; thirdly, classifying the documents by utilizing figure field information, carrying out clustering treatment on the figure field information by utilizing the social network and the context information, finally displaying a corresponding relation between the figure field information and an entity figure, and displaying the social network where each entity figure lives.

Description

technical field [0001] The invention relates to a disambiguation processing method for Internet character information. Background technique [0002] Since the search results of general search engines for relevant knowledge in vertical fields are far from meeting people's expectations, vertical search engine technology emerges as the times require. As the core of vertical search engine technology, the research on named entities is becoming more and more popular. Named entities are important linguistic units that carry information in text. The reference (entity mention, also called referent item) of entity concept in the text can have three forms: named reference, nominal reference and pronoun reference. There are a series of research tasks around named entities, such as: named entity recognition, disambiguation, attribute extraction, relationship extraction, etc. Among them, the task of named entity recognition is to identify the named referent of the entity concept in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 刘远超刘铭王晓龙刘秉权林磊单丽莉孙承杰
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products