Webpage information extraction method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of webpage information and implementation methods, which is applied in the directions of instruments, computing, and electrical digital data processing, etc., and can solve problems such as poor quality, rough granularity of candidate attributes, and low accuracy

Inactive Publication Date: 2012-06-13

PEKING UNIV

View PDF2 Cites 48 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] However, there is a problem in these current methods, that is, only some candidate attributes are extracted, and the extracted attributes are not processed in the later stage, which leads to the rough granularity of the extracted candidate attributes and the low accuracy. The expression of polysemous words is relatively poor in quality, and can only be added to the knowledge base after manual selection

And these methods do not evaluate the attributes, because some attributes are closely related to the target concept, and some are relatively weak. Selecting the closely related attributes can be beneficial to the classification of concepts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] Assume that all attributes of the concept "star" need to be extracted, and the input is a list of target instances of the concept "star", that is, a collection of stars such as Andy Lau and Zhang Ziyi. First, extract the candidate attributes corresponding to the concept instance list from various network encyclopedia data sources, and the attribute values corresponding to these attributes; then use these attribute value information to conduct synonymous induction on candidate attributes, find out attributes with similar meanings and Merge them together; then use web resources to evaluate the candidate attributes, and select the attributes that are closely related to the target concept; finally, analyze the attribute values of the attributes and predict the type of attribute value corresponding to each attribute. The following is a detailed description of each specific step (for the process, see figure 1 ).

[0042] A. Build an instance list and extract candidate at...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a webpage information extraction method, in particular a method for extracting concept attributes from a network encyclopedia data source and processing the concept attributes. The method comprises the following steps of: constructing an example list, and extracting candidate attributes of examples in the list from a multi-source heterogeneous data source; performing synonymic induction on the extracted attributes, and putting synonymic attributes in the same set; sub-classifying the induced attributes; analyzing the corresponding attribute value types of the classified attributes; and recommending the attributes and corresponding attribute value type information to a user, or storing the attributes and the corresponding attribute value type information into a structured database. By adoption of the scheme of the invention, high-quality concept attribute information can be extracted from a webpage, a knowledge base can be better constructed, and other natural language processing tasks such as extraction of attribute values, text classification and classification of query logs in a search engine can be better performed.

Description

technical field [0001] The invention provides a method for extracting web page information, in particular to a method for extracting concept attributes from a network encyclopedia data source and processing them. Background technique [0002] Today, with the explosive growth of Internet texts, how to organize information and represent knowledge reasonably and effectively, and establish a good knowledge base so that people can quickly and quickly obtain the knowledge they want from massive web pages is a very important task. research work. In the construction of knowledge base, concepts and attributes are the core elements of knowledge representation. A concept is an object that reflects objective things and their unique attributes, and an attribute is a description of the characteristics of a concept. From attribute information, a more comprehensive understanding of the characteristics of a concept can be obtained. Therefore, in the automatic construction of knowledge base...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor穗志方李文杰

OwnerPEKING UNIV

Webpage information extraction method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology