E-Science environment-oriented multi-domain Web text feature extracting system and method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A feature extraction and multi-field technology, applied in the field of Web text feature extraction, can solve problems such as restricting the application range of Chinese information extraction systems, inconvenient experiment reproduction, difficult transplantation and promotion, etc., to enhance portability and practical value, improve utilization efficiency effect

Inactive Publication Date: 2013-12-11

UNIV OF SCI & TECH BEIJING

View PDF3 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Existing domain-based information extraction methods mostly rely on domain dictionaries to discover text features, which is neither convenient for experimental reproduction, nor easy for transplantation and promotion in multi-domain environments, which seriously restricts the application range of Chinese information extraction systems

In the analysis process, it mostly relies on the assistance of domain dictionaries or tagged word sets. Although it can effectively improve the extraction accuracy of specific domain features, it cannot meet the actual needs of multi-domain information extraction in terms of system portability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0040] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0041] The multi-domain Web text feature extraction system for e-Science environment mainly considers the following three aspects during the design process: First, get rid of the dependence on domain dictionaries. The role of domain dictionaries in most Chinese information extraction systems is to segment texts and perform data preprocessing for feature discovery. However, due to its limitations in quantity and update speed, it seriously restricts the ability of the Chinese information extraction system to discover new events and the latest vocabulary in the field, which is not conducive to the transplantation and promotion of the Chinese information extraction system. The introduction of dictionary-free word segmentation technology will effectively improve the knowledge learning ability of the Chinese information extraction system, and is more suitable for f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an e-Science environment-oriented multi-domain Web text feature extracting system and method. The method comprises the following steps of: 1. making statistics on the frequentness of characters in a target text; 2. with a character as a basic processing unit, extracting character strings between the character used as a start point and a character having the frequentness of 1 and being used as a terminal point one by one; and 3. making statistics of the frequentness of each character string, and performing descending order on feature character strings according to the frequentness and outputting the feature character strings. In the invention, a non-dictionary character segmentation technology is introduced in the feature discovery of a domain text, thereby the dependence of a traditional method on a domain dictionary is effectively overcome and the portability and the practicability of the e-Science environment-oriented multi-domain Web text feature extracting system and method in multi-domain scientific data are enhanced to some extent.

Description

technical field [0001] The invention relates to feature extraction of Web text, in particular to a multi-field Web text feature extraction system and method for e-Science environment. Background technique [0002] Khaled Khelif (2007) proposed an ontology-based information extraction method, aiming to help biologists acquire professional knowledge more effectively. This method relies on semantic annotation of scientific and technological documents, automatically generates domain ontology and provides corresponding information retrieval interface. Tara McIntosh (2007) proposed a full-text information extraction system for the biomedical field to solve the shortcomings of the traditional analysis methods based on literature summarization. ZiyaOzkan Gokturk and Nihan Kesim Cicekli et al. (2007) used web crawler technology to extract and classify web page metadata using pre-set regular expressions. In the experiment, taking the European Cup and the UEFA Champions League as exa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F17/30

Inventor胡长军赵冲冲翁彧赵立永

OwnerUNIV OF SCI & TECH BEIJING

E-Science environment-oriented multi-domain Web text feature extracting system and method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology