E-Science environment-oriented multi-domain Web text feature extracting system and method

A feature extraction and multi-field technology, applied in the field of Web text feature extraction, can solve problems such as restricting the application range of Chinese information extraction systems, inconvenient experiment reproduction, difficult transplantation and promotion, etc., to enhance portability and practical value, improve utilization efficiency effect
CN102073647BInactive Publication Date: 2013-12-11UNIV OF SCI & TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF SCI & TECH BEIJING
Publication Date
2013-12-11
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to an e-Science environment-oriented multi-domain Web text feature extracting system and method. The method comprises the following steps of: 1. making statistics on the frequentness of characters in a target text; 2. with a character as a basic processing unit, extracting character strings between the character used as a start point and a character having the frequentness of 1 and being used as a terminal point one by one; and 3. making statistics of the frequentness of each character string, and performing descending order on feature character strings according to the frequentness and outputting the feature character strings. In the invention, a non-dictionary character segmentation technology is introduced in the feature discovery of a domain text, thereby the dependence of a traditional method on a domain dictionary is effectively overcome and the portability and the practicability of the e-Science environment-oriented multi-domain Web text feature extracting system and method in multi-domain scientific data are enhanced to some extent.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to feature extraction of Web text, in particular to a multi-field Web text feature extraction system and method for e-Science environment. Background technique

[0002] Khaled Khelif (2007) proposed an ontology-based information extraction method, aiming to help biologists acquire professional knowledge more effectively. This method relies on semantic annotation of scientific and technological documents, automatically generates domain ontology and provides corresponding information retrieval interface. Tara McIntosh (2007) proposed a full-text information extraction system for the biomedical field to solve the shortcomings of the traditional analysis methods based on literature summarization. ZiyaOzkan Gokturk and Nihan Kesim Cicekli et al. (2007) used web crawler technology to extract and classify web page metadata using pre-set regular expressions. In the experiment, taking the European Cup and the UEFA Champions League as exa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More