Method for automatically extracting terms from Chinese electronic document

An electronic document and automatic extraction technology, applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve problems affecting accuracy, improve accuracy, promote automation and performance, and solve problems with low performance Effect

Inactive Publication Date: 2013-07-17
FUZHOU UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In view of this, the purpose of the present invention is to provide a method for automatically extracting words from Chinese electronic documents, to solve the problem that the accuracy rate is affected by the existence of "half words" in the automatic extraction results, and to realize the computer to automatically and efficiently extract Chinese electronic documents. words in the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically extracting terms from Chinese electronic document
  • Method for automatically extracting terms from Chinese electronic document
  • Method for automatically extracting terms from Chinese electronic document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0033] like figure 1 As shown, the present embodiment provides a method for automatically extracting words from a Chinese electronic document, which is characterized by comprising the following steps: Step S01: processing the electronic document into a group of word strings consisting of atomic words of a specific part of speech; step S02: Count the frequencies of these atomic word strings and their substrings, and use atomic word strings that appear more than N times as candidate words, where N is a parameter that can be set, and preferably the N can be 2; step S03: Delete the words that appear only as substrings in the candidate word set, and obtain a set of words appearing in the document, so as to achieve the purpose of automatically extracting the words in the Chinese electronic document.

[0034] Specifically, see figure 2 , the automatic word pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for automatically extracting terms from a Chinese electronic document. The method is characterized by comprising the following steps of: step S01: processing the electronic document into a group of word strings consisting of atomic words with a special property; step S02: counting the frequency of the atomic word strings and substrings, adopting the atomic word string with the appearance times being more than N times as a candidate term, wherein N is a settable parameter; and step S03: deleting the term which only appears as a substring in a candidate term set to obtain a term set appearing in the document, and realizing the purpose for automatically extracting the terms in the Chinese electronic document. The method has the effects and benefits that the real problem and difficulty that the performance for automatically extracting the term is not high and the automation degree is limited can be solved. The high-efficient automatic method for extracting the terms is a foundation for automatically processing a text and can powerfully guarantee the information search, text summarization, content management and the like. The good term extracting method can promote the automation degree and the performance of the work.

Description

technical field [0001] The invention belongs to the field of natural language processing, and relates to a method for automatically extracting word sets from Chinese electronic documents. Background technique [0002] In recent years, with the rapid development of scientific research, economics, Internet and other fields, the number of electronic documents has increased rapidly. How to process these massive electronic documents quickly and effectively has become one of the key tasks in the fields of information retrieval, knowledge management, and Web services. . As a result, automatic electronic document processing technologies such as text retrieval, classification, and automatic summarization have become research hotspots in related fields. Among these techniques, automatic extraction of all words in an electronic document (“prompt” for short) is a fundamental work. The word prompting method of the present invention is aimed at the automatic processing of Chinese electr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 于娟
Owner FUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products