Automatic standardized filing method based on text semantic mining

A semantic mining and text technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problems of different descriptions and inapplicability of content, saving time and manpower, saving time and manpower, Easy to query and use

Inactive Publication Date: 2015-04-29
MERIT DATA CO LTD
View PDF4 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0039] For texts that have certain requirements for key information, most of the content to be included is fixed, but the description of the content is different, so the traditional abstract extraction method for ordinary text is not applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic standardized filing method based on text semantic mining
  • Automatic standardized filing method based on text semantic mining
  • Automatic standardized filing method based on text semantic mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0061] Example of a standardized filing method for courts to store case information

[0062] With the informatization construction of courts at all levels, the case information stored by courts has increased dramatically, but there are still a considerable number of legal documents in the form of free text, which require information extraction technology to extract structured information from them and store them in the information system. Easy to query and use.

[0063] The filing method and steps of the court storing case information in this embodiment are as follows: figure 1 shown. Mainly: Applying web crawler technology, after web page analysis and preprocessing, crawling files from the Internet, and then performing information extraction, keyword extraction and automatic summary generation on the crawled files and local files (word, txt format), and Stored in the information system for easy query and use.

[0064] 1. Crawl documents from a specific website

[0065] Ac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic standardized filing method based on text semantic mining. The automatic standardized filing method is characterized by comprising the steps: crawling files from a website, and carrying out information extraction, key word extraction and automatic abstract generation on the crawled file and a local file by utilizing text semantics, and finally storing into an informatization system. For the information extraction, a rule set is established by adopting a knowledge engineering method, information points are automatically extracted from the file to form structural data; for the key word extraction, a key word is extracted according to a position and semantics of a word in a text to generate a key word index; for the automatic abstract generation, a content contained by the abstract is firstly set, corresponding information is extracted from the text, the similarity of sentences is calculated, and the texts including the key information in the original file are extracted. By adopting the automatic standardized filing method, business personnel do not need to read a great amount of files, time and labor are saved, and convenience in inquiry and application can be realized.

Description

technical field [0001] The invention relates to an engineering application of text semantic analysis technology. Specifically, it applies text semantic analysis related technologies (information extraction, keyword extraction, automatic summarization) to a type of text (with certain content and format requirements) to form a method for document standardization and automatic filing. Background technique [0002] There are many types of files, such as personnel files, financial files, technical files, contract files, case files, and so on. Archives and archives management is an indispensable and important task for enterprises, institutions and government departments. [0003] Many enterprises and government departments keep a large number of text files, especially some text files with format and content requirements (such as legal documents of courts, criminal case information of public security departments, contracts stored by enterprises, etc.), these files are based on Fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/335G06F16/313G06F16/345
Inventor 程宏亮梁栋卢耀宗强劲张兵刘华兴张小平
Owner MERIT DATA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products