Plain text oriented enterprise entity classification method

A classification method and plain text technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of dependence on artificial features and external data, lack of semantics of entity types, and unguaranteed versatility and robustness, etc. problem, to achieve the effect of universality and robustness guarantee, recall rate improvement, and recall rate improvement

Active Publication Date: 2017-09-22
NANJING UNIV
View PDF6 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current mainstream named entity recognition technology only divides entities into person names, place names, organization names, etc., which makes the types of entitie

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Plain text oriented enterprise entity classification method
  • Plain text oriented enterprise entity classification method
  • Plain text oriented enterprise entity classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to better understand the technical content of the present invention, a specific embodiment of the corporate entity classification method for court documents is given and described as follows with the accompanying drawings.

[0031] Such as figure 2 As shown, the present invention constructs a training sample set before implementation. The process of constructing the training sample set in the embodiment is as follows:

[0032] Step 1-0, establish the initial state of the training set.

[0033] Step 1-1. Use a web crawler tool to collect court documents from the Internet as an original corpus.

[0034] Step 1-2. For the collected document data, use the open source word segmentation and part-of-speech tagging software HanLP to segment the document text into sentences, word segmentation and part-of-speech tagging. Of course, general open source word segmentation software can be used, such as word segmentation of the Chinese Academy of Sciences, etc. Compared w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a plain text oriented enterprise entity classification method. The plain text oriented enterprise entity classification method comprises the steps of S1, carrying out type labeling for the enterprise entities in collected plain text data and regarding the enterprise entities being subjected to type labeling as a training set of an enterprise entity identification module; carrying out type labeling for the enterprise entities in the collected plain text data according to business nature and regarding the enterprise entities being subjected to the type labeling as a training sample set of an enterprise entity classification module; and S2, carrying out enterprise entity identification model training through a condition random field model to obtain an enterprise entity identification model; S3, carrying out semantic vectorization construction for the text data of an original training set; S4, training by regarding the data of the training set after being subjected to type labeling and semantic vectorization as training parameters to obtain an enterprise entity classification model; and S5, classifying the enterprise entity in a to-be-predicted text by utilizing the enterprise entity classification model. According to the plain text oriented enterprise entity classification method, as the obtained semantic vector serves as the feature of the entity, dependence on artificial features and external data is reduced, and the universality and robustness are guaranteed.

Description

technical field [0001] The invention belongs to the technical field of named entity recognition and fine-grained entity classification, and in particular relates to a plain text-oriented enterprise entity classification method. Background technique [0002] In recent years, with the upsurge of "Internet finance", more and more corporate decision makers urgently need to use more advanced information processing methods to extract and analyze massive Internet data in order to make better decisions. Among these massive data, plain text data such as court documents and news and public opinion have become the primary source for enterprises to obtain high-value information. [0003] Named entity recognition technology is the basis for enterprises to carry out entity semantic analysis and entity relationship extraction. The current mainstream named entity recognition technology only divides entities into person names, place names, organization names, etc., which makes the types of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/353
Inventor 张雷陈嘉伟谢璐遥王崇骏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products