Entity classification model training and predicting method based on digital humanity

A classification model and prediction method technology, applied in text database clustering/classification, electrical digital data processing, special data processing applications, etc., can solve problems such as inapplicability, lack of deep learning method modeling classification, high cost, etc., to achieve Ease of extracting work effects

Pending Publication Date: 2021-08-10
同方知网数字出版技术股份有限公司 +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] General text classification models are mainly divided into four types: one is to classify data based on the method of compiling rules, but the method of manually labeling entity categories on a large scale requires a high price
The second is a dictionary-based method, such as directly using a dictionary to match entities, but this requires a very comprehensive and powerful thesaurus, which is obviously not applicable in the classification of ancient Chinese entities
The third is the classification method based on traditional machine learning, such as SVM, etc., but it is necessary to manually define the rules of some features in the text. Judging from family surnames, when encountering irregular names (such as posthumous names of ancient emperors), place names, historical events, etc., and there is no more information to extract, this method will fail. The fourth is the classification based on the deep learning model method, currently widely used in text classification projects in the industry, but for entities, especially ancient Chinese entities, there is no special deep learning method for modeling and classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity classification model training and predicting method based on digital humanity
  • Entity classification model training and predicting method based on digital humanity
  • Entity classification model training and predicting method based on digital humanity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0022] This method classifies people, official positions, institutions, dynasties, events, places and other entities of ancient humanities information, including:

[0023] Irregular sample adjustment model, which is used to extract condensed text from samples with too long text information, such as converting "Wise and mighty Qin Shihuang" to "Qin Shihuang", and make preliminary adjustments to the target entity through this model to improve the accuracy of the classification model; The entity type classification model, because they are all short texts, considering the strong short-term memory ability of RNN, uses the Bert+BIRNN algorithm to obtain the classification result of the target entity.

[0024] Such as figure 1 As shown, it is the digital...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity classification model training and predicting method based on digital humanity, which comprises the following steps of: retrieving at least six types of entities from a part of reference books, extracting texts with part of speech nr from texts with special description prefixes and suffixes in character entities by utilizing an HMM (Hidden Markov Model) model, and taking the texts as cleaned samples; dividing the retrieved entities into a positive type and a negative type, inputting the entities into BertTokenizer to be converted into token vectors, and expanding the vectors into 16-dimensional vectors according to text length features of the entities; training the token text vector to obtain a BIRNN classification model structure; recognizing and calibrating the result of the BIRNN classification model, and generating dictionaries of different categories; and exporting the result into a database, and auditing, duplicate removal and information complementation are carried out.

Description

technical field [0001] The present invention relates to the fields of natural language processing and computer information processing technology, in particular to a digital humanities-based entity classification model training and prediction method. Background technique [0002] The existing digital humanities is an innovation in the field of big data. In recent years, it has emerged in various fields of humanities and has attracted the attention of many scholars at home and abroad. Due to the update and iteration of digital technology, personal computers are widely used, and even liberal arts scholars can use digital technology to carry out research work. In order to further improve the research efficiency of scholars, there is an urgent need for a visual method to display the humanities information of the past dynasties. Since the knowledge map has the function of structuring human knowledge, the information of the Humanities Encyclopedia is usually expressed in this way. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/36G06F40/242G06F40/295
CPCG06F16/35G06F16/367G06F40/242G06F40/295
Inventor 马宇柔滕康吕强印东敏段飞虎顾君张宏伟
Owner 同方知网数字出版技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products