Financial field term recognizing method based on information entropy and term credibility

A technology of credibility and information entropy, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as overfitting, time-consuming and labor-consuming, and higher model complexity

Inactive Publication Date: 2016-11-09
DALIAN UNIV OF TECH
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This type of model transforms the problem of term recognition into a sequence labeling problem. It usually requires manual addition of features to fit the training data, and the select

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Financial field term recognizing method based on information entropy and term credibility
  • Financial field term recognizing method based on information entropy and term credibility
  • Financial field term recognizing method based on information entropy and term credibility

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The specific implementation manners of the present invention will be further described below in conjunction with the accompanying drawings and technical solutions.

[0042]1. Select the CRF model to carry out sequence labeling on the financial corpus. The 1600 articles of Sina Financial News in 2014-2016 selected by the present invention have more than 2 million words in total, and 67152 financial terms (including repetitions) are extracted. These corpus are divided into 4 :1 for training and testing, using the five-fold crossover experiment method, using the word segmentation tool Nihao for word segmentation and part-of-speech tagging. The word vector training tool is word2vec, which uses the Skip-Gram model. The training corpus is the financial news and financial newspaper texts of major portal websites from 2014 to 2016, with a total of more than 8 million words. Let the vector dimension be 100 and the word window size be 5.

[0043] 2. By analyzing the labeling res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a financial field term recognizing method based on information entropy and term credibility. Only simple characteristics are selected, and financial terms are recognized through a CRF model; candidate terms belonging to the specific error type are screened out by setting a threshold according to an information entropy formula based on marginal probability in a recognition result, and the candidate terms are processed in a more targeted mode; words are converted into word vectors with rich semantic information when the candidate terms are filtered, and a large number of financial field terms can be obtained through filtering since a similarity calculation method and a traditional mutual information method complement each other. The too complicated characteristic selection process of an existing robot learning model can be effectively avoided, post-processing part is flexible and not limited to specific linguistic data, the recall rate can be easily increased, the term structure integrity can be improved, and the method can be used as a universal term recognizing method.

Description

technical field [0001] The invention relates to the fields of natural language processing, text mining, information processing, named entity recognition, etc., focuses on the problem of term recognition, and proposes a term recognition method in the financial field based on information entropy and term credibility. The method effectively improves the recall rate and the integrity of the term structure, and can be used as a general term recognition method. Background technique [0002] With the in-depth development of the market economy, the position of finance in social life has become increasingly prominent, and finance has become the most important strategic resource for economic operation and a powerful booster for regional economic development. Compared with other fields, terms in the financial field are replaced faster. Rapid identification of financial terms has high application value for tasks such as text mining, information extraction, and public opinion analysis in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205
Inventor 黄德根梁晨
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products