A comprehensive utilization method of text features

A text and feature engineering technology, applied in the field of artificial intelligence, can solve problems such as low efficiency of manual classification, and achieve the effect of optimizing hospital workflow, improving accuracy, and improving the effect of training

Pending Publication Date: 2019-04-26
JINAN INSPUR HIGH TECH TECH DEV CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the face of rapidly accumulating data, although manual classification can guarantee a high accuracy rate, compared with the method of machine learning, the efficiency of manual classification is relatively low. It is imperative to classify and summarize data in the medical industry

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A comprehensive utilization method of text features
  • A comprehensive utilization method of text features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below in conjunction with specific examples.

[0042] A comprehensive application method of text features. In this method, the corpus is processed with a completely consistent text preprocessing method, and then the TFIDF feature engineering model and the Word2vec feature engineering model are respectively trained to obtain the same corpus represented by two different vector matrices. But these two different vector matrices each have different concerns, such as lexical salience or contextual relevance;

[0043] Among them, TFIDF is used to calculate word frequency, including the original word frequency algorithm and inverse document frequency value; Word2vec is used to solve the relevance of words in context based on TFIDF.

[0044] In one embodiment of the present invention, the text preprocessing method includes word segmentation and removal of stop words.

[0045] Then simply concatenate the obtained two vector matrice...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a comprehensive application method of text features. The method belongs to the technical field of artificial intelligence, and comprises the following steps: processing a corpus by using a completely consistent text preprocessing method, and then respectively training a TFIDF feature engineering model and a Word2vec feature engineering model to obtain the same corpus represented by two different vector matrixes; And then simply splicing the two obtained vector matrixes into a vector matrix with a higher dimension, and training a classification task model by using the vector matrix. According to the method, the respective advantages and characteristics of the TFIDF and the word2vec are combined for complementation, the relevance between the significance and context of one word in a document can be described more comprehensively and accurately, and the accuracy of a subsequent training classification model is improved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a comprehensive application method of text features. Background technique [0002] In the practice of smart medical care, a large amount of data is generated all the time, such as the patient's health status, prescriptions, medical orders, course records, consultation records, etc. With the current vigorous development of smart medical care, after collecting and storing these data, classifying them has far-reaching significance. It not only helps to manage data better for future analysis and calling, but also discovers data through classification. distribution and internal laws. In the face of rapidly accumulating data, although manual classification can guarantee a high accuracy rate, compared with the method of machine learning, the efficiency of manual classification is relatively low. It is imperative to classify and summarize data in the medical industry. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F16/35G06N3/04
CPCG06F40/216G06F40/289G06N3/045
Inventor 段强李锐高明于治楼
Owner JINAN INSPUR HIGH TECH TECH DEV CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products