Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text information classifying method based on segmented encoding genetic algorithm

A text information, genetic algorithm technology, applied in text database clustering/classification, genetic law, unstructured text data retrieval and other directions, can solve the problems of high computational complexity, difficult to apply, and difficult to deal with large-scale text set classification.

Inactive Publication Date: 2016-07-20
NANJING UNIV OF SCI & TECH
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These two types of algorithms have high computational complexity and are difficult to deal with the problem of large-scale text set classification, and information classification takes too long and is difficult to apply to practical applications, so it has no engineering application value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information classifying method based on segmented encoding genetic algorithm
  • Text information classifying method based on segmented encoding genetic algorithm
  • Text information classifying method based on segmented encoding genetic algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] In this embodiment, the patient's description of the disease text is used as an example to classify the disease, and the hospital patient's description text of the disease is the research object, such as "I have a headache", "My leg bone is broken", etc., given the initial population size (can be is a fixed value such as 10000, indicating the amount of all text information in the training sample).

[0035] In this embodiment, the text information is divided into t types, that is, t represents the number of disease types that can be divided, and is respectively recorded as C 1 ,C 2 ,...,C t , where t≥2, where C i Text-like information preset via k i A feature representation, 1≤i≤t, such as "I have a headache" and "my head was hit by a book" form a category, and the features of this category are expressed as {"I" "very" "headache" "of" "Been" "book" "hit" "had"} a total of 8 features; then there are a total of 8 features in the text information features, set the fea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the intelligent manufacturing information analysis technology, in particular to a text information classifying method based on a segmented encoding genetic algorithm.The method mainly includes the following steps that a corresponding text information matrix, namely a population size is generated through the text pretreatment technology, a dimensionality reduction characteristic number is given, an initial population is generated randomly, a chromosome with the maximum optimization objective function value is marked, and the optimization function value is recorded; segmented encoding (each segment corresponds to one type) is adopted for the chromosome, a new population is produced by enabling the initial population to intersect and mutate, and an optimization function value of the optimized population is calculated.Optimized text information classification can be generated, based on segmented encoding / intersecting reasonable classification, the problem that working efficiency is low due to a large data size of the genetic algorithm can be solved, distributed processing and parallel operation can be achieved due to segmented encoding / intersecting, and the efficiency of following data processing can be greatly improved.

Description

technical field [0001] The invention relates to a method for classifying text information based on segmental coding genetic algorithm, which belongs to the technical field of intelligent information analysis. Background technique [0002] According to incomplete statistics, the total amount of scientific and technological information provided by the global Internet exceeds 20TB, and is growing at an annual rate of more than 5%. While the Internet brings massive text information, it also creates some problems. On the one hand, users have a large demand for text information, and on the other hand, the large-scale accumulation of text information requires users to spend a lot of time to obtain the information they want, and sometimes Demanders cannot find the required information in a timely and accurate manner. [0003] How to classify, retrieve and manage such a large amount of text information has always been a research hotspot for researchers. For example, taking hospital ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/22G06F17/15G06N3/12
CPCG06F16/35G06F17/15G06F40/146G06N3/126
Inventor 童一飞裴凤雀周开俊江松卓兴成李东波何非
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products