Genetic simulated annealing method for solving new words in Chinese segmentation

A technology of simulated annealing and Chinese word segmentation, applied in gene models, special data processing applications, instruments, etc., can solve the problems of automatic word segmentation results scattered string accuracy, poor flexibility, low and other problems

Inactive Publication Date: 2016-07-06
YUNNAN UNIV
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a genetic simulated annealing method for solving new words in Chinese word segmentation, aiming at solving the problem that the algorithm flexibility of word segmentation in the current word segmentation system is poor and has strong domain
[0005] Solve the problem that the current new words are constantly appearing, and the automatic word segmentation results are prone to "scattered strings" and low accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genetic simulated annealing method for solving new words in Chinese segmentation
  • Genetic simulated annealing method for solving new words in Chinese segmentation
  • Genetic simulated annealing method for solving new words in Chinese segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0125] Example 1: Genetic Simulated Annealing Algorithm for Solving New Word Discovery

[0126] 1.1 Genetic simulated annealing algorithm strategy design

[0127] Genetic simulated annealing algorithm is widely used in various fields. From the perspective of materialist dialectics, things have both commonality and characteristics, and genetic simulated annealing algorithm is no exception. It also has both commonality and characteristics in practical applications. Therefore, in different Different fields have different requirements for the use of algorithms and strategies, and the characteristics of the strategy should be determined according to the characteristics of the current research field. Applying genetic simulated annealing algorithm to new word discovery must also have unique optimization characteristics. In order to meet the efficiency and accuracy of new word discovery, the following strategies and methods are adopted in the experiment to realize the algorithm.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genetic simulated annealing method for solving new words in Chinese segmentation. The method comprises the steps of firstly acquiring and intelligently searching Internet information by adopting a crawler program to complete data preparation; then performing Chinese segmentation on the acquired data by adopting a dedicated lexicon, namely discovering a public opinion; proposing a genetic simulated annealing algorithm using the characteristics of parallel operation and global convergence of a genetic algorithm in combination with local convergence of a simulated annealing algorithm, and performing relevant design and application on a public opinion monitoring system. By adopting the method, the automatic segmentation problem in the field of Chinese information processing is solved; by combining the solution strategies of the genetic algorithm and the simulated annealing algorithm for new words continuously appearing with the development of society and Internet, the segmentation accuracy is improved, the problems of disperse strings and segmentation errors in the automatic segmentation result are effectively solved, and the method plays an important role in observing, researching and analyzing dynamic changes of language phenomena, normalizing languages and characters and improving the overall effect of automatic Chinese segmentation.

Description

technical field [0001] The invention belongs to the technical field of Chinese search and retrieval, and in particular relates to a genetic simulated annealing method for solving new words in Chinese word segmentation. Background technique [0002] With the improvement of my country's comprehensive national strength, Chinese occupies an increasingly important position on the world stage. At the same time, with the development of Internet technology, there are more and more Chinese information, and Chinese search and retrieval technology has also been greatly developed. How to find the information and materials you need in the vast Chinese information world has become an increasingly important topic. In today's era of information explosion, it has become unrealistic to rely solely on manual labor to deal with the rapidly growing mass of information. Therefore, Chinese automatic word segmentation technology has emerged. Chinese automatic word segmentation is the basis of natur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/12
CPCG06F40/242G06F40/258G06N3/12
Inventor 康雁
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products