Word segmentation method and terminal device based on machine learning

A word segmentation method and machine learning technology, applied in the computer field, can solve the problem of low accuracy of word segmentation

Pending Publication Date: 2019-03-19
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the embodiment of the present invention provides a word segmentation method and termin

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and terminal device based on machine learning
  • Word segmentation method and terminal device based on machine learning
  • Word segmentation method and terminal device based on machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

[0032] In order to illustrate the technical solutions of the present invention, specific examples are used below to illustrate.

[0033] An embodiment of the present invention provides a word segmentation method based on machine learning. combine figure 1 , the method includes:

[0034] S101. Acquire text data that has been manually segmented.

[0035] Optionally, the code that is encapsulated...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word segmentation method and a terminal device based on machine learning. The method includes obtaining text data after manual segmentation; judging a character type for any character in the text data; according to the type of the character, the type of a first preset number of characters adjacent to and before the character in the text data, the type of a second preset number of characters adjacent to and after the character in the text data, obtaining a feature vector of the character, and obtaining a training set; construct word segmentation model and train word segmentation model through training set; According to the trained word segmentation model, the text to be processed is segmented. The invention constructs the feature vector according to the type relationship between the character and the context character, and has better adaptability for word segmentation of different texts, thereby improving the precision of word segmentation.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a machine learning-based word segmentation method and a terminal device. Background technique [0002] In natural language processing or computer language, neologisms are words that have never appeared before, or words that are not included in the dictionary. With the continuous development of Internet technology, a variety of new words have emerged in various industries, especially the emergence of web2.0 applications, which allow users to create web content by themselves, resulting in the emergence of a large number of new words. [0003] In the field of Chinese information processing, since Chinese is not like English and other Western languages, there are fixed separators between words, so Chinese word segmentation is an important basic technology. The emergence of new words greatly affects the accuracy of automatic word segmentation tools, making word segmenta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/289
Inventor 吴壮伟
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products