Word forming determination model generation method, and new word discovery method and device

A new word discovery and judgment model technology, applied in the field of computer networks, can solve problems affecting the accuracy of new word recognition and achieve the effect of improving accuracy

Inactive Publication Date: 2017-12-26
CAINIAO SMART LOGISTICS HLDG LTD
View PDF1 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This seems to fall into a strange circle: the accuracy of word segmentation itself depends on the integrity of the existing thesaurus. If the word is not included in the thesaurus, how can we trust the resu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word forming determination model generation method, and new word discovery method and device
  • Word forming determination model generation method, and new word discovery method and device
  • Word forming determination model generation method, and new word discovery method and device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0064] In order to make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other arbitrarily if there is no conflict.

[0065] In a typical configuration of this application, the computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.

[0066] The memory may include non-permanent memory in computer readable media, random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.

[0067] Computer-readable media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word forming determination model generation method, and a new word discovery method and device. The new word discovery method includes: performing pre-processing on a text so as to extract a plurality of text blocks; acquiring the word frequency, the cohesion degree and the coupling degree of each text block as word forming feature information of each text block; using a pre-generated word forming determination model and the word forming feature information to classify each text block so as to recognize new words. Automatic discovery of the new words can be achieved, the word forming feature information of each block includes the word frequency, the cohesion degree, and the coupling degree; and the accuracy of new word recognition can be improved.

Description

technical field [0001] The present application relates to computer network technology, in particular to a method for generating a word formation judgment model, a method and a device for discovering new words. Background technique [0002] When dealing with Chinese text, you will encounter difficulties that are not common in other languages, such as Chinese word segmentation. The Chinese text is a character sequence composed of some Chinese characters put together. There is no obvious boundary between Chinese words and words. By adding word boundary marks in the display, the formed word strings completely reflect the original meaning of the sentence. This is The work done by word segmentation. So, how does the computer know whether the word segmentation result of "combined into molecules" is "combined / synthesized / molecule", or "combined / formed / molecule", or "combined / component / sub"? This is the problem of ambiguity in Chinese word segmentation, and many word segmentation m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/284
Inventor 王国印郑恒
Owner CAINIAO SMART LOGISTICS HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products