Unlock instant, AI-driven research and patent intelligence for your innovation.

Word segmentation method and system for language text

A language text, word segmentation method technology, applied in the language text word segmentation method and system field, can solve the problems of resource waste system maintenance, difficulty and so on

Active Publication Date: 2021-02-09
HUAWEI TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This approach will lead to waste of resources and difficulties in system maintenance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and system for language text
  • Word segmentation method and system for language text
  • Word segmentation method and system for language text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] For ease of understanding, the first word segmentation method is referred to as a simple word segmentation method, and the module corresponding to the simple word segmentation method is called a simple word segmentation module. The simple word segmentation method can use the word segmentation algorithm with fast word segmentation speed and high word segmentation consistency, including but not limited to the shortest path word segmentation algorithm; the second word segmentation method is called the complex word segmentation method in the following, and the module corresponding to the complex word segmentation method is called the complex word segmentation module . Complex word segmentation methods can use word segmentation algorithms with high accuracy and high algorithm complexity, including but not limited to word segmentation algorithms based on word tagging methods.

[0041] figure 1 It is a structural example diagram of the word segmentation system of the embodime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present invention provide a word segmentation method and system for a language text. The method includes: acquiring the first language text to be processed and the reliability threshold; using the first word segmentation method to segment the first language text to obtain the first Word boundary set; according to the credibility threshold, the first word boundary set is divided into a credible second word boundary set and an untrustworthy third word boundary set; according to the third word boundary set, select from the first language text The second language text, the second language text includes the words corresponding to each word boundary in the third word boundary set; the second word segmentation method is used to segment the second language text to obtain the fourth word boundary set; the second word The boundary set and the fourth word boundary set are determined as word segmentation results of the first language text. By adjusting the size of the credibility threshold, the word segmentation accuracy required by the first language text can be flexibly adjusted, so as to adapt to various application scenarios that have different requirements for word segmentation accuracy.

Description

technical field [0001] Embodiments of the present invention relate to the field of natural language processing, and more specifically, relate to a method and system for word segmentation of language text. Background technique [0002] Word segmentation is one of the fundamental problems in natural language processing. All languages ​​without word boundary markers (such as: Chinese, Japanese, Arabic, etc.) face word segmentation problems. Word segmentation systems are widely used in information retrieval, machine translation, question answering systems and other fields. [0003] Different applications have different requirements for the output of the word segmentation system. For example, information retrieval systems have high requirements for word segmentation speed and consistency. However, information retrieval systems have relatively low requirements for the correctness of word segmentation, such as lower requirements for the recognition rate of unregistered words (wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06K9/62
CPCG06F40/284G06F18/285G06F40/53
Inventor 陈晓李航
Owner HUAWEI TECH CO LTD