String segmentation method and device

A string and character technology, applied in the field of string segmentation method and device, can solve the problem of low word segmentation accuracy and achieve the effect of improving accuracy

Active Publication Date: 2017-05-24
ALIBABA GRP HLDG LTD
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The embodiment of the present application provides a character string segmentation method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • String segmentation method and device
  • String segmentation method and device
  • String segmentation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] Embodiment 1 of the present application provides a character string segmentation method, which is applicable to the segmentation of numeric character strings (may be referred to as numeric character strings for short) mainly composed of numeric characters and English characters. This will not be described in detail in the application examples. Specifically, such as figure 1 As shown, it is a schematic flow chart of the string segmentation method described in Embodiment 1 of the present application, and the string segmentation method may include the following steps:

[0026] Step 101: Determine the character string to be segmented;

[0027] Step 102: Determine the category to which the character string to be segmented belongs, and select a corresponding language model for character string segmentation according to the category to which the character string to be segmented belongs; wherein, the language model for character string segmentation is based on The word freque...

Embodiment 2

[0079] Based on the same inventive concept as the first embodiment of the present application, the second embodiment of the present application provides a character string segmentation device. For the specific implementation of the character string segmentation device, please refer to the relevant description in the above method embodiment 1, and repeat will not repeat here, such as figure 2 As shown, the string segmentation device can mainly include:

[0080] The model building module 21 can be used to establish a character string segmentation language model in advance according to the word frequency of the word segmentation of each digital character string in the digital character string corpus;

[0081] Character string determination module 22, can be used for determining the fraction character string to be cut;

[0082] The model selection module 23 can be used to determine the category to which the character string to be segmented belongs, and select the corresponding s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a string segmentation method and device. According to the technical scheme, a character string segmentation language module is established according to the word frequency of word segments in each numeric and English character string in a corpus of linguistic data and English character strings; for any linguistic data and English character string to be segmented, the segmentation result of the numeric and English character string to be segmented is obtained by being based on the character string segmentation language model corresponding to a category of the numeric and English character string to be segmented, using a dynamic programming algorithm, and determining an optimal segmentation path of the numeric and English character string to be segmented. Therefore, the method and device solve the problems that when a dictionary and matching mode is used to conduct numeric and English character string segmentation, it is impossible to segment unlisted character strings, and when post-processing rules are used to conduct segmentation correction, coverage is limited. The accuracy of numeric and English character string segmentation is improved.

Description

technical field [0001] The present application relates to the technical field of Internet search, in particular to a string segmentation method and device. Background technique [0002] Chinese word segmentation technology refers to the technology of dividing a sequence of Chinese characters into individual words according to certain specifications. It is a very important basic technology of search engines, and the quality of the results will directly affect the search performance of search engines. . [0003] Specifically, since the technology of dictionary + matching (such as forward maximum matching, reverse maximum matching, or bidirectional maximum matching method, etc.) has high accuracy and good performance, it has gradually become a word segmentation commonly used by search engines. technology, it can better solve the problem of word segmentation for pure Chinese character strings. [0004] However, due to the article search engine applicable to the article search ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/289
Inventor 肖荣
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products