Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Chinese word segmentation method and computing device based on professional vocabulary

A technology of professional vocabulary and computing equipment, applied in the field of information processing, can solve problems such as inaccurate word segmentation results, and achieve high recognition rate, guaranteed accuracy, and high word segmentation accuracy

Active Publication Date: 2021-09-07
北京同盛科创科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the dictionaries used in the existing Chinese word segmentation technology are relatively general, and there is no dictionary specifically for professional vocabulary, which may lead to inaccurate word segmentation results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese word segmentation method and computing device based on professional vocabulary
  • A Chinese word segmentation method and computing device based on professional vocabulary
  • A Chinese word segmentation method and computing device based on professional vocabulary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0022] figure 1 A structural block diagram of a computing device 100 according to an embodiment of the present invention is shown.

[0023] In a basic configuration 102 , computing device 100 typically includes system memory 106 and one or more processors 104 . A memory bus 108 may be used for communication between the processor 104 and the system memory 106 .

[0024] Depending on the desired configuration, processor 104 may be ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese word segmentation method based on professional vocabulary, which is suitable for execution in computing equipment, including: constructing a dictionary with a predetermined structure by reading entries one by one, wherein the entries with the same first word in the dictionary Arrange in ascending order according to the Unicode code, and establish multiple first arrays for storing entries with the same first character, and establish at least one second array in each first array for storing entry content and identification bits, identification bits It is used to identify whether the entry belongs to a professional vocabulary; use the binary search method to search one or more character strings in the sentence to be segmented in the dictionary, and obtain multiple to-be-determined word segments after the initial segmentation; according to the corresponding The identification bit sets the word segmentation weight for the word to be determined; and constructs a segmentation path according to a plurality of word segments to be determined and their weights, and selects the shortest path as the word segmentation result. The invention also discloses a computing device for executing the method.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a Chinese word segmentation method and computing equipment based on professional vocabulary. Background technique [0002] Chinese information processing technology has been widely used in computer networks, database technology, software engineering and other computer fields, and Chinese automatic word segmentation is an important basic work for Chinese information processing, and word segmentation is involved in many Chinese information processing projects Problems, such as machine translation, automatic summarization, automatic classification, full-text retrieval of Chinese literature databases, search engines, etc. Since the Chinese text is consecutive and there is no space between words, the first problem encountered in Chinese text processing is the problem of word segmentation. The correct segmentation of words is a necessary condition for Chinese text proces...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F16/903
CPCG06F16/90344G06F40/289
Inventor 吕洪波
Owner 北京同盛科创科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products