Method and device for word segmentation

A word and normalization technology, applied in the Internet field, can solve problems such as difficult and complex image backgrounds, and achieve the effects of improving accuracy, speed, and user experience

Active Publication Date: 2017-09-05
ALIBABA GRP HLDG LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Although the OCR technology is relatively mature, due to the complex image background, the font, size and color of the text in the image vary greatly, coupled with the change of shooting angle, and the influence of different lighting, it can be detected quickly and accurately. The text area in the image is more difficult

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for word segmentation
  • Method and device for word segmentation
  • Method and device for word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The terminology used in this application is for the purpose of describing specific embodiments only, not to limit the application. As used in this application and the claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and / or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

[0039] It should be understood that although the terms first, second, third, etc. may be used in this application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Dependin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application provides a method and a device for word segmentation. The method comprises the following steps: acquiring a sample image, wherein the sample image includes word interval markers or non-word interval markers; processing the sample image with a convolution neural network to get a first characteristic vector corresponding to the sample image and a word interval probability value and/or a non-word interval probability value corresponding to the first characteristic vector; acquiring a to-be-tested image, and processing the to-be-tested image with the convolution neural network to get a second characteristic vector corresponding to the to-be-tested image and a word interval probability value and/or a non-word interval probability value corresponding to the second characteristic vector; and carrying out word segmentation on the to-be-tested image according to the word interval probability value or the non-word interval probability value obtained currently. Through the technical scheme of the application, word segmentation is carried out accurately. Therefore, the accuracy of word segmentation is improved, the speed of word segmentation is improved, and the user experience is improved.

Description

technical field [0001] The present application relates to the technical field of the Internet, in particular to a word segmentation method and device. Background technique [0002] OCR (Optical Character Recognition, Optical Character Recognition) refers to the process of using electronic equipment to check characters printed on paper, and determine their shape by detecting dark and light patterns, and then use character recognition methods to translate the shape into computer text; that is, For printed characters, the text in the paper document is converted into a black-and-white dot matrix image file optically, and the text in the image is converted into a text format by the recognition software for further editing and processing by the word processing software technology. Therefore, based on OCR technology, text images can be converted into digital form. [0003] Although the OCR technology is relatively mature, due to the complex image background, the font, size and co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/34G06N3/02G06V30/10G06V30/148
CPCG06N3/02G06V30/158G06V30/10G06N3/08G06V30/153G06V30/148G06N3/045G06T7/10G06N5/046
Inventor 周文猛程孟力毛旭东施兴褚崴
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products