Data processing method and device for text recognition

A data processing device and text recognition technology, which are used in electrical digital data processing, special data processing applications, network data retrieval, etc., can solve the problems of low accuracy of filtering background noise words, and achieve the effect of improving the accuracy.

Active Publication Date: 2015-03-25
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The main purpose of the present invention is to provide a data processing method and device for text recognition, to solve the problem of low accuracy in filtering background noise words in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device for text recognition
  • Data processing method and device for text recognition
  • Data processing method and device for text recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be described in detail below with the subject drawings and examples.

[0028] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0029] Some terms involved in the present invention are explain...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and device for text recognition. The method includes: acquiring target words in a corpus; acquiring reference words in the corpus; converting the target words into word vectors to obtain target work vectors; converting the reference words into word vectors to obtain reference word vectors; calculating the similarity of the target word vectors and the reference word vectors; comparing the similarity with a preset threshold; determining the target words as background noise words if the similarity is not larger than the preset threshold; determining that the target words are not the background noise words if the similarity is larger than the preset threshold. By the method, the problem that background noise word filter in the prior art is low in accuracy is solved, and background noise word filter accuracy is further increased.

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular to a data processing method and device for text recognition. Background technique [0002] In order to save storage space and improve search efficiency, search engines will automatically ignore certain words or words when indexing pages or processing search requests. These words or words are called stop words (Stop Words). Usually, stop words can be roughly divided into the following two categories: One type of stop words refers to words that are widely used and can be seen everywhere on the Internet, such as the word "Web" that appears on almost every website. Search engines cannot guarantee that they can give truly relevant search results, and it is difficult to help narrow the search scope, and at the same time reduce the efficiency of search; another type of stop words refers to modal particles, adverbs, prepositions, conjunctions, etc., usually these words the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F40/30
Inventor 何鑫
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products