Unlock instant, AI-driven research and patent intelligence for your innovation.

Word counting method and device

A statistical method and word counting technology, which is applied in the computer field, can solve the problem of not being able to count the word counts in multiple languages, and achieve the effect of saving time and accurate word counting

Active Publication Date: 2016-05-25
GLOBAL TONE COMM TECH
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] What the present invention aims to solve is the technical problem that the prior art cannot carry out sub-item counting of the number of words in a document containing multiple languages ​​or multiple languages ​​in a paragraph of text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word counting method and device
  • Word counting method and device
  • Word counting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0012] Embodiment one, such as figure 1 As shown, a word counting method, the technical solution includes: Step 1, read the text content, and read the text into the memory in batches according to a certain length; the certain length can be a fixed number of bytes, a sentence, or It can be a piece of text or an article. Can be set according to needs. Step 2, after reading a batch of text in the memory, scan the text in the memory, identify and count the number of punctuation marks between the text, and then remove the punctuation marks to form a new string that does not contain punctuation marks; Step 3, Read the words or characters in the string from which the punctuation marks have been filtered out, identify and count the language types word by word; step 4, add up the counted punctuation marks and the number of words or characters in each language.

[0013] The present invention provides a method for counting words, which counts the number of words in a section of text or...

Embodiment 2

[0014] Embodiment two, such as figure 2 As shown, on the basis of Embodiment 1, more optimally, the step 3, identifying the corresponding language and counting the specific steps are: sequentially identify whether it is Chinese, if it is, then count, if not, then identify whether it is English, If it is, count, if not, identify whether it is French, if it is, count, if not, identify whether it is other languages, until the language corresponding to each word or a word is identified.

[0015] More optimally, set an encoding library and language model for each language, traverse the encoding library to initially identify the language category of a word or a character, and then completely identify a character according to the language model and specific rules of each language word, word or character.

[0016] More preferably, the step 3, identifying the corresponding language and counting the specific steps is as follows: the number of words is calculated according to the actua...

Embodiment 3

[0037] Embodiment three, such as image 3 As shown, a word counting device, the technical solution includes: a reading module, used to read text content, and read the text into the memory in batches according to a certain length; a punctuation mark recognition module, used to read a After batches of text, scan the text in the memory, identify and count the number of punctuation marks between the text, and then remove the punctuation marks to form a new string that does not contain punctuation marks; the language recognition module is used to read and filter out punctuation marks Words or characters in the character string of the symbol, the corresponding language is identified and counted word by word; the sub-item statistics module is used to add the punctuation marks counted successively and the number of words or characters in each language.

[0038] More preferably, the language identification module identifies the corresponding language and counts the specific steps as fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a word counting method and device, and relates to the technical field of computers. The technical problem that in the prior art, itemized word counting cannot be performed on a file containing multiple languages or multiple languages in a text is solved. According to the technical scheme, the word counting method comprises the steps that 1, text content is read, the text is read into a memory in batches according to a certain length; 2, after a batch of the text in the memory is read every time, the text in the memory is scanned, the number of punctuations among the text is recognized and counted, then the punctuations are removed, and a new character string without containing the punctuations is formed; 3, words or characters in the character string of which the punctuations are removed are read, and the languages are recognized word by word and counted; 4, the punctuation numbers and the language text or character numbers which are counted successively are added up separately.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a word counting method and device. Background technique [0002] The existing technology is relatively mature for word counting technology in the same language, but the difficulty of word counting at present is that there are two or more languages ​​in a piece of text or document, such as mixed Chinese and English, French, Japanese, Korean and other multilingual documents. It is not possible to count the number of words in each language by language. Contents of the invention [0003] The present invention aims to solve the technical problem that the prior art cannot carry out sub-item counting of the number of words in a file containing multiple languages ​​or multiple languages ​​in a piece of text. [0004] In order to solve the above problems, the present invention provides a method for counting words, including: Step 1, read the text content, and read the text in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/216
Inventor 王建华程国艮
Owner GLOBAL TONE COMM TECH