Check patentability & draft patents in minutes with Patsnap Eureka AI!

Information processing system

An information processing system and character technology, applied in the field of information processing systems, can solve the problems that words cannot be identified, it is impossible to identify the language of unknown terms, and it is difficult to determine which language the character code belongs to.

Inactive Publication Date: 2005-03-16
PANASONIC CORP
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In Unexamined Japanese Patent Publication Hei 10-171810, there is a problem that each client performs authentication
[0008] In the prior art method explained above (for example, JP 8-16617), there is a problem in that the word list document must be prepared independently from the index used in ordinary document retrieval
[0009] In the prior art method explained above (for example, JP 5-282360), there is a problem that if the same character code appears in text, it is difficult to determine which language this character code belongs to
In addition, there is a problem that it is impossible to discriminate the language of unknown terms since dictionaries for discriminating languages ​​must be prepared in advance
In addition, there is a problem that known characters, that is, "recognition" in the case of Japanese, etc., can be recognized as characters because such characters are included in the dictionary as dictionary data, however, are not included in the dictionary Scripts that appear in ordinary sentences cannot be identified because such scripts are not included in dictionaries according to the traditional configuration
In addition, there is a problem that since rules for cutting out characters must be prepared for each discriminating language, the characters are not processed unless there are rules that apply to a specific language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information processing system
  • Information processing system
  • Information processing system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] FIG. 1 shows the structure of a character code language discriminating system according to a first method embodiment of the present invention.

[0055]In Fig. 1, reference numeral 101 represents a specific character discriminator, is used for discriminating whether the character in the input text (character string) corresponds to the specific character of the target language of detection; Reference numeral 102 represents a specific character counter, is used for by all The number of occurrences of the specific characters identified by the specific character discriminator is counted; reference numeral 103 represents an input character counter, which is used to count the occurrences of all characters of the input text; The number of occurrences of the specific character counted by the character counter 102 and the number of characters of the input text counted by the input character counter 103 calculate the occurrence rate of the specific character; reference numeral 105 ...

Embodiment 2

[0061] FIG. 3 shows the structure of a character code language discriminating system according to a second method embodiment of the present invention.

[0062] In Fig. 3, reference numeral 301 represents a specific character discriminator, is used for discriminating whether the character in the input text corresponds to the specific character of the target language of detection; The number of consecutive occurrences of the non-specific characters identified by the discriminator 301 is counted; the reference numeral 303 represents an adder, which is used to calculate the sum of the text length output from the text length counter 302; the reference numeral 304 represents a specific character counter, which is used for specific The number of occurrences of the specific character discriminated by the character discriminator 301 is counted; reference numeral 305 denotes an average text length calculator for calculating the total number of text lengths calculated by the adder 303 div...

Embodiment 3

[0069] FIG. 5 shows the structure of a character code language discriminating system according to a third method embodiment of the present invention.

[0070] In Fig. 5, reference numeral 501 represents a specific-range character discriminator, is used for discriminating whether the character in the input text corresponds to the character in the specific range of the target language of detection; Reference numeral 502 represents a specific-range character counter, is used for The number of occurrences of the specific range characters identified by the specific range character discriminator is counted; reference numeral 503 represents an input character counter, which is used to count the occurrences of all character codes in the input text; reference numeral 504 represents an occurrence rate calculator , for calculating the occurrence rate of characters in a specific range according to the number of occurrences of characters in a specific range counted by the character counter ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An information processing system which realizes a language discriminating method of discriminating a language of an input text by utilizing descriptive features of the language in the country, i.e., by detecting occurring rates of particular characters of the text as a discriminated object, while observing particular characters which are frequently occurred in the language. There is provided a particular character counter 102 for detecting occurring rates of particular characters from occurring numbers of times of the particular characters of an input text, a standard occurring rate memory 105 for storing standard occurring rates of the particular characters of the detected target language, and a comparator 106 for comparing the particular character occurring rate of the input text with the standard occurring rates of the particular characters of the detected target language.

Description

technical field [0001] The present invention relates to an information processing system used in language discrimination for discriminating the language in documents and performing keyword retrieval in full-text retrieval, thereby observing all texts (characters) contained in an input document as an object String) while retrieving / registering an input text. Background technique [0002] The methods carried out in the prior art in the field of information retrieval processing to identify the language of the text (character string) described in the file include: based on the identification of words, the method of identifying the language by providing a word dictionary of the language, it will be used in the future Disclosed in Examined Japanese Patent Publication Hei 8-137886; method of identifying a language based on specific bits (two bits here) of a language's character code, which is disclosed in Unexamined Japanese Patent Publication Hei 8-160929 and a method of discrimi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/28G06F17/30G06F40/00
CPCG06F40/20G06F40/40
Inventor 片山修小山隆正
Owner PANASONIC CORP
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More