Language identification method, device, equipment and storage medium

A language recognition and language technology, applied in the computer field, can solve problems such as increasing the burden on developers, reducing the efficiency of language recognition, and not being able to realize language recognition well, so as to save development time and cost, be widely applicable, and improve overall efficiency. Effect

Active Publication Date: 2020-04-28
RUN TECH CO LTD BEIJING
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, if the text to be recognized contains less commonly used languages, this type of language detection framework cannot realize language recognition well, thereby reducing the efficiency of language recognition and increasing the burden on developers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language identification method, device, equipment and storage medium
  • Language identification method, device, equipment and storage medium
  • Language identification method, device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Figure 1A It is a flow chart of the language recognition method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of identifying different languages ​​in the process of multilingual text processing. The method can be executed by the language identification device provided by the embodiment of the present invention, and the device can be implemented by hardware and / or software. Generally can be integrated in computer equipment. like Figure 1A As shown, it specifically includes the following steps:

[0033] S11. Obtain the text to be recognized input by the user, and determine the language range of the text to be recognized according to the Unicode range of the characters in the text to be recognized.

[0034] Optionally, the to-be-recognized text input by the user may be acquired through the interface of the language recognition tool, and the to-be-recognized text may be in Unicode format. Among them, Unicode is an indu...

Embodiment 2

[0053] Figure 2A It is a flow chart of the language recognition method provided by Embodiment 2 of the present invention. The technical solution of this embodiment is further refined on the basis of the above technical solution. Specifically, in this embodiment, before comparing the words in the word set with the characteristic corpus of the language included in the language range, the specified The generation process of the characteristic corpus of the language. Correspondingly, such as Figure 2A As shown, it specifically includes the following steps:

[0054] S21. Obtain the text to be recognized input by the user, and determine the language range of the text to be recognized according to the Unicode range of the characters in the text to be recognized.

[0055] S22. Determine whether the language range includes only one language.

[0056] S23. If not, judge whether each character in the text to be recognized has a unique corresponding language.

[0057] S24. If not, ...

Embodiment 3

[0069] image 3 Schematic diagram of the structure of the language recognition device provided by Embodiment 3 of the present invention. The device can be implemented by hardware and / or software, and can be integrated into a computer device to perform the language recognition provided by any embodiment of the present invention. method. like image 3 As shown, the device includes:

[0070] The language range determination module 31 is used to obtain the text to be recognized input by the user, and determine the language range of the text to be recognized according to the Unicode range of characters in the text to be recognized;

[0071] Language judging module 32, for judging whether the language range only includes one language;

[0072] Character judging module 33, for if not, then judge whether each character in the text to be recognized has a unique corresponding language;

[0073] The word set obtaining module 34 is used for if not, then the text to be recognized is se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a language recognition method, a device, equipment and a storage medium. The method comprises the steps of obtaining a to-be-recognized text input by a user,and determining a language range of the to-be-recognized text according to a uniform code range of characters in the to-be-recognized text; judging whether the language range only contains one language; if not, judging whether each character in the to-be-recognized text has a unique corresponding language; if not, segmenting the to-be-recognized text to obtain a word set corresponding to the to-be-recognized text; and comparing words in the word set with the feature corpora of the language contained in the language range, and determining language components of the to-be-identified text according to a comparison result. According to the method provided by the embodiment of the invention, wide application to various languages is realized, so that the overall efficiency of the language recognition process is improved, and the development time and cost of developers are saved.

Description

technical field [0001] The embodiments of the present invention relate to the field of computer technology, and in particular to a language recognition method, device, equipment and storage medium. Background technique [0002] In the process of cluster analysis of public information, a very important problem is the processing of multilingual texts, and the basic problem of multilingual text processing is the problem of language identification. After the language identification is completed, data of different languages ​​can be Carry out various targeted processing and analysis in the follow-up. [0003] Existing language recognition methods usually use some open source language detection frameworks for recognition, but the existing language detection frameworks generally only support more commonly used languages. For the recognition of unsupported minor languages, developers need to modify the code to train The language detection framework supports new languages. Therefor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/205G06F40/279G06F40/126
CPCY02D10/00
Inventor 马中元谢永恒万月亮
Owner RUN TECH CO LTD BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products