Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A multilingual document classification method, device and storage medium

A document classification and multilingual technology, applied in the field of information processing, can solve problems such as language incompatibility and limited coverage, and achieve the effects of reducing word segmentation processing, improving classification accuracy, and improving processing efficiency

Active Publication Date: 2021-10-29
中科大数据研究院
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, building a knowledge base requires classifying documents, and then constructing the classified documents as a knowledge base. The documents in the network include Chinese documents and foreign language documents. Since Chinese documents and foreign language documents are documents in different languages, they cannot communicate with each other in language. , it is difficult to classify multilingual documents at the same time, so usually the academic knowledge base established by companies and enterprises is a single language knowledge base, and the scope of such knowledge base is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multilingual document classification method, device and storage medium
  • A multilingual document classification method, device and storage medium
  • A multilingual document classification method, device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0087] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

[0088] The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application provides a multilingual document classification method, device and storage medium. The multilingual document classification method includes the steps of: document reception, the document includes Chinese documents and foreign language documents; representative word extraction, according to the document content extraction There is at least one relative word for each document, and the representative words are clustered to obtain the representative words for the document; the document category table is received, and the document category table is provided with multiple bases Classification; document classification, the representative word is converted into a representative word vector, the basic category is converted into a class word vector, the correlation between the representative word vector and the class word vector is calculated, and the correlation is performed according to the correlation Classify the literature. Respectively extract representative words from the Chinese documents and foreign language documents, calculate the correlation between the representative word vector and the class word vector, classify the documents according to the correlation, and classify the Chinese documents and foreign language documents at the same time.

Description

technical field [0001] The present application relates to the technical field of information processing, and in particular to a multilingual document classification method, device and storage medium. Background technique [0002] With the rapid development of science and technology, a large number of scientific documents such as papers and patents continue to emerge. For some companies or enterprises, it is necessary to search in multiple network databases, so document search on the Internet can no longer meet the needs of these users. As a result, in the face of massive literature, more and more companies, enterprises, and groups have begun to build their own academic knowledge bases. [0003] However, building a knowledge base requires classifying documents, and then constructing the classified documents as a knowledge base. The documents in the network include Chinese documents and foreign language documents. Since Chinese documents and foreign language documents are doc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/205G06F40/258G06F40/284G06F40/289
CPCG06F16/355G06F40/205G06F40/258G06F40/284G06F40/289
Inventor 贾士杨冯凯王元卓
Owner 中科大数据研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products