Unlock instant, AI-driven research and patent intelligence for your innovation.

A Chinese and English paper data classification and query method

A query method and data classification technology, which is applied in text database query, text database clustering/classification, unstructured text data retrieval, etc., can solve the problem of word segmentation not achieving the effect, Chinese and English integration is difficult to accurately identify, cross-language query It is difficult to achieve the expected effect and other problems, to achieve the effect of improving retrieval accuracy and improving accuracy

Active Publication Date: 2021-11-19
中科大数据研究院
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the defects and problems of the current data classification that word segmentation cannot achieve the effect, Chinese and English integration is difficult to accurately identify, and cross-language query is difficult to achieve the expected effect, the present invention provides a method for classifying and querying Chinese and English paper data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese and English paper data classification and query method
  • A Chinese and English paper data classification and query method
  • A Chinese and English paper data classification and query method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Embodiment 1: Aiming at the defects and problems of the current data classification that the word segmentation cannot achieve the effect, the fusion of Chinese and English is difficult to accurately identify, and the cross-language query is difficult to achieve the expected effect, the present invention provides a method based on the construction of how to unify the labels of papers in Chinese and English. Chinese and English paper data classification and query method to improve the accuracy of cross-language query. The method includes the following contents.

[0031] Step 1. First, according to the Chinese and English keywords included in the Chinese papers when they were published, traverse the original data of the Chinese papers and extract the Chinese and English keywords in all Chinese papers.

[0032] Then, exclude the abnormal data in Chinese and English keywords, mainly exclude the lack of Chinese or English keyword data, and aggregate the results of Chinese tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data classification, and in particular relates to a Chinese and English paper data classification and query method. This method extracts the Chinese and English keywords of Chinese papers, processes the data to form a Chinese-English library and a Chinese thesaurus, uses the model to obtain an English tag library, and fuses the English tag library and the Chinese-English library to form a Chinese-English tag library; At the same time, the Chinese and English word segmentation lists are obtained by performing word segmentation on the original data of Chinese and English papers. By calculating the correlation and dividing the fields of the papers, the research field labels of Chinese and English papers can be effectively unified, and the retrieval accuracy can be improved. Accurately identify Chinese and English papers of the same type and improve the accuracy of cross-language queries.

Description

technical field [0001] The invention belongs to the technical field of data classification, and in particular relates to a Chinese and English paper data classification and query method. Background technique [0002] The knowledge base is a collection of knowledge that stores, organizes and processes knowledge and provides knowledge services. With the help of the knowledge base, it is possible to better understand and discover the research status and development trends in a certain field. At the same time, the establishment of knowledge bases in various industries has gradually become an ongoing Knowledge serves as the foundation of management. Since English is an international common language and there are countless excellent papers in English, it is imperative to add Chinese and English papers at the same time when building the knowledge base. [0003] There are two important steps in the construction of the knowledge base: one is to classify the papers, that is, which fi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36G06F16/33G06F16/38G06F40/284G06F40/58
CPCG06F16/3344G06F16/35G06F16/36G06F16/381G06F40/284G06F40/58
Inventor 康锐文冯凯王元卓
Owner 中科大数据研究院