Non-parallel corpus-based entropy model English author entity automatic identification method

A corpus and non-parallel technology, applied in natural language data processing, instruments, electrical digital data processing, etc., can solve problems such as no mature solutions

Pending Publication Date: 2022-03-22
中国医学科学院医学信息研究所
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, there is still no mature solution for the align...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Non-parallel corpus-based entropy model English author entity automatic identification method
  • Non-parallel corpus-based entropy model English author entity automatic identification method
  • Non-parallel corpus-based entropy model English author entity automatic identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] as attached figure 1 As shown, first according to step 1: constructing a Chinese-English non-parallel corpus;

[0034] The present invention collects the titles, authors, institutions, keywords, and abstracts of papers in the Pubmed database that China is not China; collects Chinese and English titles, author names, institutions, keywords, and abstracts of papers in the field of Chinese medicine and health from Wanfang and CNKI Chinese literature databases;

[0035] According to step 2, based on the documents in the non-parallel corpus constructed in step 1, a dictionary of names of persons and institutions is generated;

[0036] According to step 3: constructing the transliteration feature functions F1 and F2 of English-Chinese literature authors;

[0037] Chinese author name CN=CNx+CNm, wherein CNx is composed of 1-2 Chinese characters, CNm is composed of 1-3 Chinese characters, and each Chinese character is converted into pinyin, expressed as {CNx 11 , CNx 12 ,......

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic English author entity identification method based on an entropy model of a non-parallel corpus, and belongs to the field of artificial intelligence. The method comprises the following steps: firstly, constructing a Chinese and English non-parallel corpus, and generating a name and institution dictionary based on the constructed medical Chinese and English literature abstract non-parallel corpus; then, constructing an English literature author transliteration feature function, an institution feature function and a thesis theme similar feature function; and performing maximum entropy model training to obtain a Chinese name recommendation result of the English author. According to the method, the Chinese correspondence problem of the English author is conveniently solved, and the method has wide application prospects in the aspects of automatic translation, scholar portrait and the like.

Description

technical field [0001] The invention relates to an entropy model-based automatic identification method for English and Chinese author entities, belonging to the field of artificial intelligence. Background technique [0002] The alignment of Chinese and English named entities, especially the alignment of human names has always been a very important topic in natural language processing, and it plays an important role in the development of machine translation and cross-language information retrieval. [0003] Many scholars have conducted research on alignment using different methods. One is to use the alignment model to find the alignment relationship between the two on the basis of bilingual entity recognition. The other method is to identify named entities only in one language, and then use the alignment model that integrates multiple features to find the alignment relationship between the two in another language. Find their corresponding translations. [0004] The alignme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/284
CPCG06F40/295G06F40/284
Inventor 高东平张冉魏晓瑶秦奕池慧
Owner 中国医学科学院医学信息研究所
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products