Non-parallel corpus-based entropy model English author entity automatic identification method
A corpus and non-parallel technology, applied in natural language data processing, instruments, electrical digital data processing, etc., can solve problems such as no mature solutions
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0033] as attached figure 1 As shown, first according to step 1: constructing a Chinese-English non-parallel corpus;
[0034] The present invention collects the titles, authors, institutions, keywords, and abstracts of papers in the Pubmed database that China is not China; collects Chinese and English titles, author names, institutions, keywords, and abstracts of papers in the field of Chinese medicine and health from Wanfang and CNKI Chinese literature databases;
[0035] According to step 2, based on the documents in the non-parallel corpus constructed in step 1, a dictionary of names of persons and institutions is generated;
[0036] According to step 3: constructing the transliteration feature functions F1 and F2 of English-Chinese literature authors;
[0037] Chinese author name CN=CNx+CNm, wherein CNx is composed of 1-2 Chinese characters, CNm is composed of 1-3 Chinese characters, and each Chinese character is converted into pinyin, expressed as {CNx 11 , CNx 12 ,......
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com