A Domain Adaptive Sentence Alignment System Based on Self-Bootstrapping
A sentence pair, self-guided technology, applied in the field of text processing of natural language processing, can solve the problems of low quality of sentence alignment, no domain specificity, time and energy consumption, saving resources, convenient and concise operation, improving The effect of alignment quality
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0029] like figure 1 As shown, the architecture of this system includes four parts, and the related implementation of each part is as follows:
[0030] 1. Web page processing module
[0031] This part takes webpage corpus as the main processing object. Webpage corpus refers to the parallel or comparable HTML files that are directly crawled from the web. Through the analysis of the format and related features of specific web pages, regular expressions are used to extract the corresponding text, including Chinese text and English text.
[0032] 2. English processing module
[0033] Combining the features of English punctuation marks, it handles sentence operations, tokenization and rooting processes, etc.
[0034] Lemmatization is the process of separating English words from the punctuation that follows them. Usually, these punctuation marks following words will affect the recognition of English words. Since English texts often have special punctuation marks (such as he’s s...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com