Methods for Fragmentation by Character Attributes of Documents

A technology of character attributes and documents, applied in the field of natural language processing, to achieve the effect of promoting division of labor, improving accuracy, and improving translation productivity

Active Publication Date: 2017-09-05
IOL WUHAN INFORMATION TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the purpose of the present invention is to propose a method for fragmentation according to the character attributes of documents to solve the problem of how to assign the most suitable translation tasks to the most suitable translators and cloud translation that can be processed in large-scale parallel distribution Requirements for standardization and measurability of input multilingual information on the platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods for Fragmentation by Character Attributes of Documents
  • Methods for Fragmentation by Character Attributes of Documents
  • Methods for Fragmentation by Character Attributes of Documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049]The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the described specific embodiments are only used to explain the present invention, not to limit the present invention.

[0050] Such as figure 1 Shown is a flow chart of the method for fragmenting according to the character attributes of documents in the present invention. The concrete steps of this method are as follows:

[0051] S11. Determine all vocabulary and character attributes of all sentences in each document after word segmentation processing;

[0052] S12. Perform matching in the established association relationship between various character attributes and level identifiers according to the character attributes;

[0053] S13. Assign the corresponding document according to the matched level identifier;

[0054] S14. Merge documents with the same level identifier.

[0055] Based on the above method, a pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for fragmentation according to character attributes of documents, which includes: determining the character attributes of all vocabulary and all sentences of each document after word segmentation; match in the association relationship; assign the corresponding document according to the level identification after matching; merge the documents with the same level identification. The present invention provides a method for fragmenting according to character attributes of documents, and distributes translation fragments of different difficulty levels to appropriate translators, which can most effectively promote division of labor and greatly improve unit translation productivity.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a method for fragmenting documents according to character attributes. Background technique [0002] In today's globalized and internationalized world, the world's political, economic, and cultural exchanges are becoming more and more frequent, and the exchanges of people from various countries are becoming more and more intensive, which makes the demand for translation more and more; at the same time, with the rise of the Internet , the amount of information in various languages ​​has grown explosively, and the demand for conversion between information in various languages ​​has also shown a trend of rapid and explosive growth. [0003] At present, the most commonly used fragmentation method is to fragment by a fixed number of words or by natural paragraphs. This method is fast and easy, and can be completed without spending a lot of computing reso...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/28
Inventor 江潮
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products