Method for fragmenting according to character attributes of documents

A technology of character attributes and documents, applied in the field of natural language processing

Active Publication Date: 2014-04-30
IOL WUHAN INFORMATION TECH CO LTD
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the purpose of the present invention is to propose a method for fragmentation according to the character attributes of documents to solve the problem of how to assign the most suitable translation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for fragmenting according to character attributes of documents
  • Method for fragmenting according to character attributes of documents
  • Method for fragmenting according to character attributes of documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the described specific embodiments are only used to explain the present invention, not to limit the present invention.

[0051] Such as figure 1 Shown is a flow chart of the method for fragmenting according to the character attributes of documents in the present invention. The concrete steps of this method are as follows:

[0052] S11. Determine all vocabulary and character attributes of all sentences in each document after word segmentation processing;

[0053] S12. Perform matching in the established association relationship between various character attributes and level identifiers according to the character attributes;

[0054] S13. Assign the corresponding document according to the matched level identifier;

[0055] S14. Merge documents with the same level identifier.

[0056] Based on the above method, a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for fragmenting according to character attributes of documents. The method comprises the steps of determining character attributes of all the vocabularies and the sentences of documents after word segmentation; matching association relationships between a plurality of built character properties and level identifications according to the character attributes; attaching the matched level identifications to corresponding documents; and combining the document with the same level identifications. Through the method for fragmenting according to the character attributes of the documents, translation fragments with different difficulty levels are assigned to appropriate translators, so that the assignment can be most effectively facilitated and the translation capability of an enterprise can be greatly improved.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a method for fragmenting documents according to character attributes. Background technique [0002] In today's globalized and internationalized world, the world's political, economic, and cultural exchanges are becoming more and more frequent, and the exchanges of people from various countries are becoming more and more intensive, which makes the demand for translation more and more; at the same time, with the rise of the Internet , the amount of information in various languages ​​has grown explosively, and the demand for conversion between information in various languages ​​has also shown a trend of rapid and explosive growth. [0003] At present, the most commonly used fragmentation method is to fragment by a fixed number of words or by natural paragraphs. This method is fast and easy, and can be completed without spending a lot of computing reso...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 江潮
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products