Method and system for processing text based on DNA sequences

A DNA sequence and text processing technology, applied in the field of DNA sequence-based text processing methods and systems, can solve the problems of single functional tasks, inability to communicate with each other, and low execution efficiency.

Inactive Publication Date: 2011-09-28
INST OF RADIATION MEDICINE ACAD OF MILITARY MEDICAL SCI OF THE PLA
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The present invention also provides a text processing system based on DNA sequence, which solves the problem that the existing text processing system completes functional tasks by assigning DNA sequence codes to the characters in the text, and then using the DNA sequence processing method to process the text. The problem of singleness, low execution efficiency, and incompatibility between each other has realized a comprehensive and efficient analysis of the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for processing text based on DNA sequences
  • Method and system for processing text based on DNA sequences
  • Method and system for processing text based on DNA sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] The technical solutions of the embodiments of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Example 1: Using figure 1 The character distribution module 11 in the shown text processing system 10 converts the characters in the text into DNA sequences according to the method of the present invention

[0045] Select a text containing about 7000 different characters, and contain a section of characters "Chinese Text Mining" in the text. According to the method of the present invention, it can be known that to represent all the characters in the text, the position of the quaternary number is required. The number is 7 digits; now take this section of characters in the text——"Chinese text mining" as an example, describe the process of converting the characters of this embodiment into DNA sequences in detail:

[0046] First use the decimal number allocation module 101 to distribute the decimal numbe...

Embodiment 2

[0051] Example 2: Using figure 2 The shown text processing system 20 carries out spectral description to two or more texts according to the method of the present invention

[0052] Selected 20 texts recently published in the journal "Progress in Biochemistry and Biophysics" shown in Table 1 (referred to as 20 texts of PIBB) and shown in Table 2 with the keyword - "text mining" from CNKI ( The 20 texts selected by searching in the CNKI) text database (20 texts for short) are used as clustering objects:

[0053] Table 1

[0054]

[0055]

[0056] Table 2

[0057]

[0058] According to the method of embodiment 1, the total number of different characters in the 40 texts is first counted, which is 3243, and then the characters in the 20 texts of the PIBB are divided into characters according to the characters in the 20 texts of the PIBB using the decimal number distribution module 201 in the character distribution module 21. The decimal numbers are assigned in the order ...

Embodiment 3

[0081] Example 3: Using figure 2 The shown text processing system 20 performs sequence similarity comparison on two texts according to the method of the present invention

[0082] Randomly select two texts from the 40 texts in Example 2 above for sequence similarity comparison. The two selected texts are: "Application Research of Text Mining in Multicultural Communication Platform" (text_01) and "Protein Interaction "Research Progress in Text Mining of Function" (text_02), now take any two texts in text_01 and text_02 as an example to describe the sequence similarity comparison process in this embodiment in detail, and name a text from text_01 as "query text (Query.txt)", a piece of text from text_02 is named "Target Text (Subject.txt)"

[0083] The query text (Query.txt) is:

[0084] "Text mining based on concept lattice, text mining is to discover potential concepts and the relationship between concepts from unstructured texts. As an effective technology for discovering p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and system for processing a text based on DNA sequences. The method comprises the following steps of: allocating DNA sequence codes to characters of over two texts; and performing similarity analysis on the over two texts allocated with DNA sequence codes by using a DNA sequence processing method, wherein the characters are one kind or multiple kinds of digitals, letters, words or symbols, and the letters or the words are the letters or the words in one or multiple languages. The allocation of the DNA sequence codes to the characters of the over two texts is realized by the following steps of: allocating decimal numbers to the characters of the over two texts; converting the decimal numbers into quaternary numbers; enabling 0, 1, 2, 3 in the quaternary numbers to respectively correspond to one kind of four kinds of deoxyribonucleic acid; and converting the quaternary numbers into the DNA sequence codes. The invention also provides the system for realizing the method. The method and the system provided by the invention do not depend on the establishment of the existing database and the extraction of key words, have no restriction on the numbers of characters and character combinations, and can realize the efficient and comprehensive analysis for text information.

Description

technical field [0001] The present invention relates to an information processing method and system, in particular to a DNA sequence-based text processing method and system. Background technique [0002] Spectrum description, similarity comparison and cluster analysis of text are routine analysis methods in text processing. At present, there are many kinds of text processing systems, but most of them only complete one of the tasks, such as the academic paper detection system of China National Knowledge Infrastructure (CNKI) and the ROST anti-plagiarism system developed by Associate Professor Shen Yang of Wuhan University and his team. In order to complete the similarity comparison of texts. [0003] The spectral characterization of text refers to analyzing one or more texts from the level of characters (single character or multi-character combination), by fixing all possible characters or character combinations on the abscissa, and then counting their presence in the text o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27
Inventor 张成岗周扬屈武斌
Owner INST OF RADIATION MEDICINE ACAD OF MILITARY MEDICAL SCI OF THE PLA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products