Code storage method, data storage structure of texts and method for compressed storage of texts and statistics output

A technology for storing data and text, applied in the field of data structure, can solve the problems of memory usage, long time for counting words and matching words, waste of time for counting words, etc.

Active Publication Date: 2016-09-14
DALIAN MARITIME UNIVERSITY
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These word segmentation and statistical methods are simple and easy to implement, but the subsequent semantic processing and sentence statistics are very troublesome, such as:
[0028] (1) It takes too long to count words and match words;
[0029] (2) Since the poems are stored in a non-compact form, the process of splicing words and matching words is very complicated, that is, it takes up a lot of memory and wastes a lot of time for counting words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code storage method, data storage structure of texts and method for compressed storage of texts and statistics output
  • Code storage method, data storage structure of texts and method for compressed storage of texts and statistics output
  • Code storage method, data storage structure of texts and method for compressed storage of texts and statistics output

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the drawings in the embodiments of the present invention:

[0081] Example, assuming that the current text content consists of three sentences: "HE IS A DOCTOR.SHE IS A DOCTOR, TOO. I AM A DIRECTOR." (case conversion has been completed)

[0082] The value corresponding to the user code of the first sentence "HE IS A DOCTOR" is calculated as follows:

[0083] (1) A character "HE" can be represented by a 32-ary value:

[0084] For example: the user code of the character "HE" is:

[0085] (H) user coding*32+(E) user coding==8*32+5=261;

[0086] (2) The second word "IS" can be represented by a 32-ary value:

[0087] For example: the user code of the character "IS" is: 8×32? (1) user code=9*32+19=307 of user code*32+(S);

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a code storage method, a data storage structure of texts, and a code storage method, data storage structure of texts with the core based on users' codes in a duotricemary notation set by capital letters and necessary punctuation marks. Users' codes in a duotricemary notation corresponding to every three characters in each word are converted according to binary system and stored in a 16-bit binary storage unit. In the data storage structure of texts, texts are stored in a hash tree. The hash tree comprises multiple table nodes orderly corresponding to users' codes in the duotricemary notation; each table node is a head node of a first-order link table and words with same initials or same characters can be stored in the first-order link table; as word nodes of subsequent nodes in the first-order link table, word nodes comprise fields for recording word lengths and the number of repeated words in texts; each word node is a first node of a second-order link table; as storage nodes of subsequent nodes in the second-order link table, each storage node is used for storing binary storage units for current words and character groups for repeated words divided by the rule in the code storage method applied in the claim 2.

Description

technical field [0001] The invention relates to a data structure for storing text characters capable of providing retrieval speed, and a text storage method and a text retrieval method based on the data structure. Mainly related to patent classification number G06 Calculation; Calculation; Counting G06F Electrical digital data processing G06F17 / 00 Digital computing equipment or data processing equipment or data processing methods especially suitable for specific functions G06F17 / 30 Information retrieval; and its database structure. Background technique [0002] The traditional character or text storage method is as follows: [0003] Core: Read a segment of the current text as the input string AS='HE IS A DOCTOR.'. Reading process: [0004] When a capital is encountered, it is considered to be the beginning of the current sentence; when a space is encountered, it is considered to be the end of the current word; when a period (or "?", "!") is encountered, it is considered th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22
CPCG06F40/126G06F40/14G06F40/146
Inventor 陈燕
Owner DALIAN MARITIME UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products