Character string value domain segmenting method and device

A string value and segmentation device technology, applied in the field of data warehouse integration, can solve the problems of non-ASCII characters, throwing exceptions, and the accuracy cannot be restored, and achieves the solution of data transmission performance, easy troubleshooting, and improved readability. Effect

Active Publication Date: 2017-01-04
ALIBABA GRP HLDG LTD
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] But this approach has some drawbacks
[0005] First, convert the string to a very small number (such as BigDecimal) representation, but in the conversion operation of the string, an exception may be thrown, resulting in segmentation failure
[0006] Secondly, in order to avoid exceptions that may be thrown during the conversion operation of the string, it is often necessary to perform approximate processing on the converted data (such as rounding, etc.) when selecting an adaptive algorithm, so that the precision is damaged and the string cannot be restored accurately. In addition, when some extremely small numbers are restored to strings, the maximum length of restored characters must be specified. This limitation also leads to the inability to restore precision
[0007] Furthermore, in the process of converting an alphabetic string to a very small number, 65536 is used as the base, and this range basically accommodates all European and American characters and most Asian characters, and then restores the ASCII string after segmentation During the process, non-ASCII characters may also appear
[0008] In addition, for different types of strings to be split, the method of converting to extremely small numbers is not universal, and it is not applicable to any strings, such as integer types / time types in RDBMS, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Character string value domain segmenting method and device
  • Character string value domain segmenting method and device
  • Character string value domain segmenting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] Such as Figure 4 As shown, the string value domain segmentation in the embodiment of this application includes the following steps:

[0040] Step 1001: Extract the character string with the largest ASCII code value as the first character string and the character string with the smallest ASCII code value as the second character string in the primary key of the data to be extracted.

[0041] The extracted data is the data to be processed in the data synchronization process. For example, there are multiple rows of data stored in the data table. When the data synchronization process is performed on this data table, each row of data needs to be extracted row by row, offline extraction and processing A certain row of processed data is the data to be extracted mentioned in step 1001.

[0042] The data to be extracted often has a primary key. The primary key is also called the primary key. The primary key is generally one or more fields in the data table. The field of the data row in...

Embodiment 2

[0183] Such as Picture 11 As shown, another method for segmenting a string value range according to an embodiment of this application includes:

[0184] Step 2001, extracting the character string with the largest ASCII code value as the first character string and the character string with the smallest ASCII code value as the second character string among the primary keywords of the data to be extracted;

[0185] In step 2002, according to the preset base number, the first and second character strings are respectively combined with the position numbers of the corresponding single characters in the first and second character strings to convert the first and second character strings into the first large integer without distortion. And the second largest integer, where the position number is the sequence of a single character in the corresponding string;

[0186] Step 2003: Calculate the range difference according to the first large integer and the second large integer, and determine th...

Embodiment 3

[0191] Such as Picture 12 As shown, a string range segmentation device according to an embodiment of the application includes: a string extraction module 10, a string conversion module 20, a segmentation step acquisition module 30, a segmentation node acquisition module 40, and a string Restore the segmentation module 50, where:

[0192] The character string extraction module 10 is configured to extract the character string with the largest ASCII code value as the first character string and the character string with the smallest ASCII code value as the second character string in the primary key of the data to be extracted;

[0193] The character string conversion module 20 is configured to convert the first and second character strings into complete numbers according to the preset hexadecimal base number and the position numbers of the corresponding individual characters in the first and second character strings. The undistorted first large integer and the second large integer, wh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a character string value domain segmenting method and a character string value domain segmenting device which are applied to the field of data warehouse integration. The method comprises the following steps: extracting character strings with the maximal ASCII (American Standard Code for Information Interchange) code value and the minimal ASCII code value in a data primary key to be extracted; converting the character strings into big integers by weight expansion sum according to the ASCII code values and a preset scale cardinal number to form a range to be segmented; solving a range difference; performing equal segmenting on the range to be segmented according to the range difference and a preset segmenting number to obtain a segmenting step length so as to obtain big integers corresponding to segmenting nodes; reducing the big integers corresponding to the segmenting nodes into segmenting node character strings by adopting an Euclidean algorithm; generating a plurality of data extracting statements according to the segmenting node character strings to realize multithreaded accelerated extracting. The method is also optimized, namely, a reduced scale cardinal number is adopted in the processes of the weight expansion sum and the Euclidean algorithm. According to the method and the device, the concurrency and the high efficiency in a data transmission process are greatly improved.

Description

Technical field [0001] The invention is applied to the field of data warehouse integration, and relates to a string value domain segmentation method and device, and specifically relates to an application in the field of data warehouse integration for offline data extraction, equalizing numbers and strings according to the value domain range Split to achieve multi-threaded accelerated extraction method. Background technique [0002] In the era of big data, data needs to be continuously flowed and exchanged in order to maximize its value. In the construction of enterprise data warehouses and business intelligence, it is usually stored in various RDBMS (relational databases, such as Mysql, Oracle, PostgreSQL, etc.) The online data is synchronously extracted to offline storage and computing platforms for unified processing (such as Hadoop in the open source community, ODPS and other systems within Alibaba Group), such as figure 1 As shown, online data will also be migrated to other o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/278G06F16/283G06F9/5066G06F16/258G06F2209/5017
Inventor 何健超陈守元邓小勇
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products