Method and system for processing unstructured data

A technology of unstructured data and processing methods, which is applied in the fields of unstructured text data retrieval, electronic digital data processing, special data processing applications, etc. question

Active Publication Date: 2014-04-30
SHANGHAI JINEN INFORMATION TECH CO LTD
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to overcome the fact that the mining of unstructured data in the prior art consumes a large amount of computing resources an

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for processing unstructured data
  • Method and system for processing unstructured data
  • Method and system for processing unstructured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Such as figure 1 As shown, the unstructured data processing method of this embodiment includes the following steps:

[0062] S 1 , Setting multiple feature templates, each feature template includes keywords;

[0063] S 2 1. Use each feature template to scan a database storing multiple pieces of unstructured data, respectively judge whether there is content consistent with each feature template for each piece of unstructured data, and use the feature template whose judgment result is yes as Feature template records matched by each piece of unstructured data;

[0064] S 3 1. Generate a plurality of template vectors corresponding to the plurality of unstructured data respectively, each template vector has a plurality of dimensions corresponding to the plurality of feature templates one by one, in the plurality of dimensions, each unstructured The scalar value of the dimension corresponding to the feature template that the data matches is 1, and the scalar value of the...

Embodiment 2

[0069] Such as figure 2 As shown, compared with Embodiment 1, the unstructured data processing method of this embodiment differs only in that the method of this embodiment also includes 3 After performing the following steps:

[0070] S 4 , Read the features to be mined;

[0071] S 5 , judging whether there is a feature template consistent with the feature to be mined in the plurality of feature templates, if so, execute S 6 , otherwise execute S 7 ;

[0072] S 6 1. Select a feature template that is consistent with the feature to be mined to match the multiple template vectors, select the template vector that matches successfully as the vector to be output, and execute S 9 ;

[0073] S 7 , generating a feature template combination to represent the feature to be mined, the feature template combination being a number of feature templates connected by logical operators;

[0074] S 8 , using the feature template combination to match the multiple template vectors, selec...

Embodiment 3

[0081] Compared with Embodiment 2, the unstructured data processing method of this embodiment differs only in the method of this embodiment, S 2 It also includes: recording the number of occurrences of content consistent with each feature template in each piece of unstructured data.

[0082] S 3 by S 3a Substitute, S 3a To: generate a plurality of template vectors corresponding to the plurality of pieces of unstructured data respectively, each template vector has a plurality of dimensions corresponding to the plurality of feature templates one by one, and the labels of the plurality of dimensions of each template vector The magnitudes are respectively the number of occurrences of content consistent with the corresponding plurality of feature templates in the corresponding unstructured data.

[0083] Moreover, part of the plurality of feature templates is a retrieval formula including keywords and logical operators. For example, there is a feature template "European and Ame...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for processing unstructured data. The method for processing the unstructured data comprises the following steps that multiple feature templates comprising keywords are arranged; a database where multiple columns of unstructured data are stored is scanned through each feature template, whether the content which is identical to each feature template and is recorded in the corresponding column of unstructured data exists in the unstructured data or not is judged, and each feature template with a positive judgment result is recorded as the feature template matched with the corresponding column of unstructured data; multiple template vectors corresponding to the multiple columns of unstructured data one to one are generated and each template vector is provided with multiple dimensions corresponding to the feature templates one to one. According to the method and system for processing the unstructured data, the unstructured data are processed through the feature templates so that the unstructured data can be in a vector mode, the following calculation processing is conducted on the template vectors, calculation resources needed for data analysis of the unstructured data are reduced, and the time needed for data analysis of the unstructured data is shortened.

Description

technical field [0001] The invention relates to a method and system for processing unstructured data. Background technique [0002] In the past ten years, the rapid development of e-commerce and network service technology has led to a rapid increase in the amount of information contained in it, and more and more involves the processing of massive information, which can be said to be a new challenge for information processing. Many applications in these fields not only have a large amount of structured data, but also generate larger volumes of unstructured data. Since the processing of unstructured data relatively consumes more computing resources, the value of unstructured data is usually ignored in traditional data analysis systems. [0003] Taking an online video website as an example, the system records structured data such as the video clicked by the user, video type, viewing period, and viewing method, and also records more unstructured data such as user evaluation, vi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/30
Inventor 叶向维
Owner SHANGHAI JINEN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products