A method and device for constructing an application system based on data support

An application system and data technology, applied in the computer field, can solve the problems of reducing the construction efficiency of the application system, affecting the implementation effect, and increasing the occupation of system resources, so as to reduce the construction cost, optimize the implementation effect, and reduce the effect of noise data

Active Publication Date: 2019-11-12
阿里巴巴(北京)软件服务有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present application provides a method and device for building an application system based on data support, to solve the problem that when the data used to build the application system is large, it will increase the occupancy of system resources, reduce the construction efficiency of the application system, and affect the The problem of implementation effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for constructing an application system based on data support
  • A method and device for constructing an application system based on data support
  • A method and device for constructing an application system based on data support

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Such as figure 1 As shown, it is a flowchart of a data-supported application system construction method provided in Embodiment 1 of the present application, including the following steps:

[0051] S101: According to the attribute information of each sample point in the text data used to construct the application system, divide the text data into multiple sample point sets; wherein each sample point contains at least one word sequence.

[0052] In the embodiment of the present application, the sample points may be sentences, phrases, paragraphs, etc. composed of a series of word sequences. The word sequence (N-gram) here is a sequence composed of consecutive N words, and the value of N can be preset, such as 2, 3, 4, 5, etc., or a combination of these numbers, That is, the number of words included in the word sequence can be one or more types; the number of letters or characters included in a word can also be preset. The attribute information may include a clustering fea...

Embodiment 2

[0061] In the second embodiment, clustering is used to divide the set of sample points; when the minimum number of sample points is selected, all the different word sequences included in the entire text data are used as the word sequences to be covered by the application system.

[0062] Such as figure 2 As shown, it is a flow chart of the data-supported application system construction method provided in Embodiment 2 of the present application, including the following steps:

[0063] S201: According to the clustering feature of each sample point in the text data used to build the application system, divide the sample points with the same clustering feature into the same sample point set.

[0064] In the specific implementation process, the clustering features of each sample point can be extracted. For example, the features include: Term frequency–inverse document frequency (TF), inverse document frequency (Term Frequency, IDF) of the word sequence contained in the sample poin...

Embodiment approach

[0075] In this embodiment, after the above-mentioned cluster division in step S201, most of the word sequences contained in the sample points in different sample point sets are different, but there may still be a small part of the word sequences that are repeated, so , in order to further reduce the data scale, the following preferred method can be adopted, each sample point set does not need to cover the word sequences contained in the sample points already selected in other sample point sets. Specifically, for each sample point set, determine the word sequence that the sample point set needs to cover according to the following steps:

[0076] Remove the word sequences included in the sample points selected in other sample point sets from the word sequences that the application system needs to cover, to obtain the remaining word sequences that need to be covered;

[0077] The intersection of each word sequence included in the sample point set and the obtained remaining word s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computers, in particular to a data support-based application system establishment method and apparatus, which is used for solving the problems that the occupation of system resources is increased and the establishment efficiency of an application system is reduced when the scale of data used for establishing the application system is relatively large. The application system establishment method provided by an embodiment of the invention comprises the steps of dividing text data into a plurality of sample point sets according to attribute information in sample points of the text data used for establishing the application system; for each sample point set, selecting a minimum number of sample points from the sample point set, and enabling the word sequence coverage rate of the minimum number of sample points to be higher than a set threshold; and establishing the application system by adopting the selected sample points in each sample point set. By adopting the method and the apparatus provided by embodiments of the invention, a small number of data valuable for the application system is finely selected from large-scale massive data to establish the application system, thereby reducing the occupied system resources and improving the establishment efficiency of the application system.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a method and device for constructing an application system based on data support. Background technique [0002] In the construction process of many application systems, it is often necessary to use a large amount of data for support. For example, for an application system such as machine translation, it is necessary to adopt a data-driven idea and conduct machine learning based on a large number of sentences in different languages, and then train and adjust Excellent translation system. [0003] The scale of data directly affects the construction and operation of these data-supported application systems. In general, the larger the data scale, the more information can be obtained, and the machine learning effect will be better. However, the increase in the data scale will pose challenges to the feasibility of application system construction: it will not only prolong t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/355
Inventor 张浩陆军蒋宏飞
Owner 阿里巴巴(北京)软件服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products