A method and device for constructing an application system based on data support

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
An application system and data technology, applied in the computer field, can solve the problems of reducing the construction efficiency of the application system, affecting the implementation effect, and increasing the occupation of system resources, so as to reduce the construction cost, optimize the implementation effect, and reduce the effect of noise data

Active Publication Date: 2019-11-12

阿里巴巴(北京)软件服务有限公司

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The embodiment of the present application provides a method and device for building an application system based on data support, to solve the problem that when the data used to build the application system is large, it will increase the occupancy of system resources, reduce the construction efficiency of the application system, and affect the The problem of implementation effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0050] Such as figure 1 As shown, it is a flowchart of a data-supported application system construction method provided in Embodiment 1 of the present application, including the following steps:

[0051] S101: According to the attribute information of each sample point in the text data used to construct the application system, divide the text data into multiple sample point sets; wherein each sample point contains at least one word sequence.

[0052] In the embodiment of the present application, the sample points may be sentences, phrases, paragraphs, etc. composed of a series of word sequences. The word sequence (N-gram) here is a sequence composed of consecutive N words, and the value of N can be preset, such as 2, 3, 4, 5, etc., or a combination of these numbers, That is, the number of words included in the word sequence can be one or more types; the number of letters or characters included in a word can also be preset. The attribute information may include a clustering fea...

Embodiment 2

[0061] In the second embodiment, clustering is used to divide the set of sample points; when the minimum number of sample points is selected, all the different word sequences included in the entire text data are used as the word sequences to be covered by the application system.

[0062] Such as figure 2 As shown, it is a flow chart of the data-supported application system construction method provided in Embodiment 2 of the present application, including the following steps:

[0063] S201: According to the clustering feature of each sample point in the text data used to build the application system, divide the sample points with the same clustering feature into the same sample point set.

[0064] In the specific implementation process, the clustering features of each sample point can be extracted. For example, the features include: Term frequency–inverse document frequency (TF), inverse document frequency (Term Frequency, IDF) of the word sequence contained in the sample poin...

Embodiment approach

[0075] In this embodiment, after the above-mentioned cluster division in step S201, most of the word sequences contained in the sample points in different sample point sets are different, but there may still be a small part of the word sequences that are repeated, so , in order to further reduce the data scale, the following preferred method can be adopted, each sample point set does not need to cover the word sequences contained in the sample points already selected in other sample point sets. Specifically, for each sample point set, determine the word sequence that the sample point set needs to cover according to the following steps:

[0076] Remove the word sequences included in the sample points selected in other sample point sets from the word sequences that the application system needs to cover, to obtain the remaining word sequences that need to be covered;

[0077] The intersection of each word sequence included in the sample point set and the obtained remaining word s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of computers, in particular to a data support-based application system establishment method and apparatus, which is used for solving the problems that the occupation of system resources is increased and the establishment efficiency of an application system is reduced when the scale of data used for establishing the application system is relatively large. The application system establishment method provided by an embodiment of the invention comprises the steps of dividing text data into a plurality of sample point sets according to attribute information in sample points of the text data used for establishing the application system; for each sample point set, selecting a minimum number of sample points from the sample point set, and enabling the word sequence coverage rate of the minimum number of sample points to be higher than a set threshold; and establishing the application system by adopting the selected sample points in each sample point set. By adopting the method and the apparatus provided by embodiments of the invention, a small number of data valuable for the application system is finely selected from large-scale massive data to establish the application system, thereby reducing the occupied system resources and improving the establishment efficiency of the application system.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a method and device for constructing an application system based on data support. Background technique [0002] In the construction process of many application systems, it is often necessary to use a large amount of data for support. For example, for an application system such as machine translation, it is necessary to adopt a data-driven idea and conduct machine learning based on a large number of sentences in different languages, and then train and adjust Excellent translation system. [0003] The scale of data directly affects the construction and operation of these data-supported application systems. In general, the larger the data scale, the more information can be obtained, and the machine learning effect will be better. However, the increase in the data scale will pose challenges to the feasibility of application system construction: it will not only prolong t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/35

CPCG06F16/355

Inventor张浩陆军蒋宏飞

Owner阿里巴巴(北京)软件服务有限公司

A method and device for constructing an application system based on data support

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment approach

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology