WSDL semi-structured document similarity analyzing and classifying method based on semantic model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A similarity analysis and semi-structured technology, applied in the field of similarity analysis and classification of WSDL semi-structured documents, can solve problems such as text classification errors, ignoring vocabulary terms and purifying common information, and achieve the effect of eliminating root ambiguity

Active Publication Date: 2014-09-24

CENT SOUTH UNIV

View PDF5 Cites 20 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Currently, many text classification algorithms rely on statistically based document feature vectors, however, these algorithms ignore lexical terms and purify common information, resulting in text classification errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] Such as figure 1 As shown, it is a flowchart of the present invention, a semantic model-based WSDL semi-structured document similarity analysis method, including the following steps:

[0042] Step 1: Find one or more roots corresponding to each original word in the original document in turn, use the WordNet dictionary to obtain one or more synonym sets of the root corresponding to each original word in the document, and use each synonym set as a semantic element;

[0043]Through the analysis of the document corpus, relying on word meaning statistics will lose the interactive information involving synonyms. Therefore, we use the WordNet dictionary (English vocabulary database) to establish the original words of semi-structured documents based on WSDL. A table in the WordNet dictionary is represented by a string of ASCII characters, and the meani...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a WSDL semi-structured document similarity analyzing and classifying method based on a semantic model. The method includes the steps that a WordNet dictionary is used for establishing a WSDL semi-structured document semantic model, lexical ambiguity is eliminated through a maximum entropy model, a WSDL semi-structured document corpus feature vector model is established, a document feature matrix of WSDL semi-structured documents is generated, hence, content classification and evaluation are conducted on two different documents, and finally the similarity comparison of service functions is obtained. By means of the WSDL semi-structured document similarity analyzing and classifying method based on the semantic model, the judging accuracy of document similarity is improved, the document classification speed is increased, the document classification precision is improved, and a dimensionality reduction effect can be achieved on vector space.

Description

technical field [0001] The invention relates to the field of Web service and information retrieval, in particular to a semantic model-based WSDL semi-structured document similarity analysis and classification. Background technique [0002] In the field of information retrieval, the implementation of document corpora for similarity and correlation analysis requires corresponding algorithms for representing different documents. Typical statistical feature extraction methods include TF-IDF based on lexical word frequency and Wahash based on continuous conditional algorithm. TF-IDF is currently a more practical document classification algorithm. In the vector space model-based information retrieval system, the TF-IDF algorithm is widely used in keyword-based information retrieval. Likewise, many document classification methods exploit word statistics, such as Bag-of-Words and Minwise hashes are extracted as statistical measures of document representation. However, in the field...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/35G06F16/80

Inventor龙军张祖平王鲁达李会玲

OwnerCENT SOUTH UNIV

WSDL semi-structured document similarity analyzing and classifying method based on semantic model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology