WSDL semi-structured document similarity analyzing and classifying method based on semantic model

A similarity analysis and semi-structured technology, applied in the field of similarity analysis and classification of WSDL semi-structured documents, can solve problems such as text classification errors, ignoring vocabulary terms and purifying common information, and achieve the effect of eliminating root ambiguity

Active Publication Date: 2014-09-24
CENT SOUTH UNIV
View PDF5 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Currently, many text classification algorithms rely on statistically based document feature vectors, howe

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • WSDL semi-structured document similarity analyzing and classifying method based on semantic model
  • WSDL semi-structured document similarity analyzing and classifying method based on semantic model
  • WSDL semi-structured document similarity analyzing and classifying method based on semantic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] Such as figure 1 As shown, it is a flowchart of the present invention, a semantic model-based WSDL semi-structured document similarity analysis method, including the following steps:

[0042] Step 1: Find one or more roots corresponding to each original word in the original document in turn, use the WordNet dictionary to obtain one or more synonym sets of the root corresponding to each original word in the document, and use each synonym set as a semantic element;

[0043]Through the analysis of the document corpus, relying on word meaning statistics will lose the interactive information involving synonyms. Therefore, we use the WordNet dictionary (English vocabulary database) to establish the original words of semi-structured documents based on WSDL. A table in the WordNet dictionary is represented by a string of ASCII characters, and the meani...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a WSDL semi-structured document similarity analyzing and classifying method based on a semantic model. The method includes the steps that a WordNet dictionary is used for establishing a WSDL semi-structured document semantic model, lexical ambiguity is eliminated through a maximum entropy model, a WSDL semi-structured document corpus feature vector model is established, a document feature matrix of WSDL semi-structured documents is generated, hence, content classification and evaluation are conducted on two different documents, and finally the similarity comparison of service functions is obtained. By means of the WSDL semi-structured document similarity analyzing and classifying method based on the semantic model, the judging accuracy of document similarity is improved, the document classification speed is increased, the document classification precision is improved, and a dimensionality reduction effect can be achieved on vector space.

Description

technical field [0001] The invention relates to the field of Web service and information retrieval, in particular to a semantic model-based WSDL semi-structured document similarity analysis and classification. Background technique [0002] In the field of information retrieval, the implementation of document corpora for similarity and correlation analysis requires corresponding algorithms for representing different documents. Typical statistical feature extraction methods include TF-IDF based on lexical word frequency and Wahash based on continuous conditional algorithm. TF-IDF is currently a more practical document classification algorithm. In the vector space model-based information retrieval system, the TF-IDF algorithm is widely used in keyword-based information retrieval. Likewise, many document classification methods exploit word statistics, such as Bag-of-Words and Minwise hashes are extracted as statistical measures of document representation. However, in the field...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/80
Inventor 龙军张祖平王鲁达李会玲
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products