Co-occurrence latent semantic vector space model semantic core method based on literature resource topic clustering

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of vector space and kernel method, which is applied in the field of semantic kernel of latent semantic vector space model of topic clustering of literature resources, which can solve problems such as high model dimension, high time and space complexity of clustering algorithm, insufficient semantic information extraction, etc. problem, achieve the effect of reducing dimensionality and improving clustering effect

Active Publication Date: 2017-05-24

SHANXI UNIV

View PDF2 Cites 24 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The present invention mainly aims at the semantic kernel method of the current semantic vector space model, which has relatively large semantic information extraction complexity, insufficient semantic information extraction, high model dimension, and high time and space complexity when applied to clustering algorithms, etc. problem, providing a semantic kernel method for text resource topic clustering co-occurrence latent semantic vector space model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0045] The first step: data preprocessing: data cleaning, labeling documents, extracting keywords of each document, and retaining the corresponding relationship between keywords and corresponding documents.

[0046] The data comes from CNKI. According to its classification, 300 documents are selected from each of the three disciplines of "Publishing", "Library Information and Digital Library" and "Archives and Museums" under the information science as the documents for analysis, except for those without keywords. There are 4 documents, and the total number of documents finally obtained is 896, including 299 articles of "publishing", 298 articles of "library information and digital library", 299 articles of "archives and museums", and 2509 different keywords were obtained. That is: the number of documents n=896, the number of keywords m=2509, the following table shows the first 20 documents and all corresponding keywords. In Table 1, LM is the document category, ID is the docum...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of a semantic vector space model semantic core method, and particularly relates to a co-occurrence latent semantic vector space model semantic core method based on literature resource topic clustering. The method mainly solves the problems that an existing semantic vector space model semantic core method is high in semantic information extraction complexity, is insufficient in semantic information extraction, is high in model dimension, is high in complexity on the aspects of time and space when the existing semantic vector space model semantic core method is applied to a clustering algorithm and the like. The co-occurrence latent semantic vector space model semantic core method based on the literature resource topic clustering comprises the following steps that: 1) preprocessing literature data; 2) carrying out word frequency statistics on an extracted keyword for subsequently establishing a co-occurrence matrix to be used; 3) taking whether the keyword is in the presence in the literature or not as a weight to construct a vector space model shown by the literature; 4) constructing a co-occurrence latent semantic vector space model; 5) constructing a semantic core function; and 6) carrying out literature clustering.

Description

technical field [0001] The invention belongs to the technical field of a semantic kernel method of a semantic vector space model, and in particular relates to a semantic kernel method of a document resource subject clustering co-occurrence latent semantic vector space model. Background technique [0002] The era of big data has brought us a large number of unstructured text resources. As an unsupervised machine learning method, clustering is one of the main means to realize text resource mining. Text clustering is different from general data clustering. It first needs to represent the text information in a data structure. The basic model of text representation is the vector space model (VSM), which maps each document into a high-dimensional sparse vector in the text space, so the semantic similarity calculation problem between texts can be transformed into It is the calculation of vectors in the vector space, that is, by calculating the similarity between vectors to measure...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/27

CPCG06F16/35G06F40/30

Inventor 牛奉高张亚宇

Owner SHANXI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Co-occurrence latent semantic vector space model semantic core method based on literature resource topic clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology