Online-increment evolution topic model based automatic software classifying method

A topic model and topic technology, applied in the field of software automatic classification technology, can solve problems such as unrealistic, unable to reflect the impact of historical time slices and current time slices, and huge cost of topic mining technology, so as to improve efficiency, quality and security Effect

Active Publication Date: 2013-01-30
NAT UNIV OF DEFENSE TECH
View PDF2 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

(3) The collection of software texts in the open source community is huge, and the overhead of topic mining technology is huge. It is unrealistic to use topic mining technology to mine all texts in the open source community, so it is necessary to provide a more

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online-increment evolution topic model based automatic software classifying method
  • Online-increment evolution topic model based automatic software classifying method
  • Online-increment evolution topic model based automatic software classifying method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The invention mainly automatically classifies massive open source software in the open source community based on the project description text.

[0038] Step 1 Obtain software-related text and preprocessing, software-related text refers to some text information related to software. First, use crawler technology and web page extraction technology to obtain a large number (>100K) of open source software project names, project subject tags (if available, otherwise set to unmarked), project description text, and Project creation time (i.e. project registration time).

[0039] In the embodiment, crawler technology and web page extraction technology are used to obtain all project information from 2000 to 2009 in the sourceforge community, from which the required fields are extracted including project name, project registration time, project subject tag and project description, and Table 1 shows Two examples.

[0040]

[0041] Table 1

[0042] Then, according to the crawl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An online-increment evolution topic model based automatic software classifying method includes acquiring relevant software texts, grouping and preprocessing by a preset time slice; generating a probability model of an online evolution topic model, computing the number of the optimum topics according to project description texts grouped according to the time slice, and incrementally computing topic word distribution and topic text distribution of the project description texts within the current time slice; acquiring a text d of an unknown classifying topic, computing topic word distribution of n topics subordinative to the text d according to the topic word distribution and the topic text distribution, classifying the text d into corresponding topics, and automatically adding semantic tags to the topics based on the word list and word inquiry method, and finally completing classification of software projects. By the online-increment evolution topic model based automatic software classifying method, new topics appearing in open source communities can be found in time, software projects can be automatically classified, a software developer can search out required open source software projects according to software topics conveniently, and accordingly, software development efficiency is improved, and quality and assurance of the open source communities are improved.

Description

technical field [0001] The present invention relates to the technical field of automatic software classification, in particular to an automatic software classification method based on an online incremental evolution topic model. The method automatically mines the text flow in the software text flow through the online incremental establishment of the topic model of the text flow of the open source community software. Hidden topics, and assign each open source software text to the mined topics, and then automatically add corresponding semantic tags to the topics, so as to realize the automatic classification of open source software. Background technique [0002] Open source software (also known as open source software) is computer software whose source code is freely available. Open source software is usually released according to a certain license agreement. The license agreement can guarantee the software users' right to freely use and access the source code. Users can modif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 尹刚王怀民朱沿旭余跃史殿习李翔王涛袁霖
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products