Automatic annotating method for subjects of open source software

A technology of automatic labeling and open source software, which is applied in special data processing applications, instruments, electrical digital data processing, etc., and can solve the problems of LabeledLDA inapplicability, short project description text, etc.

Active Publication Date: 2012-10-31
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This model has been used in Yahoo! However, the labels of open source projects are too detail

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic annotating method for subjects of open source software
  • Automatic annotating method for subjects of open source software
  • Automatic annotating method for subjects of open source software

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical solution of the present invention will be specifically described below in conjunction with the embodiments.

[0028] Step 1, crawl the open source community, obtain the open source project data, the project data includes the open source project name, label and project description, carry out preprocessing to described project description and project label, and described preprocessing comprises: described project label After converting to its root, the tags of the same root are merged, and items with less than three tags are deleted, and the item description is converted into a word bag through word segmentation, stop word deletion, and root extraction.

[0029] In the embodiment, crawler technology and web page extraction technology are used to obtain the names, labels and project descriptions of a large number (>100K) of open source projects from open source communities (such as ohloh, sourceforge). For example, use crawler technology and web page extracti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An automatic annotating method for subjects of open source software comprises the steps as follows: open source project data is obtained, project labels are converted to roots of the project labels, then the labels of the identical roots are merged, and project descriptions are converted to word packets; names, the labels and the project descriptions of an open source project are taken as input, annotated LDA (Latent Dirichlet Allocation) models are applied, the input data is trained through a Gibbs sampling process, all labels and counts designated by certain words in the project descriptions are obtained after stabilization, and words are generated in label designation; a label network is constructed in the label designation according to the generated words, and semantic distances and semantic cohesion of points are calculated; in addition, a new project can be annotated automatically according to the constructed label network, the name and the description of any one project p are input, each word in the description is searched in the label network, respective label sets Li of each different word i in the description are obtained, one label 1i is selected from each Li, the semantic cohesion (Cohesion L) can be maximum, and the labels satisfying the conditions are annotated to the new project automatically.

Description

technical field [0001] The invention relates to an automatic tagging method for open source software topics, in particular to a method for automatically adding tags to unknown software by constructing an open source project tag network model. Background technique [0002] Open source software (OSS) plays an increasingly important role in the field of software engineering. There are tens of thousands of open source software projects in many open source communities, and some giant communities such as sourceforge.net and googlecode contain a large number of open source projects. It contains a variety of data about open source projects and is of great importance to aid research in the field of software engineering. [0003] With the rapid accumulation of open source project data, the problem of quickly finding the required open source project becomes complicated for project engineers. However, text processing and labeling technologies for project summaries can be used to meet ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王怀民尹刚王涛李翔朱沿旭史殿习丁博刘惠滕猛袁霖
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products