Automatic thesaurus construction method for specific software historical code library
A technology of automatic construction and code base, applied in the field of automatic construction of thesaurus, which can solve the problems of lack of pertinence and low accuracy of knowledge base
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0034] The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:
[0035] Step 1). Extract the code and comments in the historical version library of the software system (this example uses the software system developed by java language) to generate an independent document corpus, and divide the corpus into a pure code document library and a pure annotation document library.
[0036] Step 2). Preprocess the pure code documents in the corpus, including tokenizing, removing stop words, and extracting elements (such as figure 2 Including identifiers, class names, method names, variable names), get words and phrases and their support in the code (Code-TF). In addition, in the process of tokenization, the inheritance relationship between classes (kind -of). Using the grammar of "+implements+" in java, based on the middle word "implements", the relationship between classes and interfaces (realize-of) is analyzed, and W\WG-...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com