Self-information based discovery method for co-occurrent topic in interdisciplinary field
A discovery method and self-information technology, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve the problems that co-occurrence topic information cannot be extracted well, and cannot be used to extract co-occurrence subject words, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0033] refer to figure 1 , this self-information-based discovery of co-occurrence topics in interdisciplinary fields is characterized in that: the operation steps include:
[0034] (1) Data collection: collect the self-assessment documents of highly cited authors on their scientific research success;
[0035] (2) Data processing: extracting and digitizing the text part of the self-assessment;
[0036] (3), extract candidate low-frequency topic words;
[0037] (4), calculate low-frequency theme evaluation coefficient;
[0038] (5), setting the threshold value of evaluation coefficient of low-frequency subject words;
[0039] (6) Filter low-frequency keywords.
Embodiment 2
[0040] Embodiment 2: This embodiment is basically the same as Embodiment 1, and the special features are as follows:
[0041] The specific operation of the data collection in the step (1) is: collect 3790 highly cited classic documents from the self-assessment of the authors of the highly cited classic documents collected by Garfield, the founder of the citation database SCI, about the success of their scientific research work Author self-assessment documentation collection for .
[0042] The specific operation of the data processing in the step (2) is: digitize and extract the text in the document collection; in addition, three types of information are extracted: the text content of the self-assessment, the relevant information of the self-assessment, and the relevant information of the original highly cited documents. information.
[0043] The specific operation of the step (3) extracting candidate low-frequency subject words is: firstly utilize the "Natural Language Toolse...
Embodiment 3
[0054] Such as figure 1 As shown, this method for discovering co-occurrence topics in interdisciplinary fields based on self-information includes the following steps:
[0055] (1) Data collection. Access more than 5,000 documents in PDF format in the Garfield Electronic Library at the University of Pennsylvania. Through the three data preprocessing tasks of deleting noise data, deleting duplicate data, and discarding missing data, a total of 3,790 available documents with complete information were obtained, and a self-assessment document set was established.
[0056] (2), data processing. The text portion of the self-assessment in the dossier was extracted and digitized. In addition, three types of information were extracted, the text content of the self-evaluation, relevant information of the self-evaluation (such as: the author of the self-evaluation, the address of the author, the year of the self-evaluation, and the subject field label of the self-evaluation), and the o...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com