Data tag generation method and apparatus
A data labeling and data technology, applied in the field of Internet data, can solve the problems of reducing the quality of topic clustering, time-consuming and expensive, scattered and free labels, etc., and achieve the effect of detailed and rich content, accurate content division, and complete structure
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0056] A method for generating data labels: obtaining original text data; performing top-level classification on the original text data by using a top-level subject database to obtain multiple top-level subject text data; performing de-redundancy preprocessing on multiple top-level subject text data to obtain multiple top-level subject text data Topic preprocessing text data; obtain the total number of documents N and the total number of words M in each top-level topic preprocessing text data, extract the Tf-idf feature value of each word in each document, and obtain matrix data V; among them, The number of rows of V is N, one row is one document, the number of columns of V is M, and one column is the Tf-idf feature value of a word in N documents respectively; subject clustering is performed on the matrix data V to obtain X different topics Clustering; pick 20-50 keywords that are closely related to the corresponding topic clusters from each topic cluster; sort according to the...
Embodiment 2
[0093] A data label generating device, comprising: an original data acquisition module; a top-level subject database module, which is used to perform top-level classification on original text data, and obtain top-level subject text data of the original text data;
[0094] A data preprocessing module, configured to perform de-redundancy preprocessing on each top-level topic text data, to obtain multiple top-level topic preprocessing text data;
[0095] The acquisition matrix data module is used to obtain the total number of documents and the total number of words in each top-level topic preprocessing text data, and extract the Tf-idf feature value of each word in each document of the same top-level topic preprocessing text data , to obtain matrix data; wherein, the number of rows of matrix data is the total number of documents, one row is one document, the number of columns of matrix data is the total number of words, and one column is the Tf-idf feature value of a word in multi...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com