Document classification method and device, computer equipment and storage medium
A document classification and document technology, which is applied in the field of computer and bidding, can solve problems such as difficult to achieve efficient management of bidding documents, and achieve the effect of reducing labor costs and improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0061] Such as figure 1 As shown, it is a flowchart of a document classification method provided by an embodiment of the present invention, and the document classification method may include the following steps:
[0062] Step S101: Obtain target bidding documents to be classified.
[0063] Step S102: Extract classification feature vectors based on the text content of the target bidding document.
[0064] Wherein, the classification feature vector includes at least the subject matter and classification information of the target bidding document.
[0065] In one embodiment, the classification information includes at least industry category and item type.
[0066] Step S103: Input the classification feature vector into a pre-established document classification model to obtain a classification result for the target bidding document.
[0067] Wherein, the document classification model is a classifier using historical bidding documents as training data, using XGBoost algorithm to...
Embodiment 2
[0072] Such as figure 2 As shown, another flow diagram of the document classification method provided by the embodiment of the present invention, the method may include the following steps:
[0073] Step S201: Obtain target bidding documents to be classified.
[0074] Step S202: Perform preprocessing on the text content of the target bidding document.
[0075] When preprocessing text content, it can generally include two processes of classification information table cleaning and information integration. Specifically, the first step is to clean the classification information table, which is to delete useless information such as project number and bid section number in the information table of the target bidding document. The evaluation of the training effect has no reference value, so this part of the data has been cleaned; the second step is information integration, after obtaining relatively standardized data and the subject matter, information integration is carried out, ...
Embodiment 3
[0096] See Figure 5 , the document classification model based on XGBoost provided by the embodiment of the present invention will be described below in conjunction with a specific example.
[0097] a. Read the preprocessed data into the model, specify the training text content and the corresponding classification labels and store them in the DataFrame structure of pandas;
[0098] b. When classifying according to the industry category, use the text containing the classification information, the project name, and the project unit as the training text content;
[0099] c. When classifying according to the project type, although the classification information extracted from the text content of the bidding document still has data redundancy, due to the expansion of the data volume, more useful information about the project type is included, so the bidding is based on the project type. When the document is classified, the classification information set of the bidding document can...
PUM

Abstract
Description
Claims
Application Information

- Generate Ideas
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com