Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document classification method and device, computer equipment and storage medium

A document classification and document technology, which is applied in the field of computer and bidding, can solve problems such as difficult to achieve efficient management of bidding documents, and achieve the effect of reducing labor costs and improving efficiency

Pending Publication Date: 2021-10-19
国家能源集团国际工程咨询有限公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the existing document archiving applications, the technology of using computers to classify natural language has been involved in many industries. However, for the bidding business field, due to the large amount of bidding business data, it is difficult for staff to achieve efficient management of bidding documents. Therefore, it is necessary to propose a document classification scheme suitable for the bidding business field, so as to realize the automatic and efficient management of the bidding business, and make the bidding business more intelligent and electronic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method and device, computer equipment and storage medium
  • Document classification method and device, computer equipment and storage medium
  • Document classification method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Such as figure 1 As shown, it is a flowchart of a document classification method provided by an embodiment of the present invention, and the document classification method may include the following steps:

[0062] Step S101: Obtain target bidding documents to be classified.

[0063] Step S102: Extract classification feature vectors based on the text content of the target bidding document.

[0064] Wherein, the classification feature vector includes at least the subject matter and classification information of the target bidding document.

[0065] In one embodiment, the classification information includes at least industry category and item type.

[0066] Step S103: Input the classification feature vector into a pre-established document classification model to obtain a classification result for the target bidding document.

[0067] Wherein, the document classification model is a classifier using historical bidding documents as training data, using XGBoost algorithm to...

Embodiment 2

[0072] Such as figure 2 As shown, another flow diagram of the document classification method provided by the embodiment of the present invention, the method may include the following steps:

[0073] Step S201: Obtain target bidding documents to be classified.

[0074] Step S202: Perform preprocessing on the text content of the target bidding document.

[0075] When preprocessing text content, it can generally include two processes of classification information table cleaning and information integration. Specifically, the first step is to clean the classification information table, which is to delete useless information such as project number and bid section number in the information table of the target bidding document. The evaluation of the training effect has no reference value, so this part of the data has been cleaned; the second step is information integration, after obtaining relatively standardized data and the subject matter, information integration is carried out, ...

Embodiment 3

[0096] See Figure 5 , the document classification model based on XGBoost provided by the embodiment of the present invention will be described below in conjunction with a specific example.

[0097] a. Read the preprocessed data into the model, specify the training text content and the corresponding classification labels and store them in the DataFrame structure of pandas;

[0098] b. When classifying according to the industry category, use the text containing the classification information, the project name, and the project unit as the training text content;

[0099] c. When classifying according to the project type, although the classification information extracted from the text content of the bidding document still has data redundancy, due to the expansion of the data volume, more useful information about the project type is included, so the bidding is based on the project type. When the document is classified, the classification information set of the bidding document can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a document classification method and device, computer equipment and a storage medium, and relates to the technical field of computers and bid invitation. The document classification method comprises the following steps: obtaining a to-be-classified target bid invitation document; based on the text content of the target bid invitation document, extracting a classification feature vector, wherein the classification feature vector at least comprises a subject matter and classification information of the target bid invitation document; and inputting the classification feature vector into a pre-established document classification model to obtain a classification result for the target bid invitation document. The document classification model is a classifier which takes historical bid invitation documents as training data, performs machine learning on the training data by utilizing an XGBoost algorithm and establishes a mapping relation between classification feature vectors and classification results. Thus, automatic and efficient management of the bid invitation service is achieved, complex operation of workers is not needed, the bid invitation service is more intelligent and electronized, the efficiency is improved, and meanwhile labor cost is reduced.

Description

technical field [0001] The invention relates to the technical fields of computer and bidding, in particular to a document classification method, device, computer equipment and storage medium. Background technique [0002] In the existing document archiving applications, the technology of using computers to classify natural language has been involved in many industries. However, for the bidding business field, due to the large amount of bidding business data, it is difficult for staff to achieve efficient management of bidding documents. Therefore, it is necessary to propose a document classification scheme suitable for the bidding business field to realize the automatic and efficient management of the bidding business and make the bidding business more intelligent and electronic. Contents of the invention [0003] The technical problem to be solved by the present invention is to propose a document classification scheme applicable to the bidding business field, so as to rea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/216G06F40/284G06N20/00
CPCG06F16/35G06F40/284G06F40/216G06N20/00
Inventor 严蕾苏晓辉任泽沈志远李维盈陈建
Owner 国家能源集团国际工程咨询有限公司
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More