A text classification method, device and equipment of a power grid maintenance order and a storage medium
By performing global frequency statistics and coefficient assignment on the keywords of power grid maintenance orders, the problems of large data requirements and complex inter-class relationships in existing technologies are solved, and efficient and reliable matching of power grid maintenance order text classification is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG POWER GRID CO LTD
- Filing Date
- 2022-03-18
- Publication Date
- 2026-06-16
AI Technical Summary
Existing machine learning text matching methods require a large amount of training data and cannot effectively handle complex inter-class relationships, resulting in a lack of accuracy and reliability in the text classification results of power grid maintenance orders.
By selecting a set of keywords from historical maintenance orders, performing global frequency statistics and ascending sorting, assigning keyword coefficient sequences, and using the keyword coefficient sequences and keyword sequences to score the text sentences to be identified, the maximum value is selected as the target sentence category.
It achieves accurate matching of the text category of power grid maintenance orders with a limited amount of historical maintenance order data, avoiding the influence of complex inter-class relationships and improving the reliability of classification results.
Smart Images

Figure CN114625839B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of text classification technology, and in particular to a text classification method, apparatus, equipment and storage medium for power grid maintenance orders. Background Technology
[0002] In power production, maintenance and repair are crucial components. Maintenance team leaders often need to assign tasks and emphasize safety precautions, which are then included in pre-shift and post-shift meetings. While the wording of safety precautions remains constant for a given type of work, the descriptions of tasks vary widely. Analyzing and processing work tasks expressed in natural language and accurately matching them with appropriate safety precautions can significantly improve work assignment efficiency and even automatically generate pre-shift and post-shift meeting records.
[0003] The most efficient existing text matching method is the machine learning-based classification scheme. However, this scheme relies on a large amount of training data and is difficult to obtain high-accuracy matching results for more complex text relationships, resulting in poor practical application performance. Summary of the Invention
[0004] This application provides a text classification method, apparatus, device, and storage medium for power grid maintenance orders, which addresses the technical problem that existing machine learning matching methods require a large amount of training data and cannot handle complex inter-class relationships, resulting in a lack of accuracy and reliability in the results.
[0005] In view of this, the first aspect of this application provides a text classification method for power grid maintenance orders, including:
[0006] The set of keywords for each sentence in the historical maintenance order is selected according to a preset ratio, and the historical maintenance order includes sentence categories;
[0007] All the keyword sets are integrated according to the sentence categories to obtain category keyword sets;
[0008] The keyword set of the categories is then subjected to global frequency statistics and sorted in ascending order to obtain a keyword sequence;
[0009] The keyword coefficient sequence is obtained by ranking and assigning values to the first preset number of keywords in the keyword sequence.
[0010] Sentence scoring is calculated based on the keyword coefficient sequence and the keyword sequence of the text to be identified, resulting in multiple scoring results;
[0011] The sentence category corresponding to the maximum value in the scoring results is selected as the target sentence category of the text to be identified.
[0012] Preferably, the step of selecting the keyword set for each sentence in the historical maintenance order according to a preset ratio, wherein the historical maintenance order includes sentence categories, and previously also includes:
[0013] Each sentence in the historical maintenance order is categorized to obtain the sentence category.
[0014] Preferably, the step of selecting the keyword set for each sentence in the historical maintenance order according to a preset ratio, wherein the historical maintenance order includes sentence categories, including:
[0015] Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set;
[0016] The global frequency of each word in the sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence;
[0017] Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set.
[0018] Preferably, the step of assigning ranking values to the first preset number of keywords in the keyword sequence to obtain the keyword coefficient sequence includes:
[0019] Obtain the first preset number of keywords in the keyword sequence;
[0020] The reverse sorting index of the keywords is used as a coefficient to be assigned to the first preset number of keywords, and the coefficients of keywords other than the first preset number are assigned to 0, thus obtaining the keyword coefficient sequence.
[0021] A second aspect of this application provides a text classification device for power grid maintenance orders, comprising:
[0022] The keyword selection module is used to select a set of keywords for each sentence in the historical maintenance order according to a preset ratio. The historical maintenance order includes sentence categories.
[0023] The word integration module is used to integrate all the keyword sets according to the sentence categories to obtain category keyword sets;
[0024] The word processing module is used to perform global frequency statistics and ascending sort on the set of category keywords in sequence to obtain a keyword sequence;
[0025] The ranking assignment module is used to assign ranking values to the first preset number of keywords in the keyword sequence to obtain the keyword coefficient sequence;
[0026] The scoring calculation module is used to calculate sentence scores for the text sentences to be identified based on the keyword coefficient sequence and the keyword sequence, and obtain multiple scoring results.
[0027] The text classification module is used to select the sentence category corresponding to the maximum value in the scoring results as the target sentence category of the text statement to be identified.
[0028] Preferably, it further includes:
[0029] The sentence tagging module is used to perform category tagging on each sentence in the historical maintenance order to obtain the sentence category.
[0030] Preferably, the keyword selection module is specifically used for:
[0031] Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set;
[0032] The global frequency of each word in the sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence;
[0033] Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set.
[0034] Preferably, the ranking assignment module is specifically used for:
[0035] Obtain the first preset number of keywords in the keyword sequence;
[0036] The reverse sorting index of the keywords is used as a coefficient to be assigned to the first preset number of keywords, and the coefficients of keywords other than the first preset number are assigned to 0, thus obtaining the keyword coefficient sequence.
[0037] A third aspect of this application provides a text classification device for power grid maintenance orders, the device including a processor and a memory;
[0038] The memory is used to store program code and transmit the program code to the processor;
[0039] The processor is used to execute the text classification method for power grid maintenance orders as described in the first aspect, according to the instructions in the program code.
[0040] A fourth aspect of this application provides a computer-readable storage medium for storing program code for executing the text classification method for power grid maintenance orders described in the first aspect.
[0041] As can be seen from the above technical solutions, the embodiments of this application have the following advantages:
[0042] This application provides a text classification method for power grid maintenance orders, comprising: selecting a set of keywords for each sentence in historical maintenance orders according to a preset ratio, wherein the historical maintenance orders include sentence categories; integrating all keyword sets according to sentence categories to obtain a category keyword set; performing global frequency statistics and ascending sorting on the category keyword set to obtain a keyword sequence; ranking and assigning values to the first preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence; calculating sentence scores for the text statement to be identified based on the keyword coefficient sequence and the keyword sequence to obtain multiple score results; and selecting the sentence category corresponding to the maximum score result as the target sentence category of the text statement to be identified.
[0043] The text classification method for power grid maintenance orders provided in this application only requires a small number of representative historical maintenance orders to complete the initial text preparation work, including keyword integration, proportional filtering, and ascending sorting. Each word is then assigned a value to obtain a corresponding keyword coefficient sequence. For any text sentence to be identified, the obtained coefficients can be used to calculate a score, thereby enabling accurate target category matching based on the score. Selecting a batch of keywords with lower frequencies also avoids the influence of complex inter-class relationships in the sentence, making the classification results more reliable. Therefore, this application solves the technical problem that existing machine learning matching methods require a large amount of training data and cannot handle complex inter-class relationships, resulting in a lack of accuracy and reliability. Attached Figure Description
[0044] Figure 1 A flowchart illustrating a text classification method for a power grid maintenance order provided in this application embodiment;
[0045] Figure 2 This is a schematic diagram of the structure of a text classification device for a power grid maintenance order provided in an embodiment of this application. Detailed Implementation
[0046] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present application.
[0047] For easier understanding, please refer to Figure 1 An embodiment of a text classification method for power grid maintenance orders provided in this application includes:
[0048] Step 101: Select the keyword set for each sentence in the historical maintenance order according to the preset ratio. The historical maintenance order includes sentence categories.
[0049] Further, step 101 includes:
[0050] Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set;
[0051] The global frequency of each word in a sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence;
[0052] Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set.
[0053] Historical maintenance orders contain text statements for various tasks. Each sentence can be segmented using artificial intelligence based on text rules to obtain an initial word set. Before performing global frequency statistics, duplicate words can be removed. This deduplication process ensures that each keyword is unique during the statistical process, which can be directly implemented in practice and does not require further explanation. Frequency statistics mainly record the frequency of each word in the historical maintenance orders. The more times a word appears, the stronger its correlation with the task to be performed in the historical maintenance order and with the text sentence; conversely, the weaker the correlation, the lower the frequency. Ascending sorting places keywords with weaker correlation at the beginning for easier selection later. The preset ratio can be denoted as K and can be set according to actual conditions, which will not be elaborated here.
[0054] If the initial word segmentation set obtained after processing has X words, then these X words can be denoted as FEN = {A1 - AX}. Each sentence can then be expressed as J. Y Y represents the total number of sentences. Each sentence can be checked for keywords, forming a word segmentation sequence based on the sentences. The number of sentences is the number of word segmentation sequences. The words in each word segmentation sequence are arranged in ascending order of frequency. When selecting keywords according to the preset ratio, the first part of the word segmentation sequence is obtained, that is, the part of the word segmentation with lower frequency is selected to form the keyword set An.
[0055] Furthermore, step 101, preceding the following, also includes:
[0056] Each sentence in the historical maintenance order is categorized to obtain the sentence category.
[0057] Step 102: Integrate all the keyword sets according to sentence categories to obtain category keyword sets.
[0058] The keyword sets corresponding to sentences of the same category are integrated into a total keyword set, resulting in a keyword set with the same number of sentence categories. It is understandable that the number of word segments within these keyword sets is not equal, so a subsequent keyword selection process is required.
[0059] Step 103: Perform global frequency statistics and ascending sort on the category keyword set to obtain the keyword sequence.
[0060] Taking each category of keywords as a unit, we perform global frequency statistics on the keywords within it and sort them in ascending order to obtain a keyword sequence. This operation is to prepare for keyword selection.
[0061] Step 104: Assign ranking values to the first preset number of keywords in the keyword sequence to obtain the keyword coefficient sequence.
[0062] Further, step 104 includes:
[0063] Retrieve the first preset number of keywords in the keyword sequence;
[0064] The reverse order of the keywords is used as coefficients to assign values to the first preset number of keywords, and the coefficients of keywords not in the first preset number are set to 0, thus obtaining the keyword coefficient sequence.
[0065] The selection still focuses on the lower-frequency keywords from the beginning, and the category keyword set for each sentence, after processing, is a set of keywords with a preset number, thus unifying the expression of the keyword set. The preset number can be set as needed and is not limited here.
[0066] Expressing frequency is too cumbersome and will increase the amount of calculation in the future. In order to simplify the calculation, the ranking number of the keyword is assigned to the corresponding keyword in reverse order. Except for the selected preset number of keywords, the coefficient of other keywords is assigned to 0, that is, they do not participate in the effective calculation.
[0067] For example, if the preset quantity is defined as Num = 3, then the coefficient ξ of the keyword ranked first in the keyword sequence... ij =3, the coefficient ξ of the second-ranked keyword ij =2; the coefficient ξ of the third-ranked keyword ij =1; the coefficient ξ of the fourth-ranked keyword ij =0…….
[0068] If we use Kij to represent the keywords in the keyword sequence, and ξ... ij The coefficient for each keyword is represented by Hi, which indicates the sentence category. The list of keywords can be found in Table 1, and the list of keyword coefficients can be found in Table 2.
[0069] Table 1 Keyword List
[0070] Sentence Category Keyword sequence H1 K11 K12 …… K1j …… …… …… …… …… Hi Ki1 Ki2 Kij
[0071] Table 2 Keyword Coefficient List
[0072] Sentence Category Keyword coefficient sequence H1 <![CDATA[ξ 11 ]]> <![CDATA[ξ 12 ]]> …… <![CDATA[ξ 1j ]]> …… …… …… …… …… Hi <![CDATA[ξ i1 ]]> <![CDATA[ξ i2 ]]> <![CDATA[ξ ij ]]>
[0073] Step 105: Calculate sentence scores based on the keyword coefficient sequence and keyword sequence of the text to be identified, and obtain multiple score results.
[0074] For any text statement W to be identified, we can check whether there are words from the keyword sequence in the text statement. Each sentence corresponds to a keyword sequence. If a word exists, it is recorded as 1; if it does not exist, it is recorded as 0. The keyword coefficient of the existing keyword is also recorded, and the coefficient of the non-existent keyword is also recorded as 0. Specifically, it can be represented as follows:
[0075]
[0076] Where, α ij Each sentence is assigned a score for the presence or absence of a specific keyword. The final score is then calculated using the following formula:
[0077]
[0078] Each sentence category's keyword sequence can be used to calculate a corresponding score, and the number of sentence categories is the number of scoring results.
[0079] Step 106: Select the sentence category corresponding to the maximum value in the scoring results as the target sentence category of the text to be identified.
[0080] The sentence with the highest score is selected, and its category is the category of the text statement to be identified, thus obtaining the matching result, i.e., the target sentence category.
[0081] To facilitate understanding of this embodiment, the following example of a routine power grid maintenance record is provided:
[0082] Table 3. Examples of Keyword Extraction from Power Grid On-site Maintenance Orders
[0083]
[0084]
[0085] Based on the keywords extracted above, values can be assigned according to the assigned value method to obtain a keyword coefficient sequence, and then a score calculation can be performed to obtain the best matching result.
[0086] The text classification method for power grid maintenance orders provided in this application requires only a small number of representative historical maintenance orders to complete the initial text preparation work, including keyword integration, proportional filtering, and ascending sorting. Each word is then assigned a value to obtain a corresponding keyword coefficient sequence. For any text sentence to be identified, the obtained coefficients can be used for scoring, thereby enabling accurate target category matching based on the score. Selecting a batch of keywords with lower frequencies also avoids the influence of complex inter-class relationships in sentences, making the classification results more reliable. Therefore, this application can solve the technical problem that existing machine learning matching methods require a large amount of training data and cannot handle complex inter-class relationships, resulting in a lack of accuracy and reliability.
[0087] For easier understanding, please refer to Figure 2 This application provides an embodiment of a text classification device for power grid maintenance orders, comprising:
[0088] The keyword selection module 201 is used to select a set of keywords for each sentence in the historical maintenance order according to a preset ratio. The historical maintenance order includes sentence categories.
[0089] The word integration module 202 is used to integrate all keyword sets according to sentence categories to obtain category keyword sets;
[0090] The word processing module 203 is used to perform global frequency statistics and ascending sort on the set of category keywords to obtain a keyword sequence;
[0091] The ranking assignment module 204 is used to assign ranking values to the first preset number of keywords in the keyword sequence to obtain the keyword coefficient sequence;
[0092] The scoring calculation module 205 is used to calculate sentence scores based on the keyword coefficient sequence and the keyword sequence of the text to be recognized, and obtain multiple scoring results.
[0093] The text classification module 206 is used to select the sentence category corresponding to the maximum value in the scoring results as the target sentence category of the text to be identified.
[0094] Furthermore, it also includes:
[0095] The sentence annotation module 207 is used to perform category annotation processing on each sentence in the historical maintenance order to obtain the sentence category.
[0096] Furthermore, the keyword selection module 201 is specifically used for:
[0097] Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set;
[0098] The global frequency of each word in a sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence;
[0099] Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set.
[0100] Furthermore, the ranking assignment module 204 is specifically used for:
[0101] Retrieve the first preset number of keywords in the keyword sequence;
[0102] The reverse order of the keywords is used as coefficients to assign values to the first preset number of keywords, and the coefficients of keywords not in the first preset number are set to 0, thus obtaining the keyword coefficient sequence.
[0103] This application also provides a text classification device for power grid maintenance orders, the device including a processor and a memory;
[0104] The memory is used to store program code and transfer the program code to the processor;
[0105] The processor is used to execute the text classification method for power grid maintenance orders in the above method embodiments according to the instructions in the program code.
[0106] This application also provides a computer-readable storage medium for storing program code for executing the text classification method for power grid maintenance orders in the above method embodiments.
[0107] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0108] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0109] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0110] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions for executing all or part of the steps of the methods described in the various embodiments of this application through a computer device (which may be a personal computer, server, or network device, etc.). The aforementioned storage medium includes: USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, optical disks, and other media capable of storing program code.
[0111] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A method of text classification of a power grid work order, the method comprising: include: The keyword set for each sentence in the historical maintenance order is selected according to a preset ratio. The historical maintenance order includes sentence categories. The specific process is as follows: Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set; The global frequency of each word in the sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence; Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set; All the keyword sets are integrated according to the sentence categories to obtain category keyword sets; The keyword set of the categories is then subjected to global frequency statistics and sorted in ascending order to obtain a keyword sequence; The keyword coefficient sequence is obtained by assigning ranking values to the first preset number of keywords in the keyword sequence. The specific process is as follows: Obtain the first preset number of keywords in the keyword sequence; The reverse sorting index of the keywords is used as a coefficient to be assigned to the first preset number of keywords, and the coefficients of keywords other than the first preset number are assigned to 0, thus obtaining the keyword coefficient sequence. Sentence scoring is calculated based on the keyword coefficient sequence and the keyword sequence of the text to be identified, resulting in multiple scoring results; The sentence category corresponding to the maximum value in the scoring results is selected as the target sentence category of the text to be identified.
2. The text classification method for power grid maintenance orders according to claim 1, characterized in that, The keyword set for each sentence in the historical maintenance order is selected according to a preset ratio. The historical maintenance order includes sentence categories and previously also includes: Each sentence in the historical maintenance order is categorized to obtain the sentence category.
3. A text classification device for power grid maintenance orders, characterized in that, include: The keyword selection module is used to select a set of keywords for each sentence in historical maintenance orders according to a preset ratio. The historical maintenance orders include sentence categories. Specifically, the keyword selection module is used for: Each sentence in the historical maintenance order is segmented using artificial intelligence to obtain an initial segmentation set; The global frequency of each word in the sentence is counted based on the initial word segmentation set, and then sorted in ascending order to obtain the word segmentation sequence corresponding to each sentence; Keywords are selected sequentially from the word segmentation sequence according to a preset ratio to obtain a keyword set; The word integration module is used to integrate all the keyword sets according to the sentence categories to obtain category keyword sets; The word processing module is used to perform global frequency statistics and ascending sort on the set of category keywords in sequence to obtain a keyword sequence; The ranking assignment module is used to assign ranking values to the first preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence. Specifically, the ranking assignment module is used for: Obtain the first preset number of keywords in the keyword sequence; The reverse sorting index of the keywords is used as a coefficient to be assigned to the first preset number of keywords, and the coefficients of keywords other than the first preset number are assigned to 0, thus obtaining the keyword coefficient sequence. The scoring calculation module is used to calculate sentence scores for the text sentences to be identified based on the keyword coefficient sequence and the keyword sequence, and obtain multiple scoring results. The text classification module is used to select the sentence category corresponding to the maximum value in the scoring results as the target sentence category of the text statement to be identified.
4. The text classification device for power grid maintenance orders according to claim 3, characterized in that, Also includes: The sentence tagging module is used to perform category tagging on each sentence in the historical maintenance order to obtain the sentence category.
5. A text classification device for power grid maintenance orders, characterized in that, The device includes a processor and a memory; The memory is used to store program code and transmit the program code to the processor; The processor is used to execute the text classification method for power grid maintenance orders according to any one of claims 1-2, based on the instructions in the program code.
6. A computer-readable storage medium, characterized in that, The computer-readable storage medium is used to store program code for executing the text classification method for power grid maintenance orders according to any one of claims 1-2.