Method for identifying a water blister vehicle based on vehicle repair data

By generating a corpus and performing data preprocessing and deep analysis, the problems of data collection and feature capture difficulties in vehicle evaluation were solved, enabling accurate identification of water immersion accidents and improving the reliability and privacy protection of the evaluation.

CN118485071BActive Publication Date: 2026-06-26BEIJING KUCHE YIMEI NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING KUCHE YIMEI NETWORK TECH CO LTD
Filing Date
2024-05-21
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing vehicle assessment technologies cannot accurately distinguish between the different degrees of water wading and water immersion, data collection is difficult, it is hard to capture subtle features, the applicability of learning algorithms is limited, there is a lack of domain knowledge, and privacy and ethical issues have not been fully considered.

Method used

A corpus is generated through data preprocessing and feature engineering. Sensitive information is removed using regular expressions. Word segmentation, sentence breaking, word combination, and deep analysis are performed to calculate the water bubble accident score. The accuracy of the analysis is improved by combining the deep analysis word set and privacy issues are avoided.

Benefits of technology

It improves the accuracy of identifying water immersion accidents, can capture subtle features, handle complex correlations and nonlinear relationships, avoids privacy issues, and provides more reliable vehicle assessments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118485071B_ABST
    Figure CN118485071B_ABST
Patent Text Reader

Abstract

The application provides a method for identifying bubble vehicles based on vehicle maintenance data, comprising the following steps: data collection and analysis preprocessing: collecting a large-scale data set comprising the vehicle maintenance data and vehicle accident data, and performing quality control on the large-scale data set, including data cleaning and abnormal value removal processing; data analysis: processing the description text in the large-scale data set to obtain corresponding risk types and levels. The application can capture subtle features, and the final identification accuracy is higher; can well process the complex correlation and nonlinear relationship in the vehicle data; improves the accuracy of the limit analysis method, and can find potential bubble accidents.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of vehicle detection and identification technology, and specifically relates to a method for identifying flood-damaged vehicles based on vehicle maintenance data. Background Technology

[0002] With the continuous growth in the number of passenger vehicles and the booming used car market, used car transactions have become an indispensable part of modern society. However, behind this market prosperity lies a problem that cannot be ignored: the lack of transparency in vehicle information during used car transactions. Especially in cases involving vehicles that have been flooded or submerged, buyers often find it difficult to obtain accurate information about the vehicle's condition, thus facing significant financial losses and potential safety hazards.

[0003] In recent years, water-related accidents have occurred frequently, severely impacting not only the structure and performance of vehicles but also posing a serious threat to the lives of owners and passengers. Due to the impact and immersion of water, vehicles involved in flooding may suffer severe damage to their internal parts, electrical systems, and body structure. This damage is not only difficult to detect externally but can also lead to a series of problems during future use, such as electrical malfunctions and mechanical component failures.

[0004] In the used car market, the lack of transparency in vehicle information often makes it difficult for buyers to understand the true condition of a vehicle. Many buyers, lacking professional automotive knowledge and experience, struggle to accurately interpret repair and accident data, making it easy to overlook hidden accident details. They are also susceptible to being deceived by unscrupulous dealers, purchasing used cars with serious hidden problems. Among these, flood-damaged vehicles are the most common. Flood-damaged vehicles refer to vehicles that have been severely submerged in water. These vehicles often suffer serious damage to electrical systems and mechanical components, and may even have structural deformation.

[0005] However, most existing vehicle assessment technologies remain at the superficial analysis stage, unable to accurately distinguish between the different degrees of water immersion and water damage. Furthermore, vehicle repair records are mostly entered by salespeople, potentially providing only surface information, such as which parts were replaced or what repairs were performed, but lacking details about the cause, extent, and potential impact of the accident. Similarly, accident data may only record the simple circumstances of the accident without providing sufficient details, making it difficult for buyers to fully understand the vehicle's true condition. Therefore, due to inconsistent data quality, many reports cannot simply determine the fact of a vehicle's accident from its surface appearance. Moreover, the potential lack of crucial information may prevent buyers from accurately determining whether a vehicle has been involved in a water damage incident, or assessing the impact of such incidents on vehicle performance and safety. This exposes buyers to significant risks when purchasing a car, potentially leading to the purchase of a used vehicle with serious hidden defects, and even financial losses and personal injury. In addition, there are currently no well-known and widely used procedures or methods in China for analyzing vehicle water damage incidents based on vehicle repair and accident data.

[0006] Existing vehicle evaluation technologies mainly suffer from the following problems:

[0007] 1. Difficulties in data collection: Obtaining sufficient quantity and quality of vehicle maintenance and accident data presents certain challenges. This is especially true when data from different insurance companies, repair service providers, and geographical locations is involved, as the consistency and completeness of the data can be affected.

[0008] 2. Difficulty in capturing subtle features: Although existing technologies can extract features from vehicle maintenance and accident data, some subtle but potentially important features may be difficult to capture using current data extraction methods. This may limit the accuracy and generalization ability of the analysis program for water blister accidents.

[0009] 3. Applicability of Learning Algorithms: Existing learning algorithms face certain limitations when processing vehicle maintenance and accident data. Some algorithms may make overly strict assumptions about data distribution and cannot effectively handle the complex correlations and nonlinear relationships in vehicle data.

[0010] 4. Lack of comprehensive domain knowledge: Although big data analysis methods can automatically extract patterns and correlations from data, the lack of domain experts' knowledge and experience will limit the accuracy and interpretability of the analysis process when analyzing vehicle flooding accidents.

[0011] 5. Privacy and Ethical Issues: Analyzing vehicle maintenance and accident data may involve the processing of personal privacy and sensitive information. Existing assessment technologies have failed to carefully consider data privacy and ethical issues, making it difficult to comply with relevant privacy regulations and provide necessary data protection. Summary of the Invention

[0012] To address the above problems, this invention provides a method for identifying flood-damaged vehicles based on vehicle maintenance data.

[0013] The method for identifying flood-damaged vehicles based on vehicle maintenance data provided by this invention includes the following steps:

[0014] Data collection and analysis preprocessing: Collect a dataset including the vehicle maintenance data and vehicle accident data, and perform quality control on the dataset, including data cleaning and outlier removal.

[0015] Data analysis: The descriptive text, i.e. the input text, in the dataset after preprocessing before data collection and analysis is processed to obtain the water bubble accident score.

[0016] further,

[0017] The data cleaning includes:

[0018] The license plate number is removed from the collected data using a regular expression.

[0019] The ID card number is removed from the collected data using a regular expression.

[0020] The phone numbers are removed from the collected data using a regular expression.

[0021] The part codes are removed from the collected data using regular expressions.

[0022] In this process, text matching the regular expressions in the collected data is obtained by matching the regular expressions. The matched text is then removed from the collected data by replacing the matched text with an empty character.

[0023] further,

[0024] The removal of outliers includes:

[0025] By matching text, the province and city names and 4S store names are removed from the data;

[0026] The descriptions of 4S store activities in the data were removed by using text regular expressions and similarity matching.

[0027] By using text matching, irrelevant vehicle condition information is removed from the collected data.

[0028] further,

[0029] The data analysis includes the following steps:

[0030] a. Perform word segmentation on the input text obtained through the preprocessing of the data collection and analysis to obtain a text list aList;

[0031] b. Segment and group the text list aList to obtain a complete semantic sentence list bList composed of complete semantic sentences;

[0032] c. Group the elements in the complete semantic sentence list bList into word groups to obtain the sentence list cList;

[0033] d. Based on the part-of-speech analysis of each element of the sentence list cList, each element of the sentence list cList generated in step 4c is individually combined into sub-element phrases to generate a phrase binding relationship list nandVGroupList;

[0034] e. Assign category and weight values ​​to each element in the phrase binding relationship list nandVGroupList to obtain the final phrase binding relationship list NANDVGroupList;

[0035] f. Based on the final word group binding list NANDVGroupList, calculate the water bubble incident score related to the input text.

[0036] further,

[0037] Step a includes:

[0038] a1. Search for Chinese-formatted symbols in the input text;

[0039] a2. Replace the searched Chinese symbols with the corresponding English symbols;

[0040] a3. If the input text contains spaces, then remove the spaces from the input text;

[0041] a4. Retrieves keywords related to water bubble accidents from the corpus. Matches these keywords against the input text from left to right, prioritizing keywords containing more symbols. This process segments the input text into keywords and English symbols. Specifically, keywords containing more symbols are given higher priority during matching.

[0042] The text list aList is a two-dimensional list. Each row of the text list aList stores the word segmentation result of the input text. Each row of the text list aList includes more than one list unit. The list unit includes a sequence number unit and a word segmentation unit. The sequence number unit stores the sequence number of the input text. Each word segmentation unit stores a text word, i.e., the keyword or an English symbol. The word segmentation unit and the text word or English symbol have a one-to-one correspondence. The length of each row of the text list aList, i.e., the number of list units in each row, is determined by how many text words and English symbols are stored in the row. The keyword and English symbol obtained from each input text word segmentation are sorted in the corresponding row of the text list aList according to their order before word segmentation.

[0043] further,

[0044] Step b includes:

[0045] The text in each line of the text list aList is semantically segmented based on punctuation marks, i.e., sentence breaks: periods, question marks, and exclamation marks divide the text before and after them into two semantically complete sentences; that is, periods, question marks, and exclamation marks are sentence break markers or semantic end markers. For other marks, it is assumed that the text before and after them is semantically related and exists within a semantically complete sentence.

[0046] The complete semantic sentence list bList is a two-dimensional table, where one dimension represents the elements of the complete semantic sentence list bList, and the other dimension represents the child elements of the elements of the complete semantic sentence list bList.

[0047] After sentence segmentation, the words and symbols in each complete semantic sentence are stored one-to-one in an element of the complete semantic sentence list bList. Each element includes at least one sub-element. Each sub-element is a data unit composed of the words in the complete semantic sentence corresponding to that element, separated by the other symbols. If a complete semantic sentence has no other symbols, all words in the sentence are stored in a single sub-element. If a complete semantic sentence has n other symbols (n is an integer not less than 1), when the complete semantic sentence ends with one other symbol, the entire sentence is sequentially divided into n segments by the n other symbols, with the words in each segment stored in a single sub-element. When a complete semantic sentence ends without any other symbols, the entire sentence is sequentially divided into n+1 segments by the n other symbols, with the words in each segment stored in a single sub-element. Each other symbol is assigned to the sub-element containing the word immediately preceding it.

[0048] further,

[0049] The sentence list cList is a two-dimensional table, where one dimension represents the elements of the sentence list cList, and the other dimension represents the child elements of the elements in the sentence list cList.

[0050] Step c includes:

[0051] For element A in the complete semantic sentence list bList, if its child elements do not contain the other symbols, then each word in the child element is stored one-to-one in the corresponding child element B in the sentence list cList. In this case, element A in the complete semantic sentence list bList has a one-to-one correspondence with element B in the sentence list cList. If the child element in element A ends with the other symbols, then three cases are handled:

[0052] The first case is when the other symbols are semicolons, that is, when the ending symbol of a certain sub-element of element A is a semicolon, the words before the semicolon in the semantically complete sentence containing the certain sub-element are separated from element A and stored in an element BB of the sentence list cList. The words are stored one-to-one in the sub-elements of element BB. Then, the remaining element A after separation is treated as a separate element and processed according to the three cases until all the words in element A have been processed.

[0053] The second case is when the other symbols are colons. Let next be the child element of the element in the current complete semantic sentence list bList, that is, the next child element after the current child element. When the ending symbol of the current child element is a colon, if the child element next contains both a verb and a noun, and the word in the current child element is already contained in the child element next, then the current child element is removed, that is, the current child element is not stored in the sentence list cList. Otherwise, the colon is regarded as the third of the three cases.

[0054] The third case involves other symbols that are sentence-separating symbols: commas, left parentheses, right parentheses, plus signs, minus signs, quotation marks, and colons (excluding those in the second case). After processing the first and second cases, the child elements (i.e., the second current child elements) of the remaining elements in the complete semantic sentence list bList are processed sequentially as follows:

[0055] When the second current child element ends with the sentence break symbol, if the next child element next1 after the second current child element contains both a verb and a noun, and the second current child element contains either a verb or a noun, or only a verb or a noun, then each word in the second current child element and the next child element next1 is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is skipped and processing continues; otherwise, if the second current child element stores words, then each word contained in the second current child element is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is processed.

[0056] If the sub-elements of the currently processed element A do not end with a comma, but the sub-elements of element A contain words, then each word in the sub-elements of element A is stored separately in a different sub-element of the corresponding element in the sentence list cList.

[0057] further,

[0058] Step d includes:

[0059] When performing the word group combination, each element of the sentence list cList is processed sequentially. For any element C, the part-of-speech tag of each word stored in each child element of element C is checked. Based on the part-of-speech tag, two cases are divided. In this case, an initially empty temporary array tempNandList is created to record the temporary word group combinations generated by combining the words stored in each child element of element C.

[0060] The first case is when the word stored in the current child element of element C is a noun. In this case, if there is a verb in the remaining unprocessed child elements of element C, let the current child element be termN1. Combine the word stored in the current child element termN1 with the verbs stored in the remaining unprocessed child elements of element C one by one. Record the number of swaps between the current child element and its adjacent child elements until it is adjacent to the combined child element as the distance of the combination. Then exclude word combinations with a distance distance1>2. Then compare the word combination with the temporary word combinations in the temporary array tempNandList one by one. If the word combination does not exist in the temporary array tempNandList, add it to the temporary array tempNandList.

[0061] If the remaining unprocessed child elements of element C do not store verbs, check if there exists a word combination in the temporary array tempNandList that contains a noun that is the same as the word stored in the current child element termN1. If it exists, no further processing is performed; otherwise, step c is repeated.

[0062] The second scenario is when the word stored in the current child element of element C is a verb. In this case, if there is a noun among the remaining unprocessed child elements of element C, the current child element is denoted as termV1. The word stored in the current child element termV1 is combined with the nouns stored in the remaining unprocessed child elements of element C one by one, and the distance of each combination is denoted as distance3. After excluding word combinations with a distance3>2, each of the resulting word combinations is compared with the temporary word combinations in the temporary array tempNandList. If the resulting word combination does not exist in tempNandList, it is added to the temporary array tempNandList.

[0063] If the remaining unprocessed child elements of element C do not store nouns, check if there is a verb combination in the temporary array tempNandList that is a word combination of the words stored in the current child element termV1. If it exists, no processing is performed; otherwise, step c is repeated.

[0064] When performing the phrase combination, after checking an element C, if a phrase combination exists in the temporary array tempNandList, then each phrase combination in the temporary array tempNandList is added as an element to the phrase binding relationship list nandVGroupList, and then the temporary array tempNandList is set to empty.

[0065] The phrase binding relationship list nandVGroupList is a two-dimensional table. One dimension represents element D of the phrase binding relationship list nandVGroupList, and the other dimension represents the four child elements of element D. Two of the four child elements, termV and termN, are the verb and noun in the phrase combination that constitutes element D, respectively. The other two child elements, class and weight, are the category value and weight value corresponding to the phrase combination that constitutes element D, respectively. One of these two values ​​is set to null, and the other can be set to 0 or null.

[0066] further,

[0067] Step e includes:

[0068] The phrase binding relationship list nandVGroupList is traversed. For any element E, keyword groups that have the same verb and noun as the current element E are matched from the corpus. The category value and weight value of the matched keyword groups are assigned to element E, and element E is retained in the phrase binding relationship list nandVGroupList. If no matching keyword group is found, element E is deleted from the phrase binding relationship list nandVGroupList. Then, the phrase binding relationship list nandVGroupList is filtered to remove elements E whose category value or weight value is null, resulting in the final phrase binding relationship list NANDVGroupList.

[0069] in,

[0070] The keyword group in the corpus includes one noun, one verb, a class value, and a weight value. The class values ​​are A1, B1, C1, D1, and E1, which represent five accident levels with progressively increasing severity. The weight value is an integer from 0 to 10, indicating the severity of the water immersion accident within the same category. The verb and noun in the same keyword group, when matched, represent an operation in car repair or maintenance.

[0071] further,

[0072] Step f includes:

[0073] Based on the category value (class) and weight value of each element F in the final phrase binding relationship list NANDVGroupList, the system analyzes whether a vehicle has experienced a water immersion event and calculates the water immersion score (waterScore) for element F. Each category value of element F in NANDVGroupList represents a score. As mentioned earlier, the category value (class) can be A1, B1, C1, D1, or E1. During calculation, each category value (class) corresponds to a number: A1 corresponds to 0, B1 to 0.1, C1 to 0.2, D1 to 0.5, and E1 to 1. The weight value (weight) is an integer from 0 to 10, representing the severity of the water immersion accident within the same category. When calculating the water immersion score (waterScore), the waterScore satisfies the following conditions:

[0074] waterScore = 5.5 - 3 × class 2 -0.25×weight.

[0075] further,

[0076] The data analysis also includes step g: in-depth analysis.

[0077] Traverse the deep analysis phrase set, which includes at least one deep analysis phrase G. Determine if a deep analysis phrase G exists in the set, where all its sub-elements are present in the final phrase binding list NANDVGroupList. If it exists, calculate the element F in the final phrase binding list NANDVGroupList that overlaps with the sub-elements of the phrase element based on the phrase element's class value (class1) and weight value (weight1). This element is denoted as the new score—deepWaterScore—for the overlapping element. Then, compare the waterScore of the overlapping element with the deepWaterScore, and take the lower of the two as the final bubble accident score (finalScore) for the overlapping element. If it does not exist, use the bubble water score (waterScore) calculated in step f as the final bubble accident score (finalScore) for element F in the final phrase binding list NANDVGroupList.

[0078] in,

[0079] The deep analysis phrase G includes three data items: multiple phrase elements H, the category value class1, and the weight value weight1. Each phrase element includes two phrase sub-elements: a verb sub-element termV2 and a noun sub-element termN2. The existence of these sub-elements means that each phrase element H in the deep analysis phrase G corresponds one-to-one with a different element F in the NANDVGroupList, and the verb sub-element termV2 and noun sub-element termN2 of phrase element H are respectively the corresponding sub-elements termV and termN in the element F.

[0080] The deep analysis term group's category values, class1, are A2, B2, C2, D2, and E2, representing the severity of the accident from low to high. During calculation, each class value (class1) corresponds to a number: A2 corresponds to 0, B2 to 1, C2 to 2, D2 to 3, and E2 to 4. The weight value (weight1) is an integer from 0 to 10, representing the severity of the water blistering accident within the same category.

[0081] The depth score deepWaterScore satisfies:

[0082] deepWaterScore=5-0.28×class1 2 -0.108×weight1.

[0083] The method for identifying flood-damaged vehicles based on vehicle maintenance data provided by this invention overcomes the difficulties of data collection by visualizing the data and continuously updating the corpus, and corrects the analysis results of flood-damaged accidents; the corpus generated by feature engineering analysis can capture subtle features, resulting in higher accuracy in the final identification; it can effectively handle complex correlations and nonlinear relationships in vehicle data; it improves the accuracy of constraint analysis methods, and can discover potential flood-damaged accidents through in-depth analysis; and it avoids the personal privacy and sensitive information involved in vehicle maintenance data and vehicle accident data.

[0084] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention may be realized and obtained by means of the structures pointed out in the description, claims and drawings. Attached Figure Description

[0085] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0086] Figure 1 A flowchart of a method for identifying flood-damaged vehicles based on vehicle maintenance data according to an embodiment of the present invention is shown. Detailed Implementation

[0087] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0088] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and accompanying drawings of this application are intended to cover non-exclusive inclusion. The terms "first," "second," "third," etc., in the specification, claims, or accompanying drawings of this application are used to distinguish different objects, not to describe a specific order or hierarchy. The term "multiple" in this application refers to two or more (including two).

[0089] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0090] Figure 1 A flowchart illustrating the method for identifying flood-damaged vehicles based on vehicle maintenance data provided by this invention is shown. See also... Figure 1 The method for identifying flood-damaged vehicles based on vehicle maintenance data includes the following steps.

[0091] 1. Those skilled in the art perform feature engineering analysis to generate a corpus.

[0092] This step involves feature engineering on vehicle repair data and vehicle accident data to extract features potentially related to water damage accidents. This step utilizes domain knowledge and expert experience to guide the feature selection process, identifying the most relevant feature text. The features extracted in this step are: keywords and related behavioral phrases. This step includes the following specific steps:

[0093] 1a. A person skilled in the art, such as an automotive repair technician, reviews a large amount of repair and accident data of vehicles clearly identified as having been flooded, and identifies keywords (i.e., characteristics) related to flooding accidents. These keywords are categorized by part of speech as noun keywords and verb keywords, i.e., nouns and verbs related to the repair or maintenance of vehicles involved in flooding accidents. The nouns and verbs mentioned in this invention are determined by a person skilled in the art.

[0094] 1b. Analyze existing vehicle condition data collected by those skilled in the art based on the labeled features. Those skilled in the art then filter out related behavioral phrases from the matched data and verify them. "Matched" means that the keyword obtained in step a appears completely in a certain piece of vehicle condition data; this piece of vehicle condition data is the matched keyword data. The related behavioral phrases are also called keyword combinations. A keyword combination is a combination of a verb and a noun, where at least one of the verb and noun is a keyword. A keyword combination represents an operation in vehicle repair, such as replacing the engine or repainting the two right-side doors. This step is used to find all possible keyword combinations associated with each keyword. For example, using the noun keyword "airbag" to search for related behavioral phrases in vehicle repair data will find the related keyword combination "replacing the airbag." Similarly, verb keywords can be used to search for related phrases in vehicle condition data.

[0095] 1c. For each keyword combination obtained from steps 1a and 1b, generate a set of words consisting of synonyms or similar-looking words for the verbs and nouns within it, denoted as a keyword array. Each keyword combination generates one keyword array, and the data in the keyword array is denoted as keyword data. Each keyword array is then matched against existing vehicle condition data. Based on the matching results, the keyword data in the generated keyword array is either retained or deleted, and the finally retained keyword data is combined into a keyword group. Note that erroneous data may appear in the actual data of each keyword array; for example, similar but different text may be entered. These erroneous data must be removed by deletion. The criteria for generating synonyms are that the generated words and the original words (i.e., the verbs or nouns in the keyword combination) are semantically similar. For example, if the original word is the verb keyword "repair," then the generated word "maintain" is semantically similar to "repair." The criteria for generating similar-looking words are that the generated words and the original words are similar in pronunciation and form. For example, if the original word is "A pillar," then the generated word "A-pillar" is similar in pronunciation and form to "A pillar." If the original word is still "repair," then the generated similar-looking word could be "only repair." Furthermore, the keyword array of synonyms or similar-looking words for the verb keyword "repair" is a set of words such as "repair," "repair," "only repair," "maintain," and "only rest." Of course, words like "only repair" and "only rest" are incorrect data and need to be deleted after matching. The meaning of "match" and "hit" is the same, referring to the complete appearance of a certain keyword data in the text of a certain vehicle condition data. During matching, the existing vehicle condition data text is traversed, and keyword data from the keyword arrays generated in this step is searched within the text. Keyword data that appears in the search is retained, while keyword data that does not appear is discarded. Based on step 1a, and after steps 1b and this step, the keyword groups required by this invention are obtained.

[0096] The keyword group structure includes one noun, one verb, a class value, and a weight value. Both the noun and verb are derived from keywords in a corpus. The class values ​​are A1, B1, C1, D1, and E1, representing five accident levels with progressively increasing water damage severity. The weight value is an integer from 0 to 10, serving as a variable in the water damage accident calculation and affecting the score. The weight value indicates the severity of the water damage accident within the same category. Weight values ​​from 0 to 10 represent increasing severity. Verbs and nouns within the same keyword group can be matched to represent an operation in car repair or maintenance. Those skilled in the art add weights and classifications to the obtained keyword groups using visualization tools to differentiate the severity of accidents.

[0097] 1d. Enter the keywords and keyword phrases obtained through steps 1a, 1b, and 1c into the database, store them in the database corpus, and manage them through a visualization tool.

[0098] It should be noted that in some complex water blister situations, multiple (more than one) keyword groups are needed to represent or confirm the accident situation. Data containing multiple keyword groups is called a deep analysis keyword group, which has its own weight value and classification value. A deep analysis keyword group includes multiple keyword groups, a separate category value (class1), and a separate weight value (weight1). The separate weight value and classification value are only effective when all keyword groups in the deep analysis keyword group appear in a single accident or repair record, and are shared by all keyword groups in the deep analysis keyword group. The separate weight value and classification value also participate in the analysis of water blister accidents. The dataset composed of deep analysis keyword groups is called a deep analysis keyword group set, which is compiled by those skilled in the art based on the repair and accident data of vehicles involved in water blister accidents. The deep analysis keyword group is a combination of multiple keyword groups with lower severity, forming a deep analysis keyword group with higher severity. If all keyword groups in a deep analysis keyword group appear in a single accident or repair record, it is considered a hit of that deep analysis keyword group. Deeply analyzed phrases and sets of deeply analyzed phrases are also stored in the corpus.

[0099] 2. Visualize the data and continuously update the corpus.

[0100] For the newly added vehicle maintenance and accident data collected by those skilled in the art, namely the vehicle repair data and vehicle accident data, step 1 is repeated to add the newly acquired keywords, keyword groups, in-depth analysis phrases, and in-depth analysis phrase sets to the corpus.

[0101] 3. Data collection and preprocessing before analysis:

[0102] Collect large-scale datasets (referred to as datasets) that include vehicle maintenance data and vehicle accident data. Large-scale datasets can be obtained from sources such as insurance companies, vehicle manufacturers, and maintenance service providers.

[0103] Quality control is performed on the collected data, i.e., the large-scale dataset, including data cleaning and outlier removal.

[0104] Data cleaning aims to remove irrelevant and privacy-sensitive information from the collected data. The specific steps are as follows:

[0105] 31a. Remove license plate numbers from the collected data using a regular expression for the license plate numbers.

[0106] 31b. Remove the ID card numbers from the collected data using a regular expression for the ID card numbers.

[0107] 31c. Remove phone numbers from the collected data using a regular expression for the phone numbers.

[0108] 31d. Remove part codes from the collected data using regular expressions for part codes.

[0109] In steps 31a to 31d, text matching the regular expression is found in the collected data, and the matched text is removed from the collected data by replacing the matched text with an empty character.

[0110] Outlier removal aims to exclude descriptive content irrelevant to the vehicle's condition. This includes the following steps:

[0111] 32a. Remove possible province / city names and 4S store names from the data through text matching. Specifically, retrieve these names from a database storing province / city names and 4S store names, search for these names in the collected data such as vehicle repair data and accident data, and remove the appearing names from the data.

[0112] 32b. Remove potential 4S store activity descriptions from the data using text regular expressions and similarity matching. 4S store activity descriptions have a fixed format. First, generate a regular expression based on this format. Then, use this regular expression to match the text of vehicle repair data in the collected data. During matching, all text characters in the regular expression are matched to obtain strings consisting of any characters between them. For example, if the regular expression for a 4S store activity description is "Dear customer, *? BMW Enjoy Maintenance Package, Nationwide One Price, Worry-Free Maintenance Activity", then the matching will result in strings consisting of any characters between the text characters "Dear customer", "BMW Enjoy Maintenance Package Worry-Free Maintenance Activity", "Nationwide One Price", and "Worry-Free Maintenance Activity". Then, use a similarity algorithm such as the Levenshtein distance algorithm to calculate the number of characters required to make the two texts identical. When the two texts are completely identical, the number of characters required is 0. During matching, a match is considered successful if the number of characters involved is less than 4. The operations include three basic operations: replacement, addition, and deletion.

[0113] 32c. Remove frequently occurring (more than twice in the same batch of text) but irrelevant vehicle condition information from the collected data through text matching. This type of information is filtered by those skilled in the art from the collected data. Once irrelevant vehicle condition information is discovered, it is added to the database through the data backend system and takes effect immediately. This frequently occurring but irrelevant vehicle condition information text is stored in the database. When these texts are searched in the collected data, such as vehicle maintenance data, and a particular text is found, it is deleted from the data.

[0114] Maintaining data consistency includes:

[0115] 33a. Before data collection begins, data standards and specifications are defined, including data types, formats, naming conventions, data ranges, data storage structures, field types, length limits, etc.

[0116] 33b. For data from multiple data sources, such as vehicle maintenance data and accident data from third-party partners, integration and merging shall be completed based on data standards and specifications.

[0117] 33c. When cleaning data, strictly follow the steps described in the aforementioned data cleaning and outlier removal section to prevent the same text from generating different processed results.

[0118] Maintaining data integrity includes:

[0119] 34a. Those skilled in the art shall periodically audit the data to check whether the data complies with business rules, i.e., whether there is any privacy-sensitive information that has not been processed, or information unrelated to vehicle condition, and whether there are any errors or inconsistencies. If data problems are found, they can be corrected in a timely manner through visualization programs.

[0120] 4. Data Analysis Method. In this method, the input text is the descriptive text from the large-scale dataset processed in step 3, and the water bubble accident score is obtained. This method includes the following steps:

[0121] 4a. Convert the symbols in the input text to English symbols, remove spaces, and perform word segmentation on the input text using the aforementioned corpus to obtain a text list aList. The text list aList is an ordered set of words and symbols in the input text. The conversion and word segmentation do not affect the original semantics, and the text order remains unchanged. All symbols appearing in the text are uniformly English symbols. The word segmentation process includes the following steps:

[0122] 4a1. Search for Chinese-formatted symbols in the input text;

[0123] 4a2. Replace the found Chinese symbols with the corresponding English symbols;

[0124] 4a3. If the input text contains spaces, remove the spaces from the input text.

[0125] 4a4. Read keywords from the corpus and match them against the input text from left to right, prioritizing longer words (words with more symbols) (the more symbols a word contains, the higher its priority). If the keyword appears in the input text, the match is successful. This process segments the input text into keywords and English symbols.

[0126] Example 1: Input text 1 is "Two back doors spray paint; replace air filter. Filter gets water in". Then: After executing step 4a1, the Chinese symbols ";" and "." are found in the input text; after executing step 4a2, the Chinese symbols ";" and "." are replaced with the corresponding English symbols ";" and "."; there are no spaces in the input text, skip step 4a3, execute step 4a4, segment "two back doors spray paint" into two text segments: "two back doors" and "spray paint", segment "replace air filter" into two text segments: "replace" and "air filter", segment "filter gets water in" into two text segments: "filter" and "water in", thus obtaining the text list aList1 of input text 1.

[0127] Example 2: If the input text 2 is "Right front door reshaping / front bumper disassembly / reassembly; front bumper, right front door painting. Hood, right fender painting", then: after executing step 4a1, the Chinese format symbols ";", "," and "." in the input text are searched; after executing step 4a2, the Chinese format symbols ";", "," and "." are replaced with the corresponding English symbols ";", "," and "."; since there are no spaces in the input text, step 4a3 is skipped, and step 4a4 is executed, "Right front door reshaping" is segmented into two text segments: "Right front door" and "reshaping", "front bumper disassembly / reassembly" is segmented into two text segments: "front bumper" and "disassembly / reassembly", "Right front door painting" is segmented into two text segments: "Right front door" and "painting", and "right fender painting" is segmented into two text segments: "right fender" and "painting", thus obtaining the text list aList2 of input text 2.

[0128] The text list `aList` is a two-dimensional list. Each row of `aList` stores the word segmentation results of one input text. Each row of `aList` contains more than one list unit, which includes an index unit and a word segmentation unit. The index unit stores the index of the input text, and each word segmentation unit stores a text word, i.e., the keyword or an English symbol. There is a one-to-one correspondence between word segmentation units and text words or English symbols. The length of each row of `aList`, i.e., the number of list units in each row, is determined by the number of text words and English symbols stored in that row. The keywords and English symbols obtained from each input text segmentation are sorted in the corresponding row of `aList` according to their order in the input text.

[0129] The word segmentation results for input text 1, text list aList1, are shown in Table 1. Table 1 contains 9 list units. The first list unit is the sequence number unit, where "n1" represents the sequence number of input text 1. The following 8 units are word segmentation units, storing the 8 text segments or English symbols obtained from the word segmentation of input text 1. The word segmentation results for input text 2, text list aList2, are shown in Table 1. Table 1 contains 16 list units. The first list unit is the sequence number unit, where "n2" represents the sequence number of input text 1. The following 15 units are word segmentation units, storing the 14 text segments or English symbols obtained from the word segmentation of input text 2.

[0130] Table 1. The text list aList1 after input text 1 is segmented.

[0131] n1 Two back doors Spray paint ; replace air filter . Filter Water ingress

[0132] Table 2. The text list aList2 after input text 2 is segmented.

[0133]

[0134] 4b. Based on the usage habits of text symbols, the word segmentation result obtained in step 4a, i.e., the text list aList, is divided into sentences to obtain a complete semantic sentence list bList composed of complete semantic sentences. This includes: semantically segmenting the text of each line in the text list aList according to punctuation marks, i.e., sentence segmentation. For example, periods, question marks, and exclamation marks divide the text before and after them into two semantically complete sentences. That is, periods, question marks, and exclamation marks are sentence segmentation marks or semantic end marks. For other marks, it is determined that the text content before and after the mark is semantically related and exists within a semantically complete sentence. After sentence segmentation, the words and symbols in each complete semantic sentence are stored one-to-one in an element of the complete semantic sentence list bList. Each element includes at least one sub-element. Each sub-element of the element is a data unit composed of the words in the complete semantic sentence corresponding to the element, which are sequentially separated by the other symbols. If the complete semantic sentence has no other symbols, the words in the entire sentence are stored in one sub-element. If the complete semantic sentence has n other symbols, where n is an integer not less than 1, when the end of the complete semantic sentence is one other symbol, the entire sentence is divided into n segments from beginning to end by the n other symbols, and the words in each segment are stored in one sub-element. When the complete semantic sentence has no other symbols at the end, the entire sentence is divided into n+1 segments from beginning to end by the n other symbols, and the words in each segment are stored in one sub-element. Each other symbol is assigned to (i.e., stored in) the sub-element containing the word immediately preceding it. For example, if a symbol in a row of aList is “;”, then the words before and after the symbol are all placed into the same element of the two-dimensional list bList, which contains complete semantic sentences; if a symbol in a row of aList is “.”, then the words before and after the symbol are placed into different elements of the two-dimensional list bList, which contains complete semantic sentences.

[0135] The complete semantic sentence list bList1 obtained from the sentence segmentation grouping in Table 1 is shown in Table 3. The complete semantic sentence list bList is a two-dimensional table. One dimension of the two-dimensional table represents the elements of the complete semantic sentence list bList. For example, in Table 3, each column except the first column (which represents the sub-element sequence number) represents one element. The other dimension of the two-dimensional table represents the sub-elements of the elements in the complete semantic sentence list bList. For example, in Table 3, each cell in each row except the first row (which represents the element sequence number) stores a sub-element representing the element corresponding to that cell. Following the above sentence segmentation method, if the input text 1 contains a segmentation symbol “.”, the segmentation result before “.” is stored as a semantically complete sentence in the column with the element sequence number “bList1 element 1”, serving as the first element of the complete semantic sentence list bList1; the segmentation result after “.” is stored as a semantically complete sentence in the column with the element sequence number “bList1 element 2”, serving as the second element of the complete semantic sentence list bList1. In bList1, the first element (e.g., element 1 of bList1) contains the phrase "two back doors spray paint;", which is assigned to the first sub-element (e.g., sub-element 1). The phrase "replace air filter" is assigned to sub-element 2. Since there are no other symbols in "filter water inlet", the word segmentation result of the entire sentence is assigned to the second element (e.g., element 1 of bList2) of bList1.

[0136] Similarly, the complete semantic sentence list bList2 obtained by grouping sentences according to Table 2 is shown in Table 4.

[0137] Table 3. List of complete semantic sentences obtained by grouping sentences from Table 1 (bList1).

[0138] bList1 element 1 bList1 element 2 Sub-element 1 Both rear doors were spray-painted; Filter inlet water Sub-element 2 Replace the air filter

[0139] Table 4. List of complete semantic sentences obtained by grouping sentences from Table 2.

[0140] bList2 element 1 bList2 element 2 Sub-element 1 Right front door reshaping Engine cover Sub-element 2 Front bumper removal and installation; Right fender painting Sub-element 3 Front bumper child element 4 Right front door paint spray

[0141] 4c. Group the elements in the complete semantic sentence list bList generated in step 4b into a sentence list cList. The sentence list cList is a two-dimensional table. One dimension of the two-dimensional table represents the elements of the sentence list cList. For example, in Table 5, each row except the first row (which represents the sub-element sequence number) represents an element. The other dimension of the two-dimensional table represents the sub-elements of the elements in the sentence list cList. For example, in Table 5, each cell in each column except the first column (which represents the element sequence number) stores a sub-element representing the element corresponding to that cell. Each sub-element of each element in cList stores a word.

[0142] Table 5. Sentence list cList1 obtained from the vocabulary grouping in Table 4.

[0143] Sub-element 1 Sub-element 2 Sub-element 3 child element 4 cList1 element 1 Right front door Plastic Surgery Front bumper Disassembly and assembly cList1 element 2 Front bumper Right front door Spray paint cList1 element 3 hood Right side fender Spray paint

[0144] When performing the vocabulary grouping, for element A in the complete semantic sentence list bList, if its child elements do not contain the other symbols, then each word in that child element is stored one-to-one in the corresponding child element B in the sentence list cList. In this case, element A in the complete semantic sentence list bList has a one-to-one correspondence with element B in the sentence list cList. If the child elements in element A end with the other symbols, then three cases are handled:

[0145] The first of the three cases is when the other symbols are semicolons. That is, when the last symbol of a sub-element of any element A in the complete semantic sentence list bList is a semicolon, the words before the semicolon in the semantically complete sentence containing that sub-element are separated from element A and stored in an element BB of the sentence list cList. Each word is stored in a corresponding sub-element of element BB. Then, the remaining element A after separation is treated as a separate element and processed according to the three cases until all the words in element A have been processed. If the last character of sub-element 2 in element 1 of bList2 in Table 4 is a semicolon, then the words before the semicolon in the semantically complete sentence containing sub-element 2 (i.e., the words in sub-element 1 and sub-element 2 of element 1 of bList2 in Table 4) are separated and stored in element 1 of cList1 in the sentence list cList1 in Table 5. Specifically, the two words "right front door" and "reshaping" in sub-element 1 of element 1 of bList2, and the two words "front bumper" and "reassembly" in sub-element 2 of element 1 of bList2, are each stored as a sub-element of element 1 of cList1, resulting in a total of four sub-elements in element 1 of cList1. The separated element 1 of bList2 contains sub-elements 3 and 4. The remaining element 1 of bList2 after separation is then processed as a separate element.

[0146] The second of the three cases is when the other symbol is a colon. Let next be the next child element after the current child element in the current complete semantic sentence list bList. When the ending symbol of the current child element is a colon, if the child element next contains both a verb and a noun, and the word in the current child element is already contained in the child element next, then the current child element is removed, that is, it is not stored in the sentence list cList; otherwise, the colon is considered to be the third of the three cases. For example, given the input text 3 "Spray painting: Full car spray painting (except for the rear cover)," the complete semantic sentence list bList3 obtained after segmenting and grouping it is shown in Table 6. The complete semantic sentence list bList3 contains only one element—bList3 element 1. The last character of sub-element 1 in bList3 element 1 is a colon. In the complete semantic sentence of input text 3, the word "spray painting" before the colon is a verb. The words "full car" and "spray painting" in sub-element 2 are a noun and a verb, respectively. Since the words in sub-element 2 contain the words before the colon, sub-element 1 in bList3 element 1 is not stored in the corresponding sentence list cList2.

[0147] Table 6. List of complete semantic sentences bList3 obtained by grouping the input text into segments.

[0148] bList3 element 1 Sub-element 1 Spray painting: Sub-element 2 Full car repainting Sub-element 3 Back cover

[0149] Table 7. Sentence list cList2 obtained from the vocabulary grouping in Table 6.

[0150] Sub-element 1 Sub-element 2 Sub-element 3 cList2 element 1 The whole vehicle Spray paint Back cover

[0151] The third of the three cases involves other symbols used as sentence separators: commas, left parentheses, right parentheses, plus signs, minus signs, quotation marks, and colons (excluding those in the second case). After processing the above two cases, the child elements of the remaining elements in the complete semantic sentence list bList are processed as follows:

[0152] When the current child element ends with the sentence separator, if the next child element next1 after the current child element contains both a verb and a noun, and the current child element contains either a verb or a noun, or only a verb or a noun, then each word in the current child element and the next child element next1 is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is skipped and processing continues; otherwise, if the current child element contains words, then each word contained in the current child element is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is processed. Taking the remaining bList2 element 1 after the above separation as an example, its child element 3 ends with the sentence separator - a comma, and child element 3 contains only the noun "front bar". The next child element, i.e., child element 4, contains the noun "right front door" and the verb "spray paint". Then, "front bar", "right front door", and "spray paint" are stored separately in the sentence list cList1 as different child elements of cList1 element 2. For element 1 of bList3 in the complete semantic sentence list bList3, we have already discussed child element 1. Child element 2, as the current child element, contains the words "full car" and "paint". However, the next child element, i.e., child element 3 of element 1 of bList3, only contains the word "back cover". Therefore, at this time, we only need to store the words "full car" and "paint" into child element 1 and child element 2 of the sentence list cList2 corresponding to the complete semantic sentence list bList3, and the cList2 element 1 corresponding to element 1 of element 1 of bList3, respectively, and then consider processing child element 3 of element 1 of bList3.

[0153] When the current child element has no ending symbol (i.e., the child element at the end of a complete semantic sentence is detected), if the current child element contains words, then each word in the current child element is stored separately in a different child element of the corresponding element in the sentence list cList. For example, if child element 3 of element 1 in bList3 has no ending symbol and contains only the word "back cover", then this word is stored in child element 3 of element 1 in cList2.

[0154] It should be noted that when handling the third case above, the elements in the complete semantic sentence list bList correspond one-to-one with the elements in the sentence list cList.

[0155] 4d. Based on the part-of-speech analysis of the text, each element of the sentence list cList generated in step 4c is individually combined into sub-elements to generate a list of word grouping relationships nandVGroupList. Each element in nandVGroupList has two sub-elements, one storing a verb and the other storing a noun.

[0156] When performing the word combination, each element of cList is processed sequentially. For any element C, the part of speech of each word stored in each sub-element of element C is checked one by one. Based on the different parts of speech, there are two cases. In this case, an initially empty temporary array tempNandList is created to record the temporary word combination generated by combining the words stored in each sub-element of element C.

[0157] The first case is when the word stored in the current child element of element C is a noun. In this case, if there is a verb in the remaining unprocessed child elements of element C, let the current child element be termN1. Combine the word stored in the current child element termN1 with the verbs stored in the remaining unprocessed child elements of element C one by one, and exclude word combinations whose distance (the number of swaps between the current child element and its adjacent child elements until it is adjacent to its combined child element) distance1>2. Then compare each word combination with the word combinations in the temporary array tempNandList. If the word combination does not exist in the temporary array tempNandList, add it to the temporary array tempNandList. For example, when combining, for element 1 of cList1 in sentence list cList1, if child element 1 is the current child element termN1, and the word "right front door" stored in it is a noun, then the word "reshaping" stored in the remaining child element 2 is a verb, the word "front bar" stored in child element 3 is a noun, and the word "disassembly and assembly" stored in child element 4 is a verb. It can be seen that the words stored in the remaining child elements 2 and 4 can be combined with the word "right front door" stored in the current child element termN1.

[0158] When the remaining unprocessed child elements of element C do not store verbs, check if there is a word combination in the temporary array tempNandList that is a noun of the word stored in the current child element termN1. If it exists, no processing is performed; otherwise, step 4c is repeated.

[0159] The second case is when the word stored in the current child element is a verb. In this case, if there is a noun in the remaining unprocessed child elements of element C, let the current child element be termV1. Combine the words in the current child element termV1 with the nouns in the remaining unprocessed child elements of element C one by one, and record the distance of each combination as distance3. After excluding word combinations with a distance3>2, compare each of the resulting word combinations with the word combinations in the temporary array tempNandList. If the resulting word combination does not exist in tempNandList, add it to the temporary array tempNandList.

[0160] When the remaining unprocessed child elements of element C do not store nouns, check whether there is a combination of verbs in the temporary array tempNandList that is the same as the word stored in the current child element termV1. If it exists, no processing is performed; otherwise, step 4c is repeated.

[0161] When performing phrase combination, after checking a cList element, if a phrase combination exists in the temporary array tempNandList, then each phrase combination in the temporary array tempNandList is added as an element to the phrase binding relationship list nandVGroupList, and then the temporary array tempNandList is set to empty. The phrase binding relationship list nandVGroupList is a two-dimensional table. One dimension of the two-dimensional table represents the element D of the phrase binding relationship list nandVGroupList, as shown in Table 8, each row except the first row (which represents child elements) represents one element D; the other dimension of the two-dimensional table represents the four child elements of the phrase binding relationship list nandVGroupList element D, as shown in Table 8, each cell in each column except the first column (which represents the element sequence number) stores a child element representing the element D corresponding to that cell. Of the four sub-elements, two, termV and termN, are the verb and noun in the phrase combination constituting element D, respectively. The other two sub-elements, class and weight, are the category value and weight value corresponding to the phrase combination constituting element D, respectively. One of these two values ​​is initially set to null, and the other can be set to 0 or null. For example, element 2 of cList1 in sentence list cList1 is processed according to the first case of phrase combination to obtain elements 2 and 3 in phrase binding relationship list nandVGroupList1; similarly, element 3 of cList1 is combined with phrases to obtain elements 4 and 5 in nandVGroupList1.

[0162] Table 8. List of phrase binding relationships obtained by combining phrases from sentence list cList1 nandVGroupList1

[0163] termV (verb) termN (noun) class(category value) weight (weight value) Element 1 Plastic Surgery Right front door null 0 Element 2 Spray paint Front bumper null 0 Element 3 Spray paint Right front door null 0 Element 4 Spray paint Right side fender null 0 Element 5 Spray paint hood null 0

[0164] 4e. Assign category and weight values ​​to each element in the phrase binding relationship list nandVGroupList. Specifically, iterate through the phrase binding relationship list nandVGroupList. For any element E, match keyword groups from the corpus that have the same verb and noun as the current element E. Assign the category and weight values ​​of the matched keyword groups to element E. Keep the current element E in the phrase binding relationship list nandVGroupList. If no keyword group is matched, delete the current element E from the phrase binding relationship list nandVGroupList. Then, filter the phrase binding relationship list nandVGroupList to remove elements E whose category or weight value is null, thus obtaining the final phrase binding relationship list NANDVGroupList. As shown in Table 9, elements 2 and 4 can be matched with keyword groups from the corpus. After assigning the category value and weight value of the matched keyword group to the element and deleting elements 1, 3 and 5, the final word group binding relationship list NANDVGroupList1 corresponding to the sentence list cList1 is obtained (the element sequence number in Table 9 is not modified to intuitively show the result after deletion).

[0165] Table 9. Final Phrase Binding Relationship List NANDVGroupList1 obtained from sentence list cList1

[0166]

[0167]

[0168] 4f. Based on the class value and weight value of each element F in the final phrase binding relationship list NANDVGroupList, analyze whether the vehicle has experienced a water immersion event, and calculate the water immersion score (waterScore) for each element F. Each element F in NANDVGroupList has a class value representing a score. As mentioned earlier, the class values ​​can be: A1, B1, C1, D1, E1, from A1 to E1, representing severity from low to high. During calculation, each class value corresponds to a number: A1 corresponds to 0, B1 to 0.1, C1 to 0.2, D1 to 0.5, and E1 to 1. The weight value (weight) is an integer from 0 to 10, representing the severity of the water immersion accident within the same category. The weight value increases from 0 to 10, indicating increasing severity. When calculating the water immersion score (waterScore), the waterScore satisfies:

[0169] waterScore = 5.5 - 3 × class 2-0.25×weight.

[0170] For example: Let maxClassList be the sublist of elements with the highest category in NANDVGroupList. Let maxNandVGroup be the element with the highest weight value in maxClassList. Then, when the category value of maxNandVGroup is E2 and the weight value is 8, the bubble score of the element maxNandVGroup is...

[0171] waterScore = 5.5 - 3 × 1 2 -0.25×8=0.5.

[0172] Table 10. Examples of Deep Analysis Phrase Sets

[0173]

[0174] 4g. Deep Analysis: Traverse the aforementioned deep analysis phrase set, which includes at least one deep analysis phrase G. Determine if a deep analysis phrase G exists in the set, and all its phrase sub-elements are present in NANDVGroupList. If it exists, calculate a new score for the element F in NANDVGroupList that overlaps with the phrase sub-elements of the phrase element (denoted as the duplicate element), based on the phrase element's class value (class1) and weight value (weight1). This new score is denoted as the deep score (deepWaterScore). If it does not exist, use the bubble water score (waterScore) calculated in step 4f as the final bubble accident score (finalScore) for element F in NANDVGroupList. The deep analysis phrase includes three data items: multiple phrase elements, a class value (class1), and a weight value (weight1). Each phrase element includes two phrase sub-elements: a verb sub-element (termV2) and a noun sub-element (termN2). The existence of a deep analysis phrase means that each phrase element H of the deep analysis phrase G corresponds one-to-one with a different element F of the NANDVGroupList, and the verb sub-element termV2 and noun sub-element termN2 of the phrase element H are respectively the sub-elements termV and termN in the corresponding element F. Table 10 shows an example of a deep analysis phrase set, which includes two deep analysis phrases G. Only the specific data of three data items of deep analysis phrase 1 are given. Deep analysis phrase 1 includes two phrase elements. The verb sub-element termV and noun sub-element termN of the first phrase element are "spray paint" and "front bumper" respectively. The verb sub-element termV and noun sub-element termN of the second phrase element are "spray paint" and "right fender" respectively. These two phrase elements share the same category value class1 and weight value weight1. The verb sub-element termV and noun sub-element termN of the first and second phrase elements are sub-elements of elements 2 and 4 in Table 9, respectively, so that the first and second phrase elements correspond one-to-one with elements 2 and 4 in Table 9. That is, all the phrase sub-elements of the phrase elements of the deep analysis phrase 1 exist in the final phrase binding relationship list NANDVGroupList1.

[0175] The deep analysis uses the category values ​​class1 for the phrases: A2, B2, C2, D2, and E2, representing the severity of the accident from low to high. During calculation, each category value (class1) corresponds to a number: 0 for A2, 1 for B2, 2 for C2, 3 for D2, and 4 for E2. The weight value (weight1) is an integer from 0 to 10, representing the severity of the water blisters within the same category. The weight value (weight1) ranges from 0 to 10, indicating an increasing severity of the water blisters.

[0176] The depth score deepWaterScore satisfies:

[0177] deepWaterScore=5-0.28×class1 2 -0.108×weight1.

[0178] Example: When the deep analysis phrase category value is E1 and the weight value is 8,

[0179] deepWaterScore = 5 - 0.28 × 4 2 -0.052×8=0.104.

[0180] Then, the `waterScore` and `deepWaterScore` of the duplicate element are compared, and the lower of these is taken as the final water blister accident score `finalScore` for the duplicate element, thus obtaining the water blister accident score related to the input text. The `finalScore` represents the level and severity of the water blister accident, and is divided into three levels: flooding, blistering, and immersion. When the `finalScore` ∈ [0,1), the corresponding accident is at the flooding level; when the `finalScore` ∈ [1,2.5), the corresponding accident is at the blistering level; and when the `finalScore` > 2.5, the corresponding accident is at the immersion level. Within the same level, a lower score indicates a more severe degree. Within the same value range, a lower score indicates a more specific degree of flooding. For example, when the score is in the range of 0-1, the accident described in the input text is at the flooding level, and the car is identified as a flooded car. The lower the score, the more specific the degree of flooding, and the more severe the flooding.

[0181] The method for identifying flood-damaged vehicles based on vehicle maintenance data provided by this invention overcomes the difficulties of data collection by visualizing the data and continuously updating the corpus, and corrects the analysis results of flood-damaged accidents; the corpus generated by feature engineering analysis can capture subtle features, resulting in higher accuracy in the final identification; it can effectively handle complex correlations and nonlinear relationships in vehicle data; it improves the accuracy of constraint analysis methods, and can discover potential flood-damaged accidents through in-depth analysis; and it avoids the personal privacy and sensitive information involved in vehicle maintenance data and vehicle accident data.

[0182] Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying flood-damaged vehicles based on vehicle maintenance data, characterized in that, Including the following steps: Data collection and analysis preprocessing: Collect a dataset including the vehicle maintenance data and vehicle accident data, and perform quality control on the dataset, including data cleaning and outlier removal. Data analysis: The descriptive text, i.e., the input text, in the dataset after preprocessing before data collection and analysis is processed to obtain the water bubble accident score; The data cleaning includes: The license plate number is removed from the collected data using a regular expression. The ID card number is removed from the collected data using a regular expression. The phone numbers are removed from the collected data using a regular expression. The part codes are removed from the collected data using regular expressions. In the collected data, text that matches the regular expression is matched using the regular expression. The matched text is then removed from the collected data by replacing the matched text with an empty character. The removal of outliers includes: By matching text, the province and city names and 4S store names are removed from the data; The descriptions of 4S store activities in the data were removed by using text regular expressions and similarity matching. Text matching is used to remove irrelevant vehicle condition information from the collected data; The data analysis includes the following steps: a. Perform word segmentation on the input text obtained through the preprocessing of the data collection and analysis to obtain a text list aList; b. Segment and group the text list aList to obtain a complete semantic sentence list bList composed of complete semantic sentences; c. Group the elements in the complete semantic sentence list bList into word groups to obtain the sentence list cList; d. Based on the part-of-speech analysis of each element of the sentence list cList, each element of the sentence list cList generated in step 4c is individually combined into sub-element phrases to generate a phrase binding relationship list nandVGroupList; e. Assign category and weight values ​​to each element in the phrase binding relationship list nandVGroupList to obtain the final phrase binding relationship list NANDVGroupList; f. Based on the final word group binding list NANDVGroupList, calculate the water bubble accident score related to the input text; Step a includes: a1. Search for Chinese-formatted symbols in the input text; a2. Replace the searched Chinese symbols with the corresponding English symbols; a3. If the input text contains spaces, then remove the spaces from the input text; a4. Retrieves keywords related to water bubble accidents from the corpus. Matches these keywords against the input text from left to right, prioritizing keywords containing more symbols. This process segments the input text into keywords and English symbols. Specifically, keywords containing more symbols are given higher priority during matching. The text list aList is a two-dimensional list. Each row of the text list aList stores the word segmentation result of the input text. Each row of the text list aList includes more than one list unit. The list unit includes a sequence number unit and a word segmentation unit. The sequence number unit stores the sequence number of the input text. Each word segmentation unit stores a text word, i.e., the keyword or an English symbol. The word segmentation unit and the text word or English symbol have a one-to-one correspondence. The length of each row of the text list aList, i.e., the number of list units in each row, is determined by how many text words and English symbols are stored in the row. The keywords and English symbols obtained from each input text word segmentation are sorted in the corresponding row of the text list aList according to their order before word segmentation. Step b includes: The text in each line of the text list aList is semantically segmented based on punctuation marks, i.e., sentence breaks: periods, question marks, and exclamation marks divide the text before and after them into two semantically complete sentences; that is, periods, question marks, and exclamation marks are sentence break markers or semantic end markers. For other marks, it is assumed that the text before and after them is semantically related and exists within a semantically complete sentence. The complete semantic sentence list bList is a two-dimensional table, where one dimension represents the elements of the complete semantic sentence list bList, and the other dimension represents the child elements of the elements of the complete semantic sentence list bList. After sentence segmentation, the words and symbols in each complete semantic sentence are stored one-to-one in an element of the complete semantic sentence list bList. Each element includes at least one sub-element. Each sub-element of the element is a data unit composed of the words in the complete semantic sentence corresponding to the element, which are sequentially separated by the other symbols. If the complete semantic sentence has no other symbols, the words in the entire sentence are stored in one sub-element. If the complete semantic sentence has n other symbols, where n is an integer not less than 1, when the end of the complete semantic sentence is one other symbol, the entire sentence is divided into n segments from front to back by the n other symbols, and the words in each segment are stored in one sub-element. When the complete semantic sentence has no other symbols at the end, the entire sentence is divided into n+1 segments from front to back by the n other symbols, and the words in each segment are stored in one sub-element. Each other symbol is assigned to the sub-element containing the word immediately preceding it. The sentence list cList is a two-dimensional table, where one dimension represents the elements of the sentence list cList, and the other dimension represents the child elements of the elements in the sentence list cList. Step c includes: For element A in the complete semantic sentence list bList, if its child elements do not contain the other symbols, then each word in the child element is stored one-to-one in the corresponding child element B in the sentence list cList. In this case, element A in the complete semantic sentence list bList has a one-to-one correspondence with element B in the sentence list cList. If the child element in element A ends with the other symbols, then three cases are handled: The first case is when the other symbols are semicolons, that is, when the ending symbol of a certain sub-element of element A is a semicolon, the words before the semicolon in the semantically complete sentence containing the certain sub-element are separated from element A and stored in an element BB of the sentence list cList. The words are stored one-to-one in the sub-elements of element BB. Then, the remaining element A after separation is treated as a separate element and processed according to the three cases until all the words in element A have been processed. The second case is when the other symbols are colons. Let next be the child element of the element in the current complete semantic sentence list bList, that is, the next child element after the current child element. When the ending symbol of the current child element is a colon, if the child element next contains both a verb and a noun, and the word in the current child element is already contained in the child element next, then the current child element is removed, that is, the current child element is not stored in the sentence list cList. Otherwise, the colon is regarded as the third of the three cases. The third case involves other symbols that are sentence-separating symbols: commas, left parentheses, right parentheses, plus signs, minus signs, quotation marks, and colons (excluding those in the second case). After processing the first and second cases, the child elements (i.e., the second current child elements) of the remaining elements in the complete semantic sentence list bList are processed sequentially as follows: When the second current child element ends with the sentence break symbol, if the next child element next1 after the second current child element contains both a verb and a noun, and the second current child element contains either a verb or a noun, or only a verb or a noun, then each word in the second current child element and the next child element next1 is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is skipped and processing continues; otherwise, if the second current child element stores words, then each word contained in the second current child element is stored separately in different child elements of the corresponding element in the sentence list cList, and then the next child element next1 is processed. If the sub-elements of the currently processed element A do not end with a punctuation mark, but the sub-elements of element A contain words, then each word in the sub-elements of element A is stored separately in a different sub-element of the corresponding element in the sentence list cList; Step d includes: When performing the word group combination, each element of the sentence list cList is processed sequentially. For any element C, the part-of-speech tag of each word stored in each child element of element C is checked. Based on the part-of-speech tag, two cases are divided. In this case, an initially empty temporary array tempNandList is created to record the temporary word group combinations generated by combining the words stored in each child element of element C. The first case is when the word stored in the current child element of element C is a noun. In this case, if there is a verb in the remaining unprocessed child elements of element C, let the current child element be termN1. Combine the word stored in the current child element termN1 with the verbs stored in the remaining unprocessed child elements of element C one by one. Record the number of swaps between the current child element and its adjacent child elements until it is adjacent to the combined child element as the distance of the combination. Then exclude word combinations whose distance distance 1>2. Then compare the word combination with the temporary word combinations in the temporary array tempNandList one by one. If the word combination does not exist in the temporary array tempNandList, add it to the temporary array tempNandList. If the remaining unprocessed child elements of element C do not store verbs, check if there exists a word combination in the temporary array tempNandList that contains a noun that is a word stored in the current child element termN1. If it exists, no further processing is performed; otherwise, step c is repeated. The second scenario is when the word stored in the current child element of element C is a verb. In this case, if there is a noun among the remaining unprocessed child elements of element C, the current child element is denoted as termV1. The word stored in the current child element termV1 is combined with the nouns stored in the remaining unprocessed child elements of element C one by one, and the distance of each combination is denoted as distance3. After excluding word combinations with a distance 3 > 2, each of the resulting word combinations is compared with the temporary word combinations in the temporary array tempNandList. If the resulting word combination does not exist in tempNandList, it is added to the temporary array tempNandList. If the remaining unprocessed child elements of element C do not store nouns, check if there is a verb combination in the temporary array tempNandList that is a word combination of the words stored in the current child element termV1. If it exists, no processing is performed; otherwise, step c is repeated. When performing the phrase combination, after checking an element C, if a phrase combination exists in the temporary array tempNandList, then each phrase combination in the temporary array tempNandList is added as an element to the phrase binding relationship list nandVGroupList, and then the temporary array tempNandList is set to empty. The phrase binding relationship list nandVGroupList is a two-dimensional table. One dimension represents the element D of the phrase binding relationship list nandVGroupList, and the other dimension represents the four child elements of the element D. Two of the four child elements, termV and termN, are the verb and noun in the phrase combination that constitutes element D, respectively. The other two child elements, class and weight, are the category value and weight value corresponding to the phrase combination that constitutes element D, respectively. One of these two values ​​is set to null, and the other can be set to 0 or null. Step e includes: The phrase binding relationship list nandVGroupList is traversed. For any element E, keyword groups that have the same verb and noun as the current element E are matched from the corpus. The category value and weight value of the matched keyword groups are assigned to element E, and element E is retained in the phrase binding relationship list nandVGroupList. If no matching keyword group is found, element E is deleted from the phrase binding relationship list nandVGroupList. Then, the phrase binding relationship list nandVGroupList is filtered to remove elements E whose category value or weight value is null, resulting in the final phrase binding relationship list NANDVGroupList. in, The keyword group in the corpus includes one noun, one verb, a class value, and a weight value. The class values ​​are A1, B1, C1, D1, and E1, which represent five accident levels with progressively increasing severity. The weight value is an integer from 0 to 10, indicating the severity of the water immersion accident within the same category. The verb and noun in the same keyword group, when matched, represent an operation in car repair or maintenance. Step f includes: Based on the category value (class) and weight value of each element F in the final phrase binding relationship list NANDVGroupList, the system analyzes whether a vehicle has experienced a water immersion event and calculates the water immersion score (waterScore) for element F. Each category value of element F in NANDVGroupList represents a score. As mentioned earlier, the category value (class) can be A1, B1, C1, D1, or E1. During calculation, each category value (class) corresponds to a number: A1 corresponds to 0, B1 to 0.1, C1 to 0.2, D1 to 0.5, and E1 to 1. The weight value (weight) is an integer from 0 to 10, representing the severity of the water immersion accident within the same category. When calculating the water immersion score (waterScore), the waterScore satisfies the following conditions: waterScore=5.5-3×class 2 -0.25×weight。 2. The method for identifying flood-damaged vehicles based on vehicle maintenance data according to claim 1, characterized in that, The data analysis also includes step g: in-depth analysis. Traverse the deep analysis phrase set, which includes at least one deep analysis phrase G. Determine if a deep analysis phrase G exists in the set, where all its sub-elements are present in the final phrase binding list NANDVGroupList. If it exists, calculate the element F in the final phrase binding list NANDVGroupList that overlaps with the sub-elements of the phrase element based on the phrase element's class value (class1) and weight value (weight1). This element is denoted as the new score—deepWaterScore—for the overlapping element. Then, compare the waterScore of the overlapping element with the deepWaterScore, and take the lower of the two as the final bubble accident score (finalScore) for the overlapping element. If it does not exist, use the bubble water score (waterScore) calculated in step f as the final bubble accident score (finalScore) for element F in the final phrase binding list NANDVGroupList. in, The deep analysis phrase G includes three data items: multiple phrase elements H, the category value class1, and the weight value weight1. Each phrase element includes two phrase sub-elements: a verb sub-element termV2 and a noun sub-element termN2. The existence of these sub-elements means that each phrase element H in the deep analysis phrase G corresponds one-to-one with a different element F in the NANDVGroupList, and the verb sub-element termV2 and noun sub-element termN2 of phrase element H are respectively the corresponding sub-elements termV and termN in the element F. The deep analysis term group's category values, class1, are A2, B2, C2, D2, and E2, representing the severity of the accident from low to high. During calculation, each class value (class1) corresponds to a number: A2 corresponds to 0, B2 to 1, C2 to 2, D2 to 3, and E2 to 4. The weight value (weight1) is an integer from 0 to 10, representing the severity of the water blistering accident within the same category. The depth score deepWaterScore satisfies: deepWaterScore=5-0.28×class1 2 -0.108×weight1。