Text correction method and device, electronic equipment and storage medium

By combining a text correction model with preset rules and knowledge graphs, the target text fragments are identified and a query corpus is constructed, which solves the problem of insufficient adaptability of existing models and achieves more efficient and accurate text correction.

CN122242453APending Publication Date: 2026-06-19TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2024-12-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing text correction models rely on manually labeled training samples, which limits their application and makes them unable to adapt to various text error types.

Method used

The text correction model is combined with preset rules and a preset knowledge graph. The target text fragment is determined by word segmentation and triple association. The candidate text is updated using the query corpus to correct grammatical, phonetic, and visual errors.

Benefits of technology

It improves the accuracy and efficiency of text correction, especially in dealing with semantic errors caused by new words and low-frequency words, and has greater adaptability and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242453A_ABST
    Figure CN122242453A_ABST
Patent Text Reader

Abstract

This application relates to a text correction method, apparatus, electronic device, and storage medium. The method includes taking a text to be processed as input, using a text correction model to output candidate texts corresponding to the text to be processed. The text correction model is trained based on multiple sample pairs. The error types of the erroneous texts include at least one of the following: grammatical errors, phonetic errors of text segments, and visual errors of text segments; determining target text segments from the candidate texts based on preset rules and a preset knowledge graph. The target text segments belong to target segment groups, which are determined based on word segmentation of the candidate texts, and the target segment groups and target triples are associated; performing a query in a preset query domain based on a query corpus to obtain corresponding query results; and updating the candidate texts using the query results to obtain the target text. This application provides a more adaptive text correction scheme.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of Internet communication technology, and in particular to a text error correction method, apparatus, electronic device and storage medium. Background Technology

[0002] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0003] In related technologies, text correction models are used to correct errors in text. However, obtaining text correction models often relies on manually labeled training samples, which limits their application. Therefore, there is a need to provide more adaptive text correction solutions. Summary of the Invention

[0004] To address at least one of the aforementioned technical problems, this application provides a text correction method, apparatus, electronic device, and storage medium:

[0005] According to a first aspect of this application, a text correction method is provided, the method comprising:

[0006] Taking the text to be processed as input, the text correction model outputs the candidate text corresponding to the text to be processed. The text correction model is obtained by training on multiple sample pairs. The sample pairs consist of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text fragment, and shape error type of text fragment.

[0007] Target text fragments are determined from the candidate texts based on preset rules and a preset knowledge graph. The target text fragments belong to target fragment groups. The target fragment groups are determined based on word segmentation of the candidate texts. The target fragment groups are associated with target triples. The target triples are any triples among multiple preset triples in the preset knowledge graph.

[0008] The query results are obtained by performing a query in a preset query domain based on the query corpus, and the candidate text is updated using the query results to obtain the target text. The query corpus is constructed based on the remaining text fragments in the target fragment group other than the target text fragment.

[0009] According to a second aspect of this application, a text correction apparatus is provided, the apparatus comprising:

[0010] First text processing module: used to take the text to be processed as input, and output the candidate text corresponding to the text to be processed using a text error correction model. The text error correction model is obtained by training based on multiple sample pairs. The sample pairs consist of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text fragment, and shape error type of text fragment.

[0011] Text fragment determination module: used to determine target text fragments from the candidate text based on preset rules and preset knowledge graph. The target text fragments belong to target fragment groups. The target fragment groups are determined based on word segmentation processing of the candidate text. The target fragment groups are associated with target triples. The target triples are any triples among multiple preset triples in the preset knowledge graph.

[0012] The second text processing module is used to perform queries in a preset query domain based on the query corpus to obtain corresponding query results, and to update the candidate text using the query results to obtain the target text. The query corpus is constructed based on the remaining text fragments in the target fragment group excluding the target text fragment.

[0013] According to a third aspect of this application, an electronic device is provided, the electronic device including at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the at least one processor to implement the text correction method as described in the first aspect.

[0014] According to a fourth aspect of this application, a computer-readable storage medium is provided, wherein at least one instruction or at least one program is stored therein, the at least one instruction or at least one program being loaded and executed by a processor to implement the text correction method as described in the first aspect.

[0015] According to a fifth aspect of this application, a computer program product is provided, the computer program product comprising at least one instruction or at least one program segment, the at least one instruction or at least one program segment being loaded and executed by a processor to implement the text correction method as described in the first aspect.

[0016] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this application.

[0017] Implementing this application will have the following beneficial effects:

[0018] This application provides a more adaptive text correction scheme. For the text to be processed, a trained text correction model is first used to correct errors in relevant error types to obtain candidate texts. Then, target text segments are determined from the candidate texts based on preset rules and a preset knowledge graph. Furthermore, a query corpus is constructed using the remaining text segments in the target segment group (excluding the target text segment) and queried using a preset query domain. The candidate texts are then updated based on the query results to obtain the target text. This application corrects errors in the text to be processed by relying on a text correction model to correct at least one type of specific error, such as grammatical errors, phonetic errors, and visual errors. By utilizing the trained text correction model, its adaptability and reliability can improve the accuracy of text correction and increase the efficiency of obtaining candidate texts. On the other hand, for the word segmentation results of the candidate text, multiple preset triples provided by the preset knowledge graph are used as references. With the preset triples and the segment groups provided by the word segmentation results as matching elements, the target text segment is determined from the target segment group based on preset rules. The target fragment group and the target triplet are related, and the target triplet can be used to support the definition of the dependency relationships between text fragments in the target fragment group. Based on this, the constructed query corpus has a query intent that is targeted at the target text fragments, providing support for updating candidate texts using query results. By leveraging the query results, the correction effect of semantic errors caused by new words, low-frequency words, etc., can be improved, thus increasing the accuracy of the target text.

[0019] Other features and aspects of this application will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0020] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This diagram illustrates an application environment according to an embodiment of the present application.

[0022] Figure 2A flowchart illustrating a text correction method according to an embodiment of this application is shown;

[0023] Figure 3 A schematic diagram illustrating the process of training a text correction model according to an embodiment of this application is shown.

[0024] Figure 4 A flowchart illustrating the process of determining a target text fragment according to an embodiment of this application is shown;

[0025] Figure 5 A flowchart illustrating a text correction method according to an embodiment of this application is also shown;

[0026] Figure 6 This diagram illustrates the application of a text correction model according to an embodiment of this application.

[0027] Figure 7 This diagram illustrates a process for further text correction using a reference preset knowledge graph according to an embodiment of this application.

[0028] Figure 8 This diagram illustrates a device block diagram according to an embodiment of the present application;

[0029] Figure 9 A schematic diagram of an electronic device according to an embodiment of this application is shown. Detailed Implementation

[0030] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0031] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0032] In this application embodiment, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0033] Various exemplary embodiments, features, and aspects of this application will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.

[0034] The term “exemplary” as used herein means “serving as an example, embodiment”. Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0035] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.

[0036] Furthermore, to better illustrate this application, numerous specific details are provided in the following detailed description. Those skilled in the art should understand that this application can be implemented without certain specific details. In some instances, methods, means, components, and circuits well-known to those skilled in the art have not been described in detail in order to highlight the main points of this application.

[0037] Before providing a further detailed description of the embodiments of this application, the nouns and terms involved in the embodiments of this application will be explained, and the nouns and terms involved in the embodiments of this application shall be interpreted as follows.

[0038] Large Language Model (LLM): A large-scale machine learning model capable of processing, understanding, and generating natural language text.

[0039] prompt: The instruction or question given to the large language model to guide the model to generate specific output.

[0040] Please see Figure 1 , Figure 1The diagram illustrates an application environment according to an embodiment of this application. The application environment may include a terminal 10 and a server 20. The terminal 10 and server 20 can be directly or indirectly connected via wired or wireless communication. The target object sends a text correction request to the server 20 through the terminal 10. The server-side 20 determines the text to be processed based on the received text correction request. Then, using the text to be processed as input, it outputs candidate texts corresponding to the text to be processed using a text correction model. The text correction model is trained on multiple sample pairs, each consisting of erroneous text and its corresponding correct text. The error types of the erroneous texts include at least one of the following: grammatical errors, phonetic errors of text segments, and visual errors of text segments. Furthermore, based on preset rules and a preset knowledge graph, the target text segment is determined from the candidate texts. The target text segment belongs to a target segment group, which is determined based on word segmentation of the candidate texts. The target segment group and the target triplet are related, and the target triplet is any triplet among multiple preset triplets in the preset knowledge graph. Finally, based on the query corpus, a query is performed in a preset query domain to obtain the corresponding query results, and the candidate texts are updated using the query results to obtain the target text. The query corpus is constructed based on the remaining text segments in the target segment group excluding the target text segment. It should be noted that... Figure 1 This is just one example.

[0041] Terminal 10 can be a physical device such as a smartphone, computer (e.g., desktop computer, tablet computer, laptop computer), augmented reality (AR) / virtual reality (VR) device, digital assistant, smart voice interaction device (e.g., smart speaker), smart wearable device, smart home appliance, in-vehicle terminal, etc. The operating system of Terminal 10 can be Android, iOS (a mobile operating system developed by Apple), Linux, Microsoft Windows, etc. Applications can be installed on Terminal 10, such as browser applications, news feed applications, instant messaging applications, online office applications, video applications, game applications, navigation applications, etc.

[0042] The server-side component 20 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. The server may include network communication units, processors, and memory, etc.

[0043] In practical applications, it should be noted that for texts and sample pairs that are related to user information, when the embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0044] Figure 2 The diagram illustrates a text correction method according to an embodiment of this application. This text correction method can be executed by an electronic device, which can be a terminal or a server. The method includes:

[0045] S201: Taking the text to be processed as input, the text correction model is used to output the candidate text corresponding to the text to be processed. The text correction model is obtained by training based on multiple sample pairs. The sample pair consists of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text segment, and text segment shape error type.

[0046] In this embodiment, the electronic device takes the text to be processed as input and outputs candidate text corresponding to the text using a text correction model. The candidate text corresponding to the text to be processed is obtained through the text correction model. Compared to the text to be processed, the candidate text no longer has errors of the relevant error type. Errors of the relevant error type can be at least one specific type of error, such as grammatical errors, phonetic errors of text fragments, and visual errors of text fragments. By leveraging the adaptability and reliability of the text correction model, the accuracy of correcting errors of the relevant error type in the text can be improved, and the efficiency of obtaining candidate text can be increased. In practical applications, the content carried by the text to be processed can indicate news information, articles, novels, etc.

[0047] Grammatical errors refer to language usage in a text that violates grammatical rules. Grammatical errors can indicate subject-verb disagreement, tense errors, word order errors, incorrect conjunctions, punctuation errors, and sentence structure errors, among others. For example, in text 1, "This book is very interesting, I like it very much," there is a punctuation error; specifically, a missing punctuation mark. The corrected version is "This book is very interesting, I like it very much." Similarly, in text 2, "The main ingredient of this medicine is formulated from natural plant extracts," there is a sentence structure error. The corrected version is "The main ingredient of this medicine is natural plant extracts."

[0048] For phonetic similarity errors of text segments, it means that the ideal text segment is replaced by an incorrect text segment with a similar or even identical pronunciation. Taking the scenario of Chinese text error correction as an example, a text segment can be a word, which consists of at least one character. If the incorrect text segment has a similar pronunciation to the ideal text segment, the two characters at the same position in the two text segments can be in the following situations: the same pronunciation and tone, the same pronunciation but different tones, similar pronunciation and the same tone, or similar pronunciation and different tones. For example, the ideal text segment is "可以" (kě yǐ), and the incorrect text segment is "可疑" (kě yí); the ideal text segment is "她们" (tā men), and the incorrect text segment is "他们" (tā men).

[0049] For shape similarity errors of text segments, it means that the ideal text segment is replaced by an incorrect text segment with a similar shape. Taking the scenario of Chinese text error correction as an example, a text segment can be a word, which consists of at least one character. If the incorrect text segment has a similar shape to the ideal text segment, at least one character in the incorrect text segment is similar in shape to the character at the same position in the ideal text segment. For example, the ideal text segment is "碰撞" (pèng zhuàng), and the incorrect text segment is "碰拴" (pèng shuān); the ideal text segment is "诸侯争霸" (zhū hóu zhēng bà), and the incorrect text segment is "渚候争霸" (zhǔ hóu zhēng bà).

[0050] As a possible implementation, as Figure 3 shown, the text error correction model is obtained through the following steps:

[0051] S301: Obtain a preset model and the multiple sample pairs;

[0052] S302: Use the preset model to perform text error correction on the incorrect text to obtain the corresponding corrected text;

[0053] S303: Based on the difference between the corrected text and the correct text, adjust the parameters of the preset model to obtain the text error correction model.

[0054] The two texts constituting the sample pair are one correct text and the other incorrect text. There can be at least one type of specific error in the incorrect text. Specific errors can be grammar errors, phonetic similarity errors of text segments, and shape similarity errors of text segments. For a certain type of specific error that exists, there can be at least one occurrence of this type of specific error in the incorrect text. For example, there can be one grammar error and two phonetic similarity errors of text segments in the incorrect text.

[0055] The pre-defined model is responsible for correcting erroneous text. Taking the erroneous text as input, the pre-defined model outputs the corresponding corrected text. When adjusting the parameters of the pre-defined model using the difference between the corrected and correct text, a loss function can be constructed based on this difference, and then used to adjust the model's parameters. For multiple sample pairs, at least one model parameter adjustment can be involved during the training of the text correction model. Each adjustment is based on the difference between the predicted value (corrected text) and the ideal value (correct text) for each sample pair in a batch of sample sets. A batch of sample sets can be all the multiple sample pairs or a portion of them. Model training is continuous; the text correction model can be the result of the previous training or the basis for the next. The text correction model is a highly generalizable model obtained through training (supervised fine-tuning). Using the text correction model for text correction can improve the adaptability and reliability of text correction.

[0056] In practical applications, error text can be input into the preset model along with relevant prompt text, which is used to guide text correction. The guiding text correction function of the relevant prompt text can be handled by the prompt input. For the error text and relevant prompt text input into the above large language model, the error text and relevant prompt text can be used as building blocks of a prompt. The prompt input to the preset model is as follows: "[Error Text] Understand the true intent of the above text, identify the errors in the text, and fully restate the corrected text." The preset model used can be LLAMA-13B, LLAMA-30B, etc.

[0057] Furthermore, the sample pair is obtained through the following steps: First, the correct text is obtained; then, using the correct text and the preset prompt text as input, a large language model is used to introduce grammatical errors into the correct text to obtain candidate text; next, based on a preset vocabulary library, phonetic similarity errors and / or shape similarity errors are introduced into the candidate text to obtain the erroneous text; finally, the sample pair is constructed based on the correct text and the erroneous text.

[0058] Leveraging the natural language processing capabilities of large language models, and using pre-defined prompts as guidance, the efficiency and ease of introducing grammatical errors into text can be improved. Referring to a pre-defined vocabulary database to perform phonetic and / or visual confusion on text fragments can further enhance the text obfuscation effect. Based on this, constructing sample pairs using correct and incorrect text allows for the creation of rich sample pairs with a smaller amount of labeled correct text. This reduces the workload of collecting and labeling correct text, thereby improving the efficiency of sample pair construction.

[0059] For a preset character, the preset word library can provide at least one associated character that has the same pronunciation and tone, the same pronunciation but different tones, a similar pronunciation and the same tone, or a similar pronunciation and different tones as the preset character. Taking the preset character "睁" as an example, the associated character with the same pronunciation and tone as it is "征". Taking the preset character "政" as an example, the associated character with the same pronunciation but different tones as it is "挣". Taking the preset character "帧" as an example, the associated character with a similar pronunciation and the same tone as it is "增". Taking the preset character "诊" as an example, the associated character with a similar pronunciation and different tones as it is "停".

[0060] For a preset character, the preset word library can provide at least one associated character that is similar in shape to the preset character. Taking the preset character "作" as an example, the associated character similar in shape to it is "伊". Taking the preset character "冠" as an example, the associated character similar in shape to it is "赶".

[0061] Exemplarily, taking the correct text "Today, with the rapid development of social economy, national reading is not only related to personal cultivation and quality, but also related to the soft power of national culture and core competitiveness." as an example, under the guidance of the preset prompt text, the candidate text obtained by the large language model processing the correct text is "Today, with the rapid development of social economy, national reading is not only related to the soft power of national culture and core competitiveness, but also related to personal cultivation and quality.", which has a word order error. Referring to the preset word library, "实力" in the candidate text can be replaced with "视力" (corresponding to the phonetic similarity error in the text segment), and "文化" in the candidate text can be replaced with "文代" (corresponding to the shape similarity error in the text segment). Thus, the obtained incorrect text is "Today, with the rapid development of social economy, national reading is not only related to the soft vision of national culture and core competitiveness, but also related to personal cultivation and quality.".

[0062] In practical applications, the role of introducing grammar errors through the guidance of the preset prompt text can be taken care of by the prompt as the input. For the above correct text and preset prompt text input to the large language model, the correct text and the preset prompt text can be used as the construction elements of a prompt. The prompt input to the large language model is as follows: "Here is a piece of content, based on which help me generate text with incorrect grammar but similar meaning; the content is:

correct text

[0063] In addition, grammar errors such as missing punctuation marks can also be introduced without relying on the large language model. Such grammar errors of missing punctuation marks can be introduced to the candidate text through rules.

[0064] S202: Based on preset rules and a preset knowledge graph, a target text fragment is determined from the candidate text. The target text fragment belongs to a target fragment group. The target fragment group is determined based on the word segmentation processing of the candidate text. The target fragment group is associated with a target triplet. The target triplet is any triplet among multiple preset triplets in the preset knowledge graph.

[0065] In this embodiment, the electronic device determines target text fragments from candidate text based on preset rules and a preset knowledge graph. When determining target text fragments from candidate text, the candidate text can be segmented, and the resulting text fragments are combined to form fragment groups. Fragment groups and preset triples serve as the first type of matching elements provided by the candidate text and the second type of matching elements provided by the preset knowledge graph, respectively. Matching the first and second types of matching elements is performed. If a first type of matching element matches a second type of matching element, i.e., an association exists, the first type of matching element can be considered the target fragment group, and the second type of matching element can be considered the target triple. It can be understood that for a first type of matching element to match a second type of matching element, the matching degree between the fragment group and the preset triple meets the preset matching requirements. The preset matching requirements can constrain that there are at least two first type text fragments in the fragment group, and that a first type text fragment matches a text object in the preset triple. Whether a first type of text fragment matches a text object can be determined by whether the semantic similarity between the text fragment and the text object is greater than a first similarity threshold. For example, fragment group X includes text fragments x1-x3, and preset triple Y includes text objects y1-y3. If text fragment x1 matches text object y1, text fragment x2 matches text object y2, and text fragment x3 does not match any of text objects y1-y3, then text fragments x1 and x2 are first-class text fragments, text fragment x3 is second-class text fragments, and fragment group X matches preset triple Y, meaning fragment group X and preset triple Y have an association relationship.

[0066] Based on the identified target fragment group, target text fragments can be determined from the target fragment group according to preset rules. It should be understood that, considering the difficulty of both timely updating the preset knowledge graph and ensuring the accuracy of its content, the preset knowledge graph may contain inaccurate content. The identified target text fragments can be considered as text fragments to be verified. Because there is a relationship between the target fragment group and the target triple, referring to the inherent structural characteristics of the target triple can better define the dependency relationships between text fragments in the target fragment group. Accordingly, a query corpus is subsequently constructed based on the remaining text fragments in the target fragment group excluding the target text fragment. This query corpus has a query intent that is specific to the target text fragment, and the target text fragment can then be verified through the query results.

[0067] The preset rules can instruct the random selection of a text fragment from a target fragment group as the target text fragment. Preset rules can also instruct the selection of text fragments from the target fragment group that meet preset feature requirements as the target text fragment. Preset feature requirements can be used to determine the target text fragment by limiting its performance in terms of frequency of occurrence, popularity, etc. For example, for multiple text fragments involved in the target fragment group, the text fragment with the lowest frequency of occurrence is selected as the target text fragment. The frequency of occurrence is determined according to a first statistical standard. For multiple text fragments involved in the target fragment group, the text fragment with the lowest popularity is selected as the target text fragment. The popularity is determined according to a second statistical standard. Preset feature requirements can also be set based on the specific performance of the target fragment group and the target triplet when they meet preset matching requirements. For example, if there are two Class I text fragments in the target fragment group, a first feature requirement is set; if there are three Class I text fragments in the target fragment group, a second feature requirement is set.

[0068] In practical applications, a predefined knowledge graph can indicate the current knowledge graph, which involves multiple predefined domains; it can also indicate a subgraph within a specific domain of the current knowledge graph, where the specific domain can be any of the multiple predefined domains. The text domain can be determined based on the text to be processed or candidate text, and then the subgraph indicating the text domain can be used as the predefined knowledge graph. Based on this, the target text fragment can be verified through query results, which can balance the speed and accuracy of text correction to a certain extent. This is because, compared to the current knowledge graph, using the multiple predefined triples provided by the "subgraph indicating the text domain" to match the fragment groups provided by the word segmentation results is more efficient; however, compared to the current knowledge graph, the "subgraph indicating the text domain" contains less content.

[0069] In addition, such as Figure 5-7As shown, the preset knowledge graph, used as a reference, can also help overcome logical errors and semantic errors caused by common sense issues in candidate texts, making the text more in line with user preferences. As a supplement, the preset knowledge graph has a strong ability to correct semantic errors caused by common sense issues, and can proofread professional terms and concepts, enhancing the professionalism of common sense proofreading.

[0070] As one possible implementation, such as Figure 4 As shown, before determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph, the method further includes:

[0071] S401: Perform word segmentation on the candidate text to obtain multiple baseline text fragments;

[0072] S402: Based on the dependency relationships between text fragments and the plurality of baseline text fragments, construct a plurality of candidate fragment groups, each of the candidate fragment groups consisting of three of the baseline text fragments.

[0073] This section describes how to provide segment groups based on word segmentation results. Candidate text can be segmented according to preset word segmentation rules (such as punctuation marks, specific parts of speech, etc.), or relevant machine learning models can be used. When constructing multiple candidate segment groups based on the dependency relationships between text segments, the position of the text segments within the candidate text and their parts of speech can be considered. The same benchmark text segment can participate in the construction of at least one candidate segment group, which helps to increase the richness of the candidate segment groups and thus facilitates the determination of the target segment group by matching the candidate segment groups with preset triples. Furthermore, the constructed candidate segment group consists of three benchmark text segments, which further facilitates the matching of the candidate segment group with the preset triples.

[0074] Taking the candidate text "Actor 1 and Actor 2 starred in 'Movie 2,' which is a sequel to the already released 'Movie 1'" as an example, after word segmentation, the following base text fragments are obtained: 1 "Actor 1", 2 "and", 3 "Actor 2", 4 "starred in", 5 "Movie 2", 6 "already released", 7 "Movie 1", and 8 "sequel". Among these, base text fragments 1, 3, 5, 7, and 8 are nouns, base text fragment 2 is a conjunction, base text fragment 4 is a verb, and base text fragment 6 is an adjective. Based on the positions of the benchmark text fragments within the candidate texts and their parts of speech, it can be determined that: benchmark text fragment 1 "Actor 1" and benchmark text fragment 3 "Actor 2" have a parallel relationship; benchmark text fragment 1 "Actor 1" and benchmark text fragment 4 "Appearance" have a subject-predicate relationship; benchmark text fragment 3 "Actor 2" and benchmark text fragment 4 "Appearance" have a subject-predicate relationship; benchmark text fragment 4 "Appearance" and benchmark text fragment 5 "Movie 2" have a predicate-object relationship; benchmark text fragment 6 "Already Released" and benchmark text fragment 7 "Movie 1" have a modifying relationship; benchmark text fragment 5 "Movie 2" and benchmark text fragment 7 "Movie 1" have a contextual relationship, which is determined based on benchmark text fragment 8 "Sequel". Therefore, the constructed multiple candidate fragment groups can include candidate fragment group 1 "Actor 1 + Appearance + Movie 2", candidate fragment group 2 "Actor 2 + Appearance + Movie 2", and candidate fragment group 3 "Movie 2 + Sequel + Movie 1".

[0075] Furthermore, for step S202, determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph includes:

[0076] S403: For each candidate segment group, if the plurality of preset triples include the target triplet that is associated with the candidate segment group, the candidate segment group is determined to be the target segment group;

[0077] S4041: When the association between the target fragment group and the target triple is based on binary dimension matching, the target text fragment is determined from the target fragment group, and the target text fragment does not match any text object in the target triple;

[0078] S4042: When the association between the target fragment group and the target triplet is based on the matching of the triplet dimension, for each candidate text fragment in the target fragment group, determine the number of times the candidate text fragment appears in the output result of the text correction model within a preset time period; and, if the number of occurrences is less than the number threshold, determine the candidate text fragment as the target text fragment.

[0079] The construction of candidate fragment groups focuses on the dependency relationships between text fragments, while the setting of preset triples in the preset knowledge graph focuses on the dimensional combinations of text objects. These are two aspects. For a candidate fragment group, multiple preset triples may not contain preset triples that are related to the candidate fragment group, and multiple preset triples may also contain target triples that are related to the candidate fragment group.

[0080] When multiple preset triples contain target triples that are related to the candidate segment group, the association between the target segment group and the target triple, based on the aforementioned preset matching requirements and preset rules (or preset feature requirements), can be determined by matching in a binary dimension, i.e., the target segment group contains two first-class text segments; or by matching in a ternary dimension, i.e., the target segment group contains three first-class text segments. Accordingly, step S4041 provides the first feature requirement under binary-dimensional matching, and step S4042 provides the second feature requirement under ternary-dimensional matching. The first and second feature requirements will provide the basis for determining the target text segment.

[0081] For example, fragment group X includes text fragments x1-x3, and preset triple Y includes text objects y1-y3. If text fragment x1 matches text object y1, text fragment x2 matches text object y2, and text fragment x3 does not match any of text objects y1-y3, then text fragments x1 and x2 are first-class text fragments, and text fragment x3 is a second-class text fragment. Therefore, guided by the first feature requirement, text fragment x3 is the target text fragment. At this point, the target text fragment is a new word not existing in the preset knowledge graph. Identifying the target text fragment as a new word and constructing the query corpus based on it to obtain query results is beneficial for correcting semantic errors caused by new words. Combining the above example, taking candidate fragment group 1 "Actor 1 + Performance + Movie 2" as the target fragment group and the target triple indicating "Actor 1 - Performance - TV Series 1" as an example, the target text fragment is "Movie 2". At this point, "Movie 2" is a new word not existing in the preset knowledge graph. Subsequently, "Actor 1" and "Performance" will participate in the construction of the query corpus. This effectively addresses situations where some words are not promptly incorporated into the pre-defined knowledge graph. Of course, the pre-defined knowledge graph can also be updated in a timely manner based on the target segment group "Actor 1 + Performance + Movie 2".

[0082] If text fragment x1 matches text object y1, text fragment x2 matches text object y2, and text fragment x3 matches text object y3, then text fragments x1, x2, and x3 are all first-class text fragments. For text fragment xi (where i ranges from 1 to 3), determine the number of times text fragment xi appears in the output of the text correction model within a preset time period. If the number of occurrences is less than a threshold, text fragment xi is determined as the target text fragment. The preset time period provides constraints in the time dimension, and the output of the text correction model provides constraints in the data source dimension, thereby limiting the range of data to be collected. At this time, the target text fragment is a low-frequency word. Identifying the target text fragment as a low-frequency word and constructing a query corpus based on it to obtain query results is beneficial for correcting semantic errors caused by low-frequency words. Combining the above example, taking candidate fragment group 3 "Movie 2 + Sequel + Movie 1" as the target fragment group and the target triplet indicator "Movie 2 - Same Series Movie - Movie 1" as an example, if the target text fragment is "Movie 1", then "Movie 1" is a low-frequency word. Subsequently, "Movie 2" and "Sequel" will be included in the construction of the query corpus. This effectively addresses the situation where some words, after being introduced into the pre-defined knowledge graph, fail to be detected and become inaccurate due to their low frequency of use. It's understandable that after the text objects constituting the triples are introduced into the pre-defined knowledge graph, some text objects may be replaced by other text objects due to changes in time or semantic shifts, but they may go undetected because they are not frequently used in the pre-defined knowledge graph. Therefore, this low-frequency word verification, while verifying text fragments in the candidate text, also supports the timely updating of the pre-defined knowledge graph to better adapt to changes in time and semantic shifts.

[0083] As one possible implementation, the object dimension group corresponding to the preset triplet is any dimension group in the preset dimension group set. The preset dimension group set includes a first dimension group, a second dimension group, and a third dimension group. The first dimension group consists of two entity dimensions and one relation dimension. The second dimension group consists of one entity dimension, one entity attribute dimension, and one entity attribute value dimension. The third dimension group consists of one relation dimension, one relation attribute dimension, and one relation attribute value dimension.

[0084] A preset triple consists of three text objects. The object dimensions of each of these three text objects together form an object dimension group. When the object dimension group is the first dimension group, the three text objects are entity 1, entity 2, and relation. When the object dimension group is the second dimension group, the three text objects are entity, entity attribute, and entity attribute value. When the object dimension group is the third dimension group, the three text objects are relation, relation attribute, and relation attribute value. The preset knowledge graph can contain these three types of preset triples. Guided by these three types of preset triples, the content in the constructed preset knowledge graph is more organized and logically structured, thus supporting the preset knowledge graph's participation in text correction.

[0085] In the predefined knowledge graph, an entity can refer to a specific object or concept, such as a person, place, organization, or event. For example, for the predefined triple consisting of the aforementioned relations, relation attributes, and relation attribute values, if the relation "performance" exists between "actor" (entity 1) and "movie" (entity 2), the relation attributes could be the name of the role performed or the performance time. If the relation "direction" exists between "director" (entity 1) and "movie" (entity 2), the relation attribute could be the directing style. If the relation "films in the same series" exists between "movie 1" (entity 1) and "movie 2" (entity 2), the relation attributes could be the name of the sequel or the release date of the prequel.

[0086] In practical applications, entity types can be refined, entity attributes expanded, and entity relationships enriched according to business needs. For example, the entity "books" can be subdivided into "novels," "textbooks," etc. More descriptive attributes can be added to the entity "location," such as latitude and longitude, region, and surrounding scenic spots. More granular entity relationships can be defined based on common entity relationships such as "belongs to" and "located in," such as "influence," "oppose," and "use." Furthermore, relationship types can be refined; for example, the relationship "appointment" can be subdivided based on time and space.

[0087] In addition, multi-dimensional tags can be added to entities, such as sentiment, style, and function, which helps with entity localization when referring to a pre-defined knowledge graph.

[0088] S203: Based on the query corpus, perform a query in a preset query domain to obtain the corresponding query result, and use the query result to update the candidate text to obtain the target text. The query corpus is constructed based on the remaining text fragments in the target fragment group excluding the target text fragment.

[0089] In this embodiment, the electronic device performs a query in a preset query domain based on the query corpus to obtain the corresponding query results, and uses the query results to update candidate text to obtain target text. Since the determined target text fragment can be regarded as a text fragment to be verified, the verification of the target text fragment is completed through the query results. If the target text fragment does not match the query results, it can be considered that the target text fragment has failed verification. At this time, the target text fragment can be replaced with the query results to update the candidate text and obtain the target text. If the target text fragment matches the query results, it can be considered that the target text fragment has passed verification. At this time, the candidate text is maintained, that is, the candidate text is determined to be the target text. Whether the target text fragment matches the query results can be determined by whether the semantic similarity between the target text fragment and the query results is greater than a second similarity threshold.

[0090] A preset query domain can be a content pool. The content types in the content pool are not limited to text, images, audio, or video. A preset query domain can indicate a global content pool, which involves multiple preset domains; or it can indicate a local content pool under a specific domain, which can be any of the multiple preset domains. The text domain can be determined based on the text to be processed or candidate text, and then the local content pool indicating the text domain can be used as the preset query domain. In practical applications, the global content pool can consist of multiple sites providing online query services.

[0091] As a possible implementation, combining the first feature requirement under the binary dimension matching given in step S4041 above, new words not existing in the preset knowledge graph are identified as target text fragments. For step S203, the query results include multiple candidate words. Updating the candidate text using the query results to obtain the target text may include the following steps: First, determining whether there is a target candidate word among the multiple candidate words that matches the target text fragment; then, if the target candidate word exists, determining the candidate text as the target text. This takes into account the information richness of the query results obtained based on the query corpus, verifying the target text fragment by determining whether it appears in the query results.

[0092] As a possible implementation, combining the first feature requirement under the binary dimension matching given in step S4042 above, low-frequency words are determined as target text fragments. For step S203, after obtaining the corresponding query result by querying the query corpus in the preset query domain, the following steps may also be included: First, determine the target text object in the target triplet that matches the target text fragment; then, if the target text object does not match the query result, generate a reminder message instructing the updating of the preset knowledge graph. This facilitates timely updating of the preset knowledge graph to support its subsequent participation in text correction.

[0093] The text correction method provided in this application can be applied to scenarios such as text review, text analysis, machine translation, and speech recognition. Text correction improves the readability of the target text and enables it to convey information more accurately. The importance of text correction can be reflected in the following aspects:

[0094] 1) Improve communication efficiency: In written communication, whether in formal documents or daily conversations, spelling and grammatical errors can lead to misunderstandings or create communication barriers. The text correction method provided in this application can help ensure that information is conveyed clearly and accurately.

[0095] 2) Enhance text readability: Text with fewer errors is easier to read and understand, helping readers quickly grasp the author's intent and the core information of the content.

[0096] 3) Improve information quality: In applications such as information retrieval, data analysis, and machine learning, high-quality text is a prerequisite for obtaining accurate and reliable results.

[0097] 4) Improve user experience: Applying the text correction method provided in the embodiments of this application to perform automatic text correction in software interfaces, online services and content platforms can improve user experience and make the content published by users more standardized.

[0098] 5) Education and learning: The text correction method provided in this application embodiment can be embedded in language learning tools to help learners identify and correct language errors and improve their language skills.

[0099] 6) Maintaining the standardization of online communication: With the popularization of social media and online platforms, a large amount of user-generated content needs to be standardized and corrected in order to maintain a healthy and orderly online environment.

[0100] In practical applications, experiments have shown that, compared to the text correction models in related technologies, the text correction effect of the model trained in this application embodiment is superior, both in terms of its performance and the performance after further referencing a preset knowledge graph. See Table 1 below for further details.

[0101]

[0102]

[0103] Table 1

[0104] The text correction method provided in this application is used to process the text to be processed to obtain the target text. See Table 2 below for reference:

[0105]

[0106] Table 2

[0107] As can be seen from the technical solutions provided by the embodiments of this application above, the embodiments of this application provide a more adaptive text correction scheme. For the text to be processed, firstly, a trained text correction model is used to correct its errors in relevant error types to obtain candidate text; then, based on preset rules and a preset knowledge graph, the target text segment is determined from the candidate text; furthermore, the remaining text segments in the target segment group, excluding the target text segment, are used to construct a query corpus for querying using a preset query domain, and then the candidate text is updated based on the query results to obtain the target text. For the text to be processed, the embodiments of this application rely on a text correction model to correct at least one type of specific error, such as grammatical errors, phonetic errors of text segments, and shape errors of text segments. By using the trained text correction model, the adaptability and reliability of the text correction model can improve the accuracy of text correction and the efficiency of obtaining candidate text; on the other hand, for the word segmentation results of the candidate text, multiple preset triples provided by the preset knowledge graph are used as references. With the preset triples and the segment group provided by the word segmentation results as matching elements, the target text segment is determined from the target segment group based on preset rules. The target fragment group and the target triplet are related, and the target triplet can be used to support the definition of the dependency relationships between text fragments in the target fragment group. Based on this, the constructed query corpus has a query intent that is targeted at the target text fragments, providing support for updating candidate texts using query results. By leveraging the query results, the correction effect of semantic errors caused by new words, low-frequency words, etc., can be improved, thus increasing the accuracy of the target text.

[0108] This application also provides a text correction device, such as... Figure 8 As shown, the text correction device 80 includes:

[0109] First text processing module 801: used to take the text to be processed as input, and output the candidate text corresponding to the text to be processed using a text error correction model. The text error correction model is obtained by training based on multiple sample pairs. The sample pairs consist of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text fragment, and shape error type of text fragment.

[0110] Text fragment determination module 802: used to determine target text fragments from the candidate text based on preset rules and preset knowledge graphs. The target text fragments belong to target fragment groups. The target fragment groups are determined based on word segmentation processing of the candidate texts. The target fragment groups are associated with target triples. The target triples are any triples among multiple preset triples in the preset knowledge graph.

[0111] Text fragment determination module 802: used to perform a query in a preset query domain based on the query corpus to obtain the corresponding query result, and to update the candidate text using the query result to obtain the target text, wherein the query corpus is constructed based on the remaining text fragments in the target fragment group excluding the target text fragment.

[0112] In one embodiment, the apparatus further includes a candidate fragment group construction module;

[0113] The candidate fragment group construction module is used to perform word segmentation on the candidate text to obtain multiple baseline text fragments; and to construct multiple candidate fragment groups based on the dependency relationships between the text fragments and the multiple baseline text fragments, wherein each candidate fragment group consists of three baseline text fragments.

[0114] The step of determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph includes: for each candidate fragment group, if the plurality of preset triples contain the target triple that is associated with the candidate fragment group, determining the candidate fragment group as the target fragment group; if the association between the target fragment group and the target triple is based on a binary-dimensional matching, determining the target text fragment from the target fragment group, wherein the target text fragment does not match any text object in the target triple.

[0115] In one embodiment, the apparatus further includes a candidate fragment group construction module;

[0116] The candidate fragment group construction module is used to perform word segmentation on the candidate text to obtain multiple baseline text fragments; and to construct multiple candidate fragment groups based on the dependency relationships between the text fragments and the multiple baseline text fragments, wherein each candidate fragment group consists of three baseline text fragments.

[0117] The step of determining the target text segment from the candidate text based on preset rules and a preset knowledge graph includes: for each candidate segment group, if the plurality of preset triples contain the target triple that is associated with the candidate segment group, the candidate segment group is determined as the target segment group; if the association between the target segment group and the target triple is based on the matching of the three-dimensional coordinates, for each candidate text segment in the target segment group, the number of times the candidate text segment appears in the output result of the text correction model within a preset time period is determined; if the number of appearances is less than a threshold, the candidate text segment is determined as the target text segment.

[0118] In one embodiment, after obtaining the corresponding query results by performing a query on a preset query domain based on the query corpus, the method further includes:

[0119] Identify the target text object in the target triplet that matches the target text fragment;

[0120] If the target text object does not match the query result, a reminder message is generated to update the preset knowledge graph.

[0121] In one embodiment, the object dimension group corresponding to the preset triplet is any dimension group in the preset dimension group set. The preset dimension group set includes a first dimension group, a second dimension group, and a third dimension group. The first dimension group consists of two entity dimensions and one relation dimension. The second dimension group consists of one entity dimension, one entity attribute dimension, and one entity attribute value dimension. The third dimension group consists of one relation dimension, one relation attribute dimension, and one relation attribute value dimension.

[0122] In one embodiment, the text correction model is trained by the following steps: obtaining a preset model and the plurality of sample pairs; using the preset model to correct the erroneous text to obtain the corresponding corrected text; and adjusting the parameters of the preset model based on the difference between the corrected text and the correct text to obtain the text correction model.

[0123] In one embodiment, the sample pair is obtained through the following steps: obtaining the correct text; using the correct text and a preset prompt text as input, introducing grammatical errors into the correct text using a large language model to obtain candidate text; introducing phonetic similarity errors and / or shape similarity errors into the candidate text based on a preset vocabulary library to obtain the erroneous text; and constructing the sample pair based on the correct text and the erroneous text.

[0124] It should be noted that the apparatus and method embodiments described in the device embodiments are based on the same inventive concept.

[0125] In some embodiments, the functions or modules of the apparatus provided in this application can be used to perform the methods described in the above method embodiments. The specific implementation can be referred to the description of the above method embodiments, and for the sake of brevity, it will not be repeated here.

[0126] This application also provides a computer-readable storage medium storing at least one instruction or at least one program segment, which is loaded and executed by a processor to implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

[0127] This application also provides an electronic device, which includes at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores at least one instruction or at least one program, and the at least one instruction or at least one program is loaded and executed by the at least one processor to implement the above method.

[0128] Electronic devices can be provided as terminals, servers, or other forms of devices.

[0129] Figure 9 A block diagram of an electronic device according to an embodiment of this application is shown. (Refer to...) Figure 9 The electronic device 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by the processing component 1922. Furthermore, the processing component 1922 is configured to execute instructions to perform the methods described above.

[0130] Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input / output (I / O) interface 1958. Electronic device 1900 can operate on an operating system stored in memory 1932, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, or similar.

[0131] In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions that can be executed by a processing component 1922 of an electronic device 1900 to perform the above-described method.

[0132] This application may be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having on it at least one instruction or at least a program segment for causing a processor to implement various aspects of this application.

[0133] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination thereof. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.

[0134] At least one instruction or program segment described herein may be downloaded from a computer-readable storage medium to various computing / processing devices, or downloaded via a network, such as the Internet, a local area network, a wide area network, and / or a wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives at least one instruction or at least one program segment from the network and forwards the instruction or program segment to a computer-readable storage medium in the respective computing / processing device.

[0135] At least one instruction or at least one program segment used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Smalltalk, C+, etc., and conventional procedural programming languages ​​such as the "C" language or similar programming languages. At least one instruction or at least one program segment may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized by utilizing state information of at least one instruction or at least one program to implement various aspects of this application.

[0136] Various aspects of this application are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by at least one instruction or at least a piece of program.

[0137] The at least one instruction or at least one program segment can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, it creates means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. The at least one instruction or at least one program segment can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0138] At least one instruction or at least one program may be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0139] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which includes one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions specified in the blocks may occur in a different order than those specified in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0140] The various embodiments of this application have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technological improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A text error correction method, characterized in that, The method includes: Taking the text to be processed as input, the text correction model outputs the candidate text corresponding to the text to be processed. The text correction model is obtained by training on multiple sample pairs. The sample pairs consist of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text fragment, and shape error type of text fragment. Target text fragments are determined from the candidate texts based on preset rules and a preset knowledge graph. The target text fragments belong to target fragment groups. The target fragment groups are determined based on word segmentation of the candidate texts. The target fragment groups are associated with target triples. The target triples are any triples among multiple preset triples in the preset knowledge graph. The query results are obtained by performing a query in a preset query domain based on the query corpus, and the candidate text is updated using the query results to obtain the target text. The query corpus is constructed based on the remaining text fragments in the target fragment group other than the target text fragment.

2. The method according to claim 1, characterized in that, Before determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph, the method further includes: The candidate text is segmented to obtain multiple baseline text fragments; Based on the dependency relationships between text fragments and the multiple benchmark text fragments, multiple candidate fragment groups are constructed, and each candidate fragment group consists of three benchmark text fragments. The process of determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph includes: For each candidate segment group, if the plurality of preset triples include the target triples that are associated with the candidate segment group, the candidate segment group is determined as the target segment group; When the association between the target fragment group and the target triplet is based on binary-dimensional matching, the target text fragment is determined from the target fragment group, and the target text fragment does not match any text object in the target triplet.

3. The method according to claim 1, characterized in that, Before determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph, the method further includes: The candidate text is segmented to obtain multiple baseline text fragments; Based on the dependency relationships between text fragments and the multiple benchmark text fragments, multiple candidate fragment groups are constructed, and each candidate fragment group consists of three benchmark text fragments. The process of determining the target text fragment from the candidate text based on preset rules and a preset knowledge graph includes: For each candidate segment group, if the plurality of preset triples include the target triples that are associated with the candidate segment group, the candidate segment group is determined as the target segment group; When the association between the target fragment group and the target triplet is based on the matching of the triplet dimension, for each candidate text fragment in the target fragment group, the number of times the candidate text fragment appears in the output result of the text correction model within a preset time period is determined; If the number of occurrences is less than a threshold, the candidate text segment is determined to be the target text segment.

4. The method according to claim 3, characterized in that, After obtaining the corresponding query results by performing a query on the preset query domain based on the query corpus, the method further includes: Identify the target text object in the target triplet that matches the target text fragment; If the target text object does not match the query result, a reminder message is generated to update the preset knowledge graph.

5. The method according to any one of claims 1-4, characterized in that, The object dimension group corresponding to the preset triplet is any dimension group in the preset dimension group set. The preset dimension group set includes a first dimension group, a second dimension group, and a third dimension group. The first dimension group consists of two entity dimensions and one relation dimension. The second dimension group consists of one entity dimension, one entity attribute dimension, and one entity attribute value dimension. The third dimension group consists of one relation dimension, one relation attribute dimension, and one relation attribute value dimension.

6. The method according to claim 1, characterized in that, The text correction model is trained through the following steps: Obtain the preset model and the multiple sample pairs; The erroneous text is corrected using the preset model to obtain the corresponding corrected text. Based on the difference between the corrected text and the correct text, the parameters of the preset model are adjusted to obtain the text correction model.

7. The method according to claim 1 or 6, characterized in that, The sample pairs were obtained through the following steps: Obtain the correct text; Using the correct text and the preset prompt text as input, a large language model is used to introduce grammatical errors into the correct text to obtain candidate text; Based on a preset word library, the candidate text is subjected to phonetic similarity errors and / or shape similarity errors to obtain the erroneous text; The sample pair is constructed based on the correct text and the incorrect text.

8. A text correction device, characterized in that, The device includes: First text processing module: used to take the text to be processed as input, and output the candidate text corresponding to the text to be processed using a text error correction model. The text error correction model is obtained by training based on multiple sample pairs. The sample pairs consist of erroneous text and corresponding correct text. The error type of the erroneous text includes at least one of the following: grammatical error type, phonetic error type of text fragment, and shape error type of text fragment. Text fragment determination module: used to determine target text fragments from the candidate text based on preset rules and preset knowledge graph. The target text fragments belong to target fragment groups. The target fragment groups are determined based on word segmentation processing of the candidate text. The target fragment groups are associated with target triples. The target triples are any triples among multiple preset triples in the preset knowledge graph. The second text processing module is used to perform queries in a preset query domain based on the query corpus to obtain corresponding query results, and to update the candidate text using the query results to obtain the target text. The query corpus is constructed based on the remaining text fragments in the target fragment group excluding the target text fragment.

9. An electronic device, characterized in that, The electronic device includes at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the at least one processor to implement the text correction method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one instruction or at least one program, which is loaded and executed by a processor to implement the text correction method as described in any one of claims 1-7.