A text processing method and device, electronic equipment, storage medium and program product

By identifying the target domain in text processing and using relevant string matching algorithms and a pre-defined vocabulary, the uncontrollability problem in existing text processing technologies is solved, achieving determinism and efficiency in text processing.

CN122242500APending Publication Date: 2026-06-19CHINA UNICOM (SHANGHAI) IND INTERNET CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA UNICOM (SHANGHAI) IND INTERNET CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing text processing technologies lack determinism and controllability when processing text, and cannot be applied to scenarios that require rule constraints or normalization.

Method used

By determining the target domain of the text to be processed, a string matching algorithm related to that domain is used to traverse the text, identify the target words to be processed, and replace them with target replacement words. By using a pre-set vocabulary list, the determinism and controllability of text processing are achieved.

Benefits of technology

It achieves determinism and controllability in the text processing process, improves text processing efficiency, and ensures the consistency of the processed text.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242500A_ABST
    Figure CN122242500A_ABST
Patent Text Reader

Abstract

This invention discloses a text processing method, apparatus, electronic device, storage medium, and program product. Specific implementation includes: determining the text to be processed and the string matching algorithm corresponding to the text; traversing the text to be processed based on the string matching algorithm to obtain at least one target word to be processed; replacing each target word in the text to be processed with a corresponding target replacement word to obtain the processed text. By traversing the text to be processed using the string matching algorithm to obtain the target word, parallel matching of the text to be processed is achieved, improving text processing efficiency. By using a preset vocabulary to replace the target words in the text to be processed with target replacement words, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a text processing method, apparatus, electronic device, storage medium, and program product. Background Technology

[0002] In the field of natural language processing, most current text processing technologies are based on word-by-word traversal, regular expressions, or manual rule judgment to achieve the scanning and replacement of words in text.

[0003] Existing text processing technologies replace scanned text with randomly or probabilistically generated replacement words. Therefore, the processing of text using these technologies is uncontrollable and unsuitable for scenarios requiring rule-based constraints or standardization. Summary of the Invention

[0004] This invention provides a text processing method, apparatus, electronic device, storage medium, and program product to achieve determinism and controllability in text processing and improve text processing efficiency.

[0005] According to one aspect of the present invention, a text processing method is provided, comprising: The text to be processed and the string matching algorithm corresponding to the text to be processed are determined, wherein the string matching algorithm is related to the target domain to which the text to be processed belongs; The text to be processed is traversed based on the string matching algorithm to obtain at least one target word to be processed, wherein the target word to be processed includes the words contained in the text to be processed; The text to be processed is replaced with each target word to be processed by replacing each target word with a corresponding target replacement word to obtain the processed text. The target replacement words include the replacement words corresponding to the target words to be processed indicated by a preset word list, and the preset word list is related to the target domain.

[0006] According to another aspect of the present invention, a text processing apparatus is provided, comprising: The determination module is used to determine the text to be processed and the string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm is related to the target domain to which the text to be processed belongs; The traversal module is used to traverse the text to be processed based on the string matching algorithm to obtain at least one target word to be processed, wherein the target word to be processed includes the words contained in the text to be processed; The replacement module is used to replace each of the target words to be processed contained in the text to be processed with the target replacement words corresponding to each of the target words to be processed, so as to obtain the processed text. The target replacement words include the replacement words corresponding to the target words to be processed indicated by a preset word list, and the preset word list is related to the target domain.

[0007] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the text processing method according to any embodiment of the present invention.

[0008] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the text processing method described in any embodiment of the present invention.

[0009] According to another aspect of the present invention, a computer program product is provided, the computer program product comprising a computer program that, when executed by a processor, implements the text processing method described in any embodiment of the present invention.

[0010] The technical solution of this invention involves determining the text to be processed and the string matching algorithm corresponding to the text; traversing the text to be processed based on the string matching algorithm to obtain at least one target word to be processed; and replacing each target word in the text to be processed with a corresponding target replacement word to obtain the processed text. By traversing the text to be processed using the string matching algorithm to obtain the target word, parallel matching of the text to be processed is achieved, improving text processing efficiency. By using a preset vocabulary to replace the target words in the text to be processed with target replacement words, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text.

[0011] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 This is a flowchart of a text processing method provided according to Embodiment 1 of the present invention; Figure 2 This is a flowchart of an algorithm determination method provided in Embodiment 2 of the present invention; Figure 3 This is a schematic diagram of the structure of a text processing device according to Embodiment 3 of the present invention; Figure 4 This is a block diagram of an electronic device provided according to Embodiment 4 of the present invention. Detailed Implementation

[0014] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0015] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0016] Example 1 Figure 1 This is a flowchart of a text processing method according to Embodiment 1 of the present invention. This embodiment is applicable to situations involving text processing. The method can be executed by a text processing device, which can be implemented in hardware and / or software. The text processing device can be configured in an electronic device, such as a PC or a server. Figure 1 As shown, the method includes: S110. Determine the text to be processed and the string matching algorithm corresponding to the text to be processed.

[0017] The string matching algorithm is related to the target domain to which the text to be processed belongs.

[0018] In this embodiment, the text to be processed can be understood as the text that needs to be replaced with words. The text to be processed can be a sentence or paragraph composed of at least one word. The string matching algorithm can be understood as an algorithm used to match the words contained in the text to be processed. The string matching algorithm can be an algorithm determined by the target domain to which the text to be processed belongs.

[0019] Specifically, the system receives text input manually by the user, which consists of at least one word. It then determines the target domain to which the text belongs, which can be determined by the source of the words comprising the text. Based on this target domain, it determines a string matching algorithm corresponding to that target domain, which performs matching searches on the words contained in the text.

[0020] For example, a string matching algorithm can be a domain-specific algorithm based on the Aho-Corasick automaton. The Aho-Corasick automaton algorithm includes state transition relations, mismatch pointers, and an output set to support parallel matching of multiple words.

[0021] S120. Based on the string matching algorithm, traverse the text to be processed to obtain at least one target word to be processed.

[0022] The target words to be processed include the words contained in the text to be processed.

[0023] In this embodiment, the target word to be processed can be understood as a word matched by a string matching algorithm in the text to be processed.

[0024] Specifically, a string matching algorithm iterates through the text to be processed character by character, identifying words in the text that match a pre-defined vocabulary list corresponding to the string matching algorithm, thus obtaining at least one target word contained in the text to be processed. The pre-defined vocabulary list can be understood as a pre-determined table for storing words in the target domain. This list can also serve as a carrier for information processing rules, which are the rules for processing the text to be processed.

[0025] S130. Replace each of the target words to be processed in the text to be processed with the target replacement words corresponding to each of the target words to be processed, and obtain the processed text.

[0026] The target replacement words include the replacement words corresponding to the target words to be processed as indicated by a preset word list, and the preset word list is related to the target domain.

[0027] In this embodiment, the target replacement word can be understood as the replacement word corresponding to the target word to be processed indicated by the preset word list. One target word to be processed can correspond to multiple target replacement words, and the target word to be processed and the target replacement words can have the same meaning. The processed text can be understood as the text obtained after word replacement of the text to be processed.

[0028] Specifically, the target replacement word corresponding to each target word in the preset vocabulary is determined, and each target word in the text to be processed is replaced with its corresponding target replacement word. After replacing each target word with its corresponding target replacement word, the processed text is obtained.

[0029] For example, when replacing a target word with a target replacement word, the longest matching principle is followed. That is, when one target word is completely contained by another target word, the longer target word is replaced with the corresponding target replacement word.

[0030] The technical solution of this invention involves determining the text to be processed and the string matching algorithm corresponding to the text; traversing the text to be processed based on the string matching algorithm to obtain at least one target word to be processed; and replacing each target word in the text to be processed with a corresponding target replacement word to obtain the processed text. By traversing the text to be processed using the string matching algorithm to obtain the target word, parallel matching of the text to be processed is achieved, improving text processing efficiency. By using a preset vocabulary to replace the target words in the text to be processed with target replacement words, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text.

[0031] Based on the above embodiments, modified embodiments of the above embodiments are proposed. It should be noted that, in order to keep the description brief, only the differences from the above embodiments are described in the modified embodiments.

[0032] In one embodiment, the step of traversing the text to be processed based on the string matching algorithm to obtain at least one target word to be processed includes: Based on the string matching algorithm, the following operations are performed: Determine the target word set corresponding to the string matching algorithm; The text to be processed is traversed character by character to determine at least one text word contained in the text to be processed; For each text word, determine the matching result between the text word and the target word set, wherein the matching result indicates whether the text word has successfully matched the target word set; If the matching result indicates that the text word successfully matches the target word set, the text word is identified as the target word to be processed, and the position information of the target word to be processed in the text to be processed is determined. An information tag is added at the position indicated by the position information.

[0033] In this embodiment, the target word set can be understood as the set of target words stored in the preset vocabulary corresponding to the string matching algorithm. Text words can be understood as words obtained from the text to be processed by traversing each character; text words can be the words that make up the text to be processed. Matching results can be understood as the matching results between text words and the target word set; the matching results are used to indicate whether the text words and the target words contained in the target word set are successfully matched. Position information can be understood as the position of the target word to be processed in the text to be processed. Information tags can be understood as the information of tags added at the position indicated by the target word to be processed in the text to be processed; tags can be parentheses, etc.

[0034] Specifically, the process involves a character-by-character traversal of the text to be processed based on a string matching algorithm: The target word set corresponding to the string matching algorithm is determined. Target words can be extracted from a pre-defined vocabulary list corresponding to the string matching algorithm, and the tree structure formed by these target words is used as the target word set. Next, the text to be processed is traversed character by character to obtain at least one text word that constitutes the text. For each text word, it is matched against the target words contained in the target word set, obtaining a matching result indicating whether the text word has successfully matched the target word set. If the matching result indicates that the text word has not successfully matched the target word set, it means that the text word does not need to be replaced; if the matching result indicates that the text word has successfully matched the target word set, it means that the text word needs to be replaced, and this text word is identified as the target word to be processed. The position information of the matched target word in the text to be processed is recorded, and an information label, such as parentheses, is added at the position indicated by the position information.

[0035] For example, the time complexity of a string matching algorithm when iterating through the text to be processed can be calculated. The string matching algorithm achieves parallel matching of multiple pattern strings by constructing a state transition function and mismatch pointers. Its matching time complexity is: Where N is the length of the text to be processed, M is the sum of the lengths of all target words in the preset vocabulary, and Z is the number of target words matched in the text to be processed.

[0036] In one embodiment, replacing each of the target words to be processed in the text to be processed with the target replacement words corresponding to each of the target words to be processed to obtain the processed text includes: For each target word contained in the text to be processed, the target replacement word corresponding to the target word is searched in the preset word list, and the replacement information corresponding to the target word is determined. Based on the replacement information, the replacement processing rule corresponding to the target word is determined. The replacement information includes the position information and information tag corresponding to the target word. The replacement processing rule is used to replace the target word with the target replacement word. Based on the replacement processing rules, each of the target words to be processed contained in the text to be processed is replaced with the target replacement word corresponding to each target word to be processed, so as to obtain the processed text.

[0037] In this embodiment, replacement information can be understood as information used to replace the target word to be processed with the target replacement word. Replacement information may include position information and information tags. Replacement processing rules can be understood as rules formulated based on replacement information, such as tag priority rules or matching start position priority rules.

[0038] Specifically, for each target word to be processed, the corresponding target replacement word is searched in a preset vocabulary. One target word in the preset vocabulary can correspond to multiple target replacement words, and one of these can be selected. The position information and information tags corresponding to the target word are used as the replacement information. The replacement processing rule corresponding to the target word is determined based on the replacement information; this can be a matching start position priority rule determined by the position information, or a tag priority rule determined by the tag information. According to the replacement processing rule corresponding to each target word, the target word is replaced with its corresponding target replacement word. After all target words in the text to be processed have been replaced, the processed text is obtained.

[0039] Optionally, determining the replacement processing rule corresponding to the target word to be processed based on the replacement information includes: Determine the location information and information tags included in the replacement information; If the location information and the location indicated by the information tag are the same, the replacement processing rule is determined to be to replace the target word to be processed based on the location information; If the location information and the location indicated by the information tag are different, the replacement processing rule is determined to be to replace the target word to be processed based on the information tag.

[0040] Specifically, if the locations indicated by the location information and the information tags are the same, it means that there is no overlap or conflict in the positions of the target words to be processed. The replacement rule is then determined to be based on the location information, i.e., a starting position priority rule. If the locations indicated by the location information and the information tags are different, it means that there is overlap or conflict in the positions of the target words to be processed. The replacement rule is then determined to be based on the information tags, i.e., a tag priority rule.

[0041] Example 2 Figure 2 This is a flowchart of an algorithm determination method according to Embodiment 2 of the present invention. This embodiment focuses on the string matching algorithm determination method described in the above embodiment. Figure 2 As shown, the method includes: S210. Determine the target domain to which the text to be processed belongs.

[0042] Specifically, the text words that make up the text to be processed are identified, and the target domain is determined by comprehensively considering the source of each text word.

[0043] S220. Determine a set of matching algorithms, wherein the set of matching algorithms includes at least one candidate string matching algorithm.

[0044] In this embodiment, the matching algorithm set can be understood as a collection used to store candidate string matching algorithms. These candidate string matching algorithms can be algorithms used to traverse and match text from different domains.

[0045] Specifically, algorithms for traversing and matching text in different domains are identified as candidate string matching algorithms. Existing candidate string matching algorithms are then stored in a matching algorithm set.

[0046] S230. If there is a target string matching algorithm in the matching algorithm set that is related to the target domain, then the target string matching algorithm shall be used as the string matching algorithm corresponding to the text to be processed.

[0047] In this embodiment, the target string matching algorithm can be understood as the candidate string matching algorithm that is related to the target domain among the candidate string matching algorithms.

[0048] Specifically, if among the candidate string matching algorithms included in the matching algorithm set, there exists a target string matching algorithm that is related to the target domain, it means that the algorithm used to traverse the text to be processed already exists, and the target string matching algorithm can be used as the string matching algorithm corresponding to the text to be processed.

[0049] S240. Otherwise, construct a string matching algorithm corresponding to the text to be processed based on the target domain.

[0050] Specifically, if none of the candidate string matching algorithms in the matching algorithm set are related to the target domain, it means that no algorithm exists for traversing the text to be processed. In this case, a string matching algorithm corresponding to the text to be processed needs to be constructed based on a pre-defined vocabulary corresponding to the target domain. This string matching algorithm can then be stored in the matching algorithm set as the string matching algorithm corresponding to that target domain.

[0051] Optionally, the step of constructing a string matching algorithm corresponding to the text to be processed based on the target domain includes: Determine a preset vocabulary list corresponding to the target domain, wherein the preset vocabulary list contains at least one preset target word; The tree structure composed of the at least one preset target word is used as the target word set; Construct a string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm uses the target word set when traversing the text to be processed.

[0052] In this embodiment, the preset target word can be understood as the target word stored in the preset word list, and the preset target word can be a word in the target field.

[0053] For example, a preset vocabulary is constructed, which includes at least one preset target word. The preset target word can be custom external information, and it belongs to the same target domain as the text to be processed. The tree structure formed by the preset target words in the preset vocabulary is used as the target word set, and a string matching algorithm corresponding to the text to be processed is constructed based on this target word set. This string matching algorithm uses the target word set when traversing the text to be processed; that is, the string matching algorithm uses the preset target words in this target word set as the basis for traversal.

[0054] Optionally, determining the preset vocabulary corresponding to the target domain includes: Determine at least one preset target word corresponding to the target frequency domain, wherein the domain to which the preset target word belongs is the target domain; For each preset target word, find the corresponding preset replacement word, where the domain of the preset replacement word is the target domain; Each preset target word and its corresponding preset replacement word are stored in the preset word list corresponding to the target domain.

[0055] In this embodiment, the preset replacement word can be understood as the replacement word corresponding to the preset target word stored in the preset word list. Both the preset replacement word and the preset target word are words in the target field. The preset replacement word can be used to replace the preset target word.

[0056] Specifically, at least one preset target word is identified in the target frequency domain. For each preset target word, preset replacement words in the same domain that can be used to replace the preset target word are found. Multiple preset replacement words can be found for each preset target word. Each preset target word and its corresponding preset replacement words are stored in a preset vocabulary list corresponding to the target domain. The preset vocabulary list can be determined before processing the text to be processed, thus ensuring that the subsequent replacement process of the text to be processed is constrained by the preset vocabulary list.

[0057] S250. Based on the string matching algorithm, traverse the text to be processed to obtain at least one target word to be processed.

[0058] S260. Replace each of the target words to be processed in the text to be processed with the target replacement words corresponding to each target word to obtain the processed text.

[0059] The technical solution of this invention involves determining the target domain to which the text to be processed belongs; determining a set of matching algorithms; if the set of matching algorithms contains a target string matching algorithm related to the target domain, then that target string matching algorithm is used as the string matching algorithm corresponding to the text to be processed; otherwise, a string matching algorithm corresponding to the text to be processed is constructed based on the target domain. By determining the string matching algorithm corresponding to the target domain, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text, and parallel matching of the text to be processed is realized through the string matching algorithm.

[0060] Example 3 Figure 3 This is a schematic diagram of the structure of a text processing device according to Embodiment 3 of the present invention. Figure 3 As shown, the device includes: The determining module 310 is used to determine the text to be processed and the string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm is related to the target domain to which the text to be processed belongs; The traversal module 320 is used to traverse the text to be processed based on the string matching algorithm to obtain at least one target word to be processed, wherein the target word to be processed includes the words contained in the text to be processed; The replacement module 330 is used to replace each of the target words to be processed contained in the text to be processed with the target replacement words corresponding to each of the target words to be processed, so as to obtain the processed text. The target replacement words include the replacement words corresponding to the target words to be processed indicated by a preset word list, and the preset word list is related to the target domain.

[0061] The text processing apparatus provided in this embodiment of the invention determines the text to be processed and the string matching algorithm corresponding to the text through a determining module; it traverses the text to be processed based on the string matching algorithm through a traversal module to obtain at least one target word to be processed; and a replacement module replaces each target word in the text to be processed with a corresponding target replacement word to obtain the processed text. Through the cooperation between the modules, the string matching algorithm traverses the text to be processed to obtain the target words, achieving parallel matching of the text to be processed and improving text processing efficiency. By using a preset vocabulary to replace the target words in the text to be processed with target replacement words, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text.

[0062] In one embodiment, the determining module 310 includes: The first determining unit is used to determine the target domain to which the text to be processed belongs; The second determining unit is used to determine a set of matching algorithms, wherein the set of matching algorithms includes at least one candidate string matching algorithm; A matching unit is configured to use a target string matching algorithm as the string matching algorithm corresponding to the text to be processed if there is a target string matching algorithm in the matching algorithm set that is related to the target domain. The construction unit is used to otherwise construct a string matching algorithm corresponding to the text to be processed based on the target domain.

[0063] In one embodiment, the building unit includes: A subunit is defined to determine a preset vocabulary list corresponding to the target domain, wherein the preset vocabulary list contains at least one preset target word; The subunit is used to form a tree structure composed of at least one preset target word as a target word set; A subunit is constructed to construct a string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm uses the target word set when traversing the text to be processed.

[0064] In one embodiment, the sub-unit is determined specifically for: Determine at least one preset target word corresponding to the target frequency domain, wherein the domain to which the preset target word belongs is the target domain; For each preset target word, find the corresponding preset replacement word, where the domain of the preset replacement word is the target domain; Each preset target word and its corresponding preset replacement word are stored in the preset word list corresponding to the target domain.

[0065] In one embodiment, traversal module 320 is specifically used for: Based on the string matching algorithm, the following operations are performed: Determine the target word set corresponding to the string matching algorithm; The text to be processed is traversed character by character to determine at least one text word contained in the text to be processed; For each text word, determine the matching result between the text word and the target word set, wherein the matching result indicates whether the text word has successfully matched the target word set; If the matching result indicates that the text word successfully matches the target word set, the text word is identified as the target word to be processed, and the position information of the target word to be processed in the text to be processed is determined. An information tag is added at the position indicated by the position information.

[0066] In one embodiment, replacing module 330 includes: The third determining unit is used to search for the target replacement word corresponding to each target word to be processed contained in the text to be processed in the preset word list, and determine the replacement information corresponding to the target word to be processed, and determine the replacement processing rule corresponding to the target word to be processed based on the replacement information. The replacement information includes the position information and information tag corresponding to the target word to be processed, and the replacement processing rule is used to replace the target word to be processed with the target replacement word. The replacement unit is used to replace each of the target words to be processed contained in the text to be processed with the target replacement words corresponding to each of the target words to be processed, based on the replacement processing rules, so as to obtain the processed text.

[0067] In one embodiment, the third determining unit is specifically used for: Determine the location information and information tags included in the replacement information; If the location information and the location indicated by the information tag are the same, the replacement processing rule is determined to be to replace the target word to be processed based on the location information; If the location information and the location indicated by the information tag are different, the replacement processing rule is determined to be to replace the target word to be processed based on the information tag.

[0068] The text processing device provided in this embodiment of the invention can execute the text processing method provided in any embodiment of the invention. Through the cooperation and collaborative work between the modules, the text processing is completed, and it has the corresponding functional modules and beneficial effects of the execution method.

[0069] Example 4 According to embodiments of the present invention, the present invention also provides an electronic device, a computer-readable storage medium, and a computer program product.

[0070] Figure 4 This is a block diagram of an electronic device according to Embodiment 4 of the present invention, which implements the text processing method described in the embodiments of the present invention. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (such as helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0071] like Figure 4 As shown, the electronic device 410 includes at least one processor 411 and a memory, such as a read-only memory (ROM) 412 or a random access memory (RAM) 413, communicatively connected to the at least one processor 411. The memory stores computer programs executable by the at least one processor. The processor 411 can perform various appropriate actions and processes based on the computer program stored in the ROM 412 or loaded from storage unit 418 into the RAM 413. The RAM 413 may also store various programs and data required for the operation of the electronic device 410. The processor 411, ROM 412, and RAM 413 are interconnected via a bus 414. An input / output (I / O) interface 415 is also connected to the bus 414.

[0072] Multiple components in the electronic device are connected to the I / O interface 415, including: an input unit 416, such as a keyboard, mouse, etc.; an output unit 417, such as various types of displays, speakers, etc.; a storage unit 418, such as a disk, optical disk, etc.; and a communication unit 419, such as a network card, modem, wireless transceiver, etc. The communication unit 419 allows the electronic device to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0073] Processor 411 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 411 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 411 performs the various methods and processes described above, such as text processing methods.

[0074] In some embodiments, the text processing method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 418. In some embodiments, part or all of the computer program may be loaded and / or installed on electronic device 410 via ROM 412 and / or communication unit 419. When the computer program is loaded into RAM 413 and executed by processor 411, one or more steps of the text processing method described above may be performed. Alternatively, in other embodiments, processor 411 may be configured to perform the text processing method by any other suitable means (e.g., by means of firmware).

[0075] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0076] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0077] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0078] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0079] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0080] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0081] In some embodiments, the computer program product includes a computer program that, when executed by a processor, implements the text processing method provided in the embodiments of the present invention.

[0082] The technical solution of this invention provides a text processing method, apparatus, electronic device, storage medium, and program product. It involves determining the text to be processed and the string matching algorithm corresponding to the text; traversing the text to be processed based on the string matching algorithm to obtain at least one target word; and replacing each target word in the text to be processed with a corresponding target replacement word to obtain the processed text. By using the string matching algorithm to traverse the text to obtain the target word, parallel matching of the text to be processed is achieved, improving text processing efficiency. By using a preset vocabulary to replace the target words in the text to be processed with target replacement words, the determinism and controllability of the text processing process are achieved, ensuring the consistency of the processed text.

[0083] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0084] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A text processing method, characterized in that, include: The text to be processed and the string matching algorithm corresponding to the text to be processed are determined, wherein the string matching algorithm is related to the target domain to which the text to be processed belongs; The text to be processed is traversed based on the string matching algorithm to obtain at least one target word to be processed, wherein the target word to be processed includes the words contained in the text to be processed; The text to be processed is replaced with each target word to be processed by replacing each target word with a corresponding target replacement word to obtain the processed text. The target replacement words include the replacement words corresponding to the target words to be processed indicated by a preset word list, and the preset word list is related to the target domain.

2. The method according to claim 1, characterized in that, The algorithm for determining the text to be processed and the string matching algorithm corresponding to the text to be processed includes: Determine the target domain to which the text to be processed belongs; Determine a set of matching algorithms, wherein the set of matching algorithms includes at least one candidate string matching algorithm; If there is a target string matching algorithm in the matching algorithm set that is related to the target domain, then the target string matching algorithm is used as the string matching algorithm corresponding to the text to be processed; Otherwise, a string matching algorithm corresponding to the text to be processed is constructed based on the target domain.

3. The method according to claim 2, characterized in that, The string matching algorithm for constructing the text to be processed based on the target domain includes: Determine a preset vocabulary list corresponding to the target domain, wherein the preset vocabulary list contains at least one preset target word; The tree structure composed of the at least one preset target word is used as the target word set; Construct a string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm uses the target word set when traversing the text to be processed.

4. The method according to claim 3, characterized in that, The step of determining the preset vocabulary corresponding to the target domain includes: Determine at least one preset target word corresponding to the target frequency domain, wherein the domain to which the preset target word belongs is the target domain; For each preset target word, find the corresponding preset replacement word, where the domain of the preset replacement word is the target domain; Each preset target word and its corresponding preset replacement word are stored in the preset word list corresponding to the target domain.

5. The method according to claim 1, characterized in that, The step of traversing the text to be processed based on the string matching algorithm to obtain at least one target word to be processed includes: Based on the string matching algorithm, the following operations are performed: Determine the target word set corresponding to the string matching algorithm; The text to be processed is traversed character by character to determine at least one text word contained in the text to be processed; For each text word, determine the matching result between the text word and the target word set, wherein the matching result indicates whether the text word has successfully matched the target word set; If the matching result indicates that the text word successfully matches the target word set, the text word is identified as the target word to be processed, and the position information of the target word to be processed in the text to be processed is determined. An information tag is added at the position indicated by the position information.

6. The method according to claim 1, characterized in that, The step of replacing each of the target words in the text to be processed with the target replacement words corresponding to each of the target words to be processed, to obtain the processed text, includes: For each target word contained in the text to be processed, the target replacement word corresponding to the target word is searched in the preset word list, and the replacement information corresponding to the target word is determined. Based on the replacement information, the replacement processing rule corresponding to the target word is determined. The replacement information includes the position information and information tag corresponding to the target word. The replacement processing rule is used to replace the target word with the target replacement word. Based on the replacement processing rules, each of the target words to be processed contained in the text to be processed is replaced with the target replacement word corresponding to each target word to be processed, so as to obtain the processed text.

7. The method according to claim 6, characterized in that, The step of determining the replacement processing rule corresponding to the target word to be processed based on the replacement information includes: Determine the location information and information tags included in the replacement information; If the location information and the location indicated by the information tag are the same, the replacement processing rule is determined to be to replace the target word to be processed based on the location information; If the location information and the location indicated by the information tag are different, the replacement processing rule is determined to be to replace the target word to be processed based on the information tag.

8. A text processing device, characterized in that, include: The determination module is used to determine the text to be processed and the string matching algorithm corresponding to the text to be processed, wherein the string matching algorithm is related to the target domain to which the text to be processed belongs; The traversal module is used to traverse the text to be processed based on the string matching algorithm to obtain at least one target word to be processed, wherein the target word to be processed includes the words contained in the text to be processed; The replacement module is used to replace each of the target words to be processed contained in the text to be processed with the target replacement words corresponding to each of the target words to be processed, so as to obtain the processed text. The target replacement words include the replacement words corresponding to the target words to be processed indicated by a preset word list, and the preset word list is related to the target domain.

9. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the text processing method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause a processor to execute the text processing method according to any one of claims 1-7.

11. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the text processing method according to any one of claims 1-7.