Text replacement method, apparatus, and computer readable storage medium
By combining node size rules and string matching rules in the text replacement process, a text replacement method is developed that identifies and replaces node objects that conform to the rules, and performs character replacement on node objects that do not conform to the rules. This solves the problem of low efficiency in existing text replacement technologies and achieves more efficient text replacement.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-10-14
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies for literal text translation using brute-force character matching have low success rates and low efficiency, requiring frequent backtracking of character positions for re-matching.
The text replacement method, based on preset node size rules and string matching rules, identifies and replaces node objects that conform to the rules, performs character replacement on node objects that do not conform to the rules, and merges the node information replacement and character replacement text.
It improves the success rate and efficiency of text replacement, avoids backtracking caused by character matching failures, and achieves a more efficient text replacement process.
Smart Images

Figure CN115983290B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and specifically to a text replacement method, apparatus, and computer-readable storage medium. Background Technology
[0002] Literal text translation refers to translating text while preserving its original content and form, such as translating English content into Chinese or Chinese into other languages. To achieve literal text translation of World Wide Web (WWW) web page content, current technology uses brute-force character matching to replace individual characters.
[0003] In the process of researching and practicing the existing technology, the inventors of this application found that when the existing technology performs character matching and replacement by brute-force character matching, it is necessary to match each character. This matching and replacement method has a low success rate, and when the matching fails, it is necessary to backtrack to the corresponding character position and rematch, which takes a lot of time and reduces the efficiency of text replacement. Summary of the Invention
[0004] This application provides a text replacement method, apparatus, and computer-readable storage medium. These improvements can increase the success rate of character matching and replacement, and enhance text replacement efficiency.
[0005] This application provides a text replacement method, including:
[0006] Based on the preset service language environment, the text data to be replaced is read from the display page;
[0007] Node identification is performed on the text data to be replaced to obtain multiple node objects;
[0008] Replace the node information of the target node object that conforms to the preset node size rules among the multiple node objects to obtain the first text;
[0009] When it is detected that the plurality of node objects contain a node object to be processed that does not conform to the preset node size rule, the data to be processed corresponding to the node object to be processed is obtained, and the target data in the data to be processed that conforms to the preset string matching rule is replaced with characters to obtain the second text.
[0010] The first text and the second text are merged to obtain the target replacement text corresponding to the text data to be replaced.
[0011] Accordingly, embodiments of this application provide a text replacement device, including:
[0012] The reading unit is used to read the text data to be replaced from the display page based on the preset service language environment;
[0013] The identification unit is used to identify nodes in the text data to be replaced, and obtain multiple node objects;
[0014] The first replacement unit is used to replace the node information of the target node object that conforms to the preset node size rules among the plurality of node objects to obtain the first text.
[0015] The second replacement unit is used to obtain the data to be processed corresponding to the node object to be processed when it is detected that the plurality of node objects contain a node object to be processed that does not conform to the preset node size rule, and to replace the target data in the data to be processed that conforms to the preset string matching rule with characters to obtain the second text.
[0016] The fusion unit is used to fuse the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0017] In some embodiments, the identification unit is further configured to:
[0018] Read multiple element node data from the text data to be replaced;
[0019] Based on a preset document information search language, the data of each element node is parsed to obtain the node object corresponding to each element node data.
[0020] Based on the node object corresponding to each element node data, determine multiple node objects to be replaced.
[0021] In some implementations, the preset node size rule includes a preset node length threshold, and the first replacement unit is further configured to:
[0022] Obtain the path information corresponding to each of the plurality of node objects;
[0023] The node length of each node object is determined based on the path information;
[0024] The node objects whose node length is greater than the preset node length threshold are identified as target node objects, and the node information of the target node objects is replaced to obtain the first text.
[0025] In some embodiments, the second replacement unit is further configured to:
[0026] Obtain the original text data from the text data to be replaced;
[0027] The data to be processed is compared character by character with the original data to obtain the character comparison result;
[0028] Based on the character comparison results, target data that conforms to the preset string matching rules in the data to be processed is determined;
[0029] The target data is then subjected to character replacement.
[0030] In some embodiments, the second replacement unit is further configured to:
[0031] The character comparison results are parsed to obtain the character matching degree between the data to be processed and the original data;
[0032] Extract the first data from the data to be processed, where the character matching degree is greater than or equal to the character error tolerance threshold;
[0033] Obtain a first sub-data segment in the first data that matches the original data, and obtain a second sub-data segment in the first data that does not match the original data;
[0034] The second sub-data segment whose character length is less than or equal to the character fault tolerance length threshold is determined as the fault-tolerant data segment, and the fault-tolerant data segment is merged with the first sub-data segment to obtain the target data.
[0035] In some embodiments, the text replacement device further includes a third replacement unit for:
[0036] When the data to be processed is detected to contain second target data that does not conform to the preset string matching rules, the second target data that does not conform to the preset string matching rules is machine translated to obtain the third text;
[0037] The fusion unit is further configured to: fuse the first text, the second text, and the third text to obtain the target replacement text corresponding to the text data to be replaced.
[0038] In some embodiments, the reading unit is further configured to:
[0039] Based on a preset service language environment, the system receives text parameters to be replaced, which include project identifier, file format, target data type, and language type.
[0040] Retrieve the project to be converted corresponding to the project identifier displayed on the page;
[0041] Extract the target format file corresponding to the file format in the project to be converted;
[0042] Read target type data from the target format file that corresponds to the target data type;
[0043] Configure the target type data according to the language type to obtain the text data to be replaced.
[0044] In some embodiments, the text replacement device further includes an updating unit for:
[0045] When it is detected that the data to be processed contains second target data that does not conform to the preset string matching rules, extract the second target sub-data that conforms to the preset character error tolerance rules from the second target data, and create a language data file corresponding to the second target sub-data;
[0046] Read the character error tolerance parameters from the language data file;
[0047] The preset string matching rule is updated according to the character error tolerance parameter to obtain the updated preset string matching rule;
[0048] Based on the updated preset string matching rules, character replacement is performed on the second target sub-data to obtain the target sub-text;
[0049] The first text, the second text, and the target subtext are merged to obtain the target replacement text.
[0050] Furthermore, embodiments of this application also provide a computer device, including a processor and a memory, wherein the memory stores an application program, and the processor is used to run the application program in the memory to implement the steps in the text replacement method provided in embodiments of this application.
[0051] Furthermore, embodiments of this application also provide a computer-readable storage medium storing a plurality of instructions adapted for loading by a processor to execute steps in any of the text replacement methods provided in embodiments of this application.
[0052] Furthermore, embodiments of this application also provide a computer program product, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the steps of any of the text replacement methods provided in embodiments of this application.
[0053] This application embodiment can read the text data to be replaced from the display page based on a preset service language environment; perform node recognition on the text data to be replaced to obtain multiple node objects; replace the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; when it is detected that there are unprocessed node objects that do not conform to the preset node size rules among the multiple node objects, obtain the unprocessed data corresponding to the unprocessed node objects, and replace the target data that conforms to the preset string matching rules in the unprocessed data with characters to obtain the second text; merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. Thus, this application embodiment, by performing node recognition on the text data to be replaced, facilitates the replacement of node information of node objects that meet the requirements. Furthermore, when there are some unqualified node objects, character matching and replacement are performed on the data of the node objects, and the text obtained by node information replacement and character replacement are merged to obtain the target replacement text of the text data to be replaced. In this way, the phenomenon of backtracking to the corresponding character due to character matching failure is avoided, effectively improving the matching efficiency. Moreover, by combining node matching and character matching, the success rate and efficiency of text replacement are improved. Attached Figure Description
[0054] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0055] Figure 1 This is a schematic diagram of a scenario for the text replacement system provided in an embodiment of this application;
[0056] Figure 2 This is a flowchart illustrating the steps of the text replacement method provided in the embodiments of this application;
[0057] Figure 3 This is a flowchart illustrating another step of the text replacement method provided in this application embodiment;
[0058] Figure 4 This is a schematic diagram of a scenario for the text replacement method provided in an embodiment of this application;
[0059] Figure 5 This is a schematic diagram of the structure of the text replacement device provided in the embodiments of this application;
[0060] Figure 6 This is a schematic diagram of the structure of the computer device provided in the embodiments of this application. Detailed Implementation
[0061] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0062] This application provides a text replacement method, apparatus, and computer-readable storage medium. Specifically, this application will describe the text replacement apparatus from the perspective of the text replacement apparatus, which can be integrated into a computer device, such as a server or a terminal. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, etc., but is not limited to these. The terminal and server can be directly or indirectly connected via wired or wireless communication, which is not limited herein.
[0063] The solutions provided in this application involve technologies such as text replacement, and can be applied to various scenarios such as cloud technology, AI, smart transportation, and vehicle systems. Specific examples are provided below:
[0064] For example, see Figure 1 This is a schematic diagram of a scenario for a text replacement system provided in an embodiment of this application. The scenario includes a terminal or a server.
[0065] The terminal or server can read the text data to be replaced from the display page based on a preset service language environment; perform node recognition on the text data to be replaced to obtain multiple node objects; replace the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; when it is detected that the multiple node objects contain a node object to be processed that does not conform to the preset node size rules, obtain the data to be processed corresponding to the node object to be processed, and replace the target data that conforms to the preset string matching rules in the data to be processed with characters to obtain the second text; merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0066] Text replacement can include processing methods such as reading the text data to be replaced, node recognition, text replacement, character replacement, text fusion to obtain the running path of the target program, generating target test cases, selecting test accounts, determining test scripts, and starting test scripts.
[0067] The following sections provide detailed descriptions of each example. It should be noted that the order of the following embodiments is not intended to limit the preferred order of the embodiments.
[0068] In this application embodiment, the description will focus on the user interface testing text replacement device, which can be integrated into a computer device such as a terminal or server. See also Figure 2 , Figure 2 This application provides a flowchart illustrating the steps of a user interface testing text replacement method. Taking an example where an information processing device is specifically integrated into a terminal, the specific process when the processor on the terminal or server executes the program corresponding to the information processing method is as follows:
[0069] 101. Based on the preset service language environment, read the text data to be replaced from the display page.
[0070] The preset service language environment can be the scripting language environment on the electronic device side where the control command service is deployed. This environment is used to determine and respond to relevant language control commands to achieve command interaction. For example, if the server is an electronic device and JavaScript is used as the server's scripting language, then the server's language environment can be a Node.js environment, which is the aforementioned service language environment.
[0071] The display page can be an application or browser page, and it may contain content to be replaced. For example, the display page may contain English content to be replaced (translated) into Chinese, and it may also contain content to be replaced in other languages.
[0072] The text data to be replaced can be the text data displayed on the page that needs to be replaced or translated. This text data can be in Hypertext Markup Language (HTML) format or Lightweight Data Interaction (JSON) format. It should be noted that this text data to be replaced can include source text, translated text, and elementXpath data.
[0073] To achieve text replacement, this application embodiment needs to obtain the text data to be replaced, so that the text data can be replaced / translated subsequently. For example, this application embodiment can use a command interaction service in a preset service language environment to select the corresponding item, data format, data source address, language, etc. from the display page through language control commands to read the text data to be replaced, so as to facilitate subsequent replacement / translation.
[0074] In some implementations, the step "reading the text data to be replaced from the display page based on a preset service language environment" may include:
[0075] (1) Based on the preset service language environment, receive the text parameters to be replaced, which include the project identifier, file format, target data type and language type;
[0076] (2) Obtain the project to be converted corresponding to the project identifier in the display page;
[0077] (3) Extract the target format file corresponding to the file format in the project to be converted;
[0078] (4) Read the target type data that matches the target data type from the target format file;
[0079] (5) Configure the target type data according to the language type to obtain the text data to be replaced.
[0080] The text parameter to be replaced can be a parameter of the relevant text during text replacement, which limits the text data to be replaced. For example, the text parameter to be replaced may include project identifier, file format, target data type, and language type.
[0081] To determine the text data to be replaced, this embodiment of the application can receive the text parameters to be replaced set under the command interaction service based on the preset service language environment, thereby determining the text data to be replaced. Specifically, the project to be replaced corresponding to the project identifier is selected from the display page of the application or browser, and the target format text of the corresponding text format is extracted from the project to be replaced. Then, the target type data of the corresponding data type is read from the target format file, and the language type is configured for the target type data to obtain the text data to be replaced.
[0082] The above methods are used to read the text data to be replaced, which facilitates subsequent replacement / translation, etc.
[0083] 102. Perform node identification on the text data to be replaced to obtain multiple node objects.
[0084] The node object can be the node corresponding to the relevant element on the display page. For example, a display page can contain multiple elements of different types, each element corresponding to a node, such as a display page containing a title, text, elements, attributes, etc.
[0085] It's important to note that the Document Object Model (DOM) connects web pages with scripts or programming languages to model HTML or XML documents as objects. It's worth noting that this programming language can refer to JavaScript, but the objects are not part of the JavaScript language itself. The DOM represents a document using a logical tree, where each branch of the tree is a node, and each node contains objects.
[0086] In order to identify the node objects corresponding to the text data to be replaced, this embodiment of the application needs to perform node identification on the text data to be replaced after obtaining it, so as to identify the corresponding node objects, so as to facilitate the subsequent text replacement of each identified node object.
[0087] Specifically, in some implementations, the step "to identify nodes in the text data to be replaced and obtain multiple node objects" may include:
[0088] (1) Read multiple element node data from the text data to be replaced;
[0089] (2) Based on the preset document information search language, the data of each element node is parsed to obtain the node object corresponding to each element node data;
[0090] (3) Determine the multiple node objects to be replaced based on the node object corresponding to the node data of each element node.
[0091] The element node data can be the data of the corresponding tag in the text data to be replaced, representing the data to be replaced contained in the text data. For example, taking JSON data as the text data to be replaced, the element node data can be elementXPath data.
[0092] The document information search language can be a language for finding information in a document. This language can accurately locate information within the document and can be used to match strings, numbers, times, as well as process node objects, sequences, etc.
[0093] Specifically, in order to identify the node objects corresponding to the text data to be replaced, this embodiment of the application, after obtaining the text data to be replaced, reads the element node data in the specified text data to be replaced; based on the preset document information search language in the server node, it parses the read element node data to obtain the node object corresponding to each element node data; then, based on the node object corresponding to each element node data, it obtains multiple node objects corresponding to the text data to be replaced. In this way, multiple node objects to be matched are obtained, so that the multiple node objects can be matched and replaced in the subsequent process to obtain the corresponding text.
[0094] 103. Replace the node information of the target node object that conforms to the preset node size rules among multiple node objects to obtain the first text.
[0095] The preset node size rule can be a rule that limits the replacement of node objects, used to match node objects that can be replaced with text data. For example, the preset node size rule can include a node length threshold. By matching node objects with the preset node length threshold, node objects that meet the node length threshold are obtained, so that text replacement can be performed on the target node objects that meet the node length threshold in the future.
[0096] In order to replace text on node objects, this embodiment of the application, after obtaining multiple node objects corresponding to the text data to be replaced, detects each node object according to a preset node size rule, obtains the detection result, and determines the target node object that can be matched and replaced based on the detection result. Thus, the target node object that conforms to the preset node size rule is selected from multiple node objects for text replacement, so as to obtain the partial replacement text of the node object that meets the requirements in the text data to be replaced.
[0097] In some implementations, the preset node size rule may include a preset node length threshold. Therefore, the step "replacing the node information of target node objects that conform to the preset node size rule among multiple node objects to obtain the first text" may include:
[0098] (1) Obtain the path information corresponding to each node object in multiple node objects;
[0099] (2) Determine the node length of each node object based on the path information;
[0100] (3) Node objects whose node length is greater than the preset node length threshold are identified as target node objects, and node information is replaced on the target node objects to obtain the first text.
[0101] The path information can be the path information composed of multiple nodes.
[0102] The node length can be represented by the length of the set of child nodes corresponding to each node object, which can reflect whether the node object has been successfully parsed and whether the node object can be retrieved.
[0103] The preset node length threshold can be used as a threshold for filtering node objects to be replaced, and is used to select the length of the child node set of the node object in order to obtain the target node object that has been successfully parsed.
[0104] Specifically, in order to perform text replacement on node objects, after obtaining multiple node objects, this embodiment of the application can obtain the path information corresponding to each node object; and determine the node length of the corresponding node object based on the path information; then, determine whether the corresponding node object has been successfully parsed based on the node length. If the parsing is successful, the text data of the successfully parsed target node object is replaced. Specifically, a target node object with a node length greater than a preset node length threshold is selected, and the node information of the target node object is replaced to obtain the corresponding first text.
[0105] For example, by performing path retrieval on the information of each node object (XPath node information), a set of child nodes (ChildNodes) is obtained for each node object. The length of the corresponding node object is determined by this set of child nodes (ChildNodes), and this node length determines whether the current node object has been successfully parsed. If parsing is successful, it meets a preset node length threshold. For instance, if the length of the child node set (ChildNodes) is greater than 0, it means that the length of the node object meets the preset node length threshold, and text replacement can be performed on the target node object.
[0106] Furthermore, the process of "replacing the text of the target node object to obtain the first text" can be as follows: based on the document information search language, perform path retrieval on each target node object to obtain the target child node set corresponding to each target node object; based on the node value rules, obtain the target node value corresponding to each target child node set; and generate the first text according to the target node value corresponding to each target child node set.
[0107] For example, the select method provided in the npm xpath module can be used to perform path retrieval on each target node object to obtain the set of target child nodes corresponding to each target node object; the nodeValue property can be used to obtain the target node value of each target child node set, so as to generate the corresponding first text based on each target node value.
[0108] By using the above method, when multiple node objects corresponding to the text data to be replaced are obtained, it can be determined whether the node object has been successfully parsed based on the length of each node object. Then, the target node object that has been successfully parsed (node length is greater than the preset node length threshold) is selected from the multiple node objects, and the text data is replaced on the target node object. In this way, the data replacement of the successfully parsed target node objects in the text data to be replaced is realized, thereby improving the effectiveness and accuracy of the text data to be replaced during the replacement process.
[0109] 104. When multiple node objects are detected to contain node objects that do not conform to the preset node size rules, obtain the data to be processed corresponding to the node objects to be processed, and replace the target data in the data to be processed that conforms to the preset string matching rules with characters to obtain the second text.
[0110] The preset string matching rule can be a rule that limits the data that can be replaced by characters, and is used to match the data to be replaced. For example, the preset string matching rule can include a character error tolerance threshold and a character error tolerance length threshold. By selecting target data that meets the character error tolerance threshold and the character error tolerance length threshold, it is easier to perform character replacement on the target data in the future.
[0111] It should be noted that the embodiments of this application can perform text replacement through methods such as node object matching and replacement, and character matching and replacement. First, multiple node objects of the text data to be replaced are obtained, and target node objects that conform to preset node size rules are matched and replaced to improve the effectiveness and accuracy of text replacement. However, for node objects to be processed that do not conform to preset node size rules, the corresponding data to be processed can be obtained, and the corresponding data to be processed can be replaced by character replacement to improve the success rate of regular characters in text replacement.
[0112] Specifically, in order to replace text on node objects that do not conform to the preset node size rules, this application embodiment detects each node object based on the preset node size rules, and when a node object that does not conform to the preset node size rules is detected among multiple node objects, the node object that does not conform to the preset node size rules is determined as a node object to be processed; then, the data to be processed corresponding to the node object to be processed is obtained, and the target data in the data to be processed that conforms to the preset string matching rules is replaced with characters to obtain the second text obtained by the character replacement.
[0113] In some implementations, the step "replacing characters in target data that match preset string matching rules in the data to be processed" may include:
[0114] (1) Obtain the original text data from the text data to be replaced;
[0115] (2) Compare the characters of the data to be processed with the original data to obtain the character comparison results;
[0116] (3) Based on the character comparison results, determine the target data in the data to be processed that conforms to the preset string matching rules;
[0117] (4) Replace characters in the target data.
[0118] It should be noted that the text data to be replaced may include the original text data (sourceText), the translated text data (translateText), and node data (elementXpath).
[0119] The original text data can be the original text data to be replaced (sourceText) that has not undergone any text replacement operation. The original text data can be in JSON format, which may contain multiple characters or multiple strings of data information.
[0120] After the node information is replaced, the remaining node objects that do not conform to the preset node size rules are to be processed. The data to be processed corresponding to these node objects can be all or part of the data in the original text. This data to be processed can be in JSON format, which may contain multiple characters or multiple strings of data information.
[0121] Specifically, in order to replace characters in the data to be processed corresponding to node objects that do not conform to the preset node size rules, this embodiment of the application, after obtaining the data to be processed of the node objects, obtains the original text data in the text data to be replaced, compares the characters (or character segments) of the data to be processed with the target characters (or target character segments) in the original text data to obtain the character comparison results between the data to be processed and the original text data; further, based on the character comparison results, determines the target data in the data to be processed that conforms to the preset string matching rules, and performs character replacement on the target data.
[0122] In some implementations, the preset string matching rule includes a character error tolerance threshold and a character error tolerance length threshold. The step "Based on the comparison results, determine the target data in the data to be processed that conforms to the preset string matching rule" may include:
[0123] (3.1) Analyze the character comparison results to obtain the character matching degree between the data to be processed and the original data;
[0124] (3.2) Extract the first data from the data to be processed whose character matching degree is greater than or equal to the character error tolerance threshold;
[0125] (3.3) Obtain the first sub-data segment in the first data that matches the original data, and obtain the second sub-data segment in the first data that does not match the original data;
[0126] (3.4) The second sub-data segment whose character length is less than or equal to the character fault tolerance length threshold is determined as the fault tolerance data segment, and the fault tolerance data segment is merged with the first sub-data segment to obtain the target data.
[0127] The character matching degree refers to the degree of character matching between the characters in the data to be processed and the original data. It represents the proportion of characters in the data to be processed or a segment of characters that match characters in the original data. Specifically, the character matching degree can be determined by: obtaining the total number of characters in the data to be processed and obtaining the number of characters in the data to be processed that match characters in the original data; and using the ratio between the number of matching characters and the total number of characters in the data to be processed as the character matching degree.
[0128] The character error tolerance threshold can be a threshold that limits the degree of character matching in the data to be processed. It is used to match some characters or character segments in the data to be processed, so that characters or character segments that meet the character error tolerance threshold can be replaced in the future.
[0129] The character tolerance length threshold can be a threshold that limits the number of non-matching characters allowed in a certain number of characters or character segments. It is used to filter characters or character segments that meet the corresponding non-matching character quantity requirements, so that the characters or character segments that meet the requirements can be replaced in the future.
[0130] In order to determine the target data in the data to be processed that conforms to the preset string matching rules, after obtaining the character comparison result between the data to be processed and the original data, this embodiment of the application analyzes the character comparison result to obtain the character matching degree between the characters contained in the data to be processed and the original data; and extracts data from the data to be processed whose character matching degree is greater than or equal to the character error tolerance threshold, namely the first data mentioned above.
[0131] It should be noted that, since the first data is character data that meets the character error tolerance threshold, it contains characters that match and do not match the original data. That is, the first data contains a first sub-data segment that matches the original data and a second sub-data segment that does not match the original data. In order to ensure that the character data in the unmatched second sub-data segment can be replaced later, this embodiment of the application needs to filter the second sub-data segment. Specifically, the second sub-data segment with a character length less than or equal to the character error tolerance length threshold is determined as the error-tolerant data segment. It can be understood that the error-tolerant data segment is a character data segment that can meet the string matching rules and can be replaced later. Further, the first sub-data segment and the error-tolerant data segment are merged to obtain the target data to be replaced.
[0132] Using the above method, when a node object to be processed is detected that does not conform to the preset node size rules, target data that conforms to the preset string matching rules can be matched from the data to be processed of the node object. This allows for subsequent character replacement of the target data to obtain the corresponding replacement text, i.e., the second text.
[0133] 105. Merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0134] Specifically, in order to obtain the replacement text corresponding to the text data to be replaced, this embodiment of the application, after obtaining the first text obtained by replacing node information and the second text obtained by replacing characters, needs to merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. It can be understood that: for target node objects in the text data to be replaced that conform to the preset node size rules, node information replacement is performed; for node objects to be processed that do not conform to the preset node size rules, character replacement is performed on the target data in the corresponding data to be processed that conforms to the preset string matching rules; when all the data to be processed conforms to the preset string matching rules, the first text and the second text are merged to obtain the target replacement text corresponding to the text data to be replaced.
[0135] In some implementations, after the step "replacing characters in the target data that matches the preset string matching rules in the data to be processed", the following may be included:
[0136] A. When the data to be processed is detected to contain second target data that does not conform to the preset string matching rules, extract the second target sub-data that conforms to the preset character error tolerance rules from the second target data, and create the language data file corresponding to the second target sub-data;
[0137] B. Read the character error tolerance parameters from the language data file;
[0138] C. Update the preset string matching rules according to the character error tolerance parameters to obtain the updated preset string matching rules;
[0139] D. Based on the updated preset string matching rules, perform character replacement on the second target sub-data to obtain the target sub-text;
[0140] E. Merge the first text, the second text, and the target subtext to obtain the target replacement text.
[0141] The preset character error tolerance rule can be a rule for filtering valid data that fails to match. It is used to filter a portion of the data that does not conform to the preset string matching rule. The data filtered by this rule can be called "valid data that failed to match," and this "valid data that failed to match" is used for subsequent analysis and processing. For example, the preset character error tolerance rule can include a character error tolerance ratio range and a character error tolerance length range, and a second target sub-data that conforms to the character error tolerance ratio range and character error tolerance length range can be filtered.
[0142] The character error tolerance parameter can be a parameter such as the number of matching characters, the number of non-matching characters, or the character error tolerance ratio contained in the data, used to represent the matching status between the characters in the corresponding data and the original data.
[0143] Specifically, to improve the preset string matching rules, this embodiment detects the data to be processed to determine whether it contains data that does not conform to the preset string matching rules. When second target data that does not conform to the preset string matching rules is detected, second target sub-data that conforms to the preset character error tolerance rules is extracted from the second target data. A JSON file, i.e., a language data file, is created through the Fs module of the Node, and the second target sub-data is stored in the language data file. Then, the character error tolerance parameters corresponding to the language data file are read to update the preset string matching rules using the currently collected character error tolerance parameters, such as updating the character error tolerance threshold and character error tolerance length threshold in the preset string matching rules, to obtain the updated... The process involves updating the preset string matching rules. Then, the data to be processed is matched character-wise with the original data to obtain replacement data that conforms to the updated string matching rules. It's understood that since the updated preset string matching rules are obtained by updating the character tolerance parameters of the second target sub-data, when matching the data to be processed with the original data, the second target sub-data is usually data that conforms to the updated preset string matching rules. That is, based on the updated preset string matching rules, character replacement is performed on the second target sub-data to obtain the target sub-text. Further, all previous text replacements (such as node information replacement and character replacement) are merged, specifically merging the first text, the second text, and the target sub-text to obtain the merged target replacement text. Therefore, when performing text replacement (translation) on the content of the display page, the preset string matching rules are updated based on the character tolerance parameters of some mismatched character data to optimize the character replacement tolerance configuration, improve the matching rate and success rate of subsequent character replacements, and ensure reliability.
[0144] In some implementations, after the step "replacing the target data in the data to be processed that matches the preset string matching rules with characters", the following may be included:
[0145] When the data to be processed contains second target data that does not conform to the preset string matching rules, the second target data that does not conform to the preset string matching rules is machine translated to obtain the third text;
[0146] Then, the first text and the second text are merged to obtain the target replacement text corresponding to the text data to be replaced, including:
[0147] The first text, the second text, and the third text are merged to obtain the target replacement text corresponding to the text data to be replaced.
[0148] The machine translation method refers to directly translating or replacing data or text content using text translation tools. For example, using existing online translation tools for direct translation.
[0149] Specifically, in order to achieve full-text translation of the text data to be replaced, when the second target data that does not conform to the preset string matching rules is detected, since the second target data is data that cannot be replaced by node information and character replacement, the second target data can be machine translated to obtain the third text; further, the first text obtained by node information replacement, the second text obtained by character replacement and the third text obtained by machine translation are merged to obtain the merged target replacement text, which is the text after the text data to be replaced is replaced.
[0150] In this embodiment, to batch replace or translate specific text or all files on an application or browser's display page, firstly, based on a preset service language environment, the text data to be replaced under the corresponding item is read; then, the text data to be replaced is parsed to obtain multiple node objects, and each node object is matched according to node size rules to determine whether the corresponding node object is parsed successfully based on the node length, and the node information of the successfully parsed target node objects is replaced to obtain the corresponding first text; next, for the node objects to be processed that fail to be parsed, the data to be processed of the node objects to be processed is obtained, and the target data in the data to be processed that conforms to the preset string matching rules is replaced with characters to obtain the second text; in addition, for the second target data in the data to be processed that does not conform to the preset string matching rules, the second target data is replaced by machine translation to obtain the third text; finally, the texts obtained by the above multiple replacements are merged to obtain the target replacement text. By combining multiple matching and replacement methods, the text data to be replaced is filtered layer by layer, enabling batch character replacement of all files or specific files and data on the displayed page, thus improving the matching efficiency of text replacement. Furthermore, it avoids direct machine translation of all text data to be replaced, reducing the limitations of machine translation and making the target replacement text more aligned with the cultural habits of each language. Therefore, it improves the success rate and efficiency of text replacement.
[0151] As can be seen from the above, the embodiments of this application can read the text data to be replaced from the display page based on a preset service language environment; perform node recognition on the text data to be replaced to obtain multiple node objects; replace the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; when it is detected that the multiple node objects contain a node object to be processed that does not conform to the preset node size rules, obtain the data to be processed corresponding to the node object to be processed, and replace the target data that conforms to the preset string matching rules in the data to be processed with characters to obtain the second text; merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. Thus, the embodiments of this application, by performing node recognition on the text data to be replaced, facilitate the replacement of node information of node objects that meet the requirements. Furthermore, when there are some node objects that do not meet the requirements, character matching and replacement are performed on the data of the node object, and the text obtained by node information replacement and character replacement are merged to obtain the target replacement text of the text data to be replaced. In this way, the phenomenon of backtracking to the corresponding character due to character matching failure is avoided, effectively improving the matching efficiency. Moreover, by combining node matching and character matching, the success rate of text replacement and the efficiency of text replacement are improved.
[0152] Based on the method described in the above embodiments, the following examples will provide further detailed explanations.
[0153] This application takes text replacement as an example to further describe the text replacement method provided in this application.
[0154] See Figure 3 , Figure 3 This is a schematic diagram of another step in the text replacement method provided in the embodiments of this application. Figure 4 This is a schematic diagram illustrating a scenario of the text replacement method provided in this application embodiment; for ease of understanding, please refer to it in conjunction with... Figure 3 and Figure 4 The embodiments of this application will be described below.
[0155] In this embodiment, the description will focus on a text replacement device, which can be integrated into a computer device such as a server. When the processor on the server executes the program corresponding to the text replacement method, the specific flow of the text replacement method is as follows:
[0156] 201. Based on the preset service language environment, read the text data to be replaced from the display page.
[0157] The preset service language environment can be the scripting language environment on the electronic device side where the control command service is deployed. This environment is used to determine and respond to relevant language control commands to achieve command interaction. For example, if the server is an electronic device and JavaScript is used as the server's scripting language, then the server's language environment can be a Node.js environment, which is the aforementioned service language environment.
[0158] The display page can be an application or browser page, and it may contain content to be replaced. For example, the display page may contain English content to be replaced (translated) into Chinese, and it may also contain content to be replaced in other languages.
[0159] The text data to be replaced can be the text data displayed on the page that needs to be replaced or translated. This text data can be in Hypertext Markup Language (HTML) format or Lightweight Data Interaction (JSON) format. It should be noted that this text data to be replaced can include source text, translated text, and elementXpath data.
[0160] Specifically, to determine the text data that needs to be replaced, this embodiment of the application can receive the text parameters to be replaced set under the command interaction service based on the preset service language environment, so as to determine the text data to be replaced. Specifically, the project to be replaced corresponding to the project identifier is selected from the display page of the application or browser, and the target format text of the corresponding text format is extracted from the project to be replaced. Then, the target type data of the corresponding data type is read from the target format file, and the language type is configured for the target type data to obtain the text data to be replaced.
[0161] For example, a service language environment (Node.js environment) can be pre-installed on the server. The command-line interaction service (inquirer) module of this service language environment (Node.js environment) can be used to set the text data to be replaced that the user needs to input, including: project identifier, target file format, target data type, and language. The project identifier can be a project address. Based on this project identifier, the project address in the current directory can be read. After selection, leveraging Node's powerful streaming capabilities, all files in the selected project can be read. The file format can be based on all specified file formats in the current project; the default is generally HTML, but JSON can be configured directly if needed. The target data type refers to data of a specific data type in the data file of the target file format. For example, when reading a user-specified JSON data source, the data source type includes source text (sourceText), translated text (translateText), and elementXpath data. It should be noted that when reading data of the target data type, the corresponding target data type can also be read through the address of the data source. The language type can be the language type of the data to be replaced in the text data. By selecting the language type configuration, the text data of the corresponding language type can be replaced, such as English, Japanese, etc.
[0162] 202. Perform node recognition on the text data to be replaced to obtain multiple node objects.
[0163] The node object can be the node corresponding to the relevant element on the display page. For example, a display page can contain multiple elements of different types, each element corresponding to a node, such as a display page containing a title, text, elements, attributes, etc.
[0164] It's important to note that the Document Object Model (DOM) connects web pages with scripts or programming languages to model HTML or XML documents as objects. It's worth noting that this programming language can refer to JavaScript, but the objects are not part of the JavaScript language itself. The DOM represents a document using a logical tree, where each branch of the tree is a node, and each node contains objects.
[0165] To identify the node objects corresponding to the text data to be replaced, this embodiment of the application, after obtaining the text data to be replaced, reads the element node data in the specified text data to be replaced; based on the preset document information search language in the server node, it parses the read element node data to obtain the node object corresponding to each element node data; then, based on the node object corresponding to each element node data, it obtains multiple node objects corresponding to the text data to be replaced. In this way, multiple node objects to be matched are obtained, so that subsequent matching and replacement of these multiple node objects can be performed to obtain the corresponding text.
[0166] For example, read the element node (elementXpath) data from the specified text data to be replaced (JSON data source), parse the element node data using the XML Path Language (Xpath) module in the server (Node), and obtain the DOM object.
[0167] 203. Replace the node information of the target node object that conforms to the preset node size rules among multiple node objects to obtain the first text.
[0168] The preset node size rule can be a rule that limits the replacement of node objects, used to match node objects that can be replaced with text data. For example, the preset node size rule can include a node length threshold. By matching node objects with the preset node length threshold, node objects that meet the node length threshold are obtained, so that text replacement can be performed on the target node objects that meet the node length threshold in the future.
[0169] In order to perform text replacement on node objects, after obtaining multiple node objects, this embodiment of the application can obtain the path information corresponding to each node object; and determine the node length of the corresponding node object based on the path information; then, determine whether the corresponding node object is successfully parsed based on the node length. If the parsing is successful, the text data of the successfully parsed target node object is replaced. Specifically, a target node object with a node length greater than a preset node length threshold is selected, and the node information of the target node object is replaced to obtain the corresponding first text.
[0170] For example, by performing path retrieval on the information of each node object (XPath node information), a set of child nodes (ChildNodes) is obtained for each node object. The length of the corresponding node object is determined by this set of child nodes (ChildNodes), and this node length determines whether the current node object has been successfully parsed. If parsing is successful, it meets a preset node length threshold. For instance, if the length of the child node set (ChildNodes) is greater than 0, it means that the length of the node object meets the preset node length threshold, and text replacement can be performed on the target node object.
[0171] Furthermore, the step "replacing text on the target node object to obtain the first text" can be performed as follows: based on the document information search language, perform path retrieval on each target node object to obtain the set of target child nodes corresponding to each target node object; based on node value rules, obtain the target node value corresponding to each set of target child nodes; and generate the first text based on the target node value corresponding to each set of target child nodes. For example, the `select` method provided in the `npm xpath` module can be used to perform path retrieval on each target node object to obtain the set of target child nodes corresponding to each target node object; the `nodeValue` property can be used to obtain the target node value of each set of target child nodes, so as to generate the corresponding first text based on each target node value.
[0172] 204. When multiple node objects are detected to contain node objects that do not conform to the preset node size rules, obtain the data to be processed corresponding to the node objects to be processed, and replace the target data in the data to be processed that conforms to the preset string matching rules with characters to obtain the second text.
[0173] The preset string matching rule can be a rule that limits the data that can be replaced by characters, and is used to match the data to be replaced. For example, the preset string matching rule can include a character error tolerance threshold and a character error tolerance length threshold. By selecting target data that meets the character error tolerance threshold and the character error tolerance length threshold, it is easier to perform character replacement on the target data in the future.
[0174] It should be noted that the embodiments of this application can perform text replacement through methods such as node object matching and replacement, and character matching and replacement. First, multiple node objects of the text data to be replaced are obtained, and target node objects that conform to preset node size rules are matched and replaced to improve the effectiveness and accuracy of text replacement. However, for node objects that do not conform to the preset node size rules, the corresponding data to be processed can be obtained, and character replacement can be performed on the corresponding data to be processed to improve the success rate of regular characters in text replacement.
[0175] Specifically, to replace text in node objects that do not conform to preset node size rules, this embodiment detects each node object based on the preset node size rules. When multiple node objects are detected that do not conform to the preset node size rules, these node objects are identified as node objects to be processed. Then, the data to be processed corresponding to the node objects to be processed, and the original text data in the text data to be replaced are obtained. The characters (or character segments) of the data to be processed are compared with the target characters (or target character segments) in the original text data to obtain the character comparison results between the data to be processed and the original text data. Finally, based on the character comparison results, the target data in the data to be processed that conforms to the preset string matching rules is determined, and the characters in the target data are replaced.
[0176] In some implementations, the process of "determining target data in the data to be processed that conforms to preset string matching rules based on character comparison results" can be as follows: parse the character comparison results to obtain the character matching degree between the data to be processed and the original data; extract first data from the data to be processed whose character matching degree is greater than or equal to the character error tolerance threshold; obtain a first sub-data segment in the first data that matches the original data, and obtain a second sub-data segment in the first data that does not match the original data; determine the second sub-data segment whose character length is less than or equal to the character error tolerance length threshold as the error tolerance data segment, and merge the error tolerance data segment with the first sub-data segment to obtain the target data.
[0177] Specifically, in order to determine the target data in the data to be processed that conforms to the preset string matching rules, after obtaining the character comparison result between the data to be processed and the original data, this embodiment of the application analyzes the character comparison result to obtain the character matching degree between the characters contained in the data to be processed and the original data; and extracts data from the data to be processed whose character matching degree is greater than or equal to the character error tolerance threshold, namely the first data mentioned above.
[0178] It should be noted that, since the first data is character data that meets the character error tolerance threshold, it contains characters that match and do not match the original data. That is, the first data contains a first sub-data segment that matches the original data and a second sub-data segment that does not match the original data. In order to ensure that the character data in the unmatched second sub-data segment can be replaced later, this embodiment of the application needs to filter the second sub-data segment. Specifically, the second sub-data segment with a character length less than or equal to the character error tolerance length threshold is determined as the error-tolerant data segment. It can be understood that the error-tolerant data segment is a character data segment that can meet the string matching rules and can be replaced later. Further, the first sub-data segment and the error-tolerant data segment are merged to obtain the target data to be replaced.
[0179] Using the above method, when a node object to be processed is detected that does not conform to the preset node size rules, target data that conforms to the preset string matching rules can be matched from the data to be processed of the node object. This allows for subsequent character replacement of the target data to obtain the corresponding replacement text, i.e., the second text.
[0180] 205. When the data to be processed is detected to contain second target data that does not conform to the preset string matching rules, the second target data that does not conform to the preset string matching rules is machine translated to obtain the third text.
[0181] The machine translation method refers to directly translating or replacing data or text content using text translation tools. For example, using existing online translation tools for direct translation.
[0182] It should be noted that, when performing text replacement, firstly, the node information of the target node objects in the text data to be replaced that conform to the preset node size rules is replaced; then, for the node objects to be processed that do not conform to the preset node size rules, the characters of the target data in the corresponding data to be processed that conform to the preset string matching rules are replaced; finally, when a second target data that does not conform to the preset string matching rules is detected in the data to be processed, the second target data is machine translated to obtain the third text.
[0183] Specifically, in order to achieve full-text translation of the text data to be replaced, when the second target data that does not conform to the preset string matching rules is detected, since the second target data is data that cannot be replaced by node information and character replacement, the second target data can be machine translated to obtain the third text.
[0184] 206. Merge the first text, the second text, and the third text to obtain the target replacement text corresponding to the text data to be replaced.
[0185] Specifically, in order to obtain the replacement text corresponding to the text data to be replaced, this embodiment of the application merges the first text obtained by replacing node information, the second text obtained by replacing characters, and the third text obtained by machine translation after replacing the first text obtained by replacing node information, the second text obtained by replacing characters, and the third text obtained by machine translation to obtain the merged target replacement text. The target replacement text is the text after the text data to be replaced has been replaced.
[0186] See Figure 4 This is a schematic diagram illustrating a scenario of the text replacement method provided in this application embodiment. By performing the above steps 201-206, the following can be achieved: Figure 4 The scene shown is specific. Figure 4The scenario shown is as follows:
[0187] 301. Command interaction service based on preset service language environment, reads the text data to be replaced through the server's stream channel.
[0188] Specifically, the inquirer command-line interaction module in the Node.js environment allows users to set the data they need to pass in, including: selecting the project, entering the data format to be parsed, entering the data source address to be read, and selecting the language, among other configuration information.
[0189] The "Select Project" option reads the project address in the current directory. Once selected, Node's powerful streaming capabilities allow it to read all files in the selected project.
[0190] The input field specifies the data format to be parsed: it will read all specified file formats in the current project; the default is HTML format files. If you need to read JSON format files, you can configure it directly.
[0191] The input field specifies the data format to be parsed: it will read all specified file formats in the current project; the default is HTML, but if you need to read JSON, you can configure it directly.
[0192] Among them, the input is the address of the data source to be read: the JSON data source specified by the user will be read. The data source format includes: source text data (sourceText), translated text data (translateText), XPath node (elementXpath), etc. It should be noted that the XPath node (elementXpath) refers to the element node data in this application. When the data is translated by the web annotation tool, the XPath node path will be generated. The corresponding DOM object can be found quickly by XPath path matching; for example: / A / B / C / B[1] means the first B element of A element → B element → C element.
[0193] Among them, language selection: the tool will provide several commonly used languages, including English (EN) and Japanese (JP).
[0194] 302. Start performing node object information matching and replacement.
[0195] Read the elementXpath data from the specified JSON data source, parse the data using the Xpath module in Node, and obtain the DOM object. For example, when translating using a web annotation tool, an Xpath node path will be generated at the same time. By matching the Xpath path, the corresponding DOM object can be found quickly; for example, / A / B / C / B[1] means the first B element of element A → element B → element C.
[0196] Furthermore, the success of XPath parsing is determined by the length of the currently obtained DOM object. For example, path retrieval is performed using XPath node information. After retrieval, a collection of child nodes (ChildNodes) is returned. If the length of ChildNodes is greater than 0, the retrieval is successful. If successful, data replacement is performed. This replacement process involves: using the `select` method provided by the `npm xpath` module to perform path retrieval, which returns a collection of child nodes. The `nodeValue` property is used to obtain the target node value of each target child node collection. Based on each target node value, a corresponding first text is generated, and the relevant data of this first text is recorded. If the retrieval fails, the failed matching data is stored.
[0197] 303. Perform character matching and replacement on data that fails to match.
[0198] The Knuth-Morris-Pratt (KMP) matching algorithm is an efficient pattern matching algorithm that calculates the number of bits for character matching shifts using partial matching values, thereby improving matching efficiency.
[0199] For example, if the XPath matching method fails to parse the node information in the data, the program will record the data of successful and unsuccessful node parsing. This embodiment of the application can perform KMP character matching on the failed matching data (i.e., the data to be processed in this embodiment). Specifically, based on KMP matching rules, the original page data corresponding to the failed matching data is obtained through the matching algorithm (Knuth-Morris-Pratt, KMP). The original page data obtained by KMP is compared with the original text data (sourceText) in the data source. Data that conforms to the preset string matching rules (regular matching and fault-tolerant configuration) is defined as successfully matched data (i.e., the target data in this embodiment), and the successfully matched data is then replaced with new characters.
[0200] Furthermore, during the matching process, data that fails to match characters is recorded for subsequent machine translation. Meanwhile, data that meets the preset character error tolerance rules (i.e., the second target sub-data in this embodiment) among the data that fails to match characters can be used for subsequent data analysis, and the analysis results can be used to optimize the preset string matching rules.
[0201] It should be noted that before character matching, the matching mechanism can be customized according to the user's selected language. This matching mechanism includes: preset string matching rules (regular matching, error tolerance configuration) and preset character error tolerance rules (character analysis configuration for character matching failure data); enabling it to perform regular matching while supporting error tolerance matching, thereby improving the matching success rate. At the same time, after the matching is completed, valid character matching failure data is extracted for subsequent analysis.
[0202] In summary, during the character replacement process, the data can include successfully matched characters, failed matches, and matches that failed but are valuable for analysis. A JSON log file is created using the Fs module of the server node, and the above data is stored in this log file. This allows Hou Qi to analyze the data manually or by machine, and then use the analysis results to optimize the fault tolerance configuration for character matching, i.e., optimize the preset string matching rules. In this way, the reasons for the failure of matches that failed but are valuable for analysis can be analyzed, and the custom matching mechanism can be continuously optimized to improve the matching success rate.
[0203] 304. Perform machine translation on data where character matching fails, in order to complete the matching and replacement of all text data to be replaced.
[0204] Specifically, for character matching failure data, this application embodiment uses machine translation to directly translate the character matching failure data, such as using an online translation function or an installed translation tool, to obtain the corresponding replacement text.
[0205] By executing the above steps 301-304, batch character replacement can be performed on all files under the corresponding items on the displayed page, reducing the workload of manually modifying and replacing content and making the direct translation of content easier. Furthermore, multiple matching technologies are used for layered filtering, and a fault tolerance configuration entry is provided, allowing for customization of the fault tolerance mechanism, thus improving the success rate and efficiency of character matching. Therefore, it not only performs simple matching and replacement work, but also collects successful and failed data during the character matching process and generates log files, allowing users to more intuitively view data items.
[0206] As can be seen from the above, this application embodiment can read the text data to be replaced from the display page based on a preset service language environment; perform node recognition on the text data to be replaced to obtain multiple node objects; replace the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; when it is detected that the multiple node objects contain a node object to be processed that does not conform to the preset node size rules, obtain the data to be processed corresponding to the node object to be processed, and replace the target data that conforms to the preset string matching rules in the data to be processed with characters to obtain the second text; merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. Thus, this application embodiment, by performing node recognition on the text data to be replaced, facilitates the replacement of node information of node objects that meet the requirements. Furthermore, when there are some node objects that do not meet the requirements, character matching and replacement are performed on the data of the node object, and the text obtained by node information replacement and character replacement are merged to obtain the target replacement text of the text data to be replaced. In this way, the phenomenon of backtracking to the corresponding character due to character matching failure is avoided, effectively improving the matching efficiency. Moreover, by combining node matching and character matching, the success rate of text replacement and the efficiency of text replacement are improved.
[0207] To better implement the above methods, this application also provides a text replacement device, which can be integrated into a network device, such as a server or terminal. The terminal may include a tablet computer, a laptop computer, and / or a personal computer.
[0208] For example, such as Figure 5 As shown, the text replacement device may include a reading unit 501, a recognition unit 502, a first replacement unit 503, a second replacement unit 504, and a fusion unit 505.
[0209] The reading unit 501 is used to read the text data to be replaced from the display page based on the preset service language environment;
[0210] The recognition unit 502 is used to perform node recognition on the text data to be replaced, and obtain multiple node objects;
[0211] The first replacement unit 503 is used to replace the node information of a target node object that conforms to a preset node size rule among multiple node objects to obtain the first text.
[0212] The second replacement unit 504 is used to obtain the data to be processed corresponding to the node object to be processed when multiple node objects are detected to contain a node object to be processed that does not conform to the preset node size rule, and replace the target data in the data to be processed that conforms to the preset string matching rule with characters to obtain the second text.
[0213] The fusion unit 505 is used to fuse the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0214] In some embodiments, the identification unit 502 is further configured to: read multiple element node data from the text data to be replaced; parse each element node data based on a preset document information search language to obtain the node object corresponding to each element node data; and determine multiple node objects to be replaced based on the node object corresponding to each element node data.
[0215] In some embodiments, the preset node size rule includes a preset node length threshold. The first replacement unit 503 is further configured to: obtain path information corresponding to each node object among multiple node objects; determine the node length of each node object based on the path information; determine the node object whose node length is greater than the preset node length threshold as the target node object, and replace the node information of the target node object to obtain the first text.
[0216] In some embodiments, the second replacement unit 504 is further configured to: obtain the original text data in the text data to be replaced; compare the data to be processed with the original text data character by character to obtain a character comparison result; determine the target data in the data to be processed that conforms to a preset string matching rule based on the character comparison result; and perform character replacement on the target data.
[0217] In some embodiments, the second replacement unit 504 is further configured to: parse the character comparison results to obtain the character matching degree between the data to be processed and the original data; extract first data from the data to be processed whose character matching degree is greater than or equal to the character error tolerance threshold; obtain a first sub-data segment in the first data that matches the original data, and obtain a second sub-data segment in the first data that does not match the original data; determine the second sub-data segment whose character length is less than or equal to the character error tolerance length threshold as the error tolerance data segment, and merge the error tolerance data segment with the first sub-data segment to obtain the target data.
[0218] In some embodiments, the text replacement device further includes a third replacement unit, configured to: when it is detected that the data to be processed contains second target data that does not conform to a preset string matching rule, perform machine translation on the second target data that does not conform to the preset string matching rule to obtain third text.
[0219] The fusion unit 505 is further used to: fuse the first text, the second text and the third text to obtain the target replacement text corresponding to the text data to be replaced.
[0220] In some embodiments, the reading unit 501 is further configured to: receive text parameters to be replaced based on a preset service language environment, the text parameters to be replaced including project identifier, file format, target data type and language type; obtain the project to be converted corresponding to the project identifier in the display page; extract the target format file corresponding to the file format in the project to be converted; read the target type data corresponding to the target data type in the target format file; configure the target type data according to the language type to obtain the text data to be replaced.
[0221] In some embodiments, the text replacement device further includes an update unit, specifically configured to: when it is detected that the data to be processed contains second target data that does not conform to a preset string matching rule, extract second target sub-data that conforms to a preset character error tolerance rule from the second target data, and establish a language data file corresponding to the second target sub-data; read the character error tolerance parameters in the language data file; update the preset string matching rule according to the character error tolerance parameters to obtain the updated preset string matching rule; and perform character replacement on the second target sub-data based on the updated preset string matching rule to obtain the target sub-text.
[0222] The fusion unit 505 is also used to fuse the first text, the second text, and the target sub-text to obtain the target replacement text.
[0223] As can be seen from the above, in this embodiment of the application, the reading unit 501 reads the text data to be replaced from the display page based on a preset service language environment; the identification unit 502 performs node identification on the text data to be replaced to obtain multiple node objects; the first replacement unit 503 replaces the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; the second replacement unit 504 is used to obtain the data to be processed corresponding to the node object to be processed when it detects that the multiple node objects contain a node object to be processed that does not conform to the preset node size rules, and replaces the target data that conforms to the preset string matching rules in the data to be processed with characters to obtain the second text; the fusion unit 505 fuses the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. Therefore, this application embodiment uses node recognition to identify the text data to be replaced, so as to replace the node information of the node objects that meet the requirements. Then, when there are some node objects that do not meet the requirements, the data of the node object is replaced by character matching, and the text obtained by node information replacement and character replacement is merged to obtain the target replacement text of the text data to be replaced. In this way, the phenomenon of backtracking to the corresponding character due to character matching failure is avoided, effectively improving the matching efficiency. Furthermore, by combining node matching and character matching, the success rate of text replacement and the efficiency of text replacement are improved.
[0224] This application also provides a computer device, such as... Figure 6 As shown, it illustrates a structural schematic diagram of the computer device involved in the embodiments of this application, specifically:
[0225] The computer device may include components such as a processor 601 with one or more processing cores, a memory 602 with one or more computer-readable storage media, a power supply 603, and an input unit 604. Those skilled in the art will understand that... Figure 6 The computer device structure shown does not constitute a limitation on the computer device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:
[0226] The processor 601 is the control center of the computer device. It connects various parts of the computer device via various interfaces and lines. By running or executing software programs and / or modules stored in the memory 602, and by calling data stored in the memory 602, it performs various functions of the computer device and processes data, thereby performing overall detection of the computer device. Optionally, the processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may also not be integrated into the processor 601.
[0227] The memory 602 can be used to store software programs and modules. The processor 601 executes various functional applications and data processing by running the software programs and modules stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the computer device, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide the processor 601 with access to the memory 602.
[0228] The computer device also includes a power supply 603 that supplies power to the various components. Preferably, the power supply 603 can be logically connected to the processor 601 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 603 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0229] The computer device may also include an input unit 604, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
[0230] Although not shown, the computer device may also include a display unit, etc., which will not be described in detail here. Specifically, in the embodiments of this application, the processor 601 in the computer device loads the executable files corresponding to the processes of one or more applications into the memory 602 according to the following instructions, and the processor 601 runs the applications stored in the memory 602 to realize various functions, as follows:
[0231] Based on a preset service language environment, the system reads the text data to be replaced from the display page; performs node recognition on the text data to be replaced to obtain multiple node objects; replaces the node information of target node objects that conform to preset node size rules among the multiple node objects to obtain the first text; when multiple node objects are detected to contain a node object to be processed that does not conform to preset node size rules, the system obtains the data to be processed corresponding to the node object to be processed, and replaces the target data that conforms to preset string matching rules in the data to be processed with characters to obtain the second text; and merges the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0232] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0233] As can be seen from the above, this application embodiment can read the text data to be replaced from the display page based on a preset service language environment; perform node recognition on the text data to be replaced to obtain multiple node objects; replace the node information of the target node objects that conform to the preset node size rules among the multiple node objects to obtain the first text; when it is detected that the multiple node objects contain a node object to be processed that does not conform to the preset node size rules, obtain the data to be processed corresponding to the node object to be processed, and replace the target data that conforms to the preset string matching rules in the data to be processed with characters to obtain the second text; merge the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced. Thus, this application embodiment, by performing node recognition on the text data to be replaced, facilitates the replacement of node information of node objects that meet the requirements. Furthermore, when there are some node objects that do not meet the requirements, character matching and replacement are performed on the data of the node object, and the text obtained by node information replacement and character replacement are merged to obtain the target replacement text of the text data to be replaced. In this way, the phenomenon of backtracking to the corresponding character due to character matching failure is avoided, effectively improving the matching efficiency. Moreover, by combining node matching and character matching, the success rate of text replacement and the efficiency of text replacement are improved.
[0234] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.
[0235] Therefore, embodiments of this application provide a computer-readable storage medium storing a plurality of instructions that can be loaded by a processor to execute steps in any of the text replacement methods provided in embodiments of this application. For example, the instructions can execute the following steps:
[0236] Based on a preset service language environment, the system reads the text data to be replaced from the display page; performs node recognition on the text data to be replaced to obtain multiple node objects; replaces the node information of target node objects that conform to preset node size rules among the multiple node objects to obtain the first text; when multiple node objects are detected to contain a node object to be processed that does not conform to preset node size rules, the system obtains the data to be processed corresponding to the node object to be processed, and replaces the target data that conforms to preset string matching rules in the data to be processed with characters to obtain the second text; and merges the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0237] Furthermore, embodiments of this application also provide a computer program product, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and by executing the computer instructions, performs the following steps:
[0238] Based on a preset service language environment, the system reads the text data to be replaced from the display page; performs node recognition on the text data to be replaced to obtain multiple node objects; replaces the node information of target node objects that conform to preset node size rules among the multiple node objects to obtain the first text; when multiple node objects are detected to contain a node object to be processed that does not conform to preset node size rules, the system obtains the data to be processed corresponding to the node object to be processed, and replaces the target data that conforms to preset string matching rules in the data to be processed with characters to obtain the second text; and merges the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced.
[0239] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0240] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0241] Since the instructions stored in the computer-readable storage medium can execute the steps of any of the text replacement methods provided in the embodiments of this application, the beneficial effects that any of the text replacement methods provided in the embodiments of this application can achieve can be realized, as detailed in the preceding embodiments, and will not be repeated here.
[0242] The foregoing has provided a detailed description of a text replacement method, apparatus, and computer-readable storage medium provided in the embodiments of this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A text replacement method, characterized in that, include: Based on the preset service language environment, the text data to be replaced is read from the display page; Node identification is performed on the text data to be replaced to obtain multiple node objects; Replace the node information of the target node object that conforms to the preset node size rules among the multiple node objects to obtain the first text; When it is detected that the plurality of node objects contain a node object to be processed that does not conform to the preset node size rule, the data to be processed corresponding to the node object to be processed is obtained, and the target data in the data to be processed that conforms to the preset string matching rule is replaced with characters to obtain the second text. When the data to be processed is detected to contain second target data that does not conform to the preset string matching rules, the second target data that does not conform to the preset string matching rules is machine translated to obtain the third text; The process of fusing the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced includes: fusing the first text, the second text, and the third text to obtain the target replacement text corresponding to the text data to be replaced.
2. The method according to claim 1, characterized in that, The process of identifying nodes in the text data to be replaced yields multiple node objects, including: Read multiple element node data from the text data to be replaced; Based on a preset document information search language, the data of each element node is parsed to obtain the node object corresponding to each element node data. Based on the node object corresponding to each element node data, determine multiple node objects to be replaced.
3. The method according to claim 1, characterized in that, The preset node size rule includes a preset node length threshold. The step of replacing the node information of target node objects that conform to the preset node size rule among the plurality of node objects to obtain the first text includes: Obtain the path information corresponding to each of the plurality of node objects; The node length of each node object is determined based on the path information; The node objects whose node length is greater than the preset node length threshold are identified as target node objects, and the node information of the target node objects is replaced to obtain the first text.
4. The method according to claim 1, characterized in that, The step of replacing characters in the target data that matches the preset string matching rules in the data to be processed includes: Obtain the original text data from the text data to be replaced; The data to be processed is compared character by character with the original data to obtain the character comparison result; Based on the character comparison results, target data that conforms to the preset string matching rules in the data to be processed is determined; The target data is then subjected to character replacement.
5. The method according to claim 4, characterized in that, The preset string matching rules include a character error tolerance threshold and a character error tolerance length threshold. The step of determining the target data in the data to be processed that conforms to the preset string matching rules based on the comparison results includes: The character comparison results are parsed to obtain the character matching degree between the data to be processed and the original data; Extract the first data from the data to be processed, where the character matching degree is greater than or equal to the character error tolerance threshold; Obtain a first sub-data segment in the first data that matches the original data, and obtain a second sub-data segment in the first data that does not match the original data; The second sub-data segment whose character length is less than or equal to the character fault tolerance length threshold is determined as the fault-tolerant data segment, and the fault-tolerant data segment is merged with the first sub-data segment to obtain the target data.
6. The method according to claim 1, characterized in that, The process of reading the text data to be replaced from the display page based on the preset service language environment includes: Based on a preset service language environment, the system receives text parameters to be replaced, which include project identifier, file format, target data type, and language type. Retrieve the project to be converted corresponding to the project identifier displayed on the page; Extract the target format file corresponding to the file format in the project to be converted; Read target type data from the target format file that corresponds to the target data type; Configure the target type data according to the language type to obtain the text data to be replaced.
7. The method according to claim 1, characterized in that, After replacing the target data in the data to be processed that matches the preset string matching rules with characters, the process further includes: When it is detected that the data to be processed contains second target data that does not conform to the preset string matching rules, extract the second target sub-data that conforms to the preset character error tolerance rules from the second target data, and create a language data file corresponding to the second target sub-data; Read the character error tolerance parameters from the language data file; The preset string matching rule is updated according to the character error tolerance parameter to obtain the updated preset string matching rule; Based on the updated preset string matching rules, character replacement is performed on the second target sub-data to obtain the target sub-text; The first text, the second text, and the target subtext are merged to obtain the target replacement text.
8. A text replacement device, characterized in that, include: The reading unit is used to read the text data to be replaced from the display page based on the preset service language environment; The identification unit is used to identify nodes in the text data to be replaced, and obtain multiple node objects; The first replacement unit is used to replace the node information of the target node object that conforms to the preset node size rules among the plurality of node objects to obtain the first text. The second replacement unit is used to obtain the data to be processed corresponding to the node object to be processed when it is detected that the plurality of node objects contain a node object to be processed that does not conform to the preset node size rule, and to replace the target data in the data to be processed that conforms to the preset string matching rule with characters to obtain the second text. The third replacement unit is used to perform machine translation on the second target data that does not conform to the preset string matching rule when it is detected that the data to be processed contains second target data that does not conform to the preset string matching rule, so as to obtain the third text. The fusion unit is used to fuse the first text and the second text to obtain the target replacement text corresponding to the text data to be replaced, including: fusing the first text, the second text and the third text to obtain the target replacement text corresponding to the text data to be replaced.
9. The apparatus according to claim 8, characterized in that, The identification unit is further configured to: Read multiple element node data from the text data to be replaced; Based on a preset document information search language, the data of each element node is parsed to obtain the node object corresponding to each element node data. Based on the node object corresponding to each element node data, determine multiple node objects to be replaced.
10. The apparatus according to claim 8, characterized in that, The preset node size rule includes a preset node length threshold, and the first replacement unit is further configured to: Obtain the path information corresponding to each of the plurality of node objects; The node length of each node object is determined based on the path information; The node objects whose node length is greater than the preset node length threshold are identified as target node objects, and the node information of the target node objects is replaced to obtain the first text.
11. The apparatus according to claim 8, characterized in that, The second replacement unit is further configured to: Obtain the original text data from the text data to be replaced; The data to be processed is compared character by character with the original data to obtain the character comparison result; Based on the character comparison results, target data that conforms to the preset string matching rules in the data to be processed is determined; The target data is then subjected to character replacement.
12. The apparatus according to claim 11, characterized in that, The second replacement unit is further configured to: The character comparison results are parsed to obtain the character matching degree between the data to be processed and the original data; Extract the first data from the data to be processed, where the character matching degree is greater than or equal to the character error tolerance threshold; Obtain a first sub-data segment in the first data that matches the original data, and obtain a second sub-data segment in the first data that does not match the original data; The second sub-data segment whose character length is less than or equal to the character fault tolerance length threshold is determined as the fault-tolerant data segment, and the fault-tolerant data segment is merged with the first sub-data segment to obtain the target data.
13. The apparatus according to claim 8, characterized in that, The reading unit is also used for: Based on a preset service language environment, the system receives text parameters to be replaced, which include project identifier, file format, target data type, and language type. Retrieve the project to be converted corresponding to the project identifier displayed on the page; Extract the target format file corresponding to the file format in the project to be converted; Read target type data from the target format file that corresponds to the target data type; Configure the target type data according to the language type to obtain the text data to be replaced.
14. The apparatus according to claim 8, characterized in that, The text replacement device further includes an update unit for: When it is detected that the data to be processed contains second target data that does not conform to the preset string matching rules, extract the second target sub-data that conforms to the preset character error tolerance rules from the second target data, and create a language data file corresponding to the second target sub-data; Read the character error tolerance parameters from the language data file; The preset string matching rule is updated according to the character error tolerance parameter to obtain the updated preset string matching rule; Based on the updated preset string matching rules, character replacement is performed on the second target sub-data to obtain the target sub-text; The first text, the second text, and the target subtext are merged to obtain the target replacement text.
15. A computer-readable storage medium, characterized in that, The computer-readable storage medium is computer-readable and stores a plurality of instructions adapted for loading by a processor to perform the steps of the text replacement method according to any one of claims 1 to 7.
16. A computer device, characterized in that, It includes a processor and a memory, the memory storing an application program, and the processor running the application program in the memory to implement the steps in the text replacement method as described in any one of claims 1 to 7.
17. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer-readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the steps of the text replacement method according to any one of claims 1 to 7.