Document conversion method, device, apparatus and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- AGRICULTURAL BANK OF CHINA
- Filing Date
- 2022-11-14
- Publication Date
- 2026-06-26
Smart Images

Figure CN115713063B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of language processing technology, and in particular to a document conversion method, apparatus, device, and storage medium. Background Technology
[0002] In recent years, research on information accessibility has received increasing attention. Information accessibility refers to the ability of anyone, under any circumstances, to access and utilize information equally and without barriers. Currently, due to the relative scarcity of Braille materials, blind people still face significant difficulties in accessing information. Therefore, to increase Braille materials, it is necessary to convert ordinary documents into Braille documents.
[0003] In existing technologies, ordinary documents are converted into Braille documents using a single type of translation system. When a document contains multiple types of text, multiple translation systems are first manually established. Then, different types of text are separated from the document and manually entered into the different translation systems. Finally, the texts converted from the different translation systems are merged manually.
[0004] However, the inventors discovered that the existing technology has at least the following technical problems: when a document contains multiple types of text, each type of text needs to be processed manually, which is time-consuming and labor-intensive. Therefore, the efficiency of document conversion using the methods in the existing technology is low. Summary of the Invention
[0005] This application provides a document conversion method, apparatus, device, and storage medium that can improve the efficiency of document conversion.
[0006] Firstly, this application provides a document conversion method, including:
[0007] Obtain the document content of the first document to be converted, wherein the document content includes multiple sorted characters;
[0008] The character type of each character is determined sequentially, starting from the first character among the plurality of characters;
[0009] Based on the character type of each character, multiple texts are extracted from the first document, wherein the multiple texts have different text types;
[0010] For each text, the text is converted into Braille text using a Braille conversion model that matches the text type of the text;
[0011] Multiple Braille texts converted from multiple texts are combined into a second document.
[0012] In one possible design, the step of extracting multiple texts from the first document based on the character type of each character includes: extracting characters sequentially starting from the first character among the multiple characters; if the character type of the first character extracted this time is different from the character type of the first character, then combining the multiple characters before the first character into a first text; continuing to extract characters sequentially from the multiple characters starting from the first character; if the character type of the second character extracted this time is different from the character type of the first character, then combining the multiple characters before the second character into a second text; until the extraction of multiple characters is completed, multiple texts are obtained.
[0013] In another possible design, the text type includes Chinese, mathematical, English, and symbol types; for each text, converting the text into Braille text using a Braille conversion model that matches the text type includes: if the text type is a mathematical formula, then converting the text into Braille text using a trained deep neural network model, wherein the deep neural network model is used to input mathematical formulas in LaTeX format and output Braille text in ASCII format.
[0014] In another possible design, the method further includes: if the text type is Chinese, then the text is converted into Braille text using a Chinese Braille conversion model; if the text type is English, then the text is converted into Braille text using an English Braille conversion model; if the text type is symbolic, then the text is converted into Braille text using a symbolic Braille conversion model.
[0015] In another possible design, the training process of the deep neural network model includes: acquiring a training sample set, which includes multiple sample mathematical formulas and multiple sample Braille formulas; determining multiple input semantic vectors corresponding to the multiple sample mathematical formulas and multiple reference semantic vectors corresponding to the multiple sample Braille formulas; inputting the multiple input semantic vectors into a primary deep neural network model and outputting multiple output semantic vectors; determining a similarity parameter between the multiple output semantic vectors and the multiple reference semantic vectors; adjusting the weight parameters and bias parameters in the primary deep neural network model until the similarity parameter is less than or equal to a preset similarity, thereby obtaining a trained deep neural network model.
[0016] In another possible design, determining the multiple input semantic vectors corresponding to the multiple sample mathematical formulas and the multiple reference semantic vectors corresponding to the multiple sample Braille formulas includes: for each sample mathematical formula, dividing the multiple characters in the sample mathematical formula according to a first preset separation rule to obtain multiple first word vectors corresponding to the sample mathematical formula; for each sample Braille formula, dividing the multiple characters in the sample Braille formula according to a second preset separation rule to obtain multiple second word vectors corresponding to the sample Braille formula; combining the multiple first word vectors corresponding to each sample mathematical formula to obtain multiple input semantic vectors; and combining the multiple second word vectors corresponding to each sample Braille formula to obtain multiple reference semantic vectors.
[0017] In another possible design, the word vectors in the input semantic vector have a dimension of 64.
[0018] Secondly, this application provides a document conversion apparatus, comprising:
[0019] The acquisition module is used to acquire the document content of the first document to be converted, wherein the document content includes multiple sorted characters;
[0020] The determination module is used to determine the character type of each character sequentially, starting from the first character among the plurality of characters;
[0021] The extraction module is used to extract multiple texts from the first document based on the character type of each character, wherein multiple characters included in the same text have the same character type;
[0022] A conversion module is used to convert each text into Braille text using a Braille conversion model that matches the text's text type;
[0023] The combination module is used to combine multiple Braille texts converted from multiple texts into a second document.
[0024] Thirdly, the present invention provides an electronic device, comprising: at least one processor and a memory;
[0025] The memory stores computer-executed instructions;
[0026] The at least one processor executes computer execution instructions stored in the memory, causing the at least one processor to perform the document conversion method as described in the first aspect above.
[0027] Fourthly, the present invention provides a computer storage medium storing computer execution instructions, wherein when a processor executes the computer execution instructions, the document conversion method described in the first aspect above is implemented.
[0028] Fifthly, this application also provides a computer program product comprising a computer program stored in a computer-readable storage medium, wherein at least one processor can read the computer program from the computer-readable storage medium, and when the at least one processor executes the computer program, it implements the document conversion method described in the first aspect above.
[0029] The document conversion method, apparatus, device, and storage medium provided in this application, for a first document containing different types of text, first determines multiple texts based on character types, and then converts the multiple texts using a Braille conversion model that matches the text types. The converted texts are then combined to obtain a second document. Thus, this method can automatically convert a first document containing different types of text into a second document, which improves the efficiency of document conversion compared to manually entering different types of text into different translation systems. Attached Figure Description
[0030] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0031] Figure 1 This is a schematic diagram illustrating an application scenario of the document conversion method provided in this embodiment of the invention;
[0032] Figure 2 The flow of the document conversion method provided in the embodiments of the present invention Figure 1 ;
[0033] Figure 3 The flow of the document conversion method provided in the embodiments of the present invention Figure 2 ;
[0034] Figure 4 A flowchart illustrating the training method for a neural network model provided in an embodiment of the present invention;
[0035] Figure 5 This is a schematic diagram of the document conversion device provided in an embodiment of the present invention;
[0036] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.
[0037] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0038] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0039] The collection, storage, use, processing, transmission, provision, and disclosure of image data or user data and other information involved in the technical solution of this application all comply with the provisions of relevant laws and regulations and do not violate public order and good morals.
[0040] In recent years, research on information accessibility has received increasing attention. Information accessibility refers to the ability of anyone, under any circumstances, to access and utilize information equally and without barriers. Currently, due to the relative scarcity of Braille materials, blind people still face significant difficulties in accessing information. Therefore, to increase Braille materials, it is necessary to convert ordinary documents into Braille documents.
[0041] Currently, ordinary documents are generally converted into Braille documents using a single type of translation system. However, for documents such as mathematical or financial documents, multiple types of text often exist within the same document. When a document contains multiple types of text, multiple translation systems are first manually established. Then, the different types of text are separated from the document and manually entered into the respective translation systems. Finally, the texts converted from the different systems are merged manually.
[0042] For example, a math document might contain both Chinese text and mathematical formula text. First, a Chinese text translation system and a mathematical formula text translation system are manually established. Then, the Chinese text and mathematical formula text are separated from the math document. The Chinese text is manually entered into the Chinese text translation system to obtain a Braille text, and the mathematical formula text is manually entered into the mathematical formula text translation system to obtain another Braille text. The two Braille texts are then merged to obtain the complete converted document. This demonstrates that when a document contains multiple types of text, each type needs to be processed manually, which is time-consuming and labor-intensive, resulting in low document conversion efficiency.
[0043] To address the aforementioned technical problems, this application proposes the following technical concept: For a first document containing different types of text, multiple texts are first determined based on character type. Then, a Braille conversion model matching the text type is used to convert the multiple texts. The converted texts are then combined to obtain a second document. Thus, this method can automatically convert a first document containing different types of text into a second document, which improves the efficiency of document conversion compared to manually inputting different types of text into different translation systems.
[0044] Figure 1 This is a schematic diagram illustrating an application scenario of the system performance parameter testing method provided in this embodiment of the invention. For example... Figure 1 As shown, display terminal 101 transmits a document conversion request, carrying a regular document, to server 102 via a wireless network. Server 102 receives the document conversion request, converts the regular document into a Braille document, and returns the converted Braille document to display terminal 101 for display. Display terminal 101 can also print the Braille document to add Braille information.
[0045] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0046] This application provides a document conversion method. The method in this application can be executed by a server. Figure 2 The flow of the document conversion method provided in the embodiments of this application Figure 1 .like Figure 2 As shown, the document conversion method includes:
[0047] Step S201: Obtain the document content of the first document to be converted. The document content includes multiple sorted characters.
[0048] In this embodiment of the invention, the first document is a regular document containing normal text, such as a mathematical document, a financial document, a Chinese document, or an English document. The sorted characters in the document content are also characters sorted according to word order.
[0049] In some embodiments, the first document may be a locally stored electronic document. Accordingly, this step involves uploading the stored first document to the server, and the server obtaining the document content of the first document to be converted. In other embodiments, the first document may also be a printed document. Accordingly, this step involves performing OCR (Optical Character Recognition) scanning on the printed document to obtain an electronic document, uploading the electronic document to the server, and the server obtaining the document content of the first document to be converted. In other embodiments, the first document may also be an electronic document uploaded via an application. Accordingly, this step involves receiving the electronic document uploaded by the application, uploading the electronic document to the server, and the server obtaining the document content of the first document to be converted.
[0050] Step S202: Determine the character type of each character sequentially, starting from the first character among the multiple characters.
[0051] In this embodiment of the invention, the character type of each character can be determined by a text reading judgment function. For example, character types include Chinese characters, English characters, mathematical formula characters, and symbol characters.
[0052] Step S203: Based on the character type of each character, extract multiple texts from the first document, wherein the text types of the multiple texts include at least two types.
[0053] In this embodiment of the invention, the text types of the multiple texts include at least two of the following: Chinese character type, mathematical formula type, English character type, and symbol type. Combinations of Chinese characters form Chinese character type text, combinations of mathematical formula characters form mathematical formula text, combinations of English characters form English text, and combinations of symbol characters form symbol text. Optionally, the multiple characters included in each text may have the same character type. Accordingly, this step involves: starting from the first character of the multiple characters, combining multiple characters of the same type and in consecutive order into a single text, thus obtaining multiple texts.
[0054] Step S204: For each text, convert the text into Braille text using a Braille conversion model that matches the text type of the text.
[0055] In this embodiment of the invention, the server stores the association between text types and Braille conversion models. Accordingly, this step is as follows: for each text, based on its text type, determine the Braille conversion model matching the text type from the stored association between text types and Braille conversion models. The Braille conversion model can be a Chinese Braille conversion model, an English Braille conversion model, etc.
[0056] It should be noted that for each of the multiple texts, it can be that immediately after obtaining each text, the text is converted into a Braille text. Or it can be that after obtaining the multiple texts, each text is sequentially converted into a Braille text.
[0057] Step S205: Combine the multiple Braille texts obtained after converting the multiple texts into a second document.
[0058] In an embodiment of the present invention, this step is: sequentially combine the multiple Braille texts obtained after converting the multiple texts according to the conversion order to obtain a second document.
[0059] This application provides a method for document conversion. For a first document including different types of texts, first determine multiple texts according to the character types, and then convert the multiple texts through a Braille conversion model matching the text types, and combine them to obtain a converted second document. It can be seen that this method can automatically convert the first document including different types of texts into a second document, which improves the efficiency of document conversion compared with manually entering different types of texts into different translation systems by humans.
[0060] Figure 3 is the flow of the document conversion method provided by the embodiment of the present invention Figure 2 . In an embodiment of the present invention, on the basis of the Figure 2 embodiment provided, a detailed description is given of the specific implementation method of extracting multiple texts from the first document based on the character type of each character in S203. As Figure 3 shown, this method includes:
[0061] Step S301: Sequentially extract characters starting from the first character among the multiple characters.
[0062] In an embodiment of the present invention, the first document may include one page or multiple pages. When the first document includes one page, the first character is the character corresponding to the first row and the first column. When the first document includes multiple pages, the first character is the character corresponding to the first row and the first column of the first page.
[0063] Exemplarily, the first document is one page, and the document content of the first document is: "Bank account 001". The first character is: Bank, and the character type is Chinese type.
[0064] Step S302: If the character type of the first character extracted this time is different from the character type of the first character, combine the multiple characters before the first character into a first text.
[0065] Exemplarily, the characters extracted in sequence are: silver, bank, b. If the character type of the first character "b" extracted this time is different from that of the first character "silver", then the multiple characters "silver" and "bank" before the first character "b" are combined into the first text: bank. In this example, the starting character of the first text is "silver", and the ending character of the first text is "b".
[0066] It should be noted that the methods for determining that the character type of the first character extracted this time is different from that of the first character can be divided into the following three cases.
[0067] In the first case, the character type of the first character is a Chinese character type. If the character type of the first character extracted this time is any one of English, numbers, and symbols, then it is determined that the character type of the first character extracted this time is different from that of the first character.
[0068] In the second case, the character type of the first character is an English character type. If the character type of the first character extracted this time is any one of Chinese, numbers, and symbols, then it is determined that the character type of the first character extracted this time is different from that of the first character.
[0069] In the third case, the first character is a preset mathematical symbol. If the character type of the first character extracted this time is any one of Chinese and symbols, then it is determined that the character type of the first character extracted this time is different from that of the first character.
[0070] It should be noted that in the embodiments of the present invention, the preset mathematical symbols are not specifically limited. Optionally, the preset mathematical symbols are the mathematical symbols in LaTeX format. Exemplarily, the preset mathematical symbol is " / ".
[0071] Step S303: Continue to extract characters from the first character in sequence from multiple characters.
[0072] Exemplarily, continue to extract characters b, a, n, k, account from multiple characters starting from the first character "b".
[0073] Step S304: If the character type of the second character extracted this time is different from that of the first character, then combine the multiple characters before the second character into the second text.
[0074] Exemplarily, if the character type of the second character "account" extracted this time is different from that of the first character "b", then the multiple characters b, a, n, k before the second character "account" are combined into the second text: bank.
[0075] It should be noted that the method for determining that the character type of the second character extracted this time is different from that of the first character is the same as the method for determining that the character type of the first character extracted this time is different from that of the leading character in step S302, and will not be elaborated here.
[0076] Step S305, until the extraction of multiple characters ends, obtaining multiple texts.
[0077] Exemplarily, continue to extract characters from the second character "account" in sequence from multiple characters: account, household, 0. If the character type of the third character "0" extracted this time is different from that of the second character "account", then combine the multiple characters before the second character "0": account, household, into the third text: account household. Continue to extract characters from the third character "0" in sequence from multiple characters: 0, 0, 1. When the extraction of multiple characters ends, combine the characters 0, 0, 1 into the fourth text 001, obtaining multiple texts.
[0078] In the embodiment of the present invention, starting from the leading character in multiple characters, by recursively calling the text reading and judgment function, characters are extracted until the end of the current text, realizing the automatic division of multiple characters into multiple texts, thereby improving the efficiency of determining multiple texts.
[0079] In the embodiment of the present invention, the text type may include Chinese type, mathematical formula type, English type, and symbol type. Based on the provided embodiment, a detailed description is given of the specific implementation method for converting each text into a Braille text through a Braille conversion model matching the text type of the text in S204. This method includes: Figure 2 Step S301, if the text type of the text is a mathematical formula, then through a trained deep neural network model, the text is converted into a Braille text. The deep neural network model is used to input a LaTeX format mathematical formula and output an ASCII format Braille text.
[0080] Step S301, if the text type of the text is a mathematical formula, then through a trained deep neural network model, the text is converted into a Braille text. The deep neural network model is used to input a LaTeX format mathematical formula and output an ASCII format Braille text.
[0081] Optionally, the deep neural network model is an LSTM (Long Short-Term Memory) model integrated with an attention mechanism. Exemplarily, the deep neural network model is a Seq2Seq (Sequence-to-Sequence) model.
[0082] It's worth noting that current translation models often suffer from information loss when dealing with long input sequences, as the initial input sequence is easily overwritten by subsequent input sequences. The Seq2Seq model, however, allows the decoder to generate different encoder outputs at different decoding stages, reducing information loss in long sequences. Furthermore, the attention mechanism, an intermediate component between the encoder and decoder, determines the corresponding Braille character by analyzing the correlation between each character in the text and the current character, thereby improving the accuracy of document conversion.
[0083] Step S302: If the text type is Chinese, the text is converted to Braille text using the Chinese Braille conversion model; if the text type is English, the text is converted to Braille text using the English Braille conversion model; if the text type is symbolic, the text is converted to Braille text using the symbolic Braille conversion model.
[0084] In this embodiment of the invention, the English Braille conversion model is determined to be a Level 1 English Braille conversion model, the symbolic Braille conversion model is determined to be a Level 1 symbolic Braille conversion model, and the Chinese Braille conversion model is determined to be a Level 1 Chinese Braille conversion model by rule matching.
[0085] Because the Level 1 English Braille conversion model, Level 1 Symbol Braille conversion model, and Level 1 Chinese Braille conversion model have low complexity, the server resources occupied by the conversion model are reduced by using rule matching, thereby increasing the server's computing speed and thus improving the efficiency of document conversion.
[0086] Figure 4 This is a flowchart illustrating a training method for a deep neural network model provided in an embodiment of the present invention. The execution entity of the method in this embodiment can be a server. Figure 4 As shown, the training method for this deep neural network model includes:
[0087] Step S401: Obtain a training sample set, which includes multiple sample mathematical formulas and multiple sample Braille formulas.
[0088] In this embodiment of the invention, the multiple sample mathematical formulas and multiple sample Braille formulas can be sample data from a standard database. Each sample mathematical formula corresponds to one sample Braille formula, and the sample mathematical formulas can be mathematical formulas in LaTeX format. The sample Braille formulas can be Braille text in ASCII format.
[0089] Step S402: Determine multiple input semantic vectors corresponding to multiple sample mathematical formulas and multiple control semantic vectors corresponding to multiple sample Braille formulas.
[0090] Optionally, this step is as follows: For each sample mathematical formula, divide the multiple characters in the sample mathematical formula according to the first preset separation rule to obtain multiple first word vectors corresponding to the sample mathematical formula; for each sample Braille formula, divide the multiple characters in the sample Braille formula according to the second preset separation rule to obtain multiple second word vectors corresponding to the sample Braille formula; combine the multiple first word vectors corresponding to each sample mathematical formula to obtain multiple input semantic vectors; combine the multiple second word vectors corresponding to each sample Braille formula to obtain multiple reference semantic vectors.
[0091] The first preset separation rule separates different parts representing mathematical symbols using spaces. For example, the first preset separation rule is as follows: if a character in a mathematical formula represents an operand, spaces are added to its left and right. If a character in a mathematical formula represents an operator, spaces are added to its left and right. If a character in a mathematical formula represents a structural symbol, spaces are added to its left and right; for example, the characters "^" and "_". If a character in a mathematical formula is the character "\", it is not separated separately and is treated as a single character along with the subsequent characters representing operators or operands, with spaces added to its left and right; for example, "\frac" and "\sum". If a character in a mathematical formula is the character "{" or the character "}", spaces are added to its left and right. The characters separated by the first preset separation rule can be converted into a word vector.
[0092] The second preset separation rule separates different parts representing Braille symbols using spaces. For example, the second preset separation rule is as follows: If a Braille square represents an operand, spaces are padded to its left and right. If a Braille square represents an operator, spaces are padded to its left and right. If multiple Braille squares are combined to represent an operator or operand, the multiple squares are treated as a single character, with spaces padded to their left and right. If a Braille square is empty, it is replaced with a character corresponding to the empty square in a non-Braille ASCII set, with spaces padded to its left and right. If a Braille square represents a Braille structure, spaces are padded to its left and right. If a Braille square represents Arabic numerals, uppercase Latin letters, lowercase Latin letters, uppercase Greek letters, or lowercase Greek letters, the square and the following squares are treated as a single character, with spaces padded to its left and right. The characters segmented by the second preset separation rule can be converted into a word vector.
[0093] Optionally, one input semantic vector corresponds to one reference semantic vector. For example, multiple input semantic vectors include input semantic vector A and input semantic vector B. Multiple reference semantic vectors include reference semantic vector A' and reference semantic vector B'; input semantic vector A corresponds to reference semantic vector A', and input semantic vector B corresponds to reference semantic vector B'.
[0094] In this embodiment of the invention, the word vector dimension can be a multiple of 64. Optionally, the word vector dimension in the input semantic vector is 64-dimensional, and the word vector dimension in the reference semantic vector is also 64-dimensional. Since the basic symbols of Braille include 64 different Braille squares, if the word vector dimension is the same as the number of Braille squares, it is convenient for the word vectors to correspond one-to-one with multiple Braille squares, which can improve the accuracy of the converted Braille document.
[0095] Step S403: Input multiple input semantic vectors into the primary deep neural network model and output multiple output semantic vectors.
[0096] Optionally, one output semantic vector corresponds to one output semantic vector. For example, multiple input semantic vectors include input semantic vector A and input semantic vector B. Multiple output semantic vectors include output semantic vector A1 and output semantic vector B1. Wherein, input semantic vector A corresponds to output semantic vector A1, and input semantic vector B corresponds to output semantic vector B1.
[0097] Step S404: Determine the similarity parameters between multiple output semantic vectors and multiple reference semantic vectors.
[0098] In this embodiment of the invention, the similarity parameter can be determined by the cosine similarity of the vectors.
[0099] For example, multiple output semantic vectors include output semantic vector A1 and output semantic vector B1. A similarity parameter is determined between output semantic vector A1 and the control semantic vector A', and a similarity parameter is determined between output semantic vector B1 and the control semantic vector B'.
[0100] Step S405: Adjust the weight parameters and bias parameters in the primary deep neural network model until the similarity parameter is less than or equal to the preset similarity, and obtain the trained deep neural network model.
[0101] In this embodiment of the invention, the primary deep neural network model may include an input gate, a forget gate, and an output gate. Optionally, the weight parameters and bias parameters in the input gate function, forget gate function, and output gate function are adjusted respectively using a preset learning rate. The preset learning rate is used to represent the magnitude of change of the weight parameters and bias parameters. In this embodiment of the invention, the value of the preset learning rate and the value of the preset similarity are not specifically limited. For example, the preset learning rate is 0.001, 0.002, 0.003, etc. The preset similarity can be 0.8, 0.9, 0.95, etc.
[0102] Figure 5 This is a schematic diagram of the document conversion device provided in an embodiment of this application. Figure 5 As shown, the document conversion device includes: an acquisition module 501, a determination module 502, an extraction module 503, a conversion module 504, and a combination module 505.
[0103] The acquisition module 501 is used to acquire the document content of the first document to be converted, which includes multiple sorted characters;
[0104] The determination module 502 is used to determine the character type of each character sequentially, starting from the first character among multiple characters;
[0105] Extraction module 503 is used to extract multiple texts from a first document based on the character type of each character, wherein the text types of the multiple texts include at least two.
[0106] The conversion module 504 is used to convert each text into Braille text by using a Braille conversion model that matches the text type of the text;
[0107] Combination module 505 is used to combine multiple Braille texts converted from multiple texts into a second document.
[0108] In one possible design, the extraction module 503 extracts multiple texts from the first document based on the character type of each character. Specifically, this includes: extracting characters sequentially starting from the first character among the multiple characters; if the character type of the first character extracted this time is different from the character type of the first character, then combining the multiple characters before the first character into the first text; continuing to extract characters sequentially from the multiple characters starting from the first character; if the character type of the second character extracted this time is different from the character type of the first character, then combining the multiple characters before the second character into the second text; until the extraction of multiple characters is completed, multiple texts are obtained.
[0109] In another possible design, the text types include Chinese, mathematical, English, and symbol types; for each text, the conversion module 504 converts the text into Braille text by using a Braille conversion model that matches the text type of the text. Specifically, if the text type is mathematical formula, then the text is converted into Braille text by using a trained deep neural network model. The deep neural network model is used to input mathematical formulas in LaTeX format and output Braille text in ASCII format.
[0110] In another possible design, the following is also included: if the text type is Chinese, the conversion module 504 converts the text into Braille text using a Chinese Braille conversion model; if the text type is English, the conversion module 504 converts the text into Braille text using an English Braille conversion model; if the text type is symbolic, the conversion module 504 converts the text into Braille text using a symbolic Braille conversion model.
[0111] In another possible design, the device also includes a training module. The training module trains the deep neural network model by: acquiring a training sample set, which includes multiple sample mathematical formulas and multiple sample Braille formulas; determining multiple input semantic vectors corresponding to the multiple sample mathematical formulas and multiple control semantic vectors corresponding to the multiple sample Braille formulas; inputting the multiple input semantic vectors into a primary deep neural network model and outputting multiple output semantic vectors; determining similarity parameters between the multiple output semantic vectors and the multiple control semantic vectors; and adjusting the weight and bias parameters in the primary deep neural network model until the similarity parameters are less than or equal to a preset similarity, thus obtaining a trained deep neural network model.
[0112] In another possible design, the training module determines multiple input semantic vectors corresponding to multiple sample mathematical formulas and multiple control semantic vectors corresponding to multiple sample Braille formulas. Specifically, this includes: for each sample mathematical formula, dividing multiple characters in the sample mathematical formula according to a first preset separation rule to obtain multiple first word vectors corresponding to the sample mathematical formula; for each sample Braille formula, dividing multiple characters in the sample Braille formula according to a second preset separation rule to obtain multiple second word vectors corresponding to the sample Braille formula; combining the multiple first word vectors corresponding to each sample mathematical formula to obtain multiple input semantic vectors; and combining the multiple second word vectors corresponding to each sample Braille formula to obtain multiple control semantic vectors.
[0113] The document conversion apparatus provided in this application embodiment can be used to execute the technical solution of the document conversion method in the above embodiment. Its implementation principle and technical effect are similar, and will not be described again here.
[0114] It should be noted that the division of the various modules in the above device is merely a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, these modules can be implemented entirely in software via processing element calls; they can be fully implemented in hardware; or some modules can be implemented by processing element calls to software, while others are implemented in hardware. For example, the acquisition module 501 can be a separate processing element, or it can be integrated into a chip in the above device. Alternatively, it can be stored as program code in the memory of the above device, and its functions can be called and executed by a processing element of the device. The implementation of other modules is similar. Moreover, these modules can be fully or partially integrated together, or they can be implemented independently. The processing element here can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules can be completed through the integrated logic circuits in the hardware of the processor element or through software instructions.
[0115] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 6 As shown, the electronic device may include: a transceiver 601, a processor 602, and a memory 603.
[0116] The processor 602 executes computer execution instructions stored in the memory, causing the processor 602 to perform the scheme in the above embodiments. The processor 602 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0117] The memory 603 is connected to the processor 602 via the system bus and completes communication between them. The memory 603 is used to store computer program instructions.
[0118] Transceiver 601 can be used to obtain the task to be run and its configuration information.
[0119] The system bus can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The system bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in the diagram, but this does not indicate that there is only one bus or one type of bus. Transceivers are used to enable communication between database access devices and other computers (e.g., clients, read-write libraries, and read-only libraries). Memory may include random access memory (RAM) and may also include non-volatile memory.
[0120] The electronic device provided in this application embodiment can be the computer device described in the above embodiments.
[0121] This application also provides a chip for executing instructions, which is used to execute the document conversion method described in the above embodiments.
[0122] This application also provides a computer-readable storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the document conversion method described in the above embodiments.
[0123] This application also provides a computer program product, which includes a computer program stored in a computer-readable storage medium. At least one processor can read the computer program from the computer-readable storage medium, and when the at least one processor executes the computer program, it can implement the technical solution of the document conversion method in the above embodiments.
[0124] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.
[0125] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. A document conversion method, characterized in that, include: Obtain the document content of the first document to be converted, wherein the document content includes multiple sorted characters; The character type of each character is determined sequentially, starting from the first character among the plurality of characters; Based on the character type of each character, multiple texts are extracted from the first document, wherein the text types of the multiple texts include at least two types; If the text type is a mathematical formula, then the text is converted into Braille text using a trained deep neural network model. Combine multiple Braille texts converted from multiple texts into a second document; The training process of the deep neural network model includes: Obtain a training sample set, which includes multiple sample mathematical formulas and multiple sample Braille formulas; For each sample mathematical formula, multiple characters in the sample mathematical formula are divided according to a first preset separation rule to obtain multiple first word vectors corresponding to the sample mathematical formula. For each sample Braille formula, multiple characters in the sample Braille formula are divided according to a second preset separation rule to obtain multiple second word vectors corresponding to the sample Braille formula. The second preset separation rule is as follows: if a Braille square is used to represent an operand, then spaces are padded to the left and right of the square; if a Braille square is used to represent an operator, then spaces are padded to the left and right of the square; if multiple Braille squares are combined to represent a... If there are multiple operators or operands, then the multiple blind squares are treated as a single character, with spaces padded to the left and right of the multiple blind squares; if the blind square in the Braille is an empty square, then the empty square is replaced with the corresponding character in the non-Braille ASCII code set, with spaces padded to the left and right of the corresponding character; if the blind square in the Braille is used to represent a Braille structure, then spaces are padded to the left and right of the blind square; if the blind square in the Braille is used to represent Arabic numerals, uppercase Latin letters, lowercase Latin letters, uppercase Greek letters, or lowercase Greek letters, then the blind square and the following blind square are treated as a single character, with spaces padded to the left and right of the single character. Multiple first word vectors corresponding to each sample mathematical formula are combined to obtain multiple input semantic vectors, and multiple second word vectors corresponding to each sample Braille formula are combined to obtain multiple contrast semantic vectors. A primary deep neural network model is trained based on the multiple input semantic vectors to obtain a trained deep neural network model.
2. The method according to claim 1, characterized in that, The extraction of multiple texts from the first document based on the character type of each character includes: Extract characters sequentially, starting from the first character among the plurality of characters; If the character type of the first character extracted this time is different from the character type of the first character, then the multiple characters before the first character will be combined into the first text. Continue extracting characters sequentially from the plurality of characters, starting from the first character; If the character type of the second character extracted this time is different from the character type of the first character, then the multiple characters before the second character will be combined into the second text. The process continues until all characters have been extracted, resulting in multiple text files.
3. The method according to claim 1, characterized in that, The text types include Chinese, mathematical formula, English, and symbol types; the deep neural network model is used to input mathematical formulas in LaTeX format and output Braille text in ASCII format.
4. The method according to claim 3, characterized in that, Also includes: If the text type is Chinese, it is converted into Braille text using a Chinese Braille conversion model; if the text type is English, it is converted into Braille text using an English Braille conversion model; if the text type is symbolic, it is converted into Braille text using a symbolic Braille conversion model.
5. The method according to claim 3, characterized in that, The step of training a primary deep neural network model based on the multiple input semantic vectors to obtain a trained deep neural network model includes: The multiple input semantic vectors are input into a primary deep neural network model, which outputs multiple output semantic vectors. Determine the similarity parameters between the plurality of output semantic vectors and the plurality of reference semantic vectors; The weight parameters and bias parameters in the primary deep neural network model are adjusted until the similarity parameter is less than or equal to the preset similarity, thus obtaining a trained deep neural network model.
6. The method according to any one of claims 3-5, characterized in that, The word vectors in the input semantic vector have a dimension of 64.
7. A document conversion device, characterized in that, include: The acquisition module is used to acquire the document content of the first document to be converted, wherein the document content includes multiple sorted characters; The determination module is used to determine the character type of each character sequentially, starting from the first character among the plurality of characters; An extraction module is used to extract multiple texts from the first document based on the character type of each character, wherein the multiple texts include at least two text types; The conversion module is used to convert the text into Braille text using a trained deep neural network model if the text type is mathematical formula. The combination module is used to combine multiple Braille texts converted from multiple texts into a second document; The training module trains a deep neural network model, specifically including: Obtain a training sample set, which includes multiple sample mathematical formulas and multiple sample Braille formulas; For each sample mathematical formula, multiple characters in the sample mathematical formula are divided according to a first preset separation rule to obtain multiple first word vectors corresponding to the sample mathematical formula. For each sample Braille formula, multiple characters in the sample Braille formula are divided according to a second preset separation rule to obtain multiple second word vectors corresponding to the sample Braille formula. The second preset separation rule is as follows: if a Braille square is used to represent an operand, then spaces are padded to the left and right of the square; if a Braille square is used to represent an operator, then spaces are padded to the left and right of the square; if multiple Braille squares are combined to represent a... If there are multiple operators or operands, then the multiple blind squares are treated as a single character, with spaces padded to the left and right of the multiple blind squares; if the blind square in the Braille is an empty square, then the empty square is replaced with the corresponding character in the non-Braille ASCII code set, with spaces padded to the left and right of the corresponding character; if the blind square in the Braille is used to represent a Braille structure, then spaces are padded to the left and right of the blind square; if the blind square in the Braille is used to represent Arabic numerals, uppercase Latin letters, lowercase Latin letters, uppercase Greek letters, or lowercase Greek letters, then the blind square and the following blind square are treated as a single character, with spaces padded to the left and right of the single character. Multiple first word vectors corresponding to each sample mathematical formula are combined to obtain multiple input semantic vectors, and multiple second word vectors corresponding to each sample Braille formula are combined to obtain multiple contrast semantic vectors. A primary deep neural network model is trained based on the multiple input semantic vectors to obtain a trained deep neural network model.
8. An electronic device, characterized in that, include: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the method as described in any one of claims 1-6.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-6.