Text recognition method and device applied to air handwriting, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a combination of wearable sensors and neural network models, the problem of low recognition efficiency in aerial handwritten virtual text methods has been solved, achieving efficient recognition of handwritten trajectories of different sizes and angles, and supporting the recognition of sequential text.

CN115826762BActive Publication Date: 2026-06-16YGSOFT INC

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: YGSOFT INC
Filing Date: 2022-12-29
Publication Date: 2026-06-16

Application Information

Patent Timeline

29 Dec 2022

Application

16 Jun 2026

Publication

CN115826762B

IPC: G06F3/01; G06V30/16; G06V30/18; G06V10/82

AI Tagging

Application Domain

Input/output for user-computer interaction Graph reading

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Semiconductor inventory equipment maintenance system and method
CN120087937Blower requirementEasy to carry outInput/output for user-computer interaction Data processing applications
Device for work support in a predefined work area within an assigned spatial profile
DE102013201309B4Input/output for user-computer interactionMeasuring points marking
AR head-mounted device, and AR head-mounted device and terminal device combination system
CN114967926BInput/output for user-computer interaction Graph reading
Eye tracking cross-device interaction method and apparatus
CN122195247AInput/output for user-computer interaction Character and pattern recognition
Methods and apparatus for invoking public or private interactions during multi-user communication sessions
CN115280261BInput/output for user-computer interaction Image analysis

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Traditional methods of writing virtual text in the air have difficulty in determining the lifting/falling of the pen during writing, resulting in difficulties in segmenting characters and low recognition efficiency.

⚗Method used

Wearable sensors are used to acquire handwriting trajectory information. Text recognition is performed through spatial positioning transformation, preprocessing, and a trained aerial handwriting recognition model. Convolutional neural networks, recurrent neural networks (RNNs), and language models are used for recognition.

🎯Benefits of technology

It improves the efficiency and real-time performance of air handwriting recognition, and can recognize sequential text, not limited to numbers and single characters. It supports handwriting trajectories of different sizes and angles, allowing for more freedom in writing.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115826762B_ABST

Patent Text Reader

Abstract

The embodiment of the application belongs to the technical field of text recognition in artificial intelligence, and relates to a text recognition method and device applied to air handwriting, computer equipment and a storage medium, the method comprising: acquiring handwriting trajectory information detected by a wearable sensor; performing spatial conversion operation on the handwriting trajectory information according to a spatial positioning method to obtain spatial trajectory information; preprocessing the spatial trajectory information to obtain preprocessed trajectory information; inputting the preprocessed trajectory information into a trained air handwriting recognition model to perform text recognition operation, and obtaining handwriting trajectory recognition results. Compared with a two-dimensional FCRN model used in the prior art, the one-dimensional CNN used in the application has fewer model parameters, higher training efficiency and better real-time performance, and at the same time, the low-level local features and high-level time sequence features of the trajectory are utilized, so that sequence characters can be recognized, and the recognition is not limited to numbers and single characters.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of text recognition technology in artificial intelligence, and in particular to a text recognition method, device, computer equipment and storage medium for use with handwritten text in the air. Background Technology

[0002] Text, as a widely used tool for information dissemination and communication, plays a crucial role in human-computer interaction systems. Currently, widely used text input methods include keyboards, touchscreens, and handwriting tablets. These methods each have their own advantages and disadvantages; for example, keyboards are limited by size and the number of keys, while touchscreens and handwriting tablets are limited by size and writing area. Therefore, designing more natural, convenient, and efficient text input methods is an important research direction.

[0003] There is a method for writing virtual text in the air, where users write virtual text with their fingers. A camera or motion sensor, such as Kinect, captures the movement trajectory of the finger in real time and then recognizes the trajectory as text. This text input method is more natural and convenient than other text input methods because it is not limited by the writing area or writing style.

[0004] However, the applicant found that traditional methods of writing virtual text in the air have difficulty judging the lifting and lowering of the pen during writing, making it very difficult to segment characters. Therefore, current systems of this type can only recognize single characters, demonstrating that traditional methods of writing virtual text in the air suffer from excessively low recognition efficiency. Summary of the Invention

[0005] The purpose of this application is to propose a text recognition method, device, computer equipment, and storage medium for aerial handwriting, so as to solve the problem of low recognition efficiency in traditional aerial handwriting virtual text methods.

[0006] To address the aforementioned technical problems, this application provides a text recognition method for handwritten text in mid-air, employing the following technical solution:

[0007] The system acquires handwritten trajectory information detected by a wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope.

[0008] The handwritten trajectory information is spatially transformed using a spatial positioning method to obtain spatial trajectory information.

[0009] The spatial trajectory information is preprocessed to obtain preprocessed trajectory information;

[0010] The preprocessed trajectory information is input into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result. The aerial handwriting recognition model consists of a convolutional neural network, an RNN recurrent neural network, and a language model.

[0011] Furthermore, the step of preprocessing the handwritten trajectory information to obtain preprocessed trajectory information specifically includes:

[0012] The handwritten trajectory information is subjected to a two-dimensional projection operation to obtain two-dimensional trajectory information;

[0013] The two-dimensional trajectory information is denoised to obtain denoised trajectory information;

[0014] The denoised trajectory information is serialized to obtain trajectory sequence information;

[0015] The coordinate offset, trajectory point writing direction, and trajectory curvature of the trajectory sequence information are extracted to obtain the preprocessed trajectory information.

[0016] Furthermore, the step of performing a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information specifically includes:

[0017] The handwritten trajectory information is projected onto a two-dimensional plane based on the minimum sum of squared distances to obtain the two-dimensional trajectory information, wherein the minimum sum of squared distances is the minimum sum of squared distances from all trajectory points in the handwritten trajectory information to the two-dimensional plane.

[0018] Furthermore, the step of performing a denoising operation on the two-dimensional trajectory information to obtain denoised trajectory information specifically includes the following steps:

[0019] Perform tilt correction operation on the two-dimensional trajectory information, and / or

[0020] Perform a size normalization operation on the two-dimensional trajectory information, and / or

[0021] The two-dimensional trajectory information is resampled.

[0022] Furthermore, before the step of inputting the preprocessed trajectory information into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result, the following steps are also included:

[0023] Call the original air handwriting recognition model;

[0024] Read the database and retrieve training and test data from it;

[0025] The training data is input into the original air handwriting recognition model for prediction to obtain prediction result data;

[0026] Based on the predicted data and the test data, a CTC loss function is constructed, and the parameters of the original aerial handwriting recognition model are tuned based on the CTC loss function to obtain the trained aerial handwriting recognition model.

[0027] Furthermore, the step of inputting the preprocessed trajectory information into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result specifically includes the following steps:

[0028] The low-level local spatial features and contextual information of the preprocessed trajectory information are extracted based on the convolutional neural network.

[0029] High-level temporal features and semantic information of the preprocessed trajectory information are extracted based on the RNN recurrent neural network.

[0030] The handwritten trajectory recognition result is obtained by recognizing the low-level local spatial features, the contextual information, the high-level temporal features, and the semantic information based on the language model.

[0031] To address the aforementioned technical problems, this application also provides a text recognition device for handwritten text in mid-air, employing the following technical solution:

[0032] The trajectory acquisition module is used to acquire handwritten trajectory information detected by the wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope;

[0033] The spatial conversion module is used to perform spatial conversion operations on the handwritten trajectory information according to the spatial positioning method to obtain spatial trajectory information;

[0034] The preprocessing module is used to preprocess the spatial trajectory information to obtain preprocessed trajectory information;

[0035] The text recognition module is used to input the preprocessed trajectory information into the trained aerial handwriting recognition model to perform text recognition operations and obtain the handwriting trajectory recognition result. The aerial handwriting recognition model is composed of a convolutional neural network, an RNN recurrent neural network, and a language model.

[0036] Furthermore, the preprocessing module includes:

[0037] The two-dimensional projection submodule is used to perform a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information;

[0038] The denoising submodule is used to perform denoising operations on the two-dimensional trajectory information to obtain denoised trajectory information;

[0039] The serialization submodule is used to perform serialization operations on the denoised trajectory information to obtain trajectory sequence information;

[0040] The data extraction submodule is used to extract the coordinate offset, trajectory point writing direction and trajectory curvature of the trajectory sequence information to obtain the preprocessed trajectory information.

[0041] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:

[0042] It includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the text recognition method applied to handwritten text in the air as described above.

[0043] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:

[0044] The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps described above for the text recognition method applied to handwritten text in the air.

[0045] This application provides a text recognition method for aerial handwriting, comprising: acquiring handwriting trajectory information detected by a wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope; performing a spatial transformation operation on the handwriting trajectory information according to a spatial positioning method to obtain spatial trajectory information; preprocessing the spatial trajectory information to obtain preprocessed trajectory information; and inputting the preprocessed trajectory information into a trained aerial handwriting recognition model for text recognition to obtain a handwriting trajectory recognition result, wherein the aerial handwriting recognition model is composed of a convolutional neural network, a recurrent neural network (RNN), and a language model. Compared with existing technologies, this application uses a one-dimensional CNN to extract low-level features of handwriting trajectories, which has fewer model parameters, higher training efficiency, and better real-time performance compared to the two-dimensional FCRN model used in existing technologies. Furthermore, by utilizing the low-level local features and high-level temporal features of the trajectory, it can recognize sequential text, not just numbers and single characters. Through the preprocessing of the handwriting trajectory, it achieves recognition of handwriting trajectories of different sizes and angles, allowing the writer to write freely, thus enhancing its practicality. Attached Figure Description

[0046] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0047] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;

[0048] Figure 2 This is a flowchart illustrating the implementation of the text recognition method for handwritten text in mid-air provided in Embodiment 1 of this application;

[0049] Figure 3 yes Figure 2 A flowchart of a specific implementation of step S203;

[0050] Figure 4 yes Figure 3 A flowchart of a specific implementation of step S302;

[0051] Figure 5 yes Figure 2 A flowchart of a specific implementation method prior to step S204;

[0052] Figure 6 yes Figure 2 A flowchart of a specific implementation of step S204;

[0053] Figure 7 This is a schematic diagram of the structure of the text recognition device for handwriting recognition in mid-air provided in Embodiment 2 of this application;

[0054] Figure 8 yes Figure 7 A schematic diagram of a specific embodiment of the preprocessing module 230;

[0055] Figure 9 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation

[0056] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.

[0057] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0058] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

[0059] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0060] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.

[0061] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.

[0062] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.

[0063] It should be noted that the text recognition method for handwriting in the air provided in this application is generally executed by a server / terminal device, and correspondingly, the text recognition device for handwriting in the air is generally set in the server / terminal device.

[0064] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0065] Example 1

[0066] Continue to refer to Figure 2 The diagram shows the implementation flowchart of the text recognition method for handwritten text in the air provided in Embodiment 1 of this application. For ease of explanation, only the parts related to this application are shown.

[0067] The above-mentioned text recognition method applied to handwritten text in the air includes the following steps: step S201, step S202, step S203 and step S204.

[0068] In step S201, the handwritten trajectory information detected by the wearable sensor is obtained, wherein the wearable sensor has a three-axis accelerometer and a gyroscope.

[0069] In this embodiment, the wearable sensor can be fixed to the finger used for air handwriting. The sensor has a three-axis accelerometer and a gyroscope, which can accurately collect the trajectory of the air handwriting.

[0070] In step S202, spatial transformation operation is performed on the handwritten trajectory information according to the spatial positioning method to obtain spatial trajectory information.

[0071] In this embodiment of the application, spatial positioning is used to transform the information collected by the sensor into a three-dimensional spatial motion trajectory.

[0072] In step S203, the spatial trajectory information is preprocessed to obtain preprocessed trajectory information.

[0073] In this embodiment of the application, the preprocessing may involve performing a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information; performing a denoising operation on the two-dimensional trajectory information to obtain denoised trajectory information; performing a serialization operation on the denoised trajectory information to obtain trajectory sequence information; and extracting the coordinate offset, trajectory point writing direction, and trajectory curvature of the trajectory sequence information to obtain preprocessed trajectory information.

[0074] In step S204, the preprocessed trajectory information is input into the trained aerial handwriting recognition model for text recognition operation to obtain the handwriting trajectory recognition result. The aerial handwriting recognition model consists of a convolutional neural network, an RNN recurrent neural network, and a language model.

[0075] In this embodiment, the text recognition operation may involve extracting low-level local spatial features and contextual information of preprocessed trajectory information using a convolutional neural network; extracting high-level temporal features and semantic information of preprocessed trajectory information using a recurrent neural network (RNN); and recognizing the low-level local spatial features, contextual information, high-level temporal features, and semantic information using a language model to obtain the handwritten trajectory recognition result.

[0076] In this embodiment, by preprocessing the three-dimensional aerial handwriting trajectory through projection, tilt correction, size normalization and resampling, a CNN is used to extract low-level local features and contextual information from the handwriting motion trajectory. Then, an LSTM-RNN network is used to extract high-level temporal features and semantic information of the handwriting trajectory. Finally, the CTC function and language model are combined to decode and output the handwriting recognition result, so as to solve the problems existing in the existing aerial handwriting recognition system.

[0077] This application provides a text recognition method for aerial handwriting, comprising: acquiring handwriting trajectory information detected by a wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope; performing a spatial transformation operation on the handwriting trajectory information according to a spatial positioning method to obtain spatial trajectory information; preprocessing the spatial trajectory information to obtain preprocessed trajectory information; and inputting the preprocessed trajectory information into a trained aerial handwriting recognition model for text recognition to obtain a handwriting trajectory recognition result. The aerial handwriting recognition model consists of a convolutional neural network, a recurrent neural network (RNN), and a language model. Compared with existing technologies, this application uses a one-dimensional CNN to extract low-level features of the handwriting trajectory, which has fewer model parameters, higher training efficiency, and better real-time performance compared to the two-dimensional FCRN model used in existing technologies. Furthermore, by utilizing the low-level local features and high-level temporal features of the trajectory, it can recognize sequential text, not just numbers and single characters. Through the preprocessing of the handwriting trajectory, it achieves recognition of handwriting trajectories of different sizes and angles, allowing the writer to write freely and making it more practical.

[0078] Continue reading Figure 3 , showed Figure 2 The flowchart of a specific embodiment of step S203 is shown. For ease of explanation, only the parts relevant to this application are shown.

[0079] In some optional implementations of this embodiment, step S203 specifically includes: step S301, step S302, step S303 and step S304.

[0080] In step S301, the handwritten trajectory information is subjected to a two-dimensional projection operation to obtain two-dimensional trajectory information;

[0081] In step S302, the two-dimensional trajectory information is denoised to obtain denoised trajectory information;

[0082] In step S303, the denoised trajectory information is serialized to obtain trajectory sequence information;

[0083] In step S304, the coordinate offset, trajectory point writing direction, and trajectory curvature of the trajectory sequence information are extracted to obtain preprocessed trajectory information.

[0084] In this embodiment, the two-dimensional projection operation can be to project the handwritten trajectory information onto a two-dimensional plane based on the minimum sum of squared distances to obtain the two-dimensional trajectory information.

[0085] In some optional implementations of this embodiment, step S301 includes: projecting the handwritten trajectory information onto a two-dimensional plane based on the minimum sum of squared distances to obtain two-dimensional trajectory information, wherein the minimum sum of squared distances is the minimum sum of squared distances from all trajectory points in the handwritten trajectory information to the two-dimensional plane.

[0086] In this embodiment of the application, the optimal projection plane can be obtained by projecting the handwritten trajectory information onto a two-dimensional plane using the minimum sum of squared distances.

[0087] Continue reading Figure 4 , showed Figure 3 The flowchart of a specific embodiment of step S302 is shown. For ease of explanation, only the parts relevant to this application are shown.

[0088] In some optional implementations of this embodiment, step S302 specifically includes: step S401 and / or step S402 and / or step S403.

[0089] In step S401, a tilt correction operation is performed on the two-dimensional trajectory information, and / or

[0090] In step S402, the two-dimensional trajectory information is normalized, and / or

[0091] In step S403, the two-dimensional trajectory information is resampled.

[0092] In this embodiment of the application, tilt correction, size normalization and resampling are performed on the two-dimensional trajectory information obtained by projection, which can effectively reduce the differences in the spatial position, size and angle of the text caused by writing and improve the accuracy of the data.

[0093] Continue reading Figure 5 , showed Figure 2 The flowchart of a specific implementation prior to step S204 is shown for ease of explanation, showing only the parts relevant to this application.

[0094] In some optional implementations of this embodiment, before step S204, steps S501, S502, S503 and S504 are also included.

[0095] In step S501, the original air handwriting recognition model is invoked;

[0096] In step S502, the database is read, and training data and test data are obtained from the database;

[0097] In step S503, the training data is input into the original air handwriting recognition model for prediction to obtain the prediction result data;

[0098] In step S504, a CTC loss function is constructed based on the prediction result data and test data, and the parameters of the original air handwriting recognition model are tuned based on the CTC loss function to obtain a trained air handwriting recognition model.

[0099] In this embodiment, the CTC loss function is mainly used to process the loss between the output labels and the real labels, and to solve the problem of misalignment between the number of output labels and the number of real labels.

[0100] Continue reading Figure 6 , showed Figure 2 The flowchart of a specific embodiment of step S204 is shown. For ease of explanation, only the parts relevant to this application are shown.

[0101] In some optional implementations of this embodiment, step S204 specifically includes: step S601, step S602 and step S603.

[0102] In step S601, low-level local spatial features and contextual information of the preprocessed trajectory information are extracted based on the convolutional neural network;

[0103] In step S602, high-level temporal features and semantic information of the preprocessed trajectory information are extracted based on the RNN recurrent neural network;

[0104] In step S603, the handwritten trajectory recognition result is obtained by identifying low-level local spatial features, contextual information, high-level temporal features, and semantic information based on the language model.

[0105] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware with computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium, and when executed, they can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).

[0106] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.

[0107] Example 2

[0108] Further reference Figure 7 As a response to the above Figure 2 The implementation of the method shown in this application provides an embodiment of a text recognition device for handwritten text in mid-air, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.

[0109] like Figure 7 As shown, the text recognition device 200 for handwriting in the air in this embodiment includes: a trajectory acquisition module 210, a spatial conversion module 220, a preprocessing module 230, and a text recognition module 240.

[0110] in:

[0111] The trajectory acquisition module 210 is used to acquire handwritten trajectory information detected by the wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope;

[0112] The spatial conversion module 220 is used to perform spatial conversion operations on the handwritten trajectory information according to the spatial positioning method to obtain spatial trajectory information;

[0113] Preprocessing module 230 is used to preprocess spatial trajectory information to obtain preprocessed trajectory information;

[0114] The text recognition module 240 is used to input preprocessed trajectory information into the trained aerial handwriting recognition model to perform text recognition operations and obtain handwriting trajectory recognition results. The aerial handwriting recognition model consists of a convolutional neural network, an RNN recurrent neural network, and a language model.

[0115] In this embodiment, the wearable sensor can be fixed to the finger used for air handwriting. The sensor has a three-axis accelerometer and a gyroscope, which can accurately collect the trajectory of the air handwriting.

[0116] In this embodiment of the application, spatial positioning is used to transform the information collected by the sensor into a three-dimensional spatial motion trajectory.

[0117] In this embodiment of the application, the preprocessing may involve performing a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information; performing a denoising operation on the two-dimensional trajectory information to obtain denoised trajectory information; performing a serialization operation on the denoised trajectory information to obtain trajectory sequence information; and extracting the coordinate offset, trajectory point writing direction, and trajectory curvature of the trajectory sequence information to obtain preprocessed trajectory information.

[0118] In this embodiment, the text recognition operation may involve extracting low-level local spatial features and contextual information of preprocessed trajectory information using a convolutional neural network; extracting high-level temporal features and semantic information of preprocessed trajectory information using a recurrent neural network (RNN); and recognizing the low-level local spatial features, contextual information, high-level temporal features, and semantic information using a language model to obtain the handwritten trajectory recognition result.

[0119] In this embodiment, by preprocessing the three-dimensional aerial handwriting trajectory through projection, tilt correction, size normalization and resampling, a CNN is used to extract low-level local features and contextual information from the handwriting motion trajectory. Then, an LSTM-RNN network is used to extract high-level temporal features and semantic information of the handwriting trajectory. Finally, the CTC function and language model are combined to decode and output the handwriting recognition result, so as to solve the problems existing in the existing aerial handwriting recognition system.

[0120] In this embodiment of the application, a text recognition device 200 for air handwriting is provided, comprising: a trajectory acquisition module 210 for acquiring handwriting trajectory information detected by a wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope; a spatial transformation module 220 for performing spatial transformation operations on the handwriting trajectory information according to a spatial positioning method to obtain spatial trajectory information; a preprocessing module 230 for preprocessing the spatial trajectory information to obtain preprocessed trajectory information; and a text recognition module 240 for inputting the preprocessed trajectory information into a trained air handwriting recognition model for text recognition operations to obtain handwriting trajectory recognition results, wherein the air handwriting recognition model is composed of a convolutional neural network, a recurrent neural network (RNN), and a language model. Compared with existing technologies, this application uses a one-dimensional CNN to extract low-level features of handwritten trajectories. Compared with the two-dimensional FCRN model used in existing technologies, the model has fewer parameters, higher training efficiency, and better real-time performance. At the same time, by utilizing the low-level local features and high-level temporal features of the trajectory, it can recognize sequential text, not just numbers and single characters. Through the preprocessing of handwritten trajectories, it can recognize handwritten trajectories of different sizes and angles, allowing writers to write freely and making it more practical.

[0121] Continue reading Figure 8 , showed Figure 7 A schematic diagram of a specific embodiment of the preprocessing module 230 is shown. For ease of explanation, only the parts relevant to this application are shown.

[0122] In some optional implementations of this embodiment, the preprocessing module 230 includes: a two-dimensional projection submodule 231, a denoising submodule 232, a serialization submodule 233, and a data extraction submodule 234, wherein:

[0123] The two-dimensional projection submodule 231 is used to perform a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information;

[0124] The denoising submodule 232 is used to perform denoising operations on the two-dimensional trajectory information to obtain denoised trajectory information;

[0125] The serialization submodule 233 is used to perform serialization operations on the denoised trajectory information to obtain trajectory sequence information;

[0126] The data extraction submodule 234 is used to extract the coordinate offset, trajectory point writing direction and trajectory curvature of the trajectory sequence information to obtain preprocessed trajectory information.

[0127] In this embodiment, the two-dimensional projection operation can be to project the handwritten trajectory information onto a two-dimensional plane based on the minimum sum of squared distances to obtain the two-dimensional trajectory information.

[0128] In some optional implementations of this embodiment, the two-dimensional projection submodule 231 includes: a two-dimensional projection unit, wherein:

[0129] The two-dimensional projection unit is used to project the handwritten trajectory information onto a two-dimensional plane based on the minimum sum of squared distances to obtain the two-dimensional trajectory information. The minimum sum of squared distances is the sum of the squared distances from all trajectory points in the handwritten trajectory information to the two-dimensional plane.

[0130] In this embodiment of the application, the optimal projection plane can be obtained by projecting the handwritten trajectory information onto a two-dimensional plane using the minimum sum of squared distances.

[0131] In some optional implementations of this embodiment, the denoising submodule 232 includes: a tilt correction unit and / or a size normalization unit and / or a resampling unit, wherein:

[0132] A tilt correction unit is used to perform tilt correction operations on two-dimensional trajectory information, and / or

[0133] Size normalization unit, used to perform size normalization operation on two-dimensional trajectory information, and / or

[0134] The resampling unit is used to resample the two-dimensional trajectory information.

[0135] In this embodiment of the application, tilt correction, size normalization and resampling are performed on the two-dimensional trajectory information obtained by projection, which can effectively reduce the differences in the spatial position, size and angle of the text caused by writing and improve the accuracy of the data.

[0136] In some optional implementations of this embodiment, the apparatus 200 further includes: a model invocation module, a training data acquisition module, a model prediction module, and a model training module, wherein:

[0137] The model invocation module is used to invoke the original aerial handwriting recognition model;

[0138] The training data acquisition module is used to read the database and retrieve training and test data from it.

[0139] The model prediction module is used to input training data into the original air handwriting recognition model to perform prediction operations and obtain prediction result data.

[0140] The model training module is used to construct the CTC loss function based on the prediction results data and test data, and to perform parameter tuning on the original aerial handwriting recognition model based on the CTC loss function to obtain the trained aerial handwriting recognition model.

[0141] In this embodiment, the CTC loss function is mainly used to process the loss between the output labels and the real labels, and to solve the problem of misalignment between the number of output labels and the number of real labels.

[0142] In some optional implementations of this embodiment, the text recognition module 240 includes: a first feature extraction submodule, a second feature extraction submodule, and a text recognition submodule, wherein:

[0143] The first feature extraction submodule is used to extract low-level local spatial features and contextual information of preprocessed trajectory information based on the convolutional neural network.

[0144] The second feature extraction submodule is used to extract high-level temporal features and semantic information of preprocessed trajectory information based on the RNN recurrent neural network.

[0145] The text recognition submodule is used to identify low-level local spatial features, contextual information, high-level temporal features, and semantic information based on the language model to obtain handwritten trajectory recognition results.

[0146] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 9 , Figure 9 This is a basic structural block diagram of the computer device in this embodiment.

[0147] The computer device 300 includes a memory 310, a processor 320, and a network interface 330 that are interconnected via a system bus. It should be noted that only the computer device 300 with components 310-330 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.

[0148] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.

[0149] The memory 310 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 310 may be an internal storage unit of the computer device 300, such as the hard disk or memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc. Of course, the memory 310 may also include both internal storage units and external storage devices of the computer device 300. In this embodiment, the memory 310 is typically used to store the operating system and various application software installed on the computer device 300, such as computer-readable instructions for text recognition methods applied to handwritten text. Furthermore, the memory 310 can also be used to temporarily store various types of data that have been output or will be output.

[0150] In some embodiments, the processor 320 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 320 is typically used to control the overall operation of the computer device 300. In this embodiment, the processor 320 is used to execute computer-readable instructions stored in the memory 310 or to process data, for example, to execute computer-readable instructions applied to the text recognition method for handwriting in the air.

[0151] The network interface 330 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 300 and other electronic devices.

[0152] The computer device provided in this application uses a one-dimensional CNN to extract low-level features of handwritten trajectories. Compared with the two-dimensional FCRN model used in the prior art, it has fewer model parameters, higher training efficiency, and better real-time performance. At the same time, it utilizes the low-level local features and high-level temporal features of the trajectory to recognize sequential text, not limited to the recognition of numbers and single characters. Through the preprocessing of handwritten trajectories, it can recognize handwritten trajectories of different sizes and angles, allowing writers to write freely and making it more practical.

[0153] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the above-described text recognition method for handwritten text in the air.

[0154] The computer-readable storage medium provided in this application uses a one-dimensional CNN to extract low-level features of handwritten trajectories. Compared with the two-dimensional FCRN model used in the prior art, it has fewer model parameters, higher training efficiency, and better real-time performance. At the same time, it utilizes the low-level local features and high-level temporal features of the trajectory to recognize sequential text, not limited to the recognition of numbers and single characters. Through the preprocessing of handwritten trajectories, it can recognize handwritten trajectories of different sizes and angles, allowing writers to write freely and making it more practical.

[0155] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0156] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.

Claims

1. A text recognition method for handwritten text in mid-air, characterized in that, Includes the following steps: The system acquires handwritten trajectory information detected by a wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope. The handwritten trajectory information is spatially transformed using a spatial positioning method to obtain spatial trajectory information. The spatial trajectory information is preprocessed to obtain preprocessed trajectory information; The preprocessed trajectory information is input into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result. The aerial handwriting recognition model consists of a convolutional neural network, an RNN recurrent neural network, and a language model. The step of preprocessing the handwritten trajectory information to obtain preprocessed trajectory information specifically includes: The handwritten trajectory information is subjected to a two-dimensional projection operation to obtain two-dimensional trajectory information; The two-dimensional trajectory information is denoised to obtain denoised trajectory information; The denoised trajectory information is serialized to obtain trajectory sequence information; The coordinate offset, trajectory point writing direction, and trajectory curvature of the trajectory sequence information are extracted to obtain the preprocessed trajectory information; The step of inputting the preprocessed trajectory information into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result specifically includes the following steps: The low-level local spatial features and contextual information of the preprocessed trajectory information are extracted based on the convolutional neural network. High-level temporal features and semantic information of the preprocessed trajectory information are extracted based on the RNN recurrent neural network. The handwritten trajectory recognition result is obtained by recognizing the low-level local spatial features, the contextual information, the high-level temporal features, and the semantic information based on the language model. The step of performing a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information specifically includes: The handwritten trajectory information is projected onto a two-dimensional plane based on the minimum sum of squared distances to obtain the two-dimensional trajectory information, wherein the minimum sum of squared distances is the minimum sum of squared distances from all trajectory points in the handwritten trajectory information to the two-dimensional plane.

2. The text recognition method for handwritten text in mid-air according to claim 1, characterized in that, The step of performing a denoising operation on the two-dimensional trajectory information to obtain denoised trajectory information specifically includes the following steps: Perform tilt correction operation on the two-dimensional trajectory information, and / or Perform a size normalization operation on the two-dimensional trajectory information, and / or The two-dimensional trajectory information is resampled.

3. The text recognition method for handwritten text in mid-air according to claim 1, characterized in that, Before the step of inputting the preprocessed trajectory information into the trained aerial handwriting recognition model for text recognition to obtain the handwriting trajectory recognition result, the following steps are also included: Call the original air handwriting recognition model; Read the database and retrieve training and test data from it; The training data is input into the original air handwriting recognition model for prediction to obtain prediction result data; Based on the predicted data and the test data, a CTC loss function is constructed, and the parameters of the original aerial handwriting recognition model are tuned based on the CTC loss function to obtain the trained aerial handwriting recognition model.

4. A text recognition device for handwritten text in mid-air, characterized in that, include: The trajectory acquisition module is used to acquire handwritten trajectory information detected by the wearable sensor, wherein the wearable sensor has a three-axis accelerometer and a gyroscope; The spatial conversion module is used to perform spatial conversion operations on the handwritten trajectory information according to the spatial positioning method to obtain spatial trajectory information; The preprocessing module is used to preprocess the spatial trajectory information to obtain preprocessed trajectory information; The text recognition module is used to input the preprocessed trajectory information into the trained aerial handwriting recognition model to perform text recognition operations and obtain the handwriting trajectory recognition result. The aerial handwriting recognition model is composed of a convolutional neural network, an RNN recurrent neural network, and a language model. The preprocessing module includes: a two-dimensional projection submodule, a denoising submodule, a serialization submodule, and a data extraction submodule, wherein: The two-dimensional projection submodule is used to perform a two-dimensional projection operation on the handwritten trajectory information to obtain two-dimensional trajectory information; The denoising submodule is used to perform denoising operations on the two-dimensional trajectory information to obtain denoised trajectory information; The serialization submodule is used to perform serialization operations on the denoised trajectory information to obtain trajectory sequence information; The data extraction submodule is used to extract the coordinate offset, trajectory point writing direction and trajectory curvature of the trajectory sequence information to obtain the preprocessed trajectory information. The text recognition module includes: a first feature extraction submodule, a second feature extraction submodule, and a text recognition submodule, wherein: The first feature extraction submodule is used to extract low-level local spatial features and context information of the preprocessed trajectory information based on the convolutional neural network; The second feature extraction submodule is used to extract high-level temporal features and semantic information of the preprocessed trajectory information based on the RNN recurrent neural network; The text recognition submodule is used to recognize the low-level local spatial features, the context information, the high-level temporal features, and the semantic information based on the language model to obtain the handwritten trajectory recognition result; The two-dimensional projection submodule includes: a two-dimensional projection unit, wherein: The two-dimensional projection unit is used to project the handwritten trajectory information onto a two-dimensional plane according to the minimum sum of squared distances to obtain the two-dimensional trajectory information, wherein the minimum sum of squared distances is the minimum sum of squared distances from all trajectory points in the handwritten trajectory information to the two-dimensional plane.

5. A computer device, characterized in that, The method includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the text recognition method for handwritten text in the air as described in any one of claims 1 to 3.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the text recognition method for handwritten text in the air as described in any one of claims 1 to 3.