A page reading method and device
By identifying and analyzing the layout and control relationships of application pages, and combining this with a category library for text-to-speech, the problem of users understanding page content in scenarios such as driving has been solved, and automatic text-to-speech for swiping pages has been achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies cannot effectively read pages aloud based on the relationships and independence between text in scenarios such as driving, making it difficult for users to understand the meaning of the content and making it impossible to read the content after scrolling.
By capturing screenshots of application pages, the system identifies text, punctuation marks, and location information. Based on a category library, it determines the layout and reading method, and combines this with the relationship between page controls to read the content aloud. It also supports page scrolling to access the complete content.
It enables reading aloud based on the relationships and independence between text, helping users understand the meaning of the content and reading the content of the scrolled page without manual operation.
Smart Images

Figure CN122308774A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more particularly to a method and apparatus for page reading. Background Technology
[0002] In situations such as driving, it is inconvenient for users to look away from their phone screen or perform operations on their phone using hand gestures. In such cases, if users want to access information such as news, Weibo, or WeChat, they need to enable the text-to-speech function on the application page.
[0003] Currently, there are two ways to automatically read aloud application pages on a mobile phone: one is through audio files embedded in the application, and the other is through the operating system's screenshot function.
[0004] Using audio files embedded in an application for automatic reading refers to users clicking on the audio file's reading icon to achieve automatic reading. For example, Figure 1(a) shows the interface of a weather application, which contains a reading icon and an embedded audio file that can be read aloud. Users can click on the reading icon to achieve automatic reading. This method is only applicable to applications that embed audio files and is limited by the functionality of the application itself.
[0005] Using the phone's operating system for text-to-speech refers to using the operating system's screenshot function to capture an image of the relevant application page, performing text recognition on the captured image, and then reading the recognized text aloud in a top-to-bottom, left-to-right order. This method can be applied to any application. For example, Figure 1(b) shows a scenario where the phone's operating system's built-in text-to-speech function is used to read a weather forecast page. However, the text recognition results do not include the relationships and independence between the text in the weather forecast information, such as the independent relationship between "5:00 AM" and "6:00 AM," or the relationship between "5:00 AM" and "28℃," making it difficult for users to understand the meaning of the read-out content.
[0006] Therefore, using the screenshot function to read aloud application pages cannot be based on the relationships and independence between text, making it difficult for users to understand the meaning of the read-aloud content. Furthermore, since a phone only captures the content currently displayed on the application screen when taking a screenshot, it cannot read aloud the remaining content when scrolling is required to reveal it, thus preventing users from learning more about the page's information. Summary of the Invention
[0007] This application provides a page reading method and apparatus that can read aloud based on the relationship and independence between text, making it easier for users to understand the meaning of the read content.
[0008] Firstly, this application provides a page reading method, comprising: obtaining a category library; the category library includes layout methods, reading methods, and categories; texts in different layout methods have independent relationships, and texts included in the same layout method have related relationships; obtaining a first image of an application page; the first image is a screenshot of the page content displayed by the application; recognizing the first image to obtain first page information; the first page information includes one or more of text, punctuation marks, icons, and positions; determining the category of the application based on the first page information; determining a first layout method and a corresponding reading method based on the category library and the application category; decomposing the first page information according to the first layout method to obtain a first segment corresponding to the first layout method; and reading the first segment aloud according to the reading method corresponding to the first layout method.
[0009] In the page reading method provided in this application, the texts in different layouts have independent relationships, and the texts included in the same layout have related relationships. The method of breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout can read the texts according to the related and independent relationships between the texts, making it easier for users to understand the meaning of the read content.
[0010] As one possible implementation, before reading the first segment aloud according to the reading method corresponding to the first layout method, the method further includes: obtaining the page control relationships; and reading the first segment aloud according to the page control relationships. Reading the first segment aloud according to the page control relationships makes it easier for users to understand the location information of the content being read, further reflecting the independent relationship between the text being read.
[0011] One possible implementation involves acquiring a second image of the application page; this second image is a screenshot of the page content after the operating system controls the application page to slide; the second image is then recognized to obtain second page information; this second page information includes one or more of text, punctuation marks, icons, and location information; the second page information is then decomposed according to a first layout method to obtain a second segment corresponding to the first layout method; the second segment is then read aloud according to a reading method corresponding to the first layout method, or according to the page control relationships and a reading method corresponding to the first layout method. By controlling the application page to slide through the operating system to display the remaining or all of the page content, capturing the slided page content through screenshots, and reading aloud the slided page content based on the captured images, this allows users to slide the page and read the slided page content without performing any manual operations, facilitating a greater understanding of the page content.
[0012] One possible implementation involves determining the application's category based on the information from the first page. Specifically, this includes obtaining a second layout method from the first page information and then using a category library to determine the application's category based on this second layout method. The second layout method represents the layout of a subset of pages. Determining the application's category based on the layout of a subset of pages facilitates the determination of all layout methods included in the application using the category library. This improves the efficiency of determining the layout method compared to directly determining all layout methods included in the application based on the information from the first page.
[0013] One possible implementation involves determining the application category based on the information on the first page. This specifically includes extracting keywords from the first page information and using those keywords to determine the application category. Determining the application category based on keywords allows for the identification of all layout styles included in the application using a category library. This improves the efficiency of determining layout styles compared to directly determining all layout styles based on the first page information.
[0014] Secondly, this application provides a page reading method, comprising: acquiring a category library; acquiring a first image and a second image of an application page; the first image being a screenshot of the page content displayed by the application; the second image being a screenshot of the page content after the operating system controls the application page to slide; recognizing the first image and the second image to obtain page information; the page information including one or more of text, punctuation marks, icons, and positions. Based on the page information, determining the application category; using the category library, determining a first layout method and a corresponding reading method based on the application category; decomposing the page information according to the first layout method to obtain segments corresponding to the first layout method; and reading the segments aloud according to the reading method corresponding to the first layout method.
[0015] In the page reading method provided in this application, the texts in different layouts have independent relationships, and the texts included in the same layout have related relationships. The method of breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout can read the texts according to the related and independent relationships between the texts, making it easier for users to understand the meaning of the read content.
[0016] As one possible implementation, before reading the segment aloud according to the reading method corresponding to the first typesetting method, the method further includes: obtaining the page control relationships; and reading the segment aloud according to the page control relationships. Reading the segment aloud according to the page control relationships makes it easier for users to understand the location information of the content being read, further reflecting the independent relationship between the text being read.
[0017] One possible implementation involves determining the application's category based on page information. Specifically, this includes obtaining a second layout method based on the page information and then using a category library to determine the application's category based on this second layout method. The second layout method represents the layout of a subset of pages. Determining the application's category based on the layout of a subset of pages allows for the identification of all layout methods included in the application using the category library. This improves the efficiency of determining the layout method compared to directly determining all layout methods based on page information.
[0018] One possible implementation is to determine the application category based on page information. This includes extracting keywords from the page information and then using those keywords to determine the application category. Determining the application category based on keywords allows for the identification of all layout styles included in the application using a category library. This improves the efficiency of determining layout styles compared to directly determining all layout styles based on the first page information.
[0019] Thirdly, this application provides a page reading device for executing the methods in the first aspect or any possible implementation of the first aspect. Specifically, the device includes modules for executing the methods in the first aspect or any possible implementation of the first aspect.
[0020] Fourthly, this application provides a page-reading device for performing the methods in the second aspect or any possible implementation thereof. Specifically, the device includes modules for performing the methods in the second aspect or any possible implementation thereof.
[0021] Fifthly, this application provides a reading platform, including the apparatus in the third aspect or any possible implementation of the third aspect, or including the apparatus in the fourth aspect or any possible implementation of the fourth aspect.
[0022] In a sixth aspect, this application provides a computing device, comprising: at least one memory for storing a program; and at least one processor for executing the program stored in the memory; wherein, when the program stored in the memory is executed, the processor is configured to execute the method of the first aspect or a possible implementation thereof, or to execute the method of the second aspect or a possible implementation thereof.
[0023] In a seventh aspect, this application provides a computer storage medium having a computer program stored thereon, which, when run on a processor, causes the processor to perform the method of the first aspect or a possible implementation thereof, or to perform the method of the second aspect or a possible implementation thereof.
[0024] Eighthly, this application provides a computer program product that, when run on a processor, causes the processor to perform the method of the first aspect or a possible implementation of the first aspect, or to perform the method of the second aspect or a possible implementation of the second aspect. Attached Figure Description
[0025] Figure 1(a) shows a scenario of application page reading in the prior art;
[0026] Figure 1(b) shows another application page reading scenario in the prior art;
[0027] Figure 2 Here are example diagrams of the electronic devices provided in this application;
[0028] Figure 3 This is a system architecture example diagram provided in this application;
[0029] Figure 4 This is a flowchart of a page reading method provided in this application;
[0030] Figure 5 This is an example image of a page-to-speech scenario provided in this application;
[0031] Figure 6 This is a flowchart of another page reading method provided in this application;
[0032] Figure 7 This is a flowchart of another page reading method provided in this application;
[0033] Figure 8 This is a module diagram of a page-reading device provided in this application;
[0034] Figure 9 This is another page-reading device module diagram provided in this application;
[0035] Figure 10 This is an example diagram of the computing device provided in this application. Detailed Implementation
[0036] In this article, the term "and / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The punctuation mark " / " indicates that the related objects are in an "or" relationship; for example, A / B means A or B.
[0037] The terms "first" and "second," etc., used in the specification and claims herein are used to distinguish different objects, not to describe a specific order of objects. For example, "first response message" and "second response message," etc., are used to distinguish different response messages, not to describe a specific order of response messages.
[0038] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to show the relevant concepts in a specific manner.
[0039] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.
[0040] To improve the convenience of users operating electronic devices, embodiments of this application provide an operating method for an electronic device and an electronic device itself. The electronic device involved in the embodiments of this application is described below.
[0041] Figure 2 A structural example diagram of the electronic device 100 is shown.
[0042] Electronic device 100 may include processor 110, external memory interface 120, internal memory 121, universal serial bus (USB) interface 130, charging management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 180, speaker 180A, receiver 180B, microphone 180C, headphone jack 180D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, a barometric pressure sensor 180C, a magnetic sensor 180D, an accelerometer sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
[0043] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0044] Processor 110 may include one or more processing units, such as: application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural network processing unit (NPU), etc. Different processing units may be independent devices or integrated into one or more processors.
[0045] The controller can be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction opcode and timing signals to complete the control of fetching and executing instructions.
[0046] The processor 110 may also include a memory for storing instructions and data.
[0047] The charging management module 140 is used to receive charging input from the charger.
[0048] The power management module 141 is used to connect the battery 142, the charging management module 140, and the processor 110.
[0049] The wireless communication function of electronic device 100 can be implemented through antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor, and baseband processor.
[0050] Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover one or more communication frequency bands. Different antennas can also be reused to improve antenna utilization.
[0051] The mobile communication module 150 can provide solutions for wireless communication, including 2G / 3G / 4G / 5G, applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
[0052] The wireless communication module 160 can provide a wireless communication solution for use in the electronic device 100.
[0053] Electronic device 100 implements display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering. Processor 110 may include one or more GPUs, which execute program instructions to generate or modify display information.
[0054] Display screen 194 is used to display images, videos, etc. Display screen 194 includes a display panel. The display panel may be a liquid crystal display (LCD).
[0055] Electronic device 100 can perform shooting functions through ISP, camera 193, video codec, GPU, display 194 and application processor.
[0056] Camera 193 is used to capture still images or videos.
[0057] Video codecs are used to compress or decompress digital video. Electronic device 100 may support one or more video codecs. Thus, electronic device 100 can play or record videos in various encoding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
[0058] An NPU (Neural Processing Unit) is a computational processor for neural networks (NNs). By borrowing the structure of biological neural networks, such as the transmission methods between neurons in the human brain, it can rapidly process input information and continuously learn on its own. NPUs can enable intelligent cognitive applications in electronic devices, such as image recognition, facial recognition, speech recognition, and text understanding.
[0059] The external storage interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external storage interface 120 to perform data storage functions. For example, music, video, and other files can be saved on the external memory card.
[0060] Internal memory 121 can be used to store executable program code, which includes instructions. Processor 110 executes various functional applications and data processing of electronic device 100 by running the instructions stored in internal memory 121. Internal memory 121 may include a program storage area and a data storage area.
[0061] Electronic device 100 can implement audio functions such as music playback and recording through an audio module 180, a speaker 180A, a receiver 180B, a microphone 180C, a headphone jack 180D, and an application processor.
[0062] The audio module 180 is used to convert digital audio information into analog audio signals for output, and also to convert analog audio input into digital audio signals. The audio module 180 can also be used for encoding and decoding audio signals. In some embodiments, the audio module 180 may be located in the processor 110, or some functional modules of the audio module 180 may be located in the processor 110.
[0063] The speaker 180A, also known as a "loudspeaker," is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music or make hands-free calls through the speaker 180A.
[0064] The receiver 180B, also known as the "earpiece," is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a telephone call or voice message, the receiver 180B can be brought close to the ear to listen to the voice.
[0065] Microphone 180C, also known as a "microphone" or "voice transducer," is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can speak by bringing their mouth close to microphone 180C, inputting the sound signal into microphone 180C. Electronic device 100 may be equipped with at least one microphone 180C.
[0066] The 180D headphone jack is used to connect wired headphones.
[0067] The pressure sensor 180A is used to sense pressure signals and can convert pressure signals into electrical signals.
[0068] The gyroscope sensor 180B can be used to determine the motion attitude of the electronic device 100.
[0069] The 180C barometric pressure sensor is used to measure barometric pressure.
[0070] The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip cover.
[0071] The 180E accelerometer can detect the magnitude of acceleration of electronic device 100 in various directions (typically three axes). When electronic device 100 is stationary, it can detect the magnitude and direction of gravity. It can also be used to identify the posture of electronic devices and applied to applications such as screen orientation switching and pedometers.
[0072] A distance sensor 180F is used to measure distance. Electronic device 100 can measure distance via infrared or laser. In some embodiments, during a shooting scene, electronic device 100 can utilize the distance sensor 180F to measure distance for rapid focusing.
[0073] The proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode. The LED may be an infrared LED.
[0074] The 180L ambient light sensor is used to detect ambient light intensity.
[0075] The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can utilize the characteristics of the collected fingerprints to achieve fingerprint unlocking, accessing application locks, taking photos with fingerprints, answering calls with fingerprints, etc.
[0076] Temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 uses the temperature detected by temperature sensor 180J to execute a temperature processing strategy.
[0077] Touch sensor 180K, also known as "touch panel". Touch sensor 180K can be set on display screen 194. Touch sensor 180K and display screen 194 together form touch screen, also known as "touch screen".
[0078] The bone conduction sensor 180M can acquire vibration signals.
[0079] Buttons 190 include a power button, volume buttons, etc. Buttons 190 can be mechanical buttons or touch-sensitive buttons. Electronic device 100 can receive button input and generate key signal inputs related to user settings and function control of electronic device 100.
[0080] Motor 191 can generate vibration alerts. Motor 191 can be used for incoming call vibration alerts or for touch vibration feedback.
[0081] Indicator 192 can be an indicator light, used to indicate charging status, power changes, or to indicate messages, missed calls, notifications, etc.
[0082] The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to make contact with or separate from the electronic device 100.
[0083] It is understood that the electronic devices in the embodiments of this application may also be referred to as terminals, user equipment (UE), mobile stations (MS), mobile terminals (MT), etc. Electronic devices may include mobile phones, tablets, laptops, televisions, ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), smart screens, etc. The embodiments of this application do not impose any restrictions on the specific type of electronic device.
[0084] Figure 3 This is an example diagram of a system architecture 300 applicable to embodiments of this application. The system includes an electronic device 100 and a server 200. The electronic device 100 can communicate with the server 200, which can also be referred to as a server. The electronic device 100 includes an application program, a processing module, and a text-to-speech platform. The server 200 stores a category library. The application program refers to the application program installed on the electronic device 100. The processing module and the text-to-speech platform are both embedded in the operating system of the electronic device and are functional modules of the operating system. The processing module is used to receive text-to-speech requests from the application program, as well as screenshot requests, page swiping requests, and page control relationship requests from the text-to-speech platform, and to perform operations such as taking screenshots and obtaining page control relationships, as well as sending page swiping instructions to the application program and sending screenshot results and page control relationships to the text-to-speech platform. The text-to-speech platform is used to send category library call requests to the server and receive the category library returned by the server, as well as to send screenshot requests, page scrolling requests, and page control relationship requests to the processing module. It receives screenshot images and page control relationships sent by the processing module, recognizes the images to obtain page information, and performs text-to-speech operations based on the page information, category library, and page control relationships, or performs text-to-speech operations based on the page information and category library, etc.
[0085] The technical solutions provided in the embodiments of this application will be described in detail below.
[0086] See Figure 4 This application provides a page reading method, including the following steps:
[0087] S401. During the use of the application, the user sends a reading request to the processing module.
[0088] While using the application, users can initiate a reading request by speaking or pressing the volume button three times consecutively, or by pulling down the control center and clicking the screen reading icon in the control center. This will initiate a request to have the page of the application being used read aloud and send the reading request to the processing module.
[0089] For example, in Figure 5 In this context, when users are using weather applications, they can trigger a reading request on their electronic devices by pulling down the control center and clicking the screen reading icon in the control center, thereby enabling the reading function of the current page and the page after scrolling in microblogging or weather applications.
[0090] The triggering operation for the reading request in this application embodiment is merely an example and is not intended to limit this application. Any method that triggers an electronic device to simply call operating system functions to read aloud the application's page falls within the technical scope of this application. The microblogging or weather applications listed in this application embodiment are merely examples and are not intended to limit this application. The page reading method provided in this application embodiment can be applied to any application installed on an electronic device.
[0091] The page reading method provided in this application is implemented by the operating system of the electronic device and is independent of the function of the application itself, which differs from the reading function built into the application in the prior art. It should be noted that this application embodiment does not require any processing from the application and is applicable not only to applications with embedded reading functions but also to applications without embedded reading functions.
[0092] S402, The processing module sends a reading start command to the reading platform.
[0093] The processing module can receive various task requests sent to electronic devices. When the processing module analyzes and determines that the received request is for reading aloud, it generates a reading-to-start command and sends the command to the reading-to-start platform. The reading-to-start command instructs the reading-to-start platform to initiate the process of reading aloud the application page.
[0094] S403, The reading platform sends a category library call request to the server.
[0095] The category library stores the reading methods for various application categories. After a reading process is completed, the reading platform stores the category library within a preset period. When receiving the next reading request, the reading platform first checks if it has the category library stored. If it does, steps S403-S404 are unnecessary. If the service does not store the category library, steps S403-S404 are executed to retrieve it from the server. In other words, steps S403-S404 are optional. For the same electronic device, the preset period can be fixed or dynamically adjusted based on user behavior. For different electronic devices, the preset period can be the same or different.
[0096] S404. The server sends the category library to the reading platform.
[0097] Each category of application includes both unique page layouts and general layouts. The text within a layout is related to each other, while the text in different layouts is independent. The server identifies one or more layouts included in each application page. Each layout includes one or more of the following information: text, punctuation, icons, and positions. The server sets the reading mode for the corresponding layout based on the relationships between the text, punctuation, icons, and positions within it, and generates a category library based on the layout, reading mode, and category.
[0098] Taking the server's construction of a category library for microblogging applications as an example, if the server recognizes a layout pattern related to "hot topics" or "icons" on the page, it considers the content corresponding to that layout pattern to be a microblogging hot topic. The reading method is set to pause for 1 second between each two hot topics, and after reading all the hot topics, it ends with "These contents are current hot topics." For weather applications, if the server recognizes a layout pattern related to "temperature range" on the page, the corresponding reading method starts with "lowest temperature" or "highest temperature," and then reads the corresponding temperature value. For example, for a temperature range of 15-22℃, the result is read as "lowest temperature 15℃, highest temperature 22℃." If the weather application page recognizes a "single temperature" but no corresponding time, the corresponding reading method starts with "current temperature," and then reads the corresponding temperature value. If it contains both a "single temperature" and a corresponding time, the corresponding reading method starts with the corresponding time, and then reads the corresponding temperature value. It should be noted that an application may include multiple pages, and a page may include multiple layouts. Therefore, an application of a certain category may have multiple records in the category library.
[0099] As an example, Table 1 is a category library built for microblogging and weather apps.
[0100] Table 1
[0101]
[0102] Table 1 includes three fields: layout method, reading method, and category. Text in different layout methods has independent relationships, while text within the same layout method has related relationships. The reading method is set for the corresponding layout method based on the relationships between text, punctuation, icons, and positions within that layout method. Since microblogging and weather apps include both their unique layout methods and general layout methods, Table 1 includes a general category in addition to weather and microblogging data records. When reading aloud a microblogging or weather app based on Table 1, the general category is called in addition to the microblogging and weather categories. For example, for a microblogging app, if the page includes "hot-icon" and "space," the data from both the microblogging and general categories is called to read the page aloud according to their respective reading methods.
[0103] The constructed category library is stored on the server. When a category library access request is received from the reading platform, the category library is sent to the reading platform. It should be noted that the category library in Table 1 is only an example and is not intended to limit this application. This application can be applied to any application installed on an electronic device, and the number of unique typesetting methods included in each application is not limited to the number mentioned above. The fields in the category library are not limited to typesetting method, reading method, and category. Those skilled in the art can add or modify fields according to actual needs.
[0104] S405, The reading platform sends a screenshot request to the processing module.
[0105] The reading platform sends a screenshot request to the processing module, requesting the processing module to take a screenshot of the application's page.
[0106] S406 The processing module calls the screenshot function to obtain the first image of the application page.
[0107] By calling the operating system's screenshot function, the first image of the application page is obtained. The first image is a screenshot of the page content displayed by the application when the user initiates a text-to-speech request.
[0108] S407, The processing module sends the first image to the reading platform.
[0109] After obtaining the first image, the processing module responds to the screenshot request by sending the captured first image to the text-to-speech platform.
[0110] S408, The reading platform recognizes the first image and obtains the information of the first page;
[0111] The reading platform recognizes the first image and obtains information from the first page, including one or more of the following: text, punctuation marks, icons, and location.
[0112] S409. Based on the information on the first page, determine the category of the application; through the category library, determine the first layout method and the corresponding reading method according to the category.
[0113] In one implementation, the text-to-speech platform obtains a second layout method based on the information from the first page, and then determines the application's category based on this second layout method using a category library. The second layout method is a partial page layout, which can be the top, bottom, left, right, or middle area of the page. For example, a partial page could be the top area containing icons, corresponding to the second layout method "Hot-Icons." Another example is a partial page could be the middle area of the first image containing temperature information, corresponding to the second layout method "Temperature Range." After recognizing the second layout method, the text-to-speech platform determines the application's category by calling the category library. For example, when the second layout method is "Hot-Icons," the application's category is determined to be "Weibo"; when the second layout method is "Temperature Range," the application's category is determined to be "Weather."
[0114] As one implementation method, the text-to-speech platform extracts keywords from the information on the first page and determines the application category based on these keywords. For example, if the first page information includes the keyword "Weibo trending topics," the application can be identified as a Weibo-related application based on this keyword.
[0115] After determining the application category, the text-to-speech platform calls the category library to determine the first layout method and its corresponding text-to-speech method based on the category. The first layout method includes all unique and general layout methods corresponding to that application category, covering the entire area of the page corresponding to the first image. There can be one or more first layout methods, and each first layout method corresponds to one text-to-speech method. The first layout method can contain one or more unique layout methods, and it can also contain one or more general layout methods. The second layout method is at least one of the unique layout methods contained in the first layout method.
[0116] S410. The reading platform breaks down the information on the first page according to the first layout method to obtain the first segment corresponding to the first layout method.
[0117] Decomposing the information on the first page means breaking it down into segments corresponding to a first layout style. The smallest unit of decomposition is the layout style, and each segment corresponds to one layout style. A segment includes one or more of the following: text, punctuation, icons, and positions. There can be one or more segments, and one layout style can correspond to one or more segments. Since one layout style corresponds to one reading style, the reading style for each segment is determined based on the layout style corresponding to that segment.
[0118] It should be noted that there can be multiple first layout methods, and only some of these first layout methods may have corresponding first segments. As an example, the first layout methods include "temperature range", "single temperature", and "with spaces". The resulting first segments include those corresponding to "temperature range" and "with spaces", but not those corresponding to "single temperature".
[0119] For example, in Figure 5 In the process, after determining that the current application is a weather application, the reading platform identifies multiple first layout methods for weather applications through the category library. After decomposing the information on the first page according to the first layout method, it can obtain multiple first segments represented by boxes corresponding to the first layout method. Each first segment can include text, punctuation marks, icons, and positions.
[0120] S411, The reading platform sends a page control relationship request to the processing module.
[0121] Page control relationships refer to the parent-child nesting relationships and parallel relationships between page controls. The reading platform sends page control relationship requests to the processing module to obtain the corresponding page control relationships and uses the page control relationships for reading, which helps to read according to the association and independence relationships between text.
[0122] S412, The processing module obtains the page control relationships of the application.
[0123] The processing module analyzes the page controls of the application to obtain the relationships between them. The method used to obtain the relationships between the page controls is existing technology and will not be described in detail here.
[0124] S413, The processing module sends the page control relationships to the reading platform.
[0125] After obtaining the page control relationships, the processing module sends them to the reading platform.
[0126] S414. The reading platform reads the first segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or reads the first segment aloud according to the reading method corresponding to the first layout method.
[0127] It should be noted that steps S411-S413 related to obtaining page control relationships are optional. If steps S411-S413 are not executed, the reading platform in step S414 will read the first segment according to the reading method corresponding to the first layout method. If steps S411-S413 are executed, the reading platform in step S414 will read the first segment according to the page control relationships and the reading method corresponding to the first layout method.
[0128] Page control relationships refer to the parent-child nesting and parallel relationships between page controls. When reading the first segment aloud based on the page control relationships and the reading method corresponding to the first layout, the reading platform determines the main page to be read based on the page control relationships and informs the user of the current page position through the reading method. For example, a microblogging application may have multiple parallel parent pages such as Home, Videos, Discover, Messages, and "Me". If the user is using a microblogging application, the parent page in the page control relationships is "Discover". When the reading platform starts reading the first page, it informs the user that "you are currently on the 'Discover' channel page of the microblogging application, and the content of this channel page will be read below". After reading the relevant information on the parent page, the first segment of each control is read in turn, from top to bottom and from left to right. Specifically, for a control, the position of the control is read first, and then one or more first segments within the control are read in turn, from top to bottom and from left to right, according to the reading method corresponding to the first layout method. There can be a pause between different first segments with a reserved time, which can be 1 second, 2 seconds, or other durations, without specific limitations.
[0129] When the first segment is read aloud according to the reading method corresponding to the first layout, the reading platform reads one or more of the first segments in sequence from top to bottom and from left to right, according to the reading method corresponding to the first layout.
[0130] Because the text in different layouts has independent relationships, and the text in the same layout has related relationships, the reading mode corresponding to each layout is set according to the relationship between the text, punctuation, icons and positions in that layout. Therefore, after the corresponding segments are obtained by breaking them down according to the layout, the segments are read aloud according to the reading mode corresponding to the layout. This allows for reading based on the relationship and independence between the text, making it easier for users to understand the meaning of the reading content.
[0131] S415, The reading platform sends a page scrolling request to the processing module.
[0132] If the full page content of the application needs to be displayed by swiping, the text-to-speech platform obtains the first image of the page after obtaining it, and then sends a page swipe request to obtain the second image of the page. The page content corresponding to the second image is not exactly the same as the page content corresponding to the first image.
[0133] S416, The processing module sends a page swiping instruction to the application.
[0134] After receiving the page swiping request from the reading platform, the processing module sends a page swiping instruction to the application to instruct the application to swipe the page.
[0135] S417. In response to a page swipe command, the application swipes the page.
[0136] The content displayed after swiping may overlap with the content corresponding to the first image, or it may be completely different. If the content displayed after swiping is completely different from the content corresponding to the first image, the content displayed after swiping is continuous with the content corresponding to the first image.
[0137] If the complete page content has already been displayed by swiping during the reading process, or if the page content corresponding to the first image is already complete and there is no need to swipe to display the complete page content again, then the application does not respond to the swipe command, or responds to the page swipe command by sending information that the page content has been displayed to the processing module. The processing module forwards the information that the page content has been displayed or the no-response feedback information to the reading platform. The reading platform stops sending swipe requests and performs the reading operation based on the image that has already been acquired.
[0138] It should be noted that the sliding in this application includes not only sliding down, but also sliding up, sliding left, and sliding right.
[0139] S418, The reading platform sends a screenshot request to the processing module.
[0140] S419. The processing module calls the screenshot function to obtain the second image of the application page.
[0141] S420: The processing module sends the second image to the reading platform.
[0142] The processing method for steps S418-S420 is the same as that for steps S405-S407, and will not be described in detail here.
[0143] S421. The reading platform recognizes the second image and obtains the information for the second page;
[0144] The second image is a screenshot of the application page after the user initiates a reading request. The reading platform recognizes the second image in the same way as the first image, and the information obtained from the second page includes one or more of the following: text, punctuation marks, icons, and location.
[0145] Since the page content corresponding to the second image may overlap with the page content corresponding to the first image, the reading platform can crop the second image before recognizing it to remove the areas that overlap between the second and first images, keeping only the areas that do not overlap with the first image in the second image.
[0146] S422. The reading platform breaks down the information on the second page according to the first layout method to obtain the second segment corresponding to the first layout method.
[0147] The second segment includes one or more of the following information: text, punctuation marks, icons, and location. There can be one or more second segments. The processing method for step S422 is the same as that for step S410, and will not be described in detail here.
[0148] S423. The reading platform reads the second segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or reads the second segment aloud according to the reading method corresponding to the first layout method.
[0149] The processing method of step S423 is the same as that of step S414, and will not be described in detail here.
[0150] After executing step S423, the text-to-speech platform repeats steps S415-S423 until the complete page content is read aloud. In other words, each execution of steps S415-S423 yields one second image. After obtaining the complete page content, the total number of second images may be one or more.
[0151] The operating system's processing module displays the remaining or all of the page content by controlling the swiping of the application page, and captures the page content after swiping by taking a screenshot. The text-to-speech platform then reads the page content aloud based on the captured image. This allows users to swipe the page and read the page content aloud without performing any hand gestures, making it easier for users to understand more page content information.
[0152] As one implementation method, after the segment corresponding to the previous image has been read aloud, steps S415-S423 can be executed to obtain and read aloud the segment corresponding to the next image, so as to ensure that the read aloud content and the displayed content are consistent.
[0153] As one implementation method, steps S415-S422 can be executed before the segment corresponding to the previous image is finished being read, so that the segment corresponding to the next image can be read without pausing when the segment corresponding to the previous image is finished, thereby ensuring the continuity of reading.
[0154] As one implementation method, the text-to-speech platform can, after acquiring all images of the complete page content, recognize all images to obtain corresponding first and second page information. Based on one of the page information, it determines the application category. Using a category library, it determines a first layout method and a corresponding reading method based on the category. The first and second page information are then broken down according to the first layout method to obtain segments corresponding to that layout. These segments are then read aloud based on the page control relationships and the corresponding reading method. The second page information can be one or multiple. During the process of breaking down the first and second page information according to the first layout method, the text-to-speech platform needs to merge duplicate content between the first and second page information, or multiple second page information, and adjust their order to obtain the corresponding segments according to the top-to-bottom and left-to-right order corresponding to the complete page content. These segments are then read aloud according to this top-to-bottom and left-to-right order to ensure the continuity of the reading content.
[0155] In this embodiment, the texts in different layouts have independent relationships, while the texts in the same layout have related relationships. By breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout, the texts can be read aloud based on the related and independent relationships between them, making it easier for users to understand the meaning of the read content.
[0156] Furthermore, the operating system displays the remaining or all of the page content by controlling the swiping of the application page, thereby enabling the capture of the swiped page content through screenshots. The system can then read aloud the swiped page content based on the captured images, allowing users to swipe the page and read the swiped page content without performing any hand gestures, thus facilitating users to understand more page content information.
[0157] The reading method provided in this embodiment can be applied to scenarios such as driving, enabling users to understand the page content information of the application when it is inconvenient for them to look away from the screen of the electronic device or to perform operations on the electronic device through hand gestures.
[0158] To better understand the technical solution of this application, the following section provides a detailed description of the page reading method applied to the reading platform, in conjunction with the accompanying drawings.
[0159] See Figure 6 This application provides a page reading method, applied to a reading platform, which includes the following steps:
[0160] S601. Obtain the category library.
[0161] The category library is used to store the reading methods for various application categories. After a reading process is completed, the reading platform stores the category library within a preset period. When the reading platform receives the next reading request, it first checks whether it has stored the category library. If not, it needs to send a category library call request to the server. After receiving the call request, the server returns the category library to the reading platform.
[0162] S602, Get the first image of the application page.
[0163] The first image is a screenshot of the page content displayed by the application when the user initiates a text-to-speech request. The first image was obtained by calling the operating system's screenshot function.
[0164] S603. Recognize the first image to obtain the information of the first page.
[0165] The information on the first page includes one or more of the following: text, punctuation marks, icons, and location.
[0166] S604. Based on the information on the first page, determine the category of the application; through the category library, determine the first layout method and the corresponding reading method according to the category.
[0167] As one implementation method, the text-to-speech platform obtains a second layout method based on the information from the first page, and then determines the application's category based on the second layout method using a category library. The second layout method is a layout method for a portion of the page; this portion can be the top, bottom, left, right, or center area of the page.
[0168] As one implementation method, the reading platform extracts keywords from the information on the first page and determines the category of the application based on the keywords.
[0169] After determining the application category, the text-to-speech platform calls the category library to determine the first layout method and its corresponding text-to-speech method based on the category. The first layout method includes all unique and general layout methods corresponding to that application category, covering the entire area of the page corresponding to the first image. There can be one or more first layout methods, and each first layout method corresponds to one text-to-speech method. The first layout method can contain one or more unique layout methods, and it can also contain one or more general layout methods. The second layout method is at least one of the unique layout methods contained in the first layout method.
[0170] S605. Decompose the information on the first page according to the first layout method to obtain the first segment corresponding to the first layout method.
[0171] The first segment includes one or more of the following information: text, punctuation marks, icons, and location. There can be one or more first segments. Each first segment corresponds to a layout, and each layout can correspond to one or more first segments.
[0172] S606, Obtain the relationship between page controls.
[0173] Page control relationships refer to the parent-child nesting relationships and parallel relationships between page controls. The reading platform obtains page control relationships by sending page control relationship requests to the processing module.
[0174] S607. Read the first segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or read the first segment aloud according to the reading method corresponding to the first layout method.
[0175] It should be noted that step S606, which involves obtaining the page control relationships, is optional. If step S606 is not executed, the reading platform in step S607 will read the first segment according to the reading method corresponding to the first layout method. If step S606 is executed, the reading platform in step S607 will read the first segment according to the page control relationships and the reading method corresponding to the first layout method.
[0176] When the first segment is read aloud according to the page control relationships and the reading method corresponding to the first layout, the reading platform determines the main body of the page to be read based on the page control relationships and informs the user of the page position through the reading method. When the first segment is read aloud according to the reading method corresponding to the first layout, the reading platform reads one or more segments of the first segment sequentially according to the reading method corresponding to the first layout, in a top-to-bottom and left-to-right order.
[0177] S608, Get the second image of the application page.
[0178] The second image corresponds to the page content that was not displayed when the user initiated the reading request, either in its entirety or in part. The second image is a screenshot of the page after the operating system controls the application's page to slide. There can be one or more second images. The reading platform recognizes the second image in the same way as the first image, obtaining information such as text, punctuation, icons, and location. The page content corresponding to the first and second images may or may not constitute the complete content of the application page.
[0179] Since the page content corresponding to the second image may overlap with the page content corresponding to the first image, the reading platform can crop the second image before recognizing it to remove the areas that overlap between the second and first images, keeping only the areas that do not overlap with the first image in the second image.
[0180] S609. Recognize the second image to obtain the information of the second page.
[0181] S610. Decompose the information on the second page according to the first layout method to obtain the second segment corresponding to the first layout method.
[0182] The second segment includes one or more of the following information: text, punctuation marks, icons, and location. Steps S609-S610 process the second image and the corresponding second page information in the same way as the first image and the first page information.
[0183] S611. Read the second segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or read the second segment aloud according to the reading method corresponding to the first layout method.
[0184] It should be noted that step S606, which involves obtaining the page control relationships, is optional. If step S606 is not executed, in step S611, the reading platform reads the second segment aloud according to the reading method corresponding to the first layout method. If step S606 is executed, in step S611, the reading platform reads the second segment aloud according to both the page control relationships and the reading method corresponding to the first layout method. The reading platform reads the second segment aloud in the same way as the first segment.
[0185] As one implementation method, after the segment corresponding to the previous image has been read aloud, steps S608-S611 can be executed to obtain and read aloud the segment corresponding to the next image, so as to ensure that the read aloud content and the displayed content are consistent.
[0186] As one implementation method, steps S608-S611 can be executed before the segment corresponding to the previous image is finished being read, so that the segment corresponding to the next image can be read without pausing when the segment corresponding to the previous image is finished, thereby ensuring the continuity of reading.
[0187] In this embodiment, the texts in different layouts have independent relationships, while the texts in the same layout have related relationships. By breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout, the texts can be read aloud based on the related and independent relationships between them, making it easier for users to understand the meaning of the read content.
[0188] Furthermore, in this embodiment, the operating system displays the remaining part or all of the page content by controlling the swiping of the application page, thereby enabling the capture of the swiped page content through screenshots. The captured images are then used to read aloud the swiped page content, allowing users to swipe the page and read the swiped page content without performing any hand gestures, thus facilitating users to understand more page content information.
[0189] The reading method provided in this embodiment can be applied to scenarios such as driving, enabling users to understand the page content information of the application when it is inconvenient for them to look away from the screen of the electronic device or to perform operations on the electronic device through hand gestures.
[0190] See Figure 7 This application provides a page reading method, applied to a reading platform, which includes the following steps:
[0191] S701. Obtain the category library.
[0192] S702, Get the first and second images of the application page.
[0193] The first image is a screenshot of the page content displayed by the application when the user initiates a text-to-speech request. The second image is a screenshot of the page after the user swipes. The page content corresponding to the second image is the portion or all of the page content that was not displayed when the user initiated the text-to-speech request. Both the first and second images were obtained by calling the operating system's screenshot function. There is one first image, and there may be one or more second images. The page content corresponding to the first and second images may or may not constitute the complete content of the application page.
[0194] S703. Recognize the first and second images to obtain page information.
[0195] Page information includes one or more of the following: text, punctuation marks, icons, and location.
[0196] S704. Based on the page information, determine the application category; using the category library, determine the first layout method and the corresponding reading method based on the category.
[0197] As one implementation method, the text-to-speech platform obtains a second layout method based on the page information and determines the application's category based on this second layout method using a category library. The second layout method is a layout method for a portion of the page; this portion can be the top, bottom, left, right, or center area of the page.
[0198] As one implementation method, the text-to-speech platform extracts keywords from the page information and determines the application category based on the keywords.
[0199] After determining the application category, the text-to-speech platform calls the category library to determine the first layout method and its corresponding text-to-speech method based on the category. The first layout method includes all unique and general layout methods corresponding to that application category, covering the entire area of the page corresponding to the first image. There can be one or more first layout methods, and each first layout method corresponds to one text-to-speech method. The first layout method can contain one or more unique layout methods, and it can also contain one or more general layout methods. The second layout method is at least one of the unique layout methods contained in the first layout method.
[0200] S705. Decompose the page information according to the first layout method to obtain the segment corresponding to the first layout method.
[0201] A fragment includes one or more of the following information: text, punctuation marks, icons, and location. There can be one or more fragments. Each fragment corresponds to a layout style, and each layout style can correspond to one or more fragments.
[0202] S706, Get page control relationships.
[0203] S707. Read the passage aloud according to the page control relationship and the reading method corresponding to the first typesetting method, or read the passage aloud according to the reading method corresponding to the first typesetting method.
[0204] It should be noted that step S706, which involves obtaining the page control relationships, is optional. If step S706 is not executed, the reading platform in step S707 will read the segment aloud according to the reading method corresponding to the first layout method. If step S706 is executed, the reading platform in step S707 will read the segment aloud according to the page control relationships and the reading method corresponding to the first layout method.
[0205] When the first segment is read aloud according to the page control relationships and the reading method corresponding to the first layout, the reading platform determines the main body of the page to be read based on the page control relationships and informs the user of the page position through the reading method. When the first segment is read aloud according to the reading method corresponding to the first layout, the reading platform reads one or more segments of the first segment sequentially according to the reading method corresponding to the first layout, in a top-to-bottom and left-to-right order.
[0206] In this embodiment, the texts in different layouts have independent relationships, while the texts in the same layout have related relationships. By breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout, the texts can be read aloud based on the related and independent relationships between them, making it easier for users to understand the meaning of the read content.
[0207] In this embodiment, the texts in different layouts have independent relationships, while the texts in the same layout have related relationships. By breaking down page information according to the layout and reading the segments according to the reading method corresponding to the layout, the texts can be read aloud based on the related and independent relationships between them, making it easier for users to understand the meaning of the read content.
[0208] The reading method provided in this embodiment can be applied to scenarios such as driving, enabling users to understand the page content information of the application when it is inconvenient for them to look away from the screen of the electronic device or to perform operations on the electronic device through hand gestures.
[0209] The method for obtaining the relationship between the category library and page controls in this embodiment can be found in [reference needed]. Figure 6 The relevant steps of the method shown are as follows.
[0210] See Figure 8 This application provides a page reading device 800, comprising:
[0211] The first acquisition module 801 is used to acquire the category library.
[0212] The category library is used to store the reading methods for various application categories. After a reading process is completed, the reading platform stores the category library within a preset period. When the reading platform receives the next reading request, it first checks whether it has the category library stored. If not, it needs to send a category library call request to the server. After receiving the call request, the server returns the category library to the reading platform.
[0213] The second acquisition module 802 is used to acquire the first image of the application page.
[0214] The first image is a screenshot of the page content displayed by the application when the user initiates a text-to-speech request. The first image was obtained by calling the operating system's screenshot function.
[0215] The first recognition module 803 is used to recognize the first image and obtain the first page information.
[0216] The information on the first page includes one or more of the following: text, punctuation marks, icons, and location.
[0217] The first determining module 804 is used to determine the category of the application based on the information on the first page; and to determine the first layout method and the corresponding reading method based on the category of the application through the category library.
[0218] As one implementation method, the text-to-speech platform obtains a second layout method based on the information from the first page, and then determines the application's category based on the second layout method using a category library. The second layout method is a layout method for a portion of the page; this portion can be the top, bottom, left, right, or center area of the page.
[0219] As one implementation method, the reading platform extracts keywords from the information on the first page and determines the category of the application based on the keywords.
[0220] After determining the application category, the text-to-speech platform calls the category library to determine the first layout method and its corresponding text-to-speech method based on the category. The first layout method includes all unique and general layout methods corresponding to that application category, covering the entire area of the page corresponding to the first image. There can be one or more first layout methods, and each first layout method corresponds to one text-to-speech method. The first layout method can contain one or more unique layout methods, and it can also contain one or more general layout methods. The second layout method is at least one of the unique layout methods contained in the first layout method.
[0221] As one implementation method, the text-to-speech platform obtains a second layout method based on the information from the first page, and then determines the application's category based on the second layout method using a category library. The second layout method is a layout method for a portion of the page; this portion can be the top, bottom, left, right, or center area of the page.
[0222] As one implementation method, the reading platform extracts keywords from the information on the first page and determines the category of the application based on the keywords.
[0223] The first disassembly module 805 is used to disassemble the information of the first page according to the first layout method to obtain the first segment corresponding to the first layout method.
[0224] The third acquisition module 806 is used to acquire the relationships between page controls.
[0225] The first reading module 807 is used to read the first segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or to read the first segment aloud according to the reading method corresponding to the first layout method.
[0226] The third acquisition module 806 is optional. When the page reading device 800 includes the third acquisition module 806, the first reading module 807 is used to read the first segment according to the page control relationship and the reading method corresponding to the first layout method. When the page reading device 800 does not include the third acquisition module 806, the first reading module 807 is used to read the first segment according to the reading method corresponding to the first layout method.
[0227] The fourth acquisition module 808 is used to acquire the second image of the application page.
[0228] The second image is a screenshot of the application page after the user initiates a reading request, showing the page after scrolling. There can be one or more second images. The reading platform recognizes the second image in the same way as the first image, and the resulting second page information includes one or more of the following: text, punctuation marks, icons, and location information.
[0229] The second recognition module 809 is used to recognize the second image and obtain the second page information.
[0230] The second disassembly module 810 is used to disassemble the information of the second page according to the first layout method to obtain the second segment corresponding to the first layout method.
[0231] The second reading module 811 is used to read the second segment aloud according to the page control relationship and the reading method corresponding to the first layout method, or to read the second segment aloud according to the reading method corresponding to the first layout method.
[0232] When the page reading device 800 includes the third acquisition module 806, the second reading module 811 is used to read the second segment according to the page control relationship and the reading method corresponding to the first layout method. When the page reading device 800 does not include the third acquisition module 806, the second reading module 811 is used to read the second segment according to the reading method corresponding to the first layout method.
[0233] It should be noted that, Figure 8 For detailed implementation processes of each module shown, please refer to [link / reference]. Figure 6 The relevant details of each step in the method shown.
[0234] See Figure 9 This application provides a page reading device 900, comprising:
[0235] The first acquisition module 901 is used to acquire the category library.
[0236] The second acquisition module 902 is used to acquire the first and second images of the application page.
[0237] The first image is a screenshot of the page content displayed by the application when the user initiates a text-to-speech request. The second image is a screenshot of the application page after the user initiates the text-to-speech request and the page is scrolled. Both images were obtained by calling the operating system's screenshot function. There is one first image and one or more second images. The page content corresponding to the first and second images constitutes the complete content of the application page.
[0238] The recognition module 903 is used to recognize the first image and the second image to obtain page information.
[0239] Page information includes one or more of the following: text, punctuation marks, icons, and location.
[0240] The determination module 904 is used to determine the category of the application based on the page information; and to determine the first layout method and the corresponding reading method based on the category of the application through the category library.
[0241] As one implementation method, the text-to-speech platform obtains a second layout method based on the page information and determines the application's category based on this second layout method using a category library. The second layout method is a layout method for a portion of the page; this portion can be the top, bottom, left, right, or center area of the page.
[0242] As one implementation method, the text-to-speech platform extracts keywords from the page information and determines the application category based on the keywords.
[0243] The disassembly module 905 is used to disassemble the page information according to the first layout method to obtain the segment corresponding to the first layout method.
[0244] The third acquisition module 906 is used to acquire the relationships between page controls.
[0245] The reading module 907 is used to read aloud a segment according to the page control relationship and the reading method corresponding to the first layout method, or according to the reading method corresponding to the first layout method.
[0246] The third acquisition module 906 is optional. When the page reading device 900 includes the third acquisition module 906, the reading module 907 is used to read the segment according to the page control relationship and the reading method corresponding to the first layout method. When the page reading device 900 does not include the third acquisition module 906, the reading module 907 is used to read the segment according to the reading method corresponding to the first layout method.
[0247] Figure 9 For detailed implementation processes of each module, please refer to [link / reference]. Figure 7 The relevant details of each step in the method shown.
[0248] This application provides a reading platform, including... Figure 7 or Figure 8 The apparatus shown.
[0249] Please see Figure 10 This is an example diagram of an embodiment of the computing device in this application.
[0250] The computing device provided in this embodiment can be a processor, a server, or a dedicated data processing device, etc. The specific form of the device is not limited in this embodiment.
[0251] The computing device 1000 may vary considerably due to different configurations or performance, and may include one or more processors 1001 and memory 1002, in which programs or data are stored.
[0252] The memory 1002 can be volatile or non-volatile memory. Optionally, the processor 1001 is one or more central processing units (CPUs), graphics processing units (GPUs), or other dedicated processors, such as Ascend. The CPU can be a single-core CPU or a multi-core CPU. The processor 1001 can communicate with the memory 1002 and execute a series of instructions stored in the memory 1002 on the computing device 1000.
[0253] The computing device 1000 also includes one or more wired or wireless network interfaces 1003, such as Ethernet interfaces.
[0254] Optionally, although Figure 10 As not shown in the diagram, the computing device 1000 may also include one or more power supplies; one or more input / output interfaces, which can be used to connect to a monitor, mouse, keyboard, touch screen device or sensing device, etc. The input / output interfaces are optional components and may or may not be present, and are not limited here.
[0255] In this embodiment, the memory 1002 in the computing device 1000 stores a computer program. When the processor 1001 executes the computer program, the execution flow can refer to the method flow described in the foregoing method embodiment, and will not be repeated here.
[0256] The above embodiments can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, they can be implemented in whole or in part in the form of a computer program product.
[0257] A computer program product includes one or more computer instructions. When the computer program product runs on a processor, the computer loads and executes the computer execution instructions, producing all or part of the processes or functions of the embodiments of this application. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
[0258] Computer-readable storage media can be any usable medium that a computer can store, or a data storage device such as a server or data center that integrates one or more usable media. The usable medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)). The computer-readable storage medium stores a computer program that, when executed on a processor, produces all or part of the processes or functions described in the embodiments of this application.
[0259] The technical solutions provided in this application have been described in detail above. Specific examples have been used in this application to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method for reading a page aloud, characterized in that, include: Obtain the category library; the category library includes layout style, reading style and category; the text in different layout styles has independent relationships, and the text included in the same layout style has related relationships; Get the first image of the application page; the first image is a screenshot of the page content displayed by the application. The first image is identified to obtain the first page information; the first page information includes one or more of the following: text, punctuation marks, icons, and locations. Based on the information on the first page, determine the application category; The first typesetting method and the corresponding reading method are determined based on the category of the application using the category library; The information on the first page is decomposed according to the first layout method to obtain the first segment corresponding to the first layout method; Read the first passage aloud according to the reading method corresponding to the first typesetting method.
2. The method according to claim 1, characterized in that, Before reading the first segment aloud according to the reading method corresponding to the first typesetting method, the method further includes: Get the relationships between page controls; The first segment is read aloud based on the relationships between page controls.
3. The method according to claim 1 or 2, characterized in that, Also includes: Get the second image of the application page; The second image is a screenshot of the page content after the operating system controls the application page to slide; The second image is identified to obtain second page information; the second page information includes one or more of the following: text, punctuation marks, icons, and location information. The information on the second page is decomposed according to the first layout method to obtain the second segment corresponding to the first layout method; The second segment is read aloud according to the reading method corresponding to the first layout method, or according to the page control relationship and the reading method corresponding to the first layout method.
4. The method according to any one of claims 1-3, characterized in that, The step of determining the application category based on the information on the first page specifically includes: The second layout method is obtained based on the information on the first page, and the category of the application is determined based on the second layout method through the category library; the second layout method is the layout method of some pages.
5. The method according to any one of claims 1-3, characterized in that, The step of determining the application category based on the information on the first page specifically includes: Extract keywords from the information on the first page and determine the application category based on the keywords.
6. A method for reading a page aloud, characterized in that, include: Obtain the category library; the category library includes layout style, reading style and category; the text in different layout styles has independent relationships, and the text included in the same layout style has related relationships; Get the first and second images of the application page; the first image is a screenshot of the page content displayed by the application. The second image is a screenshot of the page content after the operating system controls the application page to slide; The first and second images are identified to obtain page information; the page information includes one or more of the following: text, punctuation marks, icons, and locations. Based on the page information, determine the application category; The first typesetting method and the corresponding reading method are determined based on the category of the application using the category library; The page information is broken down according to the first layout method to obtain the segment corresponding to the first layout method; Read the passage aloud according to the reading method corresponding to the first typesetting method.
7. The method according to claim 6, characterized in that, Before reading the passage aloud according to the reading method corresponding to the first typesetting method, the method further includes: Get the relationships between page controls; The passage is read aloud based on the relationships between page controls.
8. The method according to claim 6 or 7, characterized in that, The process of determining the application category based on page information specifically includes: The second layout method is obtained based on the page information, and the application category is determined based on the second layout method through the category library; the second layout method is the layout method of some pages.
9. The method according to claim 6 or 7, characterized in that, The step of determining the application category based on the information on the first page specifically includes: Extract keywords from the page information and determine the application category based on the keywords.
10. A page-reading device, characterized in that, include: The first acquisition module is used to acquire the category library; the category library includes layout methods, reading methods, and text with independent relationships between different layout methods, while text included in the same layout method has a related relationship. The second acquisition module is used to acquire the first image of the application page; the first image is a screenshot of the page content displayed by the application. The first recognition module is used to recognize the first image to obtain the first page information; the first page information includes one or more of text, punctuation marks, icons, and locations; The first determining module is used to determine the category of the application based on the information on the first page; The first typesetting method and the corresponding reading method are determined based on the category of the application using the category library; The first decomposition module is used to decompose the information of the first page according to the first layout method to obtain the first segment corresponding to the first layout method; The first reading module is used to read the first segment aloud according to the reading method corresponding to the first layout method.
11. The apparatus according to claim 10, characterized in that, Also includes: The third acquisition module is used to acquire the relationships between page controls; The first reading module is also used to read the first segment aloud according to the relationship between page controls.
12. The apparatus according to any one of claims 10-11, characterized in that, Also includes: The fourth acquisition module is used to acquire the second image of the application page; The second image is a screenshot of the page content after the application page has been swiped; The second recognition module is used to recognize the second image to obtain the second page information; the second page information includes one or more of the following: text, punctuation marks, icons, and locations. The second decomposition module is used to decompose the information of the second page according to the first layout method to obtain the second segment corresponding to the first layout method; The second reading module is used to read the second segment aloud according to the reading method corresponding to the first layout method, or according to the page control relationship and the reading method corresponding to the first layout method.
13. The apparatus according to any one of claims 10-12, characterized in that, The step of determining the application category based on the information on the first page specifically includes: The second layout method is obtained based on the information on the first page, and the category of the application is determined based on the second layout method through the category library; the second layout method is the layout method of some pages.
14. The apparatus according to any one of claims 10-13, characterized in that, The step of determining the application category based on the information on the first page specifically includes: Extract keywords from the information on the first page and determine the application category based on the keywords.
15. A page-reading device, characterized in that, include: The first acquisition module is used to acquire a category library; the category library includes categories of typesetting methods, reading methods and applications. The text in different typesetting methods has an independent relationship, and the text included in the same typesetting method has an associated relationship. The second acquisition module is used to acquire a first image and a second image of the application page; the first image is a screenshot of the page content displayed by the application. The second image is a screenshot of the page content after the application page has been swiped; The recognition module is used to recognize the first image and the second image to obtain page information; the page information includes one or more of the following: text, punctuation marks, icons, and locations. The determination module is used to determine the category of the application based on the page information; The first typesetting method and the corresponding reading method are determined based on the category of the application using the category library; The decomposition module is used to decompose the page information according to the first layout method to obtain the segment corresponding to the first layout method; The reading module is used to read aloud a passage according to the reading method corresponding to the first layout method.
16. The apparatus according to claim 15, characterized in that, Also includes: The third acquisition module is used to acquire the relationships between page controls; The reading module is also used to read aloud segments based on the relationships between page controls.
17. The apparatus according to claim 15 or 16, characterized in that, The process of determining the application category based on page information specifically includes: The second layout method is obtained based on the information on the first page, and the category of the application is determined based on the second layout method through the category library; the second layout method is the layout method of some pages.
18. The apparatus according to claim 15 or 16, characterized in that, The process of determining the application category based on page information specifically includes: Extract keywords from the page information and determine the application category based on the keywords.
19. A reading platform, characterized in that, Includes the apparatus according to any one of claims 10-18.
20. A computing device, characterized in that, include: At least one memory for storing programs; At least one processor for executing the program stored in the memory; When the program stored in the memory is executed, the processor is used to execute the method as described in any one of claims 1-9.
21. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when run on a processor, causes the processor to perform the method as described in any one of claims 1-9.
22. A computer program product, characterized in that, When the computer program product is run on a processor, the processor causes the processor to perform the method as described in any one of claims 1-9.