An audio time-domain waveform plot drawing method, system and storage medium
By generating waveform data files and combining them with sliding controls and content views, the system overhead and cross-platform incompatibility issues of drawing waveforms of large audio files on smart mobile devices are solved, achieving cross-platform universality and rich audio visualization effects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- KUNMING LINGFEI TECH CO LTD
- Filing Date
- 2023-01-06
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for plotting waveforms of large audio files on smart mobile devices suffer from problems such as high system overhead, repeated data generation, and lack of cross-platform compatibility, resulting in choppy operation and a tendency to crash.
By generating waveform data files, storing sampling point data and converting it into view point coordinates, and combining scrollable controls and content views, the number of waveform drawing operations is reduced. A cross-platform data file format is adopted, user gesture operations are supported, and data overflow is avoided.
It achieves cross-platform waveform data universality, reduces system overhead, provides rich audio visualization effects, improves user experience, and avoids data overflow and duplicate generation.
Smart Images

Figure CN115984415B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of audio data processing technology, specifically relating to an audio time-domain waveform plotting method, system, and storage medium. Background Technology
[0002] Audio waveforms visually represent audio, with the horizontal axis representing time and the vertical axis representing pitch, providing a form of audio visualization. Combining auditory and visual elements allows users to easily and intuitively understand the characteristics of the audio (such as silence, noise areas, and volume contrast), facilitating audio editing operations such as audio cutting and synthesis. Annotations can also be added to the waveform to help users locate and describe the audio content.
[0003] Sound is a wave phenomenon, composed of waveforms, which are superpositions of waves with different frequencies and amplitudes. To represent these waveforms in digital media, they need to be sampled. The sampling must satisfy the highest frequency of the represented sound, and sufficient bit depth must be stored to represent the appropriate amplitude of the waveform in the sound sample. Analog sound signals from nature can be converted into digital audio data through digitization. Sampling of analog signals must follow the Nyquist sampling theorem, meaning the sampled signal must not be less than twice the highest frequency in the analog signal's spectrum. Human hearing range is 20-20kHz, therefore digital audio sampling needs to be above 40kHz. Common audio sampling rates for smart mobile terminals are 8000Hz, 44100Hz, and 48000Hz. Taking a 48000Hz sampling rate as an example, when a user records audio using a smart mobile terminal, 48,000 samples are needed per second. It is evident that uncompressed lossless digital audio files are undoubtedly large in size.
[0004] With the development and widespread use of smartphones and other mobile devices, people are increasingly reliant on these devices for work and daily life. Compared to computers, smart mobile devices are smaller, have less memory, and slower processing speeds. Therefore, when plotting waveforms from large audio and video files, it is necessary to consider the optimal processing solution to ensure high performance and smooth operation.
[0005] In existing technologies, waveform rendering for large audio and video files falls into two main categories: real-time block rendering and one-time rendering. Real-time block rendering reads audio data within the display area in real-time according to the playback progress. Because it reads audio data into a buffer in blocks, it can only process a portion of the audio data per unit time, making it impossible to scale the audio file globally. When only the waveform of the display area is rendered, the waveform needs to be redrawn every time the display area changes by a single pixel, resulting in high system overhead. One-time rendering renders all audio data into the view. Therefore, as the scaling ratio increases, the display precision increases, the required data density increases, and the view width increases, potentially causing memory overflow and program crashes. Furthermore, whether the data is obtained in real-time from the audio file or read into the buffer in blocks, the data that can directly generate the waveform is temporary data processed by the program. Once the program stops running, this data is cleared and regenerated on the next run; the waveform data generation process cannot be skipped in subsequent runs. That is, even if the same audio data has already generated a waveform once, the waveform data generation process still needs to be repeated on the next run. Furthermore, since waveform data is temporary data of the program, it is highly dependent on the program's operating environment. For example, program data generated on the Android platform cannot be used on the iOS platform without special processing. Summary of the Invention
[0006] To address the above problems, the first aspect of this application proposes a method for plotting audio time-domain waveforms, comprising the following steps:
[0007] S1, acquire PCM data from the audio file; S2, determine the audio sampling rate and waveform sampling duration, and calculate the value of the sampling point data based on the PCM data; S3, store the sampling point data as a waveform data file; S4, convert the sampling point data from the waveform data file into view point coordinate data; and S5, connect the coordinate points and draw the waveform.
[0008] The above solution proposes a method for drawing symmetrical, closed audio time-domain waveforms. By setting a waveform data file, it solves the problems of repeated generation of waveform data and the inability of data to be universally applicable across platforms in the prior art.
[0009] Furthermore, the value of the sampling point data is the root mean square of the PCM data of the sampling point, where the amount of PCM data n of the sampling point = audio sampling rate × waveform sampling duration.
[0010] Furthermore, the waveform data file content includes a file header and sampling point data. The file header includes an identification string and sampling duration.
[0011] Preferably, the view structure of the waveform graph includes a sliding control, a drawing view, and a content view. The sliding control is used to respond to user gestures, and the drawing view and the content view are both subviews of the sliding control. The width of the content view changes in response to user gestures.
[0012] Preferably, the method for calculating the coordinate data of the view points is as follows:
[0013] The x-coordinate of the viewpoint coordinate data is incremented by screen pixels.
[0014] The y-coordinate of the view point coordinate data satisfies rms / maxRms=y / (h / 2), where rms is the value of the current sampling point data, maxRms is the maximum value of the sampling point data of the audio file, and h is the height of the view being drawn;
[0015] The flipped coordinate y' of the view point coordinate data is obtained by flipping the ordinate y vertically about the horizontal center line of the view being drawn. The formula for the flipped coordinate is expressed as follows:
[0016]
[0017] Where (x,y) are the coordinates of the current view point, (x',y') are the coordinates of the view point after flipping, x = x', and θ is 180°.
[0018] Furthermore, the width of the drawing view is set to Uw < width < maxWidth, where Uw is the width of the sliding control, maxWidth is the maximum width of the drawing view, and maxWidth = the number of sampling points in the waveform data file m × the number of sampling points drawn per screen pixel. The waveform is redrawn when the display area exceeds the drawing view.
[0019] Preferably, the width range of the drawn view is set to 3 times Uw.
[0020] Preferably, the horizontal axis of the drawn view changes according to the playback time of the audio.
[0021] The waveform structure and drawing method proposed in the above scheme can reduce the number of times the waveform is drawn, reduce system overhead, and avoid data overflow; combined with user gesture processing for waveform scaling, it provides richer audio visualization effects.
[0022] The second aspect of this application proposes an audio time-domain waveform plotting system, comprising:
[0023] The audio data acquisition module is configured to acquire the PCM data of audio files.
[0024] The sampling point data calculation module is configured to determine the audio sampling rate and waveform sampling duration, and to calculate the sampling point data based on the PCM data.
[0025] The waveform data file generation module is configured to store sampled point data as a waveform data file.
[0026] The data conversion module is configured to convert the sample point data of waveform data files into visual data. Figure 2 3D point coordinate data;
[0027] as well as,
[0028] The waveform drawing module is configured to connect coordinate points and draw waveforms.
[0029] A third aspect of this application provides a computer-readable storage medium for plotting audio time-domain waveform data files, having stored thereon one or more computer programs that, when executed by a computer processor, implement any of the methods described above.
[0030] This invention proposes a waveform data file that stores waveform data in a cross-platform universal data file format. Waveforms can be generated by reading data from this file across multiple platforms, facilitating user operation. This invention also proposes a waveform structure combining a scrollable control, a content view, and a drawing view. This effectively reduces the number of waveform rendering operations and lowers system overhead. Furthermore, it integrates with user gestures, allowing for dragging, zooming, and other operations on the waveform. The drawing view, as a subview of the scrollable control, naturally incorporates bounce and scrolling animation effects, enriching the user experience. Further, by setting the width of the drawing view to a fixed value greater than the width of the scrollable control, data overflow caused by excessively long audio clips is avoided. Additionally, the waveform view position is reset and the waveform is redrawn only when the display area exceeds the drawing view's range, effectively reducing system overhead. Attached Figure Description
[0031] The accompanying drawings are provided to further understand this application. The elements in the drawings are not necessarily to scale. For ease of description, only the parts relevant to the invention are shown in the drawings.
[0032] Figure 1 This is a flowchart illustrating an audio time-domain waveform plotting method according to an embodiment of the present invention.
[0033] Figure 2 This is a schematic diagram of the waveform diagram in another embodiment of the present invention;
[0034] Figure 3 This is a waveform diagram of another embodiment of the present invention.
[0035] Figure 4 This is a schematic diagram of the audio time-domain waveform plotting system in another embodiment of the present invention. Detailed Implementation
[0036] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention.
[0037] Figure 1 This is a flowchart illustrating an audio time-domain waveform plotting method according to an embodiment of the first aspect of the present invention, which includes the following steps:
[0038] S1, acquire PCM data of the audio file, which also includes the recording sample data stream; S2, determine the audio sampling rate and waveform sampling duration, and calculate the value of the sample point data based on the PCM data; S3, store the sample point data as a waveform data file; S4, convert the sample point data of the waveform data file into view point coordinate data; and S5, connect the coordinate points and draw the waveform.
[0039] In a specific embodiment, the waveform data file is stored as a binary serial bit stream, which enables cross-platform and barrier-free use of the waveform data file.
[0040] In a specific embodiment, the waveform data file includes a file header and sampling point data.
[0041] In a further embodiment, the file header includes an identification string and a sampling duration t, where the sampling duration t represents the duration of PCM data acquired for each waveform data. For example, the file header LinFei.ltd waveform file 0.025 has the identification string LinFei.ltd waveform file and the sampling duration t of 0.025, meaning that each waveform data is acquired for 0.025 seconds of PCM data.
[0042] In a further embodiment, the value rms of the sampling point data is the root mean square of the PCM data of the sampling point, i.e.
[0043]
[0044] The data type of the sample point data is float, and n is the PCM data size of a single sample point, where n = audio sampling rate × waveform sampling duration. For example, for an audio file with a sampling rate of 44100Hz, when the sampling duration t of the waveform data file is 0.025, the PCM data size of each sample point is n = 44100 × 0.025 = 1102.
[0045] When plotting a waveform, the number of sampling points *m* in the waveform data file can be calculated from the audio file. In a specific embodiment, the sampling rate of a single-channel audio file is 44100Hz and the duration is 2 minutes and 30 seconds. Let the sampling duration *t* of the waveform data file be 0.025. Then, the number of sampling data points *m* in the waveform data file = audio duration / sampling duration of the waveform data file = (2*60+30) / 0.025 = 6000, meaning that the waveform data file has a total of 6000 sampling points.
[0046] The number of sampling points, m, in the waveform data file can also be calculated from the waveform data file. In a specific embodiment, the data type of the sampling point data is float, and the number of sampling points, m, in the waveform data file is calculated as: (Size of waveform data file - Number of bytes in the file header) / Number of float bytes.
[0047] Figure 2 This is a schematic diagram of the view structure of a waveform graph according to an embodiment of the first aspect of the present invention. In this embodiment, the view of the waveform graph consists of three components: a scrollable control (UIScrollView), a drawing view (WaveView), and a content view (ContentView). The scrollable control is used to respond to user gestures, such as swiping, dragging, zooming in, and zooming out. The content view is a subview of the scrollable control, and its width changes with the user's zooming. When the user zooms in, its length is greater than the width of the scrollable control; when the user zooms out, its width is less than the width of the scrollable control. The drawing view is the view where the waveform graph is actually drawn; it is at the same level as the content view, located above the content view, and is also a subview of the scrollable control.
[0048] In a specific embodiment, the sampling point data of the waveform data file is converted into view point coordinate data, i.e., the coordinate point (x, y) on the drawing view. In this embodiment, the waveform data file has a total of m sampling data, and one waveform sampling point data is drawn for each screen pixel. For the horizontal coordinate x of the view point coordinate data, it increases by one screen pixel increment. For the vertical coordinate y of the view point coordinate data, it satisfies rms / maxRms=y / (h / 2), where rms is the value of the current sampling point data, maxRms is the maximum value of the sampling point data in the audio file, and h is the height of the drawing view. The vertical coordinate y is flipped vertically about the horizontal center line of the drawing view according to the following formula to obtain the flipped coordinate y' of the view point coordinate data:
[0049]
[0050] Where (x,y) are the coordinates of the current view point, (x',y') are the coordinates of the view point after flipping, x = x', and θ is the flip angle, i.e., 180°.
[0051] Finally, by connecting all the coordinates and all the coordinate points, a symmetrical and closed waveform can be drawn. Figure 3 This is a waveform diagram drawn in a specific embodiment.
[0052] In a preferred embodiment, the width of the drawing view is set to a range of Uw < width < maxWidth, where Uw is the width of the scrollable control, maxWidth is the maximum width of the drawing view, and maxWidth = the number of sampling points in the waveform data file m × the number of sampling points drawn per screen pixel. If the width of the drawing view is equal to the width Uw of the scrollable control, when the width of the content view is greater than Uw, the waveform in the drawing view needs to be redrawn for each screen pixel scroll, resulting in high system overhead. If the width is not limited, the maximum width of the drawing view maxWidth = the number of sampling points in the waveform data file m × the number of sampling points drawn per screen pixel. In this case, the width of the drawing view increases with the audio duration. When the width of the drawing view is too large, it will cause data overflow, leading to program crashes. Therefore, to avoid data overflow, a reasonable width of the drawing view needs to be set. In this embodiment, the width of the drawing view is set to three times Uw. Redrawing is only required when the display area is outside the drawing view area, thereby reducing the number of draws and lowering system overhead.
[0053] In a specific embodiment, the horizontal coordinate of the drawing view changes according to the playback time of the audio. Since the width of the drawing view is smaller than the width of the content view, if the position of the drawing view on the scrollable control is fixed, after the user slides and drags, the drawing view deviates from the display area of the scrollable control, making the waveform unreadable. To ensure the drawing view... Figure 1 Within the display area of the scrollable control, if the position of the drawn view needs to be moved, that is, the horizontal coordinate Wx of the drawn view on the scrollable control changes. Specifically,
[0054] Wx = Px - Uw
[0055] Where Px / Cw = t / duration, Px is the horizontal coordinate of the display area of the sliding control, Cw is the width of the content view, Cw = number of sample data points × number of sample data points drawn per screen pixel, t is the current playback time of the audio, and duration is the total duration of the audio file. When the display area of the sliding control is within the view area of the drawing view, there is no need to redraw the waveform.
[0056] Those skilled in the art will understand that when the number of sampling points drawn per screen pixel exceeds a certain number, the user will not perceive an improvement in the image display quality of the waveform. Therefore, increasing the number of sampling points drawn per screen pixel is not very meaningful. Therefore, in this embodiment, when the user zooms out of the waveform, sampling point data is extracted for waveform drawing. Specifically, when the user performs a zoom-out operation, the zoom ratio of the sliding control is less than 1. At this time, only a portion of the sampling data is extracted for drawing. The number of sampling points m' = scale × the total number of sampling data m, sampled at intervals of 1 / scale. If 1 / scale is not an integer, the sampling interval can be rounded down. When a zoom-in operation is performed, scale > 1, and all sampling data can be drawn. The horizontal coordinate interval between adjacent waveform coordinate points is scale.
[0057] Figure 4 A schematic diagram of an audio time-domain waveform plotting system 400 according to an embodiment of the second aspect of the present invention is shown, comprising:
[0058] Audio data acquisition module 401 is configured to acquire PCM data of audio files;
[0059] The sampling point data calculation module 402 is configured to determine the audio sampling rate and waveform sampling duration, and to calculate the sampling point data based on the PCM data.
[0060] The waveform data file generation module 403 is configured to store sampled point data as a waveform data file.
[0061] Data conversion module 404 is configured to convert the sampling point data of waveform data files into visual data. Figure 2 3D point coordinate data;
[0062] as well as,
[0063] The waveform drawing module 405 is configured to connect coordinate points and draw waveforms.
[0064] According to an embodiment of a third aspect of the present invention, a computer-readable storage medium is also provided, which may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into the electronic device. The computer-readable storage medium carries one or more programs that, when executed by the electronic device, cause the electronic device to perform the described method.
[0065] The above embodiments propose a method, system, and computer-readable storage medium for drawing audio time-domain waveforms. This method generates waveform data from audio files or recorded sample data streams, saves it as a waveform data file, and then draws the waveform using the waveform data file. This method is particularly suitable for iOS applications related to audio processing, such as recording, audio playback, and audio editing. It can provide simple and intuitive audio visualization effects, reduce system overhead, and prevent data overflow due to excessively long audio files. Furthermore, when cross-platform data interoperability is required, the waveform data file provided by this invention can be used to achieve seamless and universal waveform data sharing.
[0066] Although the contents of this application have been specifically shown and described in conjunction with preferred embodiments, those skilled in the art should understand that any changes in form and detail made to this application without departing from the spirit and scope of this application as defined by the appended claims and without inventive effort are within the scope of protection of this application.
Claims
1. A method for plotting audio time-domain waveforms, characterized in that, Includes the following steps: S1, obtain the PCM data of the audio file; S2, determine the audio sampling rate and waveform sampling duration, and calculate the value of the sampling point data based on the PCM data; The value of the sampling point data is the root mean square of the PCM data of the sampling point, and the amount of PCM data n of the sampling point is equal to the audio sampling rate × the waveform sampling duration. S3, store the sampling point data as a waveform data file; S4, convert the sampling point data of the waveform data file into view point coordinate data; In the view point coordinate data: The x-coordinate of the viewpoint coordinate data is incremented by screen pixels. The y-coordinate of the view point coordinate data satisfies rms / maxRms=y / (h / 2), where rms is the value of the current sampling point data, maxRms is the maximum value of the sampling point data of the audio file, and h is the height of the drawn view; The flip coordinate y' of the view point coordinate data is obtained by flipping the ordinate y vertically about the horizontal center line of the drawn view. The formula for the flip coordinate is expressed as follows: ; Where (x,y) are the coordinates of the current view point, (x',y') are the coordinates of the view point after flipping, x = x', and θ is 180°; The horizontal coordinate Wx of the drawn view on the sliding control changes according to the playback time of the audio, Wx = Px - Uw; Where, Px / Cw = t / duration, Px is the horizontal coordinate of the display area of the sliding control, Cw is the width of the content view, Cw = number of sample data × number of sample point data drawn per screen pixel, t is the current playback time of the audio, duration is the total duration of the audio file, and Uw represents the width of the sliding control. When the user zooms out of the waveform, if the zoom level of the slider control is less than 1, only a portion of the sampled data is drawn. The number of sampled data points, m', is equal to scale × the total number of sampled data points, m. Sampling is performed at intervals of 1 / scale. If 1 / scale is not an integer, the sampling interval can be rounded down. When the user zooms in, if scale > 1, all sampled data can be drawn. The horizontal coordinate interval between adjacent waveform points is scale. as well as, S5, connect the coordinate points and draw the waveform; The waveform data file format includes a file header and sampling point data, and the file header includes an identification string and sampling duration; The waveform diagram's view structure includes a sliding control, a drawing view, and a content view. The sliding control is used to respond to user gestures. The drawing view and the content view are both subviews of the sliding control. The width of the content view changes in response to user gestures.
2. The audio time domain waveform graphing method of claim 1, wherein, The width of the drawing view is set to Uw < width < maxWidth, where Uw is the width of the slidable control, maxWidth is the maximum width of the drawing view, and maxWidth = the number of sampling points in the waveform data file m × the number of sampling points drawn per screen pixel. The waveform is redrawn when the display area exceeds the drawing view.
3. The audio time domain waveform graphing method of claim 2, wherein, The width of the drawing view is set to 3 times Uw.
4. An audio time-domain waveform plot drawing system characterized by comprising: include: The audio data acquisition module is configured to acquire the PCM data of audio files. The sampling point data calculation module is configured to determine the audio sampling rate and waveform sampling duration, and to calculate the value of the sampling point data based on the PCM data. The value of the sampling point data is the root mean square of the PCM data of the sampling point, and the amount of PCM data n of the sampling point is equal to the audio sampling rate × the waveform sampling duration. A waveform data file generation module is configured to store the sampled point data as a waveform data file; The data conversion module is configured to convert the sampling point data of the waveform data file into view point coordinate data; In the view point coordinate data: The x-coordinate of the viewpoint coordinate data is incremented by screen pixels. The y-coordinate of the view point coordinate data satisfies rms / maxRms=y / (h / 2), where rms is the value of the current sampling point data, maxRms is the maximum value of the sampling point data of the audio file, and h is the height of the drawn view; The flip coordinate y' of the view point coordinate data is obtained by flipping the ordinate y vertically about the horizontal center line of the drawn view. The formula for the flip coordinate is expressed as follows: ; Where (x,y) are the coordinates of the current view point, (x',y') are the coordinates of the view point after flipping, x = x', and θ is 180°; The horizontal coordinate Wx of the drawn view on the sliding control changes according to the playback time of the audio, Wx = Px - Uw; Where, Px / Cw = t / duration, Px is the horizontal coordinate of the display area of the sliding control, Cw is the width of the content view, Cw = number of sample data × number of sample data points drawn per screen pixel, t is the current playback time of the audio, duration is the total duration of the audio file, and Uw is the width of the sliding control. When the user zooms out of the waveform, if the zoom level of the slider control is less than 1, only a portion of the sampled data is drawn. The number of sampled data points, m', is equal to scale × the total number of sampled data points, m. Sampling is performed at intervals of 1 / scale. If 1 / scale is not an integer, the sampling interval can be rounded down. When the user zooms in, if scale > 1, all sampled data can be drawn. The horizontal coordinate interval between adjacent waveform points is scale. as well as, The waveform drawing module is configured to connect coordinate points and draw waveforms. The waveform data file format includes a file header and sampling point data, and the file header includes an identification string and sampling duration; The waveform diagram's view structure includes a sliding control, a drawing view, and a content view. The sliding control is used to respond to user gestures. The drawing view and the content view are both subviews of the sliding control. The width of the content view changes in response to user gestures.
5. A computer-readable storage medium for plotting audio time-domain waveform data files, wherein one or more computer programs are stored thereon, characterized in that, When the one or more computer programs are executed by a computer processor, they perform the method according to any one of claims 1 to 3.
Citation Information
Patent Citations
Method and device for dubbing audio and visual digital media
CN105827997A
Audio oscillogram drawing method and device, electronic equipment and storage medium
CN114491142A