System for detecting human activities using hybrid, deep learning-supported feature engineering

The hybrid feature engineering pipeline with data intensity-based selection and CNN-LSTM network enhances HAR systems' performance by focusing on high-intensity features, achieving high accuracy and efficient computation in real-world environments.

DE202026102481U1Undetermined Publication Date: 2026-06-25CHOUDHURY NURUL AMIN +4

Patent Information

Authority / Receiving Office
DE · DE
Patent Type
Utility models
Current Assignee / Owner
CHOUDHURY NURUL AMIN
Filing Date
2026-04-29
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing human activity recognition (HAR) systems face limitations in automatic feature development and efficient feature selection, leading to suboptimal performance in real-world environments, despite the use of deep learning models.

Method used

A hybrid feature engineering pipeline using data intensity-based feature selection and a custom CNN-LSTM network for extracting spatial and temporal features, combined with a random forest ensemble classifier, to enhance activity classification.

Benefits of technology

The system achieves high accuracy and efficient computation by prioritizing high-intensity features, capturing spatiotemporal patterns, and improving generalizability in real-world conditions, outperforming benchmark models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

A system for detecting human activity using hybrid, deep learning-based feature development, consisting of: a sensor module configured for attachment to a user's body, comprising an accelerometer and a gyroscope configured to capture motion and orientation data of human activity; a data acquisition module connected to the sensor module and configured to capture sensor data from the sensor module in real time at a predetermined sampling rate;a central processing unit connected to the data acquisition unit and comprising a data preprocessing unit, a data intensity-based feature selection module, and a feature extraction module comprising a hybrid Convolutional Neural Network with Long-Term Memory Network (CNN-LSTM), wherein the central processing unit is configured to preprocess the data received from the data acquisition module, perform data intensity-based feature selection using Root Mean Square, and extract spatial and temporal features from the selected features;and a classification module connected to the central processing unit, configured to classify the sensor data into one or more activity categories from the group consisting of walking, running, standing, sitting, climbing stairs and descending stairs, wherein the classification module includes a random forest classifier configured to use the extracted spatial and temporal features for classification, and wherein the random forest classifier captures the hierarchical feature extraction and temporal dependencies from a hybrid convolutional neural network with a long-term memory network.
Need to check novelty before this filing date? Find Prior Art

Description

AREA OF INVENTION The present disclosure relates to a system for detecting human activities, in particular a system for detecting human activities by means of hybrid, deep learning-based feature extraction. More specifically, the invention relates to a system based on data intensity-based feature selection using a transformed RMS mean-square feature set in combination with a custom CNN-LSTM network for extracting low-level spatial and temporal features to enable precise classification of human activities based on the acquired sensor data. BACKGROUND OF THE INVENTION The rapid development of wearable and embedded sensor technologies has revolutionized consumer electronics, particularly in healthcare, automation, fitness, and rehabilitation. Integrating these sensors into everyday devices has enabled the development of numerous applications that offer improved performance and an optimized user experience across various sectors. Human Activity Recognition (HAR) is a method for predicting the everyday activities of individuals or groups using sensors and advanced machine learning algorithms. These activities are performed in daily life, including work, leisure, and other pursuits. A distinction is made between static activities such as standing, sitting, and sleeping, which involve minimal movement and low energy expenditure, and dynamic activities such as walking, jogging, and climbing stairs, which require continuous movement and higher energy expenditure. Machine learning algorithms are frequently used for classification tasks in traditional HAR systems and achieve optimal results. However, the lack of automatic feature development and the need for domain expertise limit overall performance. Deep learning models are the most widely used learning method in sensor-based HAR systems. By supporting automatic feature development and classification pipelines, they are superior to machine learning models, provided high-quality data is fed into their complex neural architecture. The ability to capture noise-free data at an optimal sampling rate makes the activity recognition (HAR) sensors integrated into smartphones highly efficient. However, the sensors and their characteristics must be optimally selected to enable accurate predictions in real-world situations. Typically, wearable, sensor-based HAR systems use multiple inertial sensors, such as accelerometers, gyroscopes, and magnetometers, which are attached to various parts of the body to collect time-series data from different everyday activities. These sensors provide numerous characteristics of diverse human activities, which are then preprocessed to generate systematic data for activity recognition. Known HAR models utilize automated feature engineering to process time-series sensor data and extract spatiotemporal features using hybrid neural networks, transformers, and fine-tuned architectures. However, there is a lack of feature engineering pipelines with efficient feature selection combined with automated extraction models for robust activity detection. Most existing HAR work relies on hybrid deep learning and fine-tuned models for feature engineering and classification, which may not deliver optimal performance. Integrating an efficient ensemble learning model can improve the model's effectiveness in activity classification. To overcome the aforementioned limitations, the present invention provides a system comprising a hybrid feature engineering pipeline that includes a data intensity-based feature selection scheme and a custom convolutional neural network with long-term memory network layers for feature extraction. SUMMARY OF THE INVENTION This invention relates to a system for detecting human activities using hybrid, deep-learning-based feature extraction. The feature extraction comprises a novel, data-intensity-based feature selection module that uses the root mean square error (RMSE) to identify and select high-intensity features while discarding low-intensity features. A hybrid convolutional neural network (CNN) with long-term memory extracts spatial and temporal features from the selected features. A random forest ensemble classifier analyzes the extracted features to classify human activities such as walking, running, standing, sitting, climbing stairs, and descending stairs. The present disclosure relates to a system for detecting human activities using hybrid, deep-learning-based feature extraction. The system comprises: a sensor module for attachment to a user's body, consisting of an accelerometer and a gyroscope for capturing motion and orientation data of human activities; a data acquisition module connected to the sensor module, which acquires sensor data in real time at a predefined sampling rate; a central processing unit connected to the data acquisition unit, comprising a data preprocessing unit, a module for data intensity-based feature selection, and a feature extraction module with a hybrid convolutional neural network (CNN) and long-term memory network (LSTM).The central processing unit serves to preprocess the data acquired by the data acquisition module, to select features based on data intensity using the root mean square method, and to extract spatial and temporal features from the selected features. A classification module, connected to the central processing unit and configured to divide the sensor data into one or more activity categories selected from the group consisting of walking, running, standing, sitting, climbing stairs, and descending stairs, is also included. The classification module comprises a random forest classifier configured to use the extracted spatial and temporal features for classification, and the random forest classifier captures the hierarchical feature extraction and temporal dependencies from a hybrid convolutional neural network with a long-term memory network. The subject of the present disclosure is the provision of a system for the recognition of human activities by means of hybrid feature development based on deep learning. Another objective of this disclosure is to provide a hybrid, deep learning-based automatic feature extraction combined with data intensity-based feature extraction. A feature selection framework has been proposed to capture spatiotemporal features while preserving data dependencies for sensor-based HAR systems. The proposed model improves generalizability and optimizes computation time, thereby ensuring efficient performance in real-world environments. Another objective of the present disclosure is the development of an optimized random forest ensemble classifier to capture linear and nonlinear sensor data relationships from the features of the proposed feature engineering framework for improved activity classification. Another objective of the present disclosure is to generate a HAR dataset based on an integrated smartphone sensor in an uncontrolled environment in order to capture raw data of various DLAs under real-world conditions and thus enable the simple implementation of the proposed model in consumer electronics devices. However, another objective of the present disclosure is to conduct detailed empirical experiments on several public datasets to evaluate the performance of the proposed system compared to benchmark ensemble and deep learning models, and to highlight its strengths and effectiveness in various application areas. To further clarify the advantages and features of the present disclosure, the invention is described in more detail with reference to specific embodiments illustrated in the accompanying drawings. It is understood that these drawings merely show typical embodiments of the invention and are therefore not to be understood as limiting its scope of protection. The invention is described and explained in more detail and with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE IMAGES These and other features, aspects, and advantages of the present disclosure will be better understood if the following detailed description is read with reference to the accompanying drawings, in which identical symbols represent identical parts, wherein: Fig. 1 illustrates a block diagram of a system for detecting human activities using hybrid, deep learning-based feature extraction according to an embodiment of the present disclosure; and Fig. 2 shows a schematic diagram of the proposed hybrid framework for automatic feature development and classification using deep learning according to an embodiment of the present disclosure. Furthermore, those skilled in the art will recognize that the elements in the drawings are simplified and not necessarily drawn to scale. For example, the flowcharts illustrate the process by highlighting the main steps to facilitate understanding of this disclosure. With regard to the construction of the device, one or more components may be represented in the drawings by conventional symbols. The drawings may show only those specific details relevant to understanding the embodiments of this disclosure, so as not to clutter the drawings with details that are already apparent to those skilled in the art from the description contained herein. DETAILED DESCRIPTION: To facilitate understanding of the principles of the invention, reference is made below to the embodiment illustrated in the drawings, which is described using specific terms. It is understood, however, that this does not limit the scope of protection of the invention. Rather, modifications and further developments of the illustrated system, as well as further applications of the inventive principles depicted therein, are conceivable, insofar as they would typically occur to a person skilled in the art in the field of the invention. It will be clear to those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not to be understood as a limitation of it. References to “an aspect”, “another aspect”, or similar phrases in this description mean that a particular feature, structure, or property described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, phrases such as “in one embodiment”, “in another embodiment”, and similar expressions in this description may, but do not necessarily, all refer to the same embodiment. The terms "includes," "comprehensive," or similar expressions denote non-exclusive inclusion. Thus, a procedure or method containing a list of steps does not only include those steps but may also include further steps not explicitly listed or inherent in the procedure or method. Likewise, the statement "includes..." for one or more devices, subsystems, elements, structures, or components, without further limitations, does not preclude the existence of other devices, subsystems, elements, structures, or components. Unless otherwise defined, all technical and scientific terms used herein have the same meanings generally known to those skilled in the art in the field to which this invention belongs. The systems, methods, and examples described herein serve only for illustration and are not to be understood as limiting. Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. The functional units described in this specification are referred to as devices. A device may be implemented in programmable hardware such as processors, digital signal processors, central processing units, FPGAs, PALs, PLDs, cloud processing systems, or similar. Devices may also be implemented in software for execution by various processor types. An identified device may contain executable code and, for example, comprise one or more physical or logical blocks of computer instructions, which may be organized as an object, procedure, function, or other construct. However, the executable files of an identified device need not be physically related; they may consist of different instructions stored in different locations that, when logically combined, constitute the device and fulfill its purpose. The executable code of a device or module can consist of a single instruction or multiple instructions and can even extend across different code sections, applications, and storage media. Similarly, operational data within the device can be identified and represented, and can exist in any suitable form and be organized in any data structure. The operational data can be captured as a single data record or distributed across various storage media and may exist, at least partially, as electronic signals within a system or network. References to “a selected embodiment”, “an embodiment”, or “an embodiment” in this description mean that a particular feature, structure, or property described in connection with the embodiment is included in at least one embodiment of the disclosed subject matter. Therefore, the phrases “a selected embodiment”, “in an embodiment”, or “in an embodiment” appearing at different points in this description do not necessarily refer to the same embodiment. Furthermore, the described features, structures, or properties can be combined in one or more embodiments in any suitable manner. The following description contains numerous specific details to enable a comprehensive understanding of the embodiments of the disclosed subject matter. However, a person skilled in the art will recognize that the disclosed subject matter can also be realized without one or more of the specific details or with other methods, components, materials, etc. In other cases, known structures, materials, or processes are not presented or described in detail so as not to obscure aspects of the disclosed subject matter. According to the exemplary embodiments, the disclosed computer programs or modules can be executed in a variety of ways, for example, as an application running in the memory of a device or as a hosted application running on a server and communicating with the device application or browser via various standard protocols such as TCP / IP, HTTP, XML, SOAP, REST, JSON, and other suitable protocols. The disclosed computer programs can be written in programming languages ​​that run either in the device's memory or on a hosted server, such as BASIC, COBOL, C, C++, Java, Pascal, or scripting languages ​​such as JavaScript, Python, Ruby, PHP, Perl, or other suitable programming languages. Some of the described embodiments involve data transmission over a network, such as the transmission of various inputs or files. The network may include, for example, the internet, wide area networks (WANs), local area networks (LANs), analog or digital wired and wireless telephone networks (e.g., PSTN, ISDN, cellular networks, and xDSL), radio, television, cable, satellite, and / or other transmission or tunneling mechanisms for data. It may include multiple networks or subnetworks, each of which may, for example, have a wired or wireless data path. The network may include a circuit-switched voice network, a packet-switched data network, or another network for transmitting electronic data. For example, it may be based on the Internet Protocol (IP) or Asynchronous Transfer Mode (ATM) and support voice communication using VoIP, Voice over ATM, or similar protocols.In one embodiment, the network comprises a mobile network configured for the exchange of text or SMS messages. Examples of networks include Personal Area Networks (PAN), Storage Area Networks (SAN), Home Area Networks (HAN), Campus Area Networks (CAN), Local Area Networks (LAN), Wide Area Networks (WAN), Metropolitan Area Networks (MAN), Virtual Private Networks (VPN), Enterprise Private Networks (EPN), the Internet, Global Area Networks (GAN), and so on. Fig. 1 shows a block diagram of a system (100) for detecting human activities using hybrid, deep learning-based feature development according to an embodiment of the present disclosure. The system according to Fig. 1 comprises: a sensor module (102) for attachment to a user's body, which includes an accelerometer and a gyroscope for capturing motion and orientation data of human activities; a data acquisition module (104) connected to the sensor module (102) and acquiring sensor data in real time at a predefined sampling rate; a central processing unit (106) connected to the data acquisition unit (104) and comprising a data preprocessing unit (106a), a feature selection module based on data intensity (106b), and a feature extraction module (106c) incorporating a hybrid convolutional neural network with a long-term memory network (CNN-LSTM). The central processing unit (106) is configured to preprocess the data received from the data acquisition module (104).a feature selection based on data intensity using the Root Mean Square-Mean-Square mechanism and extracts spatial and temporal features from the selected features, and a classification module (108) connected to the central processing unit (106) and configured to classify the sensor data into one or more activity categories from the group consisting of walking, running, standing, sitting, climbing stairs and descending stairs, wherein the classification module (108) includes a random forest classifier configured to use the extracted spatial and temporal features for classification, and wherein the random forest classifier acquires the RF, hierarchical feature extraction and temporal dependencies from a hybrid convolutional neural network with a long-term memory network. In one embodiment, the sensor module (102) is configured to acquire sensor data at a sampling rate of one hundred hertz to detect high-intensity activity patterns. The sensor data includes features extracted directly from the sensor module, including acceleration due to gravity in three three-dimensional axes, linear acceleration in three three-dimensional axes, gravity in three three-dimensional axes, rotational velocity in three three-dimensional axes, and rotation vector in three three-dimensional axes. In one embodiment, the data preprocessing module (106a) is configured to detect and remove noise instances from the collected sensor data, identify outliers in the sensor data, analyze the class distribution to detect class imbalances, and prepare the sensor data for input into the feature engineering pipeline without performing any data extension. In one embodiment, the data intensity-based feature selection module (106b) calculates the effective intensities of feature sets from the sensor data using a quadratic mean transformation. Based on these calculated effective intensities, high- and low-intensity features are identified. Subsequently, the optimal features are selected by retaining the high-intensity features and discarding the low-intensity features. The selected optimal features are stored in a processed dataset. The data intensity-based feature selection module (106b) transforms three-dimensional dependent feature sets into quadratic means to enhance the differentiation between high- and low-intensity features. The calculated effective intensities of the feature sets are visualized over time windows.Finally, a maximum intensity filter is applied to automatically select features with the highest variability while discarding noise. In one embodiment, the feature extraction module (106c) with hybrid Convolutional Neural Network with Long-Term Memory Network (CNN-LSTM) is configured to receive the processed data set from the data intensity-based feature selection module, extract spatial features from the processed data set using one-dimensional convolutional layers, extract temporal features from the spatial features using long-term memory layers, and generate an extracted feature array.The hybrid long-term memory convolutional neural network further includes: a variety of one-dimensional convolutional layers configured to apply convolution operations with learned kernels and bias terms to extract low-level spatial features; a dropout layer configured to randomly disable a predetermined proportion of input units during training to reduce overfitting; a max-pooling layer configured to reduce dimensionality by selecting maximum values ​​from pooling windows; and a flattening layer configured to convert two-dimensional feature maps into a one-dimensional feature vector for input into the long-term memory layers.The long-term memory layers are configured to use storage functions to identify persistent temporal dependencies between activity instances, to obtain temporal sequence information from the spatial features, and to output temporal feature representations that preserve activity patterns over time. In one embodiment, the classification module (108) comprises a random forest ensemble classifier. This classifier receives the extracted feature array from the hybrid convolutional neural network with a long-term memory network, analyzes linear and nonlinear relationships within the feature array, and classifies the sensor data into one or more activity categories from the group walking, running, standing, sitting, climbing stairs, and descending stairs. The random forest ensemble classifier creates multiple decision trees, each using a random subset of the features from the extracted feature array. Based on these random feature sets, decision boundaries are created to capture nonlinear relationships. The predictions of the decision trees are aggregated to generate a final activity classification. In one embodiment, the central processing unit (106) is additionally configured to store the extracted feature array. The system achieves activity detection with minimal computational overhead by optimizing the feature dimensionality through the data intensity-based feature selection module prior to the deep learning-based feature extraction. The present invention relates to systems for detecting human activities using wearable sensors and deep learning algorithms, in particular a system that uses hybrid feature engineering with data intensity-based feature selection and a convolutional neural network with long-term memory network for classifying everyday activities in real-world environments. In a preferred embodiment, the human activity detection system uses smartphone-based accelerometer and gyroscope sensors in combination with a novel data intensity-based feature selection method and a hybrid CNN-LSTM network for automatic spatiotemporal feature extraction, followed by random forest ensemble classification for accurate identification of everyday activities such as walking, running, standing, sitting, and climbing stairs. Fig. 2 shows a schematic diagram of the proposed hybrid framework for automatic feature development and classification using deep learning according to an embodiment of the present disclosure. Figure 2 illustrates the functionality of the proposed system. It comprises several crucial phases, including data acquisition, preprocessing, and the development of hybrid feature models and activity detection. Each phase of the system is described in detail below. The human activity detection system overcomes the limitations of existing datasets collected in controlled laboratory and simulation environments, which rely on multiple complex sensors and thus increase storage and computational costs. The presented system generates a smartphone-based dataset in a real-world, uncontrolled environment. The sensor module includes an accelerometer and a gyroscope as the primary sensors integrated into the smartphone for data acquisition. These are configured to effectively capture motion and orientation data from various everyday activities. The system uses smartphones to collect raw data from multiple users. In one implementation, the system collects sensor data from twenty users of varying ages, including men and women. The users are between 24 and 35 years old, weigh between 65 and 94 kilograms, and are between 165 and 186 centimeters tall. The smartphone sensor modules are attached to the users' front pockets. This allows the devices to be carried comfortably without restricting their usual activities. The smartphones are positioned in the front pockets with the earpiece facing upwards. No straps or cords are used for attachment. The data acquisition module is configured to collect sensor data and store it in CSV format. The system generates a dataset of six everyday activities: walking, running, standing, sitting, climbing stairs, and descending stairs. These activities are performed by different users. The sensor module captures accelerometer and gyroscope data at a sampling rate of 100 Hz to ensure that all activity patterns are captured, even at high intensity. The system extracts a total of 16 features directly from the sensor data, along with class labels for activity classification. The raw features captured in three dimensions include: acceleration due to gravity (features 1, 2, and 3), linear acceleration (features 4, 5, and 6), gravitational force (features 7, 8, and 9), rotational velocity (features 10, 11, and 12), and rotation vector (features 13, 14, and 15).and a cosine transformation of the thirteenth feature, which is referred to as the sixteenth feature. In one embodiment, three benchmark datasets based on wearable sensors named MotionSense, mHealth, and Gait-Motion are used to validate system performance. The datasets are collected in real-world environments to validate the system for various applications. The system also includes a data preprocessing module that prepares the acquired sensor data for robust classification. This module performs several validation and cleaning operations on the sensor data before feature development and classification. It checks the acquired sensor data for noise, missing values, and corrupted entries. After validation, the system detects no missing or corrupted data in the datasets. To remove interference from the sensor data, the data preprocessing module is configured to remove the one hundred most frequent and the one hundred most recent entries from each activity category. This removal is necessary because the sensor module generates interference and unwanted data unrelated to the actual data acquisition during short setup and shutdown phases. The data preprocessing module is configured to check the captured sensor data for outliers. The system is designed to identify negligible outliers in benchmark datasets such as mHealth and MotionSense. However, the generated dataset contains outliers in the range of 0.5% to 1% of all instances. The data preprocessing module is configured to retain these outliers because they do not significantly impact classification performance. The system recognizes that the issue of noise and outliers does not occur in public datasets because these are preprocessed before use. The data preprocessing module also analyzes the class distribution across all datasets to identify class imbalances. The analysis shows that the mHealth dataset does not exhibit any class imbalances.The generated Gait-Motion and MotionSense datasets, however, exhibit class imbalances because they were generated under uncontrolled conditions. The data preprocessing module is configured to operate without data augmentation to resolve this class imbalance issue. This makes the feature engineering pipeline more sophisticated for prominent feature analysis and classification, while preserving the real-world data properties. After data acquisition and preprocessing, the system uses a data intensity-based feature selection procedure to determine the optimal features for pattern recognition. The data intensity-based feature selection module uses the root mean square (RMS) to calculate the effective intensities of different relative feature sets from the dataset and selects the optimal features using data visualization and a maximum intensity filter. The root mean square (RMS) value is used because it calculates the root mean square of the three-dimensional dependent features and represents the effective intensities across different time windows. The quadratic transformation amplifies the influence of larger measured values, thus highlighting the distinction between high-intensity features (higher values) and low-intensity features (lower values). High-intensity features represent greater variability in the sensor data and are prioritized because they are more likely to capture the relevant underlying patterns of the dataset. High-intensity features have a greater impact on the model's learning ability because they capture more pronounced fluctuations in the raw signals.By retaining high-intensity features, the system ensures that the model focuses on the most informative features for pattern recognition, while discarding low-intensity features that contribute minimally or might represent noise. This makes RMS a reliable metric for feature selection. Data visualization is a key aspect of the data intensity-based feature selection process. By displaying the intensities of the transformed features, the system can visually assess which features exhibit high or low variability over time. This helps identify features with consistently high intensity and thus qualify them for selection in the processed dataset. Subsequently, the maximum intensity filter is applied to select the most important high-intensity features and exclude low-intensity features. This process ensures that only the features with the greatest impact on pattern generalization are retained, thereby improving the overall performance of the classifier. Once the suboptimal features are stored in the processed dataset, they are fed to the feature extraction module. This module uses a specially developed, resource-efficient hybrid convolutional neural network (CNN) with long-term memory (LTM) layers for automated feature extraction in the spatial and temporal domains. The CNN utilizes multiple one-dimensional convolutional layers to extract low-level spatial features from the raw time-series sensor data, where the timestamps represent the temporal evolution. Convolution is performed using a convolution kernel (also called a filter) in combination with a bias term to generate a resulting feature map. From this feature map, various statistical spatial features are derived for classification. To improve generalization and prevent overfitting, the system implements various regularization techniques, including dropout at a predefined rate. During training, dropout randomly sets a portion of the input units to zero. Additionally, a max-pooling layer is used to reduce dimensionality by determining the maximum value from each pooling window of a predefined size. Finally, the flattening operation converts the two-dimensional feature maps into a one-dimensional feature vector. Together, these operations ensure that the convolutional neural network effectively captures and maintains key spatial features while minimizing the risk of overfitting. To handle long-term temporal dependencies between activity instances, the system integrates a long-term memory layer. This layer uses its storage function to identify persistent dependencies between everyday activities. Additionally, a dropout layer is introduced to prevent overfitting and reduce the feature network, thereby minimizing computational overhead. After extracting the spatial and temporal features, the system stores them in an array in NumPy file format and then performs classification using a random forest classifier. This classifier is used because of its ability to capture nonlinear relationships by employing multiple decision trees. Each decision tree constructs a decision boundary based on a random subset of the features.This flexibility allows the Random Forest classifier to effectively capture the hierarchical feature extraction and temporal dependencies of the Convolutional Neural Network with Long-Term Memory (CNN) network model, thus adapting to complex patterns in the data. In one embodiment, benchmark ensemble and deep learning models are integrated alongside the proposed system to compare and analyze performance in detail based on various evaluation metrics such as precision, hit rate, F1 score, accuracy, and computation time. The benchmark models include Random Forest, Extreme Gradient Boosting, Dense Neural Network, Long-Term Memory Network, and a hybrid Convolutional Neural Network with a Long-Term Memory Network using a softmax classifier. These models serve as benchmarks because numerous modern human activity detection systems incorporate these architectures into various sensor-based datasets for feature extraction and efficient activity recognition. The system evaluates precision as the ratio of correctly identified cases to the sum of correctly and incorrectly identified cases across all activity categories. Recall is evaluated as the ratio of correctly identified cases to the sum of correctly and falsely identified negative cases across all activity categories. The F1 score is calculated as the harmonic mean of precision and recall. Accuracy is determined as the ratio of correct classifications (both correct and incorrect) to the total number of classifications (including correct and incorrect). Computation time is measured as the sum of the training and validation time required by the system to perform activity recognition. The proposed system was tested on a generated dataset as well as three publicly available datasets, including mHealth, MotionSense, and Gait-Motion. When capturing the transformed root mean squares of the intensities using the data intensity-based feature selection scheme, gravity features exhibited the lowest intensity compared to other features, such as linear acceleration, and were therefore removed from the dataset. The optimal features extracted from the proposed feature engineering pipeline achieved an average accuracy of 99.40% and a peak accuracy of 99.53% on the generated dataset, outperforming benchmark models with optimal performance margins. The proposed system demonstrated efficient generalization with a minimum test loss of 0.078 to 0.115 at a comparatively optimized computation time of 255 to 261 seconds.The proposed model achieved a significantly lower standard deviation of 0.08%, indicating minimal variability and superior stability compared to benchmark models. The benchmark models Random Forest, Extreme Gradient Boosting, Dense Neural Network, and Long-Term Memory Network achieved lower average accuracies of 94.43%, 93.33%, 93.21%, and 94.22%, respectively, with higher computation times due to the lack of low-level hierarchical feature extraction and data intensity-based feature selection. The Convolutional Neural Network with Long-Term Memory Network achieved an average accuracy of 96.81%, which was significantly lower than the proposed model due to the absence of intensity-based feature selection. On publicly available datasets, the proposed model demonstrated exceptional generalizability and robustness. In the mHealth dataset, the data-intensity-based feature selection procedure reduced six gyroscope-based features from the raw dataset. The model achieved a remarkable accuracy of 99.81%, with 100% precision, hit rate, and F1 score. In the MotionSense dataset, three gravity features were optimized from the twelve-feature raw dataset. The model achieved an accuracy of 99.89%, with 100% precision, hit rate, and F1 score. In the Gait-Motion dataset, gravity features in three axes were identified as having minimal intensity and were removed. This resulted in an accuracy of 99.12%, with optimal values ​​for precision, hit rate, and F1 score. Ablation studies confirmed the effectiveness of critical components.They showed that removing the data intensity-based feature selector reduced the accuracy to 97.21%, removing the convolutional neural network feature extractor with long-term memory network to 94.05%, removing the random forest classifier to 97.59%, removing the convolutional neural network component to 98.01%, and removing the long-term memory network component to 95.41%. These results demonstrate that low-level hierarchical feature extraction, combined with effective data intensity-based feature selection, contributed to superior performance and outperformed state-of-the-art sensor-based human activity detection models with optimal performance margins. The proposed model achieved remarkable performance accuracy of 99.81%, 99.12%, 99.89%, and 99.53%, respectively, using the mHealth, Gait-Motion, MotionSense, and a custom dataset, thus significantly outperforming all previous reference models. Furthermore, compared to the integrated benchmark models, the proposed model achieved optimized computation times and exhibited significant test loss with a variance of 0.078 to 0.115. The drawings and the preceding description illustrate embodiments. Those skilled in the art will recognize that one or more of the described elements can be combined to form a single functional element. Alternatively, certain elements can be divided into several functional elements. Elements of one embodiment can be added to another. For example, the process flows described here can be modified and are not limited to the manner described herein. Furthermore, the actions of a flowchart need not be performed in the sequence shown; nor do all actions necessarily need to be carried out. Actions that do not depend on other actions can be performed in parallel with the other actions. The scope of protection of the embodiments is in no way limited by these specific examples. Numerous variations, whether explicitly stated in the description or not, such as...Differences in structure, dimensions, and materials are possible. The scope of protection of the embodiments is at least as comprehensive as described by the following claims. The advantages, other benefits, and problem solutions have been described above with reference to specific embodiments. However, the advantages, benefits, problem solutions, and any components that can effect or enhance an advantage, benefit, or solution are not to be construed as critical, necessary, or essential features or components of the claims. REFERENCES 100 A system for detecting human activity using hybrid, deep learning-supported feature engineering / 102 Sensor module 104 Data acquisition module 106 Central processing unit 106a Data preprocessing unit 106b Feature selection module based on data intensity 106c Feature extraction module 108 Classification module 202 Calibration 204 Data acquisition 206 Data preprocessing 208 Preprocessed dataset 210 Feature intensity estimation and visualization 212 Suboptimal feature set 214 Dataset with selected features 216 Spatial feature extraction using CNN 218 Temporal feature extraction using LSTM 220 Addition of nonlinearity through dense layer 222 Random forest classifier for classification 224 Activity detection

Claims

A system for detecting human activity using hybrid, deep learning-based feature development, consisting of: a sensor module configured for attachment to a user's body, comprising an accelerometer and a gyroscope configured to capture motion and orientation data of human activity; a data acquisition module connected to the sensor module and configured to capture sensor data from the sensor module in real time at a predetermined sampling rate;a central processing unit connected to the data acquisition unit and comprising a data preprocessing unit, a data intensity-based feature selection module, and a feature extraction module comprising a hybrid Convolutional Neural Network with Long-Term Memory Network (CNN-LSTM), wherein the central processing unit is configured to preprocess the data received from the data acquisition module, perform data intensity-based feature selection using Root Mean Square, and extract spatial and temporal features from the selected features;and a classification module connected to the central processing unit, configured to classify the sensor data into one or more activity categories from the group consisting of walking, running, standing, sitting, climbing stairs and descending stairs, wherein the classification module includes a random forest classifier configured to use the extracted spatial and temporal features for classification, and wherein the random forest classifier captures the hierarchical feature extraction and temporal dependencies from a hybrid convolutional neural network with a long-term memory network. System according to claim 1, wherein the sensor module is configured to acquire sensor data at a sampling rate of one hundred hertz to capture high-intensity activity patterns, and the sensor data includes features extracted directly from the sensor module, including acceleration due to gravity in three three-dimensional axes, linear acceleration in three three-dimensional axes, gravity in three three-dimensional axes, rotational velocity in three three-dimensional axes, and rotation vector in three three-dimensional axes. System according to claim 1, wherein the data preprocessing module is configured to detect and remove noise instances from the collected sensor data, identify outliers in the sensor data, analyze the class distribution to detect class imbalances, and prepare the sensor data for input into the feature engineering pipeline without performing any data extension. System according to claim 1, wherein the feature selection module based on data intensity is configured to calculate the effective intensities of feature sets from the sensor data using a root mean square transformation, identify high and low intensity features based on the calculated effective intensities, select optimal features by retaining the high intensity features and discarding the low intensity features, and store the selected optimal features in a processed data set. System according to claims 1 and 4, wherein the data intensity-based feature selection module is further configured to convert three-dimensional dependent feature sets into quadratic means to enhance the distinction between high-intensity and low-intensity features, visualizes the calculated effective intensities of the feature sets over time windows, and applies a maximum intensity filter to automatically select features with the highest variability while discarding features that represent noise. System according to claim 1, wherein the feature extraction module with hybrid Convolutional Neural Network with Long-Term Memory Network (CNN-LSTM) is configured to receive the processed data set from the data intensity-based feature selection module, extract spatial features from the processed data set using one-dimensional convolutional layers, extract temporal features from the spatial features using long-term memory layers, and generate an extracted feature array. System according to claims 1 and 6, wherein the hybrid Convolutional Neural Network with Long-Term Memory Network further comprises: a plurality of one-dimensional convolutional layers configured to apply convolution operations with learned kernels and bias terms to extract low-level spatial features; a dropout layer configured to randomly disable a predetermined proportion of the input units during training to reduce overfitting; a max-pooling layer configured to reduce dimensionality by selecting maximum values ​​from pooling windows; and a flattening layer configured to convert two-dimensional feature maps into a one-dimensional feature vector for input into the long-term memory layers.The Long-Term Memory Layers are configured to use memory functions to identify persistent temporal dependencies between activity instances, to obtain temporal sequence information from spatial features, and to output temporal feature representations that preserve activity patterns over time. System according to claim 1, wherein the classification module comprises a random forest ensemble classifier configured to receive the extracted feature array from the hybrid convolutional neural network with long-term memory network, analyze linear and nonlinear relationships within the extracted feature array, and classify the sensor data into one or more activity categories selected from the group consisting of walking, running, standing, sitting, climbing stairs, and descending stairs. System according to claims 1 and 8, wherein the random forest ensemble classifier is configured to construct a plurality of decision trees, each using a random subset of features from the extracted feature array, create decision boundaries based on the random subsets of features to capture nonlinear relationships, and aggregate predictions from the plurality of decision trees to produce a final activity classification. System according to claim 1, wherein the central processing unit is further configured to store the extracted feature array, and the system achieves activity detection with minimal computation time by optimizing the feature dimensionality through the data intensity-based feature selection module prior to the deep learning-based feature extraction.