Aim:
To develop a robust and efficient system for detecting Android malware by advanced machine learning, and deep learning models trained on the CICMalDroid2020 dataset.
Abstract:
Android is the most widely used mobile operating system, but it’s also a major target for malware. One existing system, called PermGuard, helps detect Android malware by looking at how app permissions are used and whether they might be misused. It uses a smart learning method that improves over time and has shown high accuracy in spotting malicious apps. However, it mainly focuses on permissions and may not work as well for identifying different types of malware. This research proposes a new approach using the CICMalDroid2020 dataset, which includes 15 types of Android malware. We train and compare different machine learning models like XGBoost, LightGBM, and Random Forest. We also use a deep learning model called Bi-LSTM, which can understand patterns in app behavior over time. The goal is to find the best model for real-time malware detection. We measure each model’s performance using accuracy, precision, and recall. This study compares the existing permission-based method with our new behavior-based approach. In the end, we aim to improve Android malware detection by making it more accurate, flexible, and ready for real-world use.
Proposed System:
The proposed system focuses on utilizing the CICMalDroid2020 dataset to classify Android malware into 15 families. Informative syscall subsequences are identified using the information gain method, and binary feature representation is applied to improve model efficiency. Multiple machine learning models, including XGBoost, LightGBM, and Random Forest, as well as an advanced Bi-LSTM deep learning model, are trained on this data. The models are compared based on their accuracy, precision, recall, and F1-score, with the goal of identifying the optimal solution. This approach addresses challenges of dataset imbalance, noise, and obfuscation while enhancing scalability for future use cases.
Advantages:
- High accuracy in multiclass malware classification.
- Reduced training data requirements through binary feature representation.
- Robustness against noise and dataset imbalances.
- Scalability for future incorporation of new malware families.






Reviews
There are no reviews yet.