Aim:
        To develop a robust and efficient system for detecting Android malware by leveraging informative syscall subsequences, advanced machine learning, and deep learning models trained on the CICMalDroid2020 dataset.
Abstract:
      Android dominates the global smartphone industry with a market share exceeding 70%, making it a prime target for malware applications. Static malware detection techniques often fail against code obfuscation, while manipulating runtime syscall sequences remains challenging for attackers. Current syscall-based detection systems rely on numerical features like syscall frequencies and transition probability matrices, which require large datasets and are vulnerable to noise and outliers.
     This study proposes a novel approach that improves efficiency and accuracy using binary representations of syscall subsequences identified through the information gain method. Utilizing the CICMalDroid2020 dataset, this research trains multiple machine learning models—XGBoost, LightGBM, Random Forest—and an advanced Bi-LSTM-based deep learning model to classify 15 malware families. Results are compared to identify the best-performing model, which can be deployed for real-time malware detection.
Existing System:
        Existing systems for Android malware detection predominantly rely on static analysis, which involves examining the application’s source code or binary structure to detect malicious patterns. These methods are effective against known malware but fail when the malware employs obfuscation techniques, such as encryption or polymorphism, to hide malicious intent.
      While dynamic analysis techniques like syscall-based detection show promise, most approaches depend on numerical features such as syscall frequencies or transition probability matrices. Models like Random Forest, Decision Tree, Logistic Regression, and K-Nearest Neighbors have achieved accuracies between 98% and 99% for binary classification. However, these systems face significant challenges, including:
- Limited ability to handle code obfuscation in static analysis.
- High dependency on large datasets for training machine learning models.
- Susceptibility to noise and outliers in syscall data.
- Lack of scalability to multiclass malware classification.
Problem Definition:
        Android malware detection systems must evolve to address the challenges of handling increasingly sophisticated malware. The reliance on static analysis in existing methods makes them ineffective against malware that employs code obfuscation or encryption. While syscall-based dynamic analysis methods offer better detection capabilities, they rely heavily on numerical features, which are sensitive to noise and require extensive datasets for effective training.
    Furthermore, most current solutions focus on binary classification, neglecting the need for detailed multiclass classification of diverse malware families. These limitations create a pressing need for robust techniques capable of efficiently representing and analyzing syscall data while ensuring high accuracy across multiple malware families.
Proposed System:
       The proposed system focuses on utilizing the CICMalDroid2020 dataset to classify Android malware into 15 families. Informative syscall subsequences are identified using the information gain method, and binary feature representation is applied to improve model efficiency. Multiple machine learning models, including XGBoost, LightGBM, and Random Forest, as well as an advanced Bi-LSTM deep learning model, are trained on this data. The models are compared based on their accuracy, precision, recall, and F1-score, with the goal of identifying the optimal solution. This approach addresses challenges of dataset imbalance, noise, and obfuscation while enhancing scalability for future use cases.
Advantages:
- High accuracy in multiclass malware classification.
- Reduced training data requirements through binary feature representation.
- Robustness against noise and dataset imbalances.
- Scalability for future incorporation of new malware families.
Reviews
There are no reviews yet.