Aim
Ā Ā Ā Ā Ā Ā The aim of this research is to develop an intelligent system capable of detecting and classifying obfuscated privacy malware into various categories and families. This system leverages machine learning and deep learning models trained on the CIC-MalMem-2022 dataset to improve accuracy and address the challenges posed by data imbalance and complex malware behaviour.
Abstract
Ā Ā Ā Ā Ā Ā Ā Malware that targets user privacy has seen significant growth in recent years, fuelled by global digital adoption and the increasing reliance on e-commerce and online services. Privacy-focused malware often uses obfuscation techniques to evade detection, making it difficult for traditional systems to identify and classify them. In this study, we utilize the CIC-MalMem-2022 dataset, based on memory dumping analysis, to train three classifiers: a binary classifier to differentiate between benign and malicious samples, a category classifier to identify benign, spyware, ransomware, and trojan horse samples, and a family classifier capable of recognizing 16 specific malware families.
Ā Ā Ā Ā Ā Ā Ā To overcome the challenges of imbalanced datasets, the Synthetic Minority Oversampling Technique (SMOTE) was applied. Models were trained using both traditional machine learning algorithms and a Deep Neural Network (DNN). Experimental results highlight the superior performance of the DNN, particularly in multiclass classification tasks, making it a viable solution for enhancing malware protection systems.
Existing System
Ā Ā Ā Ā Ā Ā Ā Existing malware detection systems primarily rely on signature-based methods, which are effective for known malware but fail to address polymorphic and obfuscated malware. Heuristic approaches offer improvements but are limited in their ability to generalize across new or unseen malware families. While machine learning techniques have shown promise in detecting binary classifications of malware, their performance in multiclass scenarios, particularly with obfuscated malware, is inadequate. The reliance on imbalanced datasets further reduces the effectiveness of traditional systems, making advanced approaches necessary.
Problem Definition
Ā Ā Ā Ā Ā Ā Obfuscated privacy malware poses a significant challenge to cybersecurity due to its ability to mask behaviour and evade traditional detection mechanisms. The complex and polymorphic nature of such malware makes it difficult to identify patterns that distinguish malicious activity. Moreover, the imbalanced distribution of benign and malicious samples in available datasets hinders the performance of machine learning models. Effective detection and classification of such malware demand advanced techniques capable of handling both data imbalance and behavioural complexity.
Proposed System
Ā Ā Ā Ā Ā Ā This study proposes a comprehensive approach to malware detection and classification using memory dumping observations from the CIC-MalMem-2022 dataset. The proposed system includes three classifiers: a binary classifier to differentiate benign and malicious samples, a category classifier to group samples into benign, spyware, ransomware, and trojan horse classes, and a family classifier to identify 16 specific malware families. By integrating SMOTE for dataset balancing and leveraging deep learning architectures such as the DNN, the system achieves enhanced detection accuracy. Additionally, a user-friendly web application is developed to provide real-time malware prediction and visualization.
Advantages
The proposed system offers several advantages, including:
- High accuracy in malware detection and classification, especially in multiclass scenarios.
- Robustness against data imbalance through the application of SMOTE.
- Scalability to accommodate new malware families and evolving threats.
- User-friendly web interface for real-time malware detection and user interaction.
Reviews
There are no reviews yet.