Aim:
To enhance the assigning accuracy of former methods in spam detection in Twitter using advanced methods.
Synopsis:
Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also attracts many spammers to infiltrate legitimate users’ accounts with a large amount of spam messages. In this paper, we discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Our results show that among the four classifiers and produce the best results.
Existing System:
Stream clustering methods have been repeatedly used for spam filtering in order to categorize input messages/tweets into spam and non-spam clusters. These methods assume each cluster contains a number of neighbor small (micro) clusters, where each micro-cluster has a symmetric distribution. Nonetheless, this assumption is not necessarily correct and big micro clusters might have asymmetric distribution. To enhance the assigning accuracy of former methods in their online phase, we suggest replacing by machine learning classifiers.
Proposed System:
We discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Firstly, exhaustive use of Natural language processing (NLP) techniques has been rendered towards creation of a new comprehensive dataset with a wide range of content-based features. After that, our results show that among the four classifiers we evaluated the accuracy results. Finally, we created a web application using flask. It will classify spam or ham.
Reviews
There are no reviews yet.