Aim:
Ā Ā Ā Ā Ā The primary aim of this study is to develop a robust and accurate auxiliary diagnostic system for breast cancer by integrating machine learning techniques with a hybrid strategy.
Abstract:
Ā Ā Ā Ā Ā Ā Breast cancer has replaced lung cancer as the number one cancer among women worldwide. In this paper, we take breast cancer as the research object, and pioneer a hybrid strategy to process the data, and combine the machine learning method to build a more accurate and efficient breast cancer auxiliary diagnosis model. First, the combined sampling method SMOTE-ENN is used to solve the problem of sample imbalance, and the data are standardized to make the data have better separability. Then, the features of the dataset are initially screened using the mutual information method, and further secondary feature selection is performed using the recursive feature elimination method based on the Logistic Regression algorithm. Thus, the feature dimensionality of the dataset is reduced and the generalization ability of the model is improved. Finally, four different machine learning models are used for classification prediction, the best combination of parameters for each model is found, and the final results of each model are derived. The experiments are conducted using the Wisconsin Diagnostic Breast Cancer dataset (WDBC), and the results of the study find that after the data are processed by the hybrid strategy, the best prediction results are obtained using the Random Forest model with high accuracy, which is better than the previous research methods.
Existing System:
Ā Ā Ā Ā Ā Ā Breast cancer is the leading cause of death in the developed world and second in the developing world, killing almost 8 million people a year. According to the characteristics of the WDBC dataset, the Z-score standardization method is selected to process the dataset. First, the top 20 features with scores are screened by the mutual information method, and then 13 features are screened by the recursive feature elimination method based on the XGBoost algorithm to obtain the final feature subset. So high accuracy need to predict the breast cancer. In existing system, Random forest algorithm used to detect the breast cancer. It gives little minimum amount of accuracy compare to proposed system. So we move on to the proposed system
Ā Proposed System:
Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā The incidence and mortality rate of breast cancer is increasing year by year and has become the number one cancer among women worldwide. In the medical field, the diagnosis and treatment of breast cancer relies heavily on early detection and treatment, and the earlier the treatment, the better the clinical outcome for patients. Firstly, in the preprocessing sections, some categorical values are found, factorize is used to encode the categorical values into numerical. A combined SMOTE sampling method is used to solve the problem of sample imbalance. Then, the features of the dataset are screened using the mutual information method, and further the recursive feature elimination method based on the Logistic Regression is used to derive the best feature subset. Finally, four different machine learning models Random Forest, SVM, KNN, and Gradient Boost, are used for classification and prediction. The experimental results find that the best prediction results are obtained using the RF model, with the high accuracy. This is better than the previous research methods
Reviews
There are no reviews yet.