This project is about prediction of Hypothyroidism and Hyperthyroidism using machine learning approaches. The data is obtained from the UCI Machine Learning Repository. The data is preprocessed and various machine learning algorithms are applied to predict the disease. The project is divided into 5 parts:
- Data Preprocessing
- Exploratory Data Analysis
- Model Building
- Model Evaluation
- Model explanation
The data is preprocessed by removing missing values, encoding categorical variables, and scaling the data. The Exploratory Data Analysis is done to understand the data and the relationship between the features. The model is built using various machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boosting, etc. The model is evaluated using various metrics such as precision, recall, f1-score, and accuracy. The results are compared and the best model is selected.
The project is implemented in Python using Jupyter Notebook. The libraries used are pandas, numpy, matplotlib, seaborn, scikit-learn, and xgboost.
$ git clone
What things you need to have to be able to run:
- Python 3.6 +
- Pip 3+
- VirtualEnvWrapper is recommended but not mandatory
$ pip install requirements.txt
This project uses SHAP for explainable AI. SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
Cavalcante, C. M. V., and Rosana C. B. Rego. "Early prediction of hypothyroidism based on feature selection and explainable artificial intelligence." In: Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), 2024, Goiânia. Anais do XXIV Simpósio Brasileiro de Computação Aplicada à Saúde, 2024. pp. 49-60.
Cavalcante, C. M. V., and Rosana C. B. Rego. "Explainable AI Diagnosis for Hypothyroidism." In: 21st IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2024, Natal, Brazil. .
HypoAssist: A software for early prediction of hypothyroidism based on feature selection and explainable artificial intelligence. The software is available at HypoAssist© : Diagnostic Assistant for Hypothyroidism.
Financial support in granting a Scientific Initiation scholarship and UFERSA/PROPPG 65/2022 (PAPC)