Abstract
c (CKD) is a significant health problem globally, with high morbidity and mortality rates. It often progresses without being visible until it reaches advanced stages, leading to irreversible kidney failure and increased risk of cardiovascular diseases, high blood pressure, and other serious health complications. Early and accurate diagnosis of CKD can significantly improve patient outcomes and reduce healthcare costs. This project focuses on leveraging machine learning models to enhance the prediction and diagnosis of CKD. We have employed various models, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Classifier (GBC), XGBoost, and LightGBM. The dataset is from the University of California (UCI) AI repository. It has undergone rigorous data preprocessing, including advanced techniques for handling missing data and feature selection to improve model performance.Our approach stands out in the comprehensive comparison of ensemble models and traditional classifiers, emphasizing the benefits of data-driven insights in healthcare. The findings highlight that ensemble models, such as Random Forest, Gradient Boosting, XGBoost, and LightGBM, consistently provide high accuracy, precision, and recall scores, making them suitable candidates for CKD prediction. This project demonstrates the efficacy of machine learning models in diagnosing CKD. It emphasizes the importance of data preprocessing, feature selection, and model interpretability to make these tools practical for clinical settings. The results suggest a promising step forward in automated and reliable CKD diagnosis, potentially aiding healthcare professionals in early intervention and personalized treatment planning.