
Flawed datasets cast doubt on AI tools predicting diabetes and stroke
Researchers found that 124 papers used two Kaggle datasets to train stroke- and diabetes-prediction models that may be built on fabricated data; some models are already in clinical use in Indonesia, Spain, and the US, with journals investigating; irregular data patterns—such as unreal completeness and duplicated values—cast doubt on reliability, prompting calls for data-source disclosure and removal of the dubious datasets to prevent flawed clinical decisions.
