Abstract
As it is challenging to collect data and could be expensive with time and resources, AI developers must look for online pre-collected datasets to train the model. Online resources enable developers to develop the AI model instantly without spending time collecting the data themselves. However, Online data resources have brought new cybersecurity threats as the attacker can embed poisoned data in the dataset that is undetectable by the developers. This data poison allows the attacker to take control of the model or cause it to have wrong outputs. This project aims to defend and preventing data poisoning attacks on AI, pre-checking and removing the poisoned data, or reducing its effect on AI model prediction.