Abstract
Managing a network file share is time consuming and costly. The task of routinely archiving stale files from the share is very difficult for information technology professionals. MLFAT provides a mechanism to make the task of identifying and archiving easy and convenient. Using a combination of knowledge based systems and machine learning; MLFAT is able to assist information technology professionals regardless of the physical hardware their network runs on. The knowledge based system provides a bootstrap engine for MLFAT. It allows the information technology professional to assign categories with associated retention values for files and folders in the network share. It then provides a propagation mechanism that distributes the categories throughout the system. This propagation converts the initial unlabeled file system into a labeled file system, allowing supervised learning. The machine learning component makes use of a support vector machine learner. The support vector machine will train on a subset of the file system, generate a decision function, and then use the decision function to classify a subset of the file system. MLFAT will present the finding of the decision function to the information technology professional that will have the final decision on archiving a file or folder. MLFAT's combination of a knowledge based bootstrap engine, machine learner, and lack of hardware requirements, allows it to be deployed in a greater variety of systems than current archival tools. The bootstrap engine grants greater flexibilty in deployment than other tools, making it possible to deploy in unstructured systems at anypoint in their lifetime. MLFAT additionally provides the ability for information technology professionals to make use of the Prolog files generated by the knowledge based system to run discovery queries, in the case of litigation.