Abstract
Anybody impacted by fire loss requires clear and accurate answers to move towards closure and the proposed method aims to provide the best support possible to do so. Forensic analysis of fire debris is an important factor in cause and origin investigations. The most expensive, difficult, subjective, and critical aspect of this analysis is the qualitative classification of data - does the fire debris submitted contain fatty acids prone to spontaneous heating?
The ASTM E-2881 methodology does not require the practitioner to assess the confidence of a prediction or report error rates on quality assurance evaluations. The goal of this thesis is to achieve a different approach to classification, in which machine learning is used to perform the same classification task under a variety of preprocessing methods to transparently report confidence and establish competency. The data to train the model was generated from spontaneous heating samples, neat exemplars that contained unsaturated fatty acids, real-life case samples submitted to a forensic laboratory, and pyrolyzed substrates for a total of 310 samples. The machine learning models consistently correctly classified fatty acids in fire debris across a wide range of substrates and vegetable oils. Several preprocessing models were explored, with the best performing model using Target Compound preprocessing based on accepted chemometric principles, such as qualifier to quantifier ion ratios and peak integration. The advantages of a machine learning based classification method include greater consistency, improved analysis speed, a larger experience base, dynamic quantitative thresholding in classification, low cost of maintenance, and the ability to work continuously without pay.
This thesis approaches the most subjective part of forensic analysis with a novel methodology taking advantage of established chemometric and artificial intelligence techniques. The best performing preprocessing methods and developed models were able to classify fire debris samples with an average accuracy, which for binary classification is the sum all correct classifications predictions over total predictions, above 0.999 across multiple replicates and dataset shuffles strongly indicating an improvement over the existing norm for this analysis in terms of documented competency. The methodology established in this thesis was shown to be versatile enough to be extended to other trace forensic analyses like ignitable liquid residue analysis.