Abstract
An important component of California’s smog check program is the policy of directing the highest polluting vehicles to specialized smog check stations for testing. The State Bureau of Automotive Repair identifies these ‘directed vehicles’ through the use of a regression model that identifies the potentially highest polluting vehicles based upon past tailpipe emissions readings of the same type of vehicles. However, beginning in 2014, the testing procedure for a large portion of the vehicles in California will no longer include a tailpipe emissions measurement. The revised testing procedure will rely upon a scan of the on-board computer diagnostic and control system that controls and constantly evaluates the function of the engine and emissions control systems present on most vehicles manufactured since 1996. The revised procedure will also include a visual inspection of the emissions control devices present on the vehicle. Consequently, there is a need for a regression model capable of identifying the vehicles with the highest likelihood of failure based upon the results of the scan of the diagnostic system and the visual inspection. In this thesis, I developed a binomial logistic regression model that predicts which vehicles are highly likely to fail the vehicle computer diagnostic scan or visual inspection procedure comprising the new inspection procedure. The regression analyses described herein accurately identified a group of approximately 40% of the vehicles subject to smog check inspections that have a higher likelihood of failure than the remaining vehicles subject to testing. Implementation of the regression models described in this thesis, or similar models, will enable the Bureau to continue to identify approximately 30% of the fleet of vehicles as directed vehicles.