Abstract
Extract Transform and Load (ETL) is a process in data warehouse projects to extract data from external sources, transform them according to business needs and load them in target database. The web ETL tool developed by Nithin Vijayendra as part of his master’s project “A Courseware on ETL process” has the below features: • Extract data from text files and MySQL tables. • Apply data type transformation and merge from multiple sources with or without duplicates. • Loading the data back to MySQL tables. I propose to add the below features: 1. Support for Oracle, XML Sources and Oracle Target Adding functionality to support XML, Oracle Sources, and Oracle target. Commercial tools will have capabilities to extract data from multiple sources formats like XML and relational database (Oracle). Currently, the tool supports only MySQL as source and target. Enhancing the tool to support Oracle and XML format as source and Oracle as target will provide end users with additional capability of using these source and targets. 2. Additional Transformations Adding additional commonly used transformation such as joining tables, filtering data, dropping columns and generating surrogate keys to make it complete. Currently, tool supports transformations like merging from multiple sources (with or without duplicates) and applying data table transformations. 3. Metadata Repository Enhance the tool to support metadata repository. When using the tool as an end user if I select my source as flat file, I have to manually enter the meta data for each and every column in the flat file which is tedious and time consuming; rather I would prefer to have a metadata repository which I can use as template for using it against similar flat file from different sources, so in this way I can modify only the required meta data according to my needs rather than specifying metadata for each and every field once again. The above mentioned features will be implemented using Php5, JavaScript and Html. I would be using MySQL and Oracle databases to store the data. The application will be hosted on the server gaia.ecs.csus.edu.