Abstract
Modern data centers house hundreds to hundreds of thousands of servers. As the complexity of the data centers increase, it becomes very important to monitor the network for capacity planning, performance analysis and enforcing security. Sampling the network data and analyzing helps data center administrators plan and tune for optimum application performance. But as data centers grown in size, the sampled data sets become very large in size. We study the application of map-reduce model, a parallel programming technique, to process and analyze these large network traces. Specifically, we analyze the network traces for iSCSI performance and network statistics. We design and implement a prototype of a protocol-aware network trace-processing tool called Netdoop. This prototype functions as a reference design. We also implement a farm of virtual servers and iSCSI targets to create an environment that represents a small data center. We use this virtual environment to collect data sets and demonstrate the scalability of the prototype. Further, these virtual servers are also used to host and run the map-reduce framework. Based on the performance and scalability of the tool that is developed, we make conclusions about the applicability of the map-reduce model for analyzing large network traces.