Abstract
Describes a new protocol that helps the user in building reliable distributed applications with file operations. Our file checkpointing and recovery protocol is designed to consistently checkpoint and recover user files with respect to the volatile state of the distributed program. Based on the protocol, a file I/O interface has been implemented as part of our Libra library for supporting fault tolerance in distributed applications. File operations are done using this interface whereas the complexity of checkpointing and recovering user files is hidden from the application level-the checkpointing and recovery of user files are done automatically.