Accelerating CNN Training with I/O Characterization and Fine-Tuning Data Loading Hyper-Parameters
Training a deep learning (DL) model is a compute and I/O-intensive process requiring several system layers to incorporate. Due to the significant performance gap between the storage subsystem and other system parts, it is crucial to speed up the I/O-related parts of the training process, which act as bottleneck. Some recent studies accelerate the data loading phase in the training pipeline by manipulating the data pipeline and adding extra implementations that impose considerable overheads on the entire training process. Also, previous works have not addressed the impacts of data loading hyper-parameters on performance and resource utilization.
In this work, we accelerate the training of the state-of-the-art models with I/O characterization and fine-tuning data loading hyper-parameters. To do this, we first discuss the importance of data pipeline in DL training by motivating the need for optimal configurations of data loading hyper-parameters. Then, we provide some insights into how an optimal disk setup can enhance performance per cost in different model configurations. Finally, we evaluate the effects of data loading hyper-parameters on performance and resource utilization and provide potential improvements by fine-tuning those parameters. Our method does not impose any overhead on resource utilization or training time. The experiments show that fine-tuning data loading hyper-parameters can achieve up to 5x lower training time and up to 10x higher resource utilization in some specific configurations.