Accelerating CNN Training with I/O Characterization and Fine-Tuning Data Loading Hyper-Parameters

Jul 1, 2024·
Ali Farahani
Ali Farahani
· 0 min read
Abstract

Training a machine learning (ML) model is a compute and I/O-intensive process requiring several system layers to incorporate. Due to the significant performance gap between the storage subsystem and other system parts, it is crucial to speed up the I/O-related part of the training process. Some recent studies accelerate the data loading phase in the training pipeline by manipulating the data pipeline and adding extra implementations that impose considerable overheads on the entire training process. Also, previous works have not addressed the impacts of data loading hyper-parameters on performance and resource utilization.
In this work, we accelerate convolutional neural networks (CNNs) with I/O characterization and fine-tuning data loading hyper-parameters. To do this, we first discuss the importance of data pipeline in ML training by motivating the need for optimal configurations of data loading hyper-parameters. Then, we provide some insights into how an optimal disk setup can enhance performance per cost in different model configurations. Finally, we evaluate the effects of data loading hyper-parameters on performance and resource utilization and provide potential improvements by fine-tuning those parameters. Our method does not impose any overhead on resource utilization or training time. The experiments show that by fine-tuning data loading hyper-parameters, we can achieve up to 5x lower training time and up to 10x higher resource utilization in some specific configurations.