Optimizing Data Set Size and Loss Functions for Enhanced Neural Network Performance

21 Jun 2024


(1) J. Quetzalcóatl Toledo-Marín, University of British Columbia, BC Children’s Hospital Research Institute Vancouver BC, Canada (Email: j.toledo.mx@gmail.com);

(2) James A. Glazier, Biocomplexity Institute and Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN 47408, USA (Email: jaglazier@gmail.com);

(3) Geoffrey Fox, University of Virginia, Computer Science and Biocomplexity Institute, 994 Research Park Blvd, Charlottesville, Virginia, 22911, USA (Email: gcfexchange@gmail.com).

Abstract and Introduction




Conclusions and References

4 Discussion

Deep diffusion surrogates can aid in obtaining the steady-state solution for multiscale modeling. In this paper we looked at 20 sources randomly placed in 2D lattice. It’s still far less complex than the real problem, e.g., simulation of a vascular system. Nevertheless, this is a step forward in that direction. We have shown that increasing the number of sources already pose a number of challenges in different aspects. We showed how the network architecture, the training set structure and size, the loss function, the hyperparameters for training algorithms and defining metrics to evaluate the task-specific performance of the trained network (which may differ from the loss function used in training) are all aspects that affect the final product. In the case of the NN architecture we argued that the encoder-decoder CNN architecture performs well not due to data compression, as believed by many, rather to data transformation akin to Fourier transform. However a more rigorous proof is, both required and desired. That’s not to say that other architecture should not be able to perform well in predicting the steady-state solution. In fact, in prior work [17] we combined an ED-CNN and a CNN for a similar task and found that the CNN improved the performance by reinforcing the sources in the prediction. The results shown in this paper highlight that the largest absolute error occurs at and near the sources. Therefore, we reckon the architecture such as a UNet [27] as a good candidate for this task and we leave it for future work.

We considered different loss functions and compared the different performances due to the different loss functions. The wide numeric range for input and output of neural networks makes the analysis very sensitive to choices in the loss function. Our results suggest that the loss function can have a significant effect on the model’s performance. However, we showed that the data set size has a greater effect in the model’s performance and, furthermore, the performance associated to the loss function depends on the data set size. We also showed how a large enough data set reduces the performance fluctuations in the test set. In a real problem the landscape of possibilities is unknown, which implies that the model fluctuations are, at least, unknown. This hints the difficulty in bounding the performance error for any unseen configuration. The naive solution is to increase the training data set. Increasing the training set arbitrarily will lead to an increase in training time. So care must be taken in such approach. One needs to take the best of both worlds, i.e., models trained on sufficiently large data sets to reduce fluctuations but small enough in order be able to train in a fashionable time. A better curated data set where configuration redundancy is kept at a minimum can lead to better performance. Another approach that seem promising is active learning [28] whereby data is ranked by the magnitude of the performance error and data with the largest error is then fed into the training.

Defining the right metrics in deep learning is highly challenging. Partly because quantifying the degree of success can be difficult, whereas it is fairly easy agreeing in the ideal success. In other words, quantifying good enough is not straightforward and requires bench-marking different approaches for comparison. A thorough discussion on benchmark suite can be found in [29].

This paper is available on arxiv under CC 4.0 license.