On the Data-Driven Prediction of Arrival Times for Freight Trains on U.S. Railroads


The high capacity utilization and the pre-dominantly single-track network topology of freight railroads in the United States causes large variability and unpredictability of train arrival times. Predicting accurate estimated times of arrival (ETAs) is an important step for railroads to increase efficiency and automation, reduce costs, and enhance customer service. We propose using machine learning algorithms trained on historical railroad operational data to generate ETAs in real time. The machine learning framework is able to utilize the many data points produced by individual trains traversing a network track segment and generate periodic ETA predictions with a single model. In this work we compare the predictive performance of linear and non-linear support vector regression, random forest regression, and deep neural network models, tested on a section of the railroad in Tennessee, USA using over two years of historical data. Support vector regression and deep neural network models show similar results with maximum ETA error reduction of 26% over a statistical baseline predictor. The random forest models show over 60% error reduction compared to baseline at some points and average error reduction of 42%.

21st International Conference on Intelligent Transportation Systems, ITSC 2018, Maui, HI, USA, November 4-7, 2018