Using Scalable Data Mining for Predicting Flight Delays


Abstract in English

Flight delays are frequent all over the world (about 20% of airline flights arrive more than 15 minutes late) and they are estimated to have an annual cost of several tens of billion dollars. This scenario makes the prediction of flight delays a primary issue for airlines and travelers. The main goal of this work is to implement a predictor of the arrival delay of a scheduled flight due to weather conditions. The predicted arrival delay takes into consideration both flight information (origin airport, destination airport, scheduled departure and arrival time) and weather conditions at origin airport and destination airport according to the flight timetable. Airline flights and weather observations datasets have been analyzed and mined using parallel algorithms implemented as MapReduce programs executed on a Cloud platform. The results show a high accuracy in predicting delays above a given threshold. For instance, with a delay threshold of 15 minutes we achieve an accuracy of 74.2% and 71.8% recall on delayed flights, while with a threshold of 60 minutes the accuracy is 85.8% and the delay recall is 86.9%. Furthermore, the experimental results demonstrate the predictor scalability that can be achieved performing data preparation and mining tasks as MapReduce applications on the Cloud.

References used

https://www.researchgate.net/publication/292539590_Using_Scalable_Data_Mining_for_Predicting_Flight_Delays

Download