Unpredictability is one of the top reasons that prevent people from using public transportation. To improve the on-time performance of transit systems, prior work focuses on updating schedule periodically in the long-term and providing arrival delay prediction in real-time. But when no real-time transit and traffic feed is available (e.g., one day ahead), there is a lack of effective contextual prediction mechanism that can give alerts of possible delay to commuters. In this paper, we propose a generic tool-chain that takes standard General Transit Feed Specification (GTFS) transit feeds and contextual information (recurring delay patterns before and after big events in the city and the contextual information such as scheduled events and forecasted weather conditions) as inputs and provides service alerts as output. Particularly, we utilize shared route segment networks and multi-task deep neural networks to solve the data sparsity and generalization issues. Experimental evaluation shows that the proposed toolchain is effective at predicting severe delay with a relatively high recall of 76% and F1 score of 55%.