In this paper we focus on application of data-driven methods for remaining useful life estimation in components where past failure data is not uniform across devices, i.e. there is a high variance in the minimum and maximum value of the key parameters. The system under study is the hard disks used in computing cluster. The data used for analysis is provided by Backblaze as discussed later. In the article, we discuss the architecture of of the long short term neural network used and describe the mechanisms to choose the various hyper-parameters. Further, we describe the challenges faced in extracting effective training sets from highly unorganized and class-imbalanced big data and establish methods for online predictions with extensive data pre-processing, feature extraction and validation through online simulation sets with unknown remaining useful lives of the hard disks. Our algorithm performs especially well in predicting RUL near the critical zone of a device approaching failure. With the proposed approach we are able to predict whether a disk is going to fail in next ten days with an average precision of 0.8435. We also show that the architecture trained on a particular model is generalizable and transferable as it can be used to predict RUL for devices in other models from same manufacturer.