Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
Predictive Analytics is popular in various field ranging from Retail, Finance, Healthcare to Education. We at ALTEN Calsoft Labs have used Predictive Analytics to predict patient re-Admission within 30 days, diabetic retinopathy, predicting length of stay in healthcare, student retention/performance in education and revenue forecasting/product recommendations in retail segment.
Some of the techniques used in Predictive Analytics are:
Once the dataset is available. It is sent for processing, cleaning up, etc. The refined dataset is split into train and test sets in the ratio of 70% and 30% respectively. The larger set forms the training data set and will be used to train the model whereas, the purpose of the test dataset is used to evaluate the performance of the final model at the very end. There are many different learning algorithms viz. Random Forest, Support Vector Machine (SVM), Naive Bayes, Artificial Neural Networks (ANN), Decision Tree Classifiers which can be used for training the model. Techniques such as cross-validation are used in the model creation and refinement steps to evaluate the classification performance. The most popular tools used are Python, R, Scikit lib, SAS, Mathematica and Matlab. Once the model is ready, its performance is evaluated on the test data at the very end. There are many techniques for evaluating the performance of a model. The techniques vary according to the type of model (regression, classification) and the problem domain.
As a complete solution to Predictive Modelling, the ALTEN Calsoft Labs’ Predictive Analytics Platform provides multiple micro-services for various data processes, analysis processes and finally data visualization processes.
The Apache Hadoop software library is a framework that allows distributed processing of large datasets across clusters of computers using…