- Ability to work with very large datasets.
- Experienced with data modeling, design patterns, building highly scalable and secured solutions
- Identify potential problems, build the hypothesis, identify research data attributes and determine approach to solve analytical problems
- Understand, extract, clean and prepare the data for analysis
- Perform detail exploratory data analysis on the data and report findings
- Ability to understand codes in SAS/R or Python
- Run tests to understand validation of the predictive power of the models built and implementation transition to operations and IT
- Document all findings and smoothly aid implementation by transition to operations and IT
- Make recommendations for procedural improvements supported by analytical findings
- Build presentations for targeted audiences
- Establish timelines to deliver recommendations/solutions and deliver on or before deadlines.
Desired Candidate Profile : Skill Set - Python, Hadoop or Hive, Pyspark (mandatory)
Skill Requirement : Experience with Hadoop distributions like Cloudera, MapR or Hortonworks
- Mandatory experience in R, Apache Spark (SparkR), Python scripting, PySpark
- Experience in Statistical Modeling/Machine learning techniques like Linear Regression, Logistic Regression, Decision Trees or K - means etc ,
- Very good programming skills, preferably in R and/or Python.- In-depth knowledge of data architecture data modelling and database design best practice.
- Hands on experience in big data migrations and data warehousing.