? Build tools to ingest new data and persist it into a database
? Build tools / wrappers to easily query, select & download the desired data slice to a local machine from the Big-Data storage DBs, in a tabular format .
? Port the algorithms developed by data scientists into the Big-Data framework so that they can be executed at scale within the constraints of the production environment
Skills Required:
? Very strong coding skills in any one of Python/R/Scala
? Strong in Data Structure/Algorithms especially Data Algorithms
? Experienced with using distributed data processing layers like Spark, Map Reduce, Akka. (must be strong in at least one of these)
? Experienced with using distributed persistence systems like NoSQL, Cassandra, Hadoop, S3. (must be strong in at least one of these)
? Sound understanding of statistics and linear algebra ? Basic Knowledge of machine learning concepts