building a robust ETL pipeline to manage data from disparate of data sources
building and scaling our backend data stores and compute engines to process large quantities of data
creating text corpora of multilingual data, and tools to process them in a variety of ways
building a low latency serving layer that powers our dashboards, reports, and other analytics functionality
building an analytics pipeline to serve actionable insights and recommendations to our customers
building a culture of data and statistics driven thinking in everything we do.
Have some personal projects that you work on during your spare time. Show off projects you have hosted on Github.
Use the command line like a pro. Be proficient in regular expressions, Xpath and other libraries for parsing and extracting unstructured data.
Have built RESTful APIs; addressed challenges of scale and performance.
Have exposure to large scale computational models such as MapReduce and Spark
Have experience using one or more storage and indexing technologies such as MongoDB, Cassandra, Solr, Elastic.
Be a generalist who has the ability to pick up any of these over a weekend and get to work on the Mondaynext
Be a self-starter, someone who thrives in environments with minimal “management”
Have exposure to analytics tools and libraries like R and Pandas
Have background in Machine Learning.