We’re hiring a hands-on Data Scientist/Engineer to build production-ready data services and ML solutions. Your primary goal is to turn raw, messy, and distributed data into clean, reliable, well-documented datasets and insights that power product and decision-making. You’ll design robust pipelines, perform deep exploratory analysis, define transformations and aggregations for data, and expose results via stable back-end interfaces (datasets, batch jobs, and APIs).
Key Responsibilities:
- Conduct data discovery and profiling across sources; map schemas, assess quality, identify gaps/anomalies, and document data contracts.
- Build and maintain ETL/ELT pipelines for ingestion, cleaning, validation, normalization, and aggregation (daily/weekly/monthly rollups, feature tables).
- Perform exploratory data analysis (EDA) and statistical analysis to uncover patterns, trends, outliers, and data issues; produce analysis-ready tables.
- Define and implement business logic and transformations (feature engineering, window functions, joins, dedupe, imputation, enrichment).
- Data analysis and modeling: Analyze data, verify hypotheses, and develop prediction models using data analysis tools and techniques.
- Ensure data quality and governance: validations, unit/expectation tests, lineage, documentation, PII handling, and access controls.
- Collaborate with research and engineering teams to translate questions into datasets, metrics, and clear definitions.
- Optimize processing performance and cost (indexing, partitioning, clustering, caching, vectorized ops, parallelization).
- Keeping up with technological advancements and industry trends: Keep up with the latest data analysis techniques and industry trends to enhance the company’s data analysis capabilities.
- Uphold engineering best practices: version control, code reviews, testing, CI/CD, reproducible notebooks/experiments, and documentation.
Required Qualifications:
- Bachelor’s degree or higher in Data Science, AI, Statistics, Computer Science, or related field.
- 3+ years in data analysis, ML model development, or data engineering.
- Understanding and experience in programming languages such as Python (Pandas, NumPy, scikit-learn, XGBoost), R and proficiency in SQL/NoSQL.
- Experience developing ML/DL models (TensorFlow, PyTorch, scikit-learn).
- Knowledge of database design and optimization (SQL/ NoSQL) and good understanding of MS Excel.
- Experience deploying data/ML workloads in cloud environments (GCP, AWS, or Azure).
- Basic understanding of web systems and how back-end services integrate with client apps.
- Strong logical thinking, problem-solving, and clean, maintainable coding habits.
- Communication skills: Requires the ability to communicate analysis results to non-experts.
Preferred Qualifications:
- Prompt engineering and agent development experience.
- Built services with modern LLM frameworks (e.g., LangChain, LangGraph, DeepAgents).
- Data visualization/reporting (Tableau, Power BI) and Python viz (Matplotlib/Seaborn/Plotly) for stakeholder communication.
- Experience with Linux operating systems and shell scripting.
- Domain expertise: NLP, time-series, recommender systems, anomaly detection.
- Large-scale processing (Hadoop, Spark, Dask).
- MLOps & governance (MLflow, Kubeflow, feature stores, model/version management).
- Experience on global projects and strong English communication skills.
What You’ll Work On:
- High-quality ingestion and transformation jobs that convert raw feeds into trustworthy, analysis-ready datasets.
- Metrics, dashboards, and quality monitors that demonstrate business impact.
- Analytical studies (EDA, cohort analyses, attribution, baseline metrics) that inform product decisions.
- Productionized data products/APIs with monitoring for data freshness, completeness, and accuracy.