Build robust data infrastructure at Pankh.AI to power AI and analytics.
What you'll do
- Design and build scalable data pipelines using Apache Spark and Airflow
- Implement data warehouse schemas and ETL processes
- Build real-time data streaming solutions with Kafka
- Ensure data quality through validation, testing, and monitoring
- Optimize data storage and query performance across petabyte-scale data
What we're looking for
- 2-5 years of data engineering experience
- Strong proficiency in Python, SQL, and Apache Spark
- Experience with Airflow, dbt, or similar orchestration tools
- Knowledge of data warehouse design (Snowflake, BigQuery, or Redshift)
Nice to have
- Experience with real-time streaming (Kafka, Kinesis)
- Knowledge of data governance and compliance frameworks