Key Responsibilities
· Build, deploy, and manage data pipelines using Python and PySpark
· Develop and optimize ETL/ELT processes to support data integration across systems
· Work directly with GCP and AWS services to implement scalable cloud-based data solutions
· Own data workflows end-to-end — from ingestion to transformation to storage
· Continuously monitor and improve pipeline reliability, speed, and data quality
· Use GenAI and automation tools to speed up development and reduce manual effort
· Proactively debug, troubleshoot, and resolve data engineering issues
· Ensure data is available and trustworthy for analytics and downstream systems
· Deliver high-quality code and documentation with a bias for action
Required Skills & Qualifications
· Strong proficiency in Python and PySpark
· Working experience with cloud platforms like GCP and/or AWS
· Solid understanding of ETL/ELT, data warehouse and data lake concepts.
· Proficient in working with relational databases—preferably PostgreSQL and MySQL—as
well as non-relational databases, with a focus on MongoDB.
· Driven by delivery and results — you get things done efficiently
· Self-starter attitude with minimal need for hand-holding
· Excitement for automating work using GenAI or scripting tools
· Familiarity with SCD, CDC, Real-time Streaming vs Batch-processing.
Nice to Have
· Experience with CI/CD pipelines and Docker
· Understanding of data governance and observability
· Prior experience in fast-paced, execution-heavy teams
Why Join Us?
· High-impact role where execution speed is valued and recognised
· Freedom to build, ship, and iterate without red tape
· Work with a lean, high-performing data team
· Opportunities to innovate with Generative AI tools
· Fast learning environment and ownership from day one
If you're someone who prefers delivering over deliberating — apply now and help us build data
infrastructure that moves the needle