Informação básica
Ref Number
Last day to apply
Primary Location
Country
Job Type
Work Style
Descrição e requisitos
AI & Infrastructure
Overview:
The Senior Software Engineer - AI & Infrastructure is a critical technical leadership role responsible for driving the development, deployment, and operational excellence of AI-powered applications and supporting infrastructure. This position demands a unique combination of AI engineering expertise, full-stack development capabilities, infrastructure-as-code proficiency, and a deep commitment to software engineering rigor and quality. As a senior technical leader and gatekeeper for AI projects, you will shape technical direction, champion coding standards, mentor team members, and ensure the successful delivery of production-ready AI applications.
This role is essential in establishing and maintaining high engineering standards across the entire stack—from AI application code to user-facing APIs, deployment pipelines, and production operations. You will work alongside our existing senior engineers while bringing complementary expertise in shipping production AI applications with the infrastructure, observability, and developer experience needed to support them.
Our team builds software that other teams within the company consume, making strong cross-team communication and developer support critical to success.
Key Responsibilities:
Technical Leadership & Architecture:
Serve as a technical gatekeeper and leader for AI projects, providing architectural oversight and
technical direction from conception through production deployment.
Shape the technical architecture of AI applications, including API design (internal and external),
data pipelines, and integration patterns.
Drive architectural decision-making to ensure solutions are scalable, maintainable, secure, and
aligned with business objectives.
Lead technical design discussions and provide expert guidance on complex engineering challenges
across the full stack.
Champion software engineering best practices across AI-specific code, backend and frontend APIs,
infrastructure, and deployment pipelines.
AI Application Development & Delivery:
Design, develop, and deploy production-ready AI applications with a focus on reliability, performance, and user experience.
Contribute to and oversee both AI-specific application code and the surrounding infrastructure (APIs, data pipelines, integration layers, user interfaces).
Build robust APIs that provide seamless access to AI capabilities for both internal and external users.
Ensure AI applications are production-ready with proper error handling, monitoring, testing, and documentation.
Proven track record of shipping AI products to production on-time and at high quality.
Infrastructure, DevOps & Platform Engineering:
Design and implement infrastructure-as-code to provision and manage cloud resources.
Build and maintain CI/CD pipelines to automate testing, building, and deployment processes.
Containerize and orchestrate application deployments for scalability and reliability.
Champion developer ergonomics by improving tooling, documentation, and workflows that enable team
members to be productive quickly.
Optimize cloud infrastructure for cost, performance, and security.
Observability & Production Operations:
Implement comprehensive observability solutions to ensure visibility into system behavior and performance.
Establish monitoring, logging, alerting, and tracing for AI applications.
Lead rapid response to production incidents, conducting root cause analysis and implementing preventive measures.
Manage and optimize production environments for high availability, scalability, and resilience.
Code Quality & Engineering Excellence:
Uphold and enforce rigorous coding standards, design patterns, and engineering best practices across all codebases.
Conduct thorough code reviews with a focus on quality, maintainability, security, and performance.
Foster a culture of engineering excellence, continuous improvement, and technical accountability.
Lead by example in writing clean, well-tested, documented, and maintainable code.
Drive the adoption of automated testing strategies including unit, integration, and end-to-end tests.
Cross-Team Collaboration & Developer Support:
Work closely with other engineering teams across the company to help them integrate with and consume the software our team builds.
Provide technical guidance and support to developers from other teams, including documentation, examples, and troubleshooting assistance.
Communicate complex technical concepts clearly to both technical and non-technical stakeholders.
Collaborate effectively with Product Management, Data Science, QA, Design, and Operations teams.
Participate actively in agile ceremonies including sprint planning, daily stand-ups, retrospectives, and backlog refinement.
Contribute to project planning, estimation, and execution to ensure timely delivery without compromising quality.
Mentorship & Team Development:
Provide technical mentorship and guidance to junior and mid-level engineers, helping them grow their skills and advance their careers.
Share knowledge through documentation, technical talks, code reviews, and pair programming sessions.
Encourage and facilitate continuous learning within the team, staying current with emerging technologies and best practices.
Build a culture of collaboration, knowledge sharing, and collective ownership.
Required Qualifications:
Technical Skills:
Programming Languages:
Expert-level proficiency in **Python** for AI development and backend API development
Strong proficiency in **Rust** for high-performance backend services and APIs
Proficiency in **JavaScript/TypeScript** for UI component development
Experience with modern language features, design patterns, and idiomatic code
AI Engineering:
Proven experience building and deploying AI applications to production environments
Familiarity with modern AI frameworks and libraries (e.g., Haystack AI, LangChain, OpenAI SDKs)
Experience with AI application architecture including prompt engineering, RAG systems, or agent-based architectures
Understanding of how to integrate and consume AI model APIs
Infrastructure & DevOps:
Strong expertise in **Terraform** for infrastructure-as-code across multiple cloud providers
Proficiency with **Docker** for containerization and **Kubernetes** for orchestration
Experience designing and implementing **CI/CD pipelines** using **GitHub Actions** or similar tools
Understanding of cloud-native architectures, microservices, and distributed systems
Observability & Operations:
Hands-on experience with observability platforms such as OpenTelemetry, Grafana, Dynatrace, DataDog, Sentry, New Relic, or equivalents
Deep understanding of monitoring, logging, tracing, and alerting strategies for production systems
Experience with incident response and production support
API Development:
Strong experience designing and implementing RESTful APIs and/or GraphQL APIs
Understanding of API design principles, versioning, authentication, and rate limiting
Experience with API frameworks in Python (FastAPI, Flask, Django) and Rust (Axum, Actix, Rocket)
Frontend Development:
Experience with modern frontend frameworks, particularly **React**
Understanding of frontend best practices, state management, and component architecture
Data & Databases:
Strong experience working with relational databases
Proficiency in **SQL** for querying, schema design, and optimization
Understanding of database performance tuning and data modeling
Experience with generating embeddings and working with vector databases (e.g., turbopuffer, Pinecone, Weaviate, Qdrant, pgvector)
Understanding of similarity search and vector-based retrieval patterns
Additional Job Description
Professional Experience:
7+ years of professional software engineering experience with at least 3+ years in senior or lead roles
Demonstrated track record of **shipping production AI applications** on-time and at high quality
Proven experience leading technical projects from design through deployment and operations
Experience working in agile development environments with cross-functional teams
Strong problem-solving skills with the ability to tackle complex technical challenges
Communication & Leadership:
Excellent communication skills, both written and verbal, with the ability to explain complex technical topics to diverse audiences
Strong ability to work with and support developers from other teams, helping them successfully integrate with your team's software
Strong mentorship capabilities with a passion for developing other engineers
Commitment to code quality, engineering rigor, and best practices
Ability to balance pragmatism with technical excellence
Self-motivated with strong ownership and accountability
Collaborative mindset with the ability to build consensus and drive decisions
Preferred Qualifications (Bonus Points):
Experience with multiple cloud providers (**GCP, AWS, Azure**)
Experience with traditional machine learning frameworks and techniques (scikit-learn, PyTorch, TensorFlow)
Experience with data engineering tools and pipelines (Airflow, dbt, Spark, etc.)
Contributions to open-source projects in AI, infrastructure, or related domains
Experience with LLM fine-tuning, evaluation, and optimization
Familiarity with security best practices for AI applications and cloud infrastructure
Experience with platform engineering and building internal developer platforms
Background in distributed systems, high-performance computing, or real-time systems
EEO Statement