Job Description
We are looking for a Machine Leaning Engineer (MLE) to design, build, and optimize our machine learning operations. You will play a crucial role in scaling AI models from research to production, ensuring smooth model deployment, monitoring, and lifecycle management across our Google Cloud Platform (GCP) infrastructure. You'll work closely with data scientists, ML Ops, and data engineers to automate workflows, improve model performance, and ensure reliability for our AI that serves millions of players worldwide.
What You'll Do
- Design, develop, and deploy machine learning models and solutions, leveraging tools such as LangGraph and MLflow for orchestration and lifecycle management.
- Collaborate on building and maintaining scalable data and feature pipeline infrastructure for real-time and batch processing using tools like BigQuery, BigTable, Dataflow, Composer(Airflow), PubSub, and Cloud Run to support ML model training and inference.
- Develop and implement robust strategies for model monitoring and observability to detect model drift, bias, and performance degradation, leveraging tools like Vertex AI Model Monitoring and custom dashboards.
- Optimize ML model inference performance to improve latency and cost-efficiency of AI applications.
- Ensure the overall reliability, performance, and scalability of the ML models and data infrastructure platform, including proactive identification and resolution of issues related to model performance and data quality.
- Troubleshoot and resolve complex issues impacting ML models, data pipelines, and production AI systems.
- Ensure AI/ML models and workflows meet data governance, security, and compliance requirements, specifically for real-money gaming.
What We're Looking For
- 1+ years of experience as an ML Engineer, with a focus on developing and deploying machine learning models in production environments.
- Strong experience in Google Cloud Platform (GCP), including services relevant to ML and data infrastructure such as BigQuery, Dataflow, Vertex AI, Cloud Run, and Pub/Sub and Composer (Airflow).
- Solid grasp of containerization (Docker, Kubernetes) and experience with Kubernetes orchestration platforms like GKE for deploying ML services.
- Experience building and deploying scalable data pipelines and machine learning models in production environments.
- Understanding of model monitoring, logging, and observability best practices for ML models and applications.
- Experience in Python and ML frameworks (e.g., TensorFlow, PyTorch, scikit-learn).
- Familiarity with AI orchestration concepts using tools like LangGraph or LangChain is a bonus.
- Bonus experience includes working in gaming, real-time fraud detection, or AI personalization systems and Agentic workflows.