Job Description
MLOps / ML Platform Engineer
Are you an MLOps / ML Platform Engineer focused on building scalable, automated pipelines that carry models from experiment to inference? ALETHIA AI is hiring an MLOps / ML Platform Engineer to build reproducible pipelines, orchestrate deployments, and monitor performance so AI features ship safely and improve continuously.
ALETHIA AI is leading the Agentic AI movement across industries. Through partnerships with Solana Mobile, AWS, OpenSea, and many more, we’re building the foundation for decentralized AI. These collaborations have enabled us to launch the world’s first intelligent NFTs (iNFTs), expand our AI capabilities, and bring on-chain AI agents to mobile.
As our MLOps / ML Platform Engineer, you will version data and models, instrument drift, and optimize serving cost and latency. We need a systems thinker who streamlines lifecycle processes and ensures experiments translate into observable, maintainable production capabilities.
Key Responsibilities
- Design and automate end-to-end MLOps pipelines—data ingestion, feature processing, experiment tracking, evaluation, CI/CD for ML, model registry/versioning, and monitoring—to reliably ship models to production. Scope includes fine-tuning and operationalizing pretrained models.
- Collaborate closely with data scientists and ML engineers to understand model requirements and tailor the deployment process for each model.
- Build and operate scalable model-serving systems using FastAPI for REST/gRPC endpoints and low-latency streaming (WebSockets/WebRTC/RTMP, as appropriate), packaged with Docker and orchestrated on Kubernetes for autoscaling, progressive rollouts, and observability. Experience with WebRTC/RTMP is a strong plus.
- Ensure robust version control and reproducibility for models and datasets—manage model versions, data lineage, and configuration using appropriate tools and practices.
- Continuously monitor the performance and accuracy of deployed models in production, implementing alerts and dashboards to track metrics like response time, data drift, and prediction quality.
- Troubleshoot and resolve issues related to model deployment and performance (e.g., model failures, environment incompatibilities, pipeline bottlenecks) in a timely manner.
- Implement ML-focused CI/CD best practices, with hands-on experience in Python-based pipelines, Terraform-managed infrastructure, and EKS deployments—along with strong proficiency in observability and logging stacks, especially ELK with Filebeat and monitoring tools like Grafana.
- Work with the platform and security teams to ensure all MLOps activities comply with security and data privacy standards, and that sensitive data is handled appropriately throughout the ML lifecycle.
- Keep up-to-date with the latest MLOps tools, frameworks, and trends, and introduce improvements to our ML infrastructure and processes as needed.
Requirements
- Bachelor’s degree in Computer Science, Data Engineering, or related field.
- 3+ years of experience in a DevOps, MLOps, or related engineering role, preferably with exposure to deploying machine learning or data-intensive applications.
- Must have experience deploying and monitoring machine learning models at scale, with familiarity with Web3 environments.
- Solid understanding of machine learning workflows and lifecycle, as well as familiarity with popular ML frameworks (TensorFlow, PyTorch, Scikit-learn).
- Experience with cloud platforms and their ML services (AWS, GCP AI Platform, Azure ML, etc.), and proficiency in setting up cloud resources for data processing and model deployment.
- Proficiency in programming (Python is commonly used in ML) and experience with automation scripts; comfort with both software development and basic data engineering tasks.
- Hands-on experience with containerization (Docker) and deep, mandatory expertise in Kubernetes for orchestration, along with knowledge of CI/CD tools and infrastructure-as-code for automated deployments.
- Knowledge of monitoring tools and practices specifically for ML (tracking model performance, detecting data drift, etc.).
- Strong problem-solving and debugging skills, with an ability to work across interdisciplinary teams (with data scientists, software engineers, IT).
- Passion for improving and automating processes; stays current with emerging practices in MLOps and scalable ML architecture.
Remuneration
Competitive, with role-aligned incentives and growth opportunities.
If you’re excited to build a community at the forefront of AI and Web3 innovation, we’d love to hear from you. Apply by sending your CV to [email protected] or feel free to reach out directly to discuss this opportunity. Know someone who might be a fit? Please share this post with them!