Job Description
Job Description:
We are looking for a talented and experienced Machine Learning Engineer (Interactive AI Systems) to join our team. The ideal candidate will be responsible for designing, developing, and deploying a highly realistic interactive system that can engage with users through speech and facial expressions. This role requires a deep understanding of machine learning, computer vision, natural language processing, and video synthesis technologies.
Responsibilities:
- Design and implement a robust architecture for an interactive AI system.
- Collect and preprocess high-quality video and audio data for training purposes.
- Develop and fine-tune Text-to-Speech (TTS) models (e.g., Tacotron 2, Transformer TTS) for natural and expressive speech synthesis.
- Implement voice cloning techniques to replicate specific voice characteristics.
- Apply facial landmark detection (e.g., Dlib, OpenPose) to map facial expressions and synchronize with speech.
- Use Generative Adversarial Networks (GANs) (e.g., Wav2Lip) to generate realistic lip-synced video frames.
- Integrate a high-pass filter and noise suppression algorithms to enhance speech detection and reduce background noise.
- Develop real-time speech-to-text and text-to-speech processing for user interaction.
- Ensure seamless integration of audio and video components to provide an unbroken user experience.
- Collaborate with frontend and backend developers to create a user-friendly interface for text input and video playback.
- Continuously test, optimize, and refine the system based on user feedback and performance metrics.
- Stay up-to-date with the latest advancements in AI, machine learning, and video synthesis technologies.
Requirements:
- Proven experience in developing AI-driven applications, particularly in the fields of speech synthesis and video synthesis.
- Strong knowledge of machine learning frameworks and libraries such as TensorFlow, PyTorch, and Keras.
- Expertise in Text-to-Speech (TTS) systems, voice cloning, and natural language processing.
- Proficiency in computer vision techniques, including facial landmark detection and expression mapping.
- Hands-on experience with Generative Adversarial Networks (GANs) and video synthesis models (e.g., Wav2Lip).
- Familiarity with Web Audio API and real-time speech detection techniques.
- Solid programming skills in Python and JavaScript, with experience in backend development (e.g., Node.js, Express).
- Ability to preprocess and analyze audio and video data effectively.
- Strong problem-solving skills and ability to work independently as well as in a collaborative team environment.
- Excellent communication skills and attention to detail.
Preferred Qualifications:
- Advanced degree (BS or Master’s) in Computer Science, AI, Machine Learning, or a related field.
- At least 3 Years of experience is required
- Experience with deploying machine learning models in production environments.
- Knowledge of high-pass filters, noise suppression algorithms, and audio signal processing.
- Previous experience in developing interactive applications with a focus on user experience.