Job Overview

Location
Lahore, Punjab
Job Type
Full Time
Date Posted
2 months ago

Additional Details

Job ID
1373
Job Views
425
Career Level
Experienced Professional
Job Shift
First Shift (Day)
Gender
Any
Work Mode *
On-site
Degree Requirement
Bachelors
No. of Positions
1
Years of Experience
3-4

Job Description

Job Description:

We are looking for a talented and experienced Machine Learning Engineer (Interactive AI Systems) to join our team. The ideal candidate will be responsible for designing, developing, and deploying a highly realistic interactive system that can engage with users through speech and facial expressions. This role requires a deep understanding of machine learning, computer vision, natural language processing, and video synthesis technologies.


Responsibilities:

  • Design and implement a robust architecture for an interactive AI system.
  • Collect and preprocess high-quality video and audio data for training purposes.
  • Develop and fine-tune Text-to-Speech (TTS) models (e.g., Tacotron 2, Transformer TTS) for natural and expressive speech synthesis.
  • Implement voice cloning techniques to replicate specific voice characteristics.
  • Apply facial landmark detection (e.g., Dlib, OpenPose) to map facial expressions and synchronize with speech.
  • Use Generative Adversarial Networks (GANs) (e.g., Wav2Lip) to generate realistic lip-synced video frames.
  • Integrate a high-pass filter and noise suppression algorithms to enhance speech detection and reduce background noise.
  • Develop real-time speech-to-text and text-to-speech processing for user interaction.
  • Ensure seamless integration of audio and video components to provide an unbroken user experience.
  • Collaborate with frontend and backend developers to create a user-friendly interface for text input and video playback.
  • Continuously test, optimize, and refine the system based on user feedback and performance metrics.
  • Stay up-to-date with the latest advancements in AI, machine learning, and video synthesis technologies.


Requirements:

  • Proven experience in developing AI-driven applications, particularly in the fields of speech synthesis and video synthesis.
  • Strong knowledge of machine learning frameworks and libraries such as TensorFlow, PyTorch, and Keras.
  • Expertise in Text-to-Speech (TTS) systems, voice cloning, and natural language processing.
  • Proficiency in computer vision techniques, including facial landmark detection and expression mapping.
  • Hands-on experience with Generative Adversarial Networks (GANs) and video synthesis models (e.g., Wav2Lip).
  • Familiarity with Web Audio API and real-time speech detection techniques.
  • Solid programming skills in Python and JavaScript, with experience in backend development (e.g., Node.js, Express).
  • Ability to preprocess and analyze audio and video data effectively.
  • Strong problem-solving skills and ability to work independently as well as in a collaborative team environment.
  • Excellent communication skills and attention to detail.


Preferred Qualifications:

  • Advanced degree (BS or Master’s) in Computer Science, AI, Machine Learning, or a related field.
  • At least 3 Years of experience is required
  • Experience with deploying machine learning models in production environments.
  • Knowledge of high-pass filters, noise suppression algorithms, and audio signal processing.
  • Previous experience in developing interactive applications with a focus on user experience.

Location