Platform Reliability Engineer

SET

Track

Book Free Career Advice

Get in touch with our expert career advisors for guidance on choosing the right career track.

Learn More

Take Free Career Test New

Use our state-of-the-art tool to figure out your personality type and get guidance on the career track for you!

Learn More

Explore Tech Career Tracks

Browse through the most in demand hi-tech career tracks and find what is best for you.

Learn More

About our Career Tools

Avail free career tools at Dicecamp and Set an informed Career Track on the go. Our career tools provide a 360-cover on your hi-tech career journey. Whether it's seeking guidance, advice and information regarding a career or finding your true personality-career match, we’ve got you covered. Individuals can also pick from our list of hand picked careers that's proven auspicious.

Insights

Access Career blog posts by our Career experts

Browse

Happenings

Join Career Conselling Events by our experts

Browse
ACQUIRE

New Skills
Training Bootcamps

Choose from our hands-on based and career focussed training bootcamps.

Data Science

Artificial Intelligence and Robotics

Programming and Development

Cyber Security

Cloud Computing

DevOps

Browse All Programs
Mentorship

Book a one-to-one private mentorship session customized to your needs where our experts will be helping you build your career roadmap.

Book Session

Useful Resources

Get free access to our useful resources suitable for your chosen track.

Seminars / Webinars

Free Training Programs

Tech News and Articles

Private Tuition

Learn tech skills in a highly customised environment that’s just tailored for your needs. Find best private tutors across any technology domain.

Book Tuition

Technical Support

Getting technical assistance is now trouble free. Find Technical Support at Dicecamp with our tech geeks who are always ready to dig deeper in your tech troubles.

Get Support
BUILD

Network

Communities

Join Communities to participate in various activities and to stay up-to-date with day to day happenings.

Learn More

People

Find Professionals relevant to your career track and connect with them or follow them.

Learn More

Experts

Search for companies and follow them for regular updated in your activity feed.

Learn More

Meetups

Be part of the regular meetups to network with the industry professionals.

Learn More

Seminars / Webinars

Attend our seminars and webinars and be a part of discussions related to your career track and solve problems.

Learn More

Industry Updates

Stay tuned with the industry updates.

Learn More
GET

Evaluated

How it Works?

Apply for remote tech jobs from companies worldwide after an evaluation with Dicecamp and bring lucrative offshore work at your doorstep.

Learn More

What are the Benefits

A trouble-free, remote way to apply for jobs from international Elite companies that bring a lifting career and pay you higher, at home.

Learn More

Apply Now for Evaluation

Simply create a profile and schedule an Evaluation. Our team will connect and let you know of evaluation details.

Learn More

Start Your Skill Test

Unlock your potential and conquer new horizons with our skill tests, designed to challenge and showcase your abilities.

Learn More
GET

Hired

Browse Jobs

Apply for high paying jobs in 3 easy steps. Browse new jobs every day from top companies around the globe looking for in-demand tech skills.

Learn More

Browse Internships

Start your career with Pakistan's best internship opportunities at Dice Jobs. Find and apply for valuable internships across any industry.

Learn More

Resources

Prepare beforehand and win that big opportunity. Build an effective resume that gets you a call, and join our domain experts who guide you for the interview.

Learn More

Platform Reliability Engineer

Job Overview

Location

Remote, Any Country

Job Type

Full Time

Date Posted

20 days ago

Additional Details

Job ID

2218

Job Views

82

Job Description

The Platform Reliability Engineer (PRE) ensures the reliability, observability, and operational excellence of KnowledgeCity’s cloud infrastructure and internal platforms.

This role focuses on building and maintaining monitoring systems, dashboards, and status pages that provide real-time visibility into infrastructure health and performance across environments.

The PRE is responsible for maintaining the inventory of infrastructure nodes, developing observability tools, coordinating incident response, and driving post-incident reviews and data based optimization initiatives.

Through a deep understanding of system reliability, automation, and monitoring technologies, the Platform Reliability Engineer bridges development, DevOps, and support teams — enabling proactive issue detection, transparent communication, and continuous improvement of our SaaS platforms.

Key Responsibilities

Infrastructure Visibility and Monitoring

Design, implement, and maintain end-to-end observability solutions using Prometheus, Grafana, Loki, Alertmanager, or other.
Develop infrastructure inventory systems to track all nodes, environments, and services.
Ensure system health metrics (CPU, memory, disk, latency, response time) are consistently collected and visualized.
Create and maintain Grafana dashboards tailored for developers, operations, and clients.

Incident Management and Status Reporting

Lead incident response processes: detection, escalation, resolution, and post-mortem analysis.
Manage and update internal and external status pages reflecting real-time service health and historical uptime.
Publish incident reports and root cause summaries to communicate clearly with stakeholders.
Define and monitor SLIs, SLOs, and SLAs to measure and improve service reliability.

Reliability Engineering and Automation

Automate reliability checks, uptime probes, and health verifications using scripting or infrastructure-as-code tools.
Implement synthetic monitoring and proactive alerting for critical paths (API, LMS, Portal, Database etc.).
Identify and eliminate recurring incidents by implementing preventive monitoring and self-healing automation.

Collaboration and Knowledge Sharing

Partner with DevOps, Development, QA, and Support teams to enhance observability and response processes.
Provide data insights and reliability reports that support development and optimization decisions.
Maintain documentation and runbooks for monitoring, alerts, and incident handling.
Advocate for a data-driven reliability culture across all technical teams.

Continuous Improvement

Continuously refine monitoring dashboards, alerting rules, and uptime metrics for better accuracy and usability.
Evaluate and integrate emerging observability and reliability tools to enhance the monitoring stack.
Conduct reliability reviews after major releases or infrastructure changes.

Qualifications

Technical Expertise

Solid experience in observability and monitoring systems: Grafana, Prometheus, Loki, Alertmanager, ELK Stack, or similar.
Strong knowledge of cloud infrastructure (AWS, GCP, Oracle or Azure) and Linux systems administration.
Familiarity with Infrastructure as Code (Terraform, Ansible) and CI/CD pipelines.
Understanding of incident response, post-mortem analysis, and reliability metrics (SLI/SLO/SLA).
Competency in scripting (Bash, Python, PHP) for automation and tool integration.
Experience with container orchestration (Docker, Kubernetes) and associated monitoring.
Knowledge of network performance monitoring and synthetic testing tools.

Problem-Solving

Analytical mindset with the ability to transform raw metrics into actionable insights.
Capable of diagnosing performance bottlenecks and improving system reliability through automation.
Skilled in root-cause analysis and preventive design.

Communication

Excellent communication and documentation skills for incident summaries and cross-team updates.
Ability to collaborate effectively with technical and non-technical teams.
Advanced English proficiency, both written and spoken, to report incidents and produce public-facing updates.

Location

Similar Jobs

Dice Tech Recruitment Services

RPA Developer (Python)

Full Time

Dice Tech Recruitment Services

Java Technical Support Engineer

Full Time

Dice Tech Recruitment Services

Automation Developer

Full Time

Dice Tech Recruitment Services

Software Developer

Full Time

Dicecamp.com is your ultimate tech career kit! It's a community driven platform which helps you in exploring career track, acquiring skills, practicing, networking and finding work.