Job Description
This role focuses on building and optimizing data frameworks using Python and PySpark, handling both structured and unstructured data from diverse sources. The engineer will ensure adherence to software development best practices, optimize data workflows for performance and cost efficiency, and work closely with data teams to maintain high data quality.
Responsibilities
- Develop scalable data frameworks using Python and PySpark.
- Handle complex data from multiple sources and ensure efficient processing.
- Apply software engineering best practices in all development efforts.
- Optimize ETL pipelines for performance and cost.
- Collaborate with data teams to uphold data integrity and quality.
Requirements:
- Strong command of Python, including OOP and design patterns.
- 2-4 years of relevant experience
- Expertise in Apache Spark (PySpark) and SQL.
- Experience in ETL, data modeling, and data warehousing.
- Proficient in software engineering principles and version control.