Job Description
This is an opportunity for a Data Engineer to gain experience architecting scalable & efficient data solutions solve global issues in telecommunications, sustainable water management & energy.
Working 100% Remotely | R1.2M – R1.4M per annum | Based in South Africa
THE COMPANY
This company is a leading global software, data, and AI solutions provider renowned for solving enterprise and city-wide challenges with cutting-edge technology. For the past decade, this company has excelled in designing and deploying advanced AI, data, software, and IoT projects across the telecom, utilities, healthcare, and insurance sectors. This company empowers global enterprises to bridge the gap between their current capabilities and their digital future.
THE ROLE
As a Data Engineer, you are responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data-intensive applications to solve the world’s biggest issues in telecommunications, sustainable water management, energy, healthcare, climate change, smart cities, and other areas that have a real impact on the world. In this role, you will be responsible for not only developing data pipelines but also designing data architectures and overseeing data engineering projects. You will work closely with cross-functional teams and contribute to the strategic direction of our data initiatives.
Responsibilities:
DATA PIPELINE DEVELOPMENT: Lead the design, implement, and maintain scalable data pipelines for ingesting, processing, and transforming large volumes of data from various sources using tools such as databricks, python and pyspark.
DATA ARCHITECTURE: Architect scalable and efficient data solutions using the appropriate architecture design, opting for modern architectures where possible.
DATA MODELING: Design and optimize data models and schemas for efficient storage, retrieval, and analysis of structured and unstructured data.
ETL PROCESSES: Develop, optimize and automate ETL workflows to extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses.
BIG DATA TECHNOLOGIES: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics.
CLOUD PLATFORMS: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services for data storage, processing, and analytics.
TECH STACK: Python, Java, Scala, or SQL / Databricks, DBT, Docker, PySpark, AWS, Azure and GCP
THE REQUIREMENTS
BSc or Mathematics degree or equivalent tertiary education
At least 4 years of experience programming in Python, Java, or Scala, with commercial experience in Azure and other cloud services.
Proficiency in Apache Spark or PySpark is essential.
Key technologies include Python, SQL, Apache Spark, Azure, Data Warehousing, Lakehouse, Big Data, Relational Databases, Data Governance, and Data Quality.
Azure or AWS certifications (advantageous)