Skills measured

Microsoft Certified: Azure Data Engineer Associate – Certifications | Microsoft Learn

In most organizations, a data engineer is the primary role responsible for integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions. An Azure data engineer also helps ensure that data pipelines and data stores are high-performing, efficient, organized, and reliable, given a specific set of business requirements and constraints.

Types of data

Data integration
Data Integration involves establishing links between operational and analytical services and data sources to enable secure, reliable access to data across multiple systems.

Data transformation
Operational data usually needs to be transformed into suitable structure and format for analysis, often as part of an extract, transform, and load (ETL) process

Data consolidation
Data consolidation is the process of combining data that has been extracted from multiple data sources into a consistent structure – usually to support analytics and reporting.

Common languages
SQL – One of the most common languages data engineers use is SQL.
Others , The use of notebooks is growing in popularity, and allows collaboration using different languages within the same notebook.

Operational and analytical data
Operational data is usually transactional data that is generated and stored by applications, often in a relational or non-relational database.

Analytical data is data that has been optimized for analysis and reporting, often in a data warehouse.

Streaming data
Streaming data refers to perpetual sources of data that generate data values in real-time, often relating to specific events.

Data pipelines
Data pipelines are used to orchestrate activities that transfer and transform data. Pipelines are the primary way in which data engineers implement repeatable extract, transform, and load (ETL) solutions that can be triggered based on a schedule or in response to events.

Data lakes
A data lake is a storage repository that holds large amounts of data in native, raw formats.
[…], The idea with a data lake is to store everything in its original, untransformed state. This approach differs from a traditional data warehouse, which transforms and processes the data at the time of ingestion.

Data warehouses
A data warehouse is a centralized repository of integrated data from one or more disparate sources. Data warehouses store current and historical data in relational tables that are organized into a schema that optimizes performance for analytical queries.

Apache Spark
Apache Spark is a parallel processing framework that takes advantage of in-memory processing and a distributed file storage. It’s a common open-source software (OSS) tool for big data scenarios.

Data engineers need to be proficient with Spark, using notebooks and other code artifacts to process data in a data lake and prepare it for modeling and analysis.

The core Azure technologies used to implement data engineering workloads include:

Azure Synapse Analytics
Azure Data Lake Storage Gen2

Azure Stream Analytics

Introduction to Azure Stream Analytics | Microsoft Learn

Choose a stream processing technology in Azure

Stream Analytics has first-class integration with four kinds of resources as inputs:

Understand inputs for Azure Stream Analytics | Microsoft Learn

Azure Data Factory