Skip to content
This repository has been archived by the owner on Jul 2, 2022. It is now read-only.

Latest commit

 

History

History
81 lines (50 loc) · 4.85 KB

README.md

File metadata and controls

81 lines (50 loc) · 4.85 KB

DAT263 - Create and Manage End-to-End Data Pipelines with SAP Data Intelligence

REUSE status

Tutorial Description

This repository contains the material for the SAP TechEd 2020 workshop with Session ID - DAT263. This tutorial can be completed as part of an SAP guided workshop or on your own time by using your own SAP Data Intelligence instance.

Overview

This session introduces participants to use the SAP Data Intelligence Modeler to create data pipelines. We try to touch as many aspects as possible within an interactive 2h workshop. We will follow a use case that is based on a customer request in the area of IoT and quality management. The background story is quite simple.

If you are doing these tutorials as part of a workshop then please follow the 2h tutorials

If you are doing these tutorials on your own time then please follow the 3h tutorials which includes two additional exercises: File concatenation and Jupyter Notebook analysis)

Scenario description

On a daily basis a customer receives the configured values of several IoT device that reflect what nominal value that the device should produce. We refer to this as the configuration dataset. Throughout the day actual performance values of each device is received, we refer to this dataset as the performance dataset. All datasets are stored as files in separate subdirectories in an object store e.g. an Amazon S3 bucket.

Process

  1. Append all configuration files and all performance files into corresponding single files and store them to another object store location. (3h tutorials only)

  2. Merge the 2 resulting files into a HANA table by using projections, aggregation and joining.

  3. Do a simple data validation and create for the failed data a quality management service ticket.

  4. In order to improve the quality check a data scientist should be able to do an analysis of the IoT data to eventually develop an early alert schema (3h tutorials only).

  5. The central device configuration and performance table should be exposed via a webservice to retrieve the device status from outside.

Acquired Skills

After having done all the tasks you are familiar with the general concept of using operators in SAP Data Intelligence Modeler, how to read and ingest data to/from multiple data sources, and how to analyze this data using Jupyter Notebook

Requirements

  • One of the following SAP Data Intelligence versions:

    • SAP Data Intelligence 3.1 On-premise edition, patch 0
    • SAP Data Intelligence 3.1 Trial Edition
    • SAP Data Intelligence Cloud Edition 2010 or newer
  • Chrome browser (Recommended)

  • [Workshop participants only] Login credentials to your SAP Data Intelligence Cloud instance

  • [Self-guided users only] The following connections must be created in Connection Manager.

    • A cloud storage connection e.g S3 / GCS / WASB / ADL
    • A Smart Data Lake (SDL) connection
    • A HANA database connection

    Note that above connections are already predefined in SAP Data Intelligence 3.1 Trial Edition

Video walkthrough at SAP HANA Academy

If you do not have access to a instance of SAP Data Intelligence or want to review the tutorials then you can watch a video walkthrough on [SAP HANA Academy](https://www.youtube.com/playlist?list=PLkzo92owKnVyY89xEshp_cSQ0QF8EE927)

Exercises

2h Workshop (Guided workshop tutorials)

3h Workshop (Self-guided tutorials)

How to obtain support

Support for the content in this repository is available during the actual time of the online session for which this content has been designed. Otherwise, you may request support via the Issues tab.

License

Copyright (c) 2021 SAP SE or an SAP affiliate company. All rights reserved. This file is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.