==>Customer 360 team have problem to handle all customer information and Update it and fault tolerance •Closed Orders against the cluster (The "ORDERS" Data is provided by Third party between 5PM to 6PM) Orders_file.csv -> S3 buckets -> 5 PM -6 PM •Cstomer related information ("CUSTOMERS-INFORMATION" IS PRESENT ON RELATIONAL DATABASE LIKE :- MySQL ) •CRM team all the customer information in a MySQL/oracle_db S3("Orders") MySql("Customer_Info") • Order filtering on the "CLOSED" orders • Load Both the dataset in Hive • Notification & HBase data loading
S3 Files(https connection) [In AIRFLOW] HTTP Sensor Connection - Name - Host/Port/Username/Password/schema SSH into edgeNode
- Download the files from S3 into Local(edgeNode)
- Sqoop will fetch the Customers_Info from MySql and Dump to Hive
- Upload S3 orders file to HDFS location
- Spark program [SUBMIT SPARK~JOB]
- Create Hive table from the Output Path available in step 4
- Upload it into Hbase (HBase Hive Connectors)
- Slack #channel for communication Success/Failure of the PipeLine