✨Add toolbar dropdown to do remote run options #164

AdrySky · 2022-06-09T09:56:12Z

Description

This will enable to add and execute remote custom run (Eg. Spark Submit) with its own configuration using a subprocess module. Initially, it's only for spark submit but now it's more for general usage where we can add other execution type beside spark submit.

It's using the file config.ini to get the configuration data. There are 3 separate section to fill which are:

REMOTE_EXECUTION = The main run types
RUN_TYPES = Each separate run type with its configuration
CONFIGURATION = Each configuration data(Eg. name, command, url, msg). Required to be filled.

Note: Every time config.ini is updated, xircuits only detect the change after changing run type on the toolbar.

Pull Request Type

Type of Change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Tests

Able to run the default spark submit.
1. Currently, inside config.ini will have spark-submit configuration by default.
2. Open xircuits
3. Click on dropdown run menu. Choose Remote Run.
4. Click run button. A run dialog will prompted. Inside that there are two dropdown menu which are run types and its configurations.
5. Click 'SPARK' for the run types and choose any for its configuration.
6. It should be able to run and see the output in the output panel.
Add new run type on config.ini
1. I have added a dummy way on how to add a new run type.
2. Just uncomment
  1. TEST for REMOTE_EXECUTION
  2. TEST's configuration in RUN_TYPES
3. Make sure xircuits can get the latest update from config.ini by changing the run type from the toolbar.
4. Click Run.
5. While inside run dialog, the run types dropdown option will have new run type called 'TEST'.
6. When clicking 'TEST' run type, it'll have its own configuration which are EG, EG2
7. Click any of that will show its command.

Tested on?

Notes

Thinking of instead of using config.ini, use something like .json file. IMO, .json is more user-friendly on the data structure

This reverts commit 6213480.

This reverts commit 764cf14.

MFA-X-AI

Awesome work, looks a big PR! I've tried the two tests you've mentioned, and it looks like it's running well for both. I'll try this distribution in our spark cluster soon.

From my local though, I went ahead and experiment the command feature, looks like we can do some nice stuff like echo-ing commands.

So there's a few things I noticed:

Beside the Hyperparameter appears to be a dropdown icon.
The window appears to have a double resize. I think the inner one is enough, as adjusting the size for the outer one will misadjust the inner one.
I don't think we need the "Also, you can go to Kraftboard to check the benchmarks" string for the default distribution. This message we can add to the config msg.

Those 3 for now, I'll add more comments when I've tested it out more. Thanks!

MFA-X-AI

I've tested it in our server with cluster, things are working perfectly.
One thing that I've noted is the difference between the treatment for the last line of the config. Previously I would need to supply a double \\ as below

            --conf spark.driver.maxResultSize=10G \\

to perform a spark submit.

Now I would need to omit it to run, otherwise it'll return

 sparkTrain.py not found.

As a gap is generated. I think the new change is better, so great job. 😄

AdrySky · 2022-06-29T08:31:15Z

Thanks for the review. Solved the 3 issues. Good job noticing the dropdown icon. It was not an easy fix as I though.

Yeah, forgot to mention that I add space between the command and file's path. So we don't have to add \\ on the last line.

MFA-X-AI

Alright, based on the feedback we've gotten I've modified the config.ini to Local and Cluster mode.

Local works out of the box, but for cluster mode I had to do a bit more work. For documentation purposes, these are the errors that users might get, and how to resolve it.

1. Module not found error
ModuleNotFoundError: No module named '...'
Need to package xai_components + venv to a zip file, then add these spark configs:

        --py-files env_spark.zip \
        --archives env_spark.zip \

To make zipping process easier, I've added SparkPackageVenv.xircuits that does just that.

2. Incorrect python cluster version

File "/home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0303/container_1655102329321_0303_01_000001/env_spark.zip/numpy/version.py", line 1
    from __future__ import annotations
    ^
SyntaxError: future feature annotations is not defined

If the packages require a different python runtime than the default one, users would need to specify the python version. In this example, the Centos that I was uses by default 3.6, while the packages expect a higher version, ie 3.9. To set the python runtime, they'll need these configs:

        --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON='/usr/local/bin/python3.9' \
        --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON='/usr/local/bin/python3.9' \

3. File does not exist

pyspark.sql.utils.AnalysisException: Path does not exist: hdfs://servername:9000/user/fahreza/datasets/wind.csv

The cluster does not have the file used in the workflow, so to upload the file to HDFS:

hdfs dfs -mkdir datasets
hdfs dfs -put datasets/wind.csv datasets/wind.csv

If everything is working, they'd get an output like this in the hadoop stdout log:

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-dlralr49 because the default path (/home/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Executing: xSparkSession
/home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/pyspark.zip/pyspark/context.py:264: RuntimeWarning: Failed to add file [file:///home/fahreza/Github/xircuits-spark-config/env_spark.zip] specified in 'spark.submit.pyFiles' to Python path:
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/filecache/36
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/spark-d581118d-7889-4ae4-881e-740358604fa1/userFiles-40a340a3-57fe-441f-8597-3fae5ac6a412
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/filecache/34/__spark_libs__580593447246665266.zip/spark-core_2.12-3.1.3.jar
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/pyspark.zip
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/py4j-0.10.9-src.zip
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/env_spark.zip
  /usr/local/lib/python39.zip
  /usr/local/lib/python3.9
  /usr/local/lib/python3.9/lib-dynload
  /usr/local/lib/python3.9/site-packages
  /home/fahreza/Github/xircuits-spark-config/xai_components
  warnings.warn(

Executing: SparkReadFile
+------+-----------+
|  Year|       Wind|
+------+-----------+
|1980.0|        0.0|
|1981.0|        0.0|
|1982.0|        0.0|
|1983.0|0.029667962|
|1984.0|0.050490252|
|1985.0|0.072761883|
|1986.0| 0.14918872|
|1987.0|0.205541414|
|1988.0|0.342871014|
|1989.0|   2.597943|
|1990.0|     3.5356|
|1991.0|   4.096951|
|1992.0|   4.611373|
|1993.0|    5.55795|
|1994.0|   7.284414|
|1995.0|   7.935523|
|1996.0|   9.288649|
|1997.0|  12.134585|
|1998.0|  16.108642|
|1999.0|   21.24186|
+------+-----------+
only showing top 20 rows


Executing: SparkVisualize

Finish Executing

AdrySky added the enhancement New feature or request label Jun 9, 2022

AdrySky requested a review from MFA-X-AI June 9, 2022 09:56

AdrySky self-assigned this Jun 9, 2022

MFA-X-AI mentioned this pull request Jun 20, 2022

🐛Fix spark submit interactive output #114

Closed

14 tasks

AdrySky and others added 20 commits June 27, 2022 17:41

Get config configuration

5b01384

Add dropdown run type on dialog

e23e034

Initial subprocess operation

ffd6a84

config.ini structure

d6cbce8

Do subprocess from front end

a18ec62

Remove env

5290ecb

Fix passing command

3e09925

Include url and remove export

5cf73dd

Add url to output

cb79854

Only enable when spark submit not empty

ea54ad5

Remove space

8df89cf

Update config

ce7ab3c

Remove log

ef28da7

Fix minor change

03d3e71

temporarily move funcs above, added missing colon

aedcacd

Revert "fixed missing colons"

951c0a3

This reverts commit 6213480.

Revert "temporarily move funcs above, added missing colon"

6550572

This reverts commit 764cf14.

Fix missing bracket

7bb7d93

Able to add additional run type and its config

daa28cb

Update config

7025ddf

AdrySky force-pushed the adry/config-spark branch from ac20cfc to 7025ddf Compare June 27, 2022 09:53

AdrySky changed the title ~~✨Add toolbar dropdown to choose run on CPU / GPU / VE~~ ✨Add toolbar dropdown to do remote run options Jun 28, 2022

AdrySky marked this pull request as ready for review June 28, 2022 03:47

MFA-X-AI reviewed Jun 28, 2022

View reviewed changes

MFA-X-AI reviewed Jun 29, 2022

View reviewed changes

Remove kraftboard's msg

1f759cc

AdrySky added 3 commits June 29, 2022 16:24

Remove outer resizing of run dialog

ce053d6

Fix arrow position

b625f59

Remove unnecessary div

efa0e84

AdrySky requested a review from MFA-X-AI June 29, 2022 08:31

MFA-X-AI added 3 commits July 4, 2022 14:04

updated default spark submit configs to local + cluster

78d3287

updated default config to work in cluster mode

366e87e

added workflow to package venv

083ddc5

MFA-X-AI approved these changes Jul 4, 2022

View reviewed changes

MFA-X-AI merged commit c457032 into master Jul 5, 2022

MFA-X-AI deleted the adry/config-spark branch July 5, 2022 01:53

MFA-X-AI mentioned this pull request Aug 5, 2022

✨ Wrapper for Jupyter Launch Args #187

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨Add toolbar dropdown to do remote run options #164

✨Add toolbar dropdown to do remote run options #164

AdrySky commented Jun 9, 2022 •

edited

Loading

MFA-X-AI left a comment

MFA-X-AI left a comment

AdrySky commented Jun 29, 2022

MFA-X-AI left a comment •

edited

Loading

✨Add toolbar dropdown to do remote run options #164

✨Add toolbar dropdown to do remote run options #164

Conversation

AdrySky commented Jun 9, 2022 • edited Loading

Description

Pull Request Type

Type of Change

Tests

Tested on?

Notes

MFA-X-AI left a comment

Choose a reason for hiding this comment

MFA-X-AI left a comment

Choose a reason for hiding this comment

AdrySky commented Jun 29, 2022

MFA-X-AI left a comment • edited Loading

Choose a reason for hiding this comment

AdrySky commented Jun 9, 2022 •

edited

Loading

MFA-X-AI left a comment •

edited

Loading