Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Add toolbar dropdown to do remote run options #164

Merged
merged 27 commits into from
Jul 5, 2022
Merged

Conversation

AdrySky
Copy link
Contributor

@AdrySky AdrySky commented Jun 9, 2022

Description

This will enable to add and execute remote custom run (Eg. Spark Submit) with its own configuration using a subprocess module. Initially, it's only for spark submit but now it's more for general usage where we can add other execution type beside spark submit.

It's using the file config.ini to get the configuration data. There are 3 separate section to fill which are:

  1. REMOTE_EXECUTION = The main run types
  2. RUN_TYPES = Each separate run type with its configuration
  3. CONFIGURATION = Each configuration data(Eg. name, command, url, msg). Required to be filled.

Note: Every time config.ini is updated, xircuits only detect the change after changing run type on the toolbar.

Pull Request Type

  • Xircuits Core (Jupyterlab Related changes)
  • Xircuits Canvas (Custom RD Related changes)
  • Xircuits Component Library
  • Testing Automation
  • Documentation
  • Others (Please Specify)

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Tests

  1. Able to run the default spark submit.
    1. Currently, inside config.ini will have spark-submit configuration by default.
    2. Open xircuits
    3. Click on dropdown run menu. Choose Remote Run.
    4. Click run button. A run dialog will prompted. Inside that there are two dropdown menu which are run types and its configurations.
    5. Click 'SPARK' for the run types and choose any for its configuration.
    6. It should be able to run and see the output in the output panel.
  2. Add new run type on config.ini
    1. I have added a dummy way on how to add a new run type.
    2. Just uncomment
      1. TEST for REMOTE_EXECUTION
      2. TEST's configuration in RUN_TYPES
    3. Make sure xircuits can get the latest update from config.ini by changing the run type from the toolbar.
    4. Click Run.
    5. While inside run dialog, the run types dropdown option will have new run type called 'TEST'.
    6. When clicking 'TEST' run type, it'll have its own configuration which are EG, EG2
    7. Click any of that will show its command.

Tested on?

  • Windows
  • Linux Ubuntu
  • Centos
  • Mac
  • Others (State here -> xxx )

Notes

Thinking of instead of using config.ini, use something like .json file. IMO, .json is more user-friendly on the data structure

@AdrySky AdrySky added the enhancement New feature or request label Jun 9, 2022
@AdrySky AdrySky requested a review from MFA-X-AI June 9, 2022 09:56
@AdrySky AdrySky self-assigned this Jun 9, 2022
@AdrySky AdrySky force-pushed the adry/config-spark branch from ac20cfc to 7025ddf Compare June 27, 2022 09:53
@AdrySky AdrySky changed the title ✨Add toolbar dropdown to choose run on CPU / GPU / VE ✨Add toolbar dropdown to do remote run options Jun 28, 2022
@AdrySky AdrySky marked this pull request as ready for review June 28, 2022 03:47
Copy link
Member

@MFA-X-AI MFA-X-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, looks a big PR! I've tried the two tests you've mentioned, and it looks like it's running well for both. I'll try this distribution in our spark cluster soon.

From my local though, I went ahead and experiment the command feature, looks like we can do some nice stuff like echo-ing commands.
image

So there's a few things I noticed:

  1. Beside the Hyperparameter appears to be a dropdown icon.
    image

  2. The window appears to have a double resize. I think the inner one is enough, as adjusting the size for the outer one will misadjust the inner one.
    double resize

  3. I don't think we need the "Also, you can go to Kraftboard to check the benchmarks" string for the default distribution. This message we can add to the config msg.
    image

Those 3 for now, I'll add more comments when I've tested it out more. Thanks!

Copy link
Member

@MFA-X-AI MFA-X-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it in our server with cluster, things are working perfectly.
One thing that I've noted is the difference between the treatment for the last line of the config. Previously I would need to supply a double \\ as below

            --conf spark.driver.maxResultSize=10G \\

to perform a spark submit.

Now I would need to omit it to run, otherwise it'll return

 sparkTrain.py not found.

As a gap is generated. I think the new change is better, so great job. 😄

@AdrySky
Copy link
Contributor Author

AdrySky commented Jun 29, 2022

Thanks for the review. Solved the 3 issues. Good job noticing the dropdown icon. It was not an easy fix as I though.

Yeah, forgot to mention that I add space between the command and file's path. So we don't have to add \\ on the last line.

@AdrySky AdrySky requested a review from MFA-X-AI June 29, 2022 08:31
Copy link
Member

@MFA-X-AI MFA-X-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, based on the feedback we've gotten I've modified the config.ini to Local and Cluster mode.

Local works out of the box, but for cluster mode I had to do a bit more work. For documentation purposes, these are the errors that users might get, and how to resolve it.


1. Module not found error
ModuleNotFoundError: No module named '...'
Need to package xai_components + venv to a zip file, then add these spark configs:

        --py-files env_spark.zip \
        --archives env_spark.zip \

To make zipping process easier, I've added SparkPackageVenv.xircuits that does just that.


2. Incorrect python cluster version

File "/home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0303/container_1655102329321_0303_01_000001/env_spark.zip/numpy/version.py", line 1
    from __future__ import annotations
    ^
SyntaxError: future feature annotations is not defined

If the packages require a different python runtime than the default one, users would need to specify the python version. In this example, the Centos that I was uses by default 3.6, while the packages expect a higher version, ie 3.9. To set the python runtime, they'll need these configs:

        --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON='/usr/local/bin/python3.9' \
        --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON='/usr/local/bin/python3.9' \

3. File does not exist

pyspark.sql.utils.AnalysisException: Path does not exist: hdfs://servername:9000/user/fahreza/datasets/wind.csv

The cluster does not have the file used in the workflow, so to upload the file to HDFS:

hdfs dfs -mkdir datasets
hdfs dfs -put datasets/wind.csv datasets/wind.csv

If everything is working, they'd get an output like this in the hadoop stdout log:

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-dlralr49 because the default path (/home/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Executing: xSparkSession
/home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/pyspark.zip/pyspark/context.py:264: RuntimeWarning: Failed to add file [file:///home/fahreza/Github/xircuits-spark-config/env_spark.zip] specified in 'spark.submit.pyFiles' to Python path:
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/filecache/36
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/spark-d581118d-7889-4ae4-881e-740358604fa1/userFiles-40a340a3-57fe-441f-8597-3fae5ac6a412
  /data/disk3/hadoop/nm-local-dir/usercache/fahreza/filecache/34/__spark_libs__580593447246665266.zip/spark-core_2.12-3.1.3.jar
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/pyspark.zip
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/py4j-0.10.9-src.zip
  /home/hadoop/nm-local-dir/usercache/fahreza/appcache/application_1655102329321_0305/container_1655102329321_0305_01_000001/env_spark.zip
  /usr/local/lib/python39.zip
  /usr/local/lib/python3.9
  /usr/local/lib/python3.9/lib-dynload
  /usr/local/lib/python3.9/site-packages
  /home/fahreza/Github/xircuits-spark-config/xai_components
  warnings.warn(

Executing: SparkReadFile
+------+-----------+
|  Year|       Wind|
+------+-----------+
|1980.0|        0.0|
|1981.0|        0.0|
|1982.0|        0.0|
|1983.0|0.029667962|
|1984.0|0.050490252|
|1985.0|0.072761883|
|1986.0| 0.14918872|
|1987.0|0.205541414|
|1988.0|0.342871014|
|1989.0|   2.597943|
|1990.0|     3.5356|
|1991.0|   4.096951|
|1992.0|   4.611373|
|1993.0|    5.55795|
|1994.0|   7.284414|
|1995.0|   7.935523|
|1996.0|   9.288649|
|1997.0|  12.134585|
|1998.0|  16.108642|
|1999.0|   21.24186|
+------+-----------+
only showing top 20 rows


Executing: SparkVisualize

Finish Executing

successful cluster run

@MFA-X-AI MFA-X-AI merged commit c457032 into master Jul 5, 2022
@MFA-X-AI MFA-X-AI deleted the adry/config-spark branch July 5, 2022 01:53
@MFA-X-AI MFA-X-AI mentioned this pull request Aug 5, 2022
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants