Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding data prep recipes #96

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Bytes-Explorer
Copy link

No description provided.

Signed-off-by: Hima Patel <[email protected]>
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/IBM/data-prep-kit/blob/dev/examples/notebooks/code/sample-notebook.ipynb\">\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cell will be pointing to the wrong place when merged in this repo, and colab buttons in the notebook is not in line with our style guide. Please drop this cell

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File path is fixed. We may not be able to drop this cell. Can you pls suggest on how you would like the colab button to be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the cell. Just delete it.

"## Setup\n",
"\n",
"Install data-prep-toolkit and datasets library. This notebook requires atleast 8 cpus. \n",
"To run on google colab, it is recommended to change the runtime to TPUs to get the required number of cpus.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this notebook be reworked to drop this requirement. At best, you won't get 8cpus for the testing (at least w/o some additional config, which I don't know how to do off the top of my head).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have updated to 4. There is guidance in the notebook to select TPUs and these nodes can be then made available via Colab.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in testing though. You can only get 2 cpus for the test suite

},
"outputs": [],
"source": [
"from data_processing_ray.runtime.ray import RayTransformLauncher\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line fails in testing and also fails for me when trying to run in colab

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been able to run it in colab post the pip installations. Please see link here https://colab.research.google.com/drive/1uxKhvPZLbSnPmAGYfQ0jnrJIiDbfeIcJ#scrollTo=NbF_Zw3KBazf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't have access, but was your runtime CPU only or TPU?

@Bytes-Explorer
Copy link
Author

The notebook fails on CI/CD on

ValidationError: 1 validation error for Replicate
__root__
  Did not find replicate_api_token, please add an environment variable `REPLICATE_API_TOKEN` which contains it, or pass `replicate_api_token` as a named parameter. (type=value_error)

Error: Process completed with exit code 1.

Can you pls suggest why replicate token is required in this notebook ?

@fayvor
Copy link
Contributor

fayvor commented Sep 19, 2024

Can you pls suggest why replicate token is required in this notebook ?

@Bytes-Explorer the replicate token is required in another notebook (./recipes/Getting_Started_with_Granite_Code.ipynb), which is being tested by the CI action.

It looks like the action sets the REPLICATE_API_TOKEN from secrets, so maybe that secret hasn't been added to this repo yet? I'll defer to @rawkintrevo on that.

@rawkintrevo
Copy link
Contributor

rawkintrevo commented Sep 19, 2024

it has been added to this Repo, but I don't think it gets picked up when making a PR from an outside repo. Ill link the conversation on slack to show them how to set this up.

Also, as @fayvor said,

Also note the test is testing all of the notebooks, not just the one you added. It is failling in the 'Getting Started With Granite' notebook, not yours.

https://github.com/ibm-granite-community/granite-code-cookbook/actions/runs/10939249436/job/30369100528?pr=96#step:8:18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants