-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] add note about setting checkpoint dir for DBSCAN #1744
base: master
Are you sure you want to change the base?
Conversation
09c22bf
to
088c35b
Compare
088c35b
to
3b25a4a
Compare
@@ -858,6 +858,10 @@ The algorithm is available as a Scala and Python function called on a spatial da | |||
|
|||
The first parameter is the dataframe, the next two are the epsilon and min_points parameters of the DBSCAN algorithm. | |||
|
|||
!!!Note | |||
The sparkContext's checkpoint directory must be set to use DBSCAN. Sedona's DBSCAN implementation uses Graphframes | |||
which requires a checkpoint directory to be set. This can be done by calling `sparkContext.setCheckpointDir("path/to/checkpoint")`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a reference link about the checkPointDir? In addition, given that we have been using sedona
(which is sparkSession), please provide a bit more code to illustrate how to get SparkContext (e.g., sedona.sc
...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revise the use sedona.sparkContext
. I didn't think the spark docs were very helpful tbh: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.setCheckpointDir.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@james-willis then we need to explain what a checkPointDir is via our doc. We should give examples about how to set this dir (locally, on S3, HDFS, ...). Distributed DBSCAN is highly anticipated by the community so we should make it easy to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set time with Matthew for tomorrow to pair on this.
We need to add a note to the tutorial page as well. |
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[DOCS] my subject
What changes were proposed in this PR?
Added a note to the DBSCAN docs about setting the checkpoint dir
How was this patch tested?
pre-commit
Did this PR include necessary documentation updates?