Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNAP-656 Delink RDD partitions from buckets #297

Merged
merged 25 commits into from
Sep 1, 2016
Merged

SNAP-656 Delink RDD partitions from buckets #297

merged 25 commits into from
Sep 1, 2016

Conversation

ymahajan
Copy link
Contributor

@ymahajan ymahajan commented Jul 5, 2016

Changes proposed in this pull request

  • set default rdd partitions as number of cores in cluster
  • map buckets to target partitions using round robin assignment
  • colocated tables should have same partitions and queries should work fine.
  • split mode support

Patch testing

ReleaseNotes.txt changes

yes

Other PRs

Store - TIBCOSoftware/snappy-store#85
Spark - TIBCOSoftware/snappy-spark#4
SnppyData - #297

ymahajan added 8 commits July 5, 2016 12:51
…into SNAP-656

Conflicts:
	core/src/main/scala/org/apache/spark/sql/store/StoreUtils.scala
Conflicts:
	core/src/main/scala/io/snappydata/impl/SparkShellRDDHelper.scala
	core/src/main/scala/org/apache/spark/sql/execution/columnar/impl/JDBCSourceAsColumnarStore.scala
	core/src/main/scala/org/apache/spark/sql/execution/row/RowFormatScanRDD.scala
	core/src/main/scala/org/apache/spark/sql/store/StoreUtils.scala
	store
ymahajan added 2 commits August 11, 2016 14:02
…titionedRDD

+ handled redundancy cases
+ fixed precheckin failures
// val region = Misc.getRegionForTable(resolvedName, true).
// asInstanceOf[PartitionedRegion]
// region.getTotalNumberOfBuckets
val numCores = Runtime.getRuntime.availableProcessors()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is evaluated at driver node. We need to consider sever nodes. Driver node num processors is not useful to us. Can you please see SchedulerBackend.defaultParallelism. That takes total cores for slaves into consideration.
Catch however is spark.default.parallelism gets priority and if somebody configures bad we will suffer.

}
}

def getNumPartitions : Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above

ymahajan added 5 commits August 17, 2016 17:31
Conflicts:
	core/src/main/scala/org/apache/spark/sql/collection/Utils.scala
	core/src/main/scala/org/apache/spark/sql/execution/columnar/impl/JDBCSourceAsColumnarStore.scala
	core/src/main/scala/org/apache/spark/sql/store/StoreUtils.scala
	store
Conflicts:
	core/src/main/scala/org/apache/spark/sql/execution/columnar/impl/ColumnFormatRelation.scala
…fle exchange operation

Use spark.default.parallelism to decide numPartitions and disabled split mode optimization
ymahajan added 3 commits August 29, 2016 23:34
Conflicts:
	core/src/main/scala/org/apache/spark/sql/execution/ExistingPlans.scala
	store
…into SNAP-656

Conflicts:
	core/src/main/scala/org/apache/spark/sql/store/StoreUtils.scala
	store
@ymahajan ymahajan merged commit 21440be into master Sep 1, 2016
@sumwale sumwale deleted the SNAP-656 branch December 5, 2016 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants