-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement error handling that will filter the rows that have any error: #93 #94
implement error handling that will filter the rows that have any error: #93 #94
Conversation
* new functions `null_col` and `call_udf` * `ErrorMessage` refactoring * `ErrorHandling` trait designed to serve as the interface for different implementations * Implement error handling by putting the info into column of `ErrorMessage` array * numerous support classes
…e of raw values * `ErrorMessageSubmitJustErrorValue` class created to offer the ability to submit errors without source column but with error value
* `ErrorMessageSubmitWithoutColumn` changed from `case class` to `class` to allow inheritance * some PR comments addressed
…or-handling-map-errcol Introducing Map for error columns and their values
* Fixed few minor things discovered by the UTs
64dc95e
to
c50dd16
Compare
* `ErrorMessageSubmitWithoutColumn` changed from `case class` to `class` to allow inheritance * some PR comments addressed
* Fixed few minor things discovered by the UTs
…dling/implementations/ErrorHandlingFilterRowsWithErrors.scala Co-authored-by: David Benedeki <[email protected]>
…dling/implementations/ErrorHandlingFilterRowsWithErrors.scala Co-authored-by: David Benedeki <[email protected]>
…e-rows-that-have-any-error' of https://github.com/AbsaOSS/spark-commons into feature/93-Implement-error-handling-that-will-filter-the-rows-that-have-any-error
JaCoCo code coverage report - scala:2.11 - spark:2.4
|
JaCoCo code coverage report - scala:2.12 - spark:3.3
|
package za.co.absa.spark.commons.errorhandling.implementations | ||
|
||
import org.apache.spark.sql.{Column, DataFrame} | ||
import org.apache.spark.sql.functions.{coalesce, col, lit} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col
is never used
|
||
package za.co.absa.spark.commons.errorhandling.implementations | ||
import org.apache.spark.sql.DataFrame | ||
import org.apache.spark.sql.functions.{col, column, length} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column
is never used
import org.apache.spark.sql.DataFrame | ||
import org.apache.spark.sql.functions.{col, column, length} | ||
import org.scalatest.funsuite.AnyFunSuite | ||
import za.co.absa.spark.commons.errorhandling.ErrorMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrorMessage
is never used
} | ||
|
||
test("putError and putErrors does not group by together") { | ||
// val errorMessageArray = ErrorMessageArray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code, I believe it can be removed, unless you have some plans with this?
resultDf.as[ResultDfRecordType].collect().sortBy(_._1).toList | ||
} | ||
|
||
test("aggregateErrorColumns\" should \"return an empty list after error aggregation\"") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why all these \"
there?
assert(results == expectedResults) | ||
} | ||
|
||
test("aggregateErrorColumns\" should \"return records whose don't have errors\"") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why all these \"
there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those I can actually remove, they are just for readability.
val e3 = ErrorHandlingFilterRowsWithErrors.putErrorToColumn(errorSubmitB) | ||
|
||
val resultsDF = ErrorHandlingFilterRowsWithErrors.aggregateErrorColumns(srcDf)(e1, e2, e3) | ||
resultsDF.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed, although I don't insist: https://softwareengineering.stackexchange.com/questions/415573/should-unit-tests-contain-print-statements
@benedeki what is your opinion on having prints in automated tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is present in a few tests below as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prints (show
is kind of print) in the final code.
I do use them occasionally for debugging etc during development, but should be removed before the actual merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I was also using them for debugging purposes. I actually forgot to remove them. But, yes they must be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good, @TebaleloS, I'm happy to approve once those tiny things I found are resolved. Great work!
*/ | ||
|
||
object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | |
object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon { | |
*/ | |
object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon { |
Minor: I believe ScalaDoc needs to be "attached" to the class/method. No empty lines between.
val e3 = ErrorHandlingFilterRowsWithErrors.putErrorToColumn(errorSubmitB) | ||
|
||
val resultsDF = ErrorHandlingFilterRowsWithErrors.aggregateErrorColumns(srcDf)(e1, e2, e3) | ||
resultsDF.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prints (show
is kind of print) in the final code.
I do use them occasionally for debugging etc during development, but should be removed before the actual merge.
val resultsDf = ErrorHandlingFilterRowsWithErrors.putErrorsWithGrouping(srcDf)( | ||
Seq(er1, er2, er3) | ||
) | ||
resultsDf.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice documentation for the ErrorHandlingFilterRowsWithErrors
class. Also good test coverage. 😉
- code reviewed
- pulled
- built
No description provided.