Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement error handling that will filter the rows that have any error: #93 #94

Conversation

TebaleloS
Copy link
Contributor

No description provided.

benedeki and others added 11 commits March 7, 2023 04:29
* new functions `null_col` and `call_udf`
* `ErrorMessage` refactoring
* `ErrorHandling` trait designed to serve as the interface for different implementations
* Implement error handling by putting the info into column of `ErrorMessage` array
* numerous support classes
* headers fix
…e of raw values

* `ErrorMessageSubmitJustErrorValue` class created to offer the ability to submit errors without source column but with error value
* `ErrorMessageSubmitWithoutColumn` changed from `case class` to `class` to allow inheritance
* some PR comments addressed
…or-handling-map-errcol

Introducing Map for error columns and their values
* Fixed few minor things discovered by the UTs
@TebaleloS TebaleloS changed the base branch from master to feature/83-create-a-spike-for-error-handling March 29, 2023 07:11
@benedeki benedeki force-pushed the feature/83-create-a-spike-for-error-handling branch from 64dc95e to c50dd16 Compare April 1, 2023 14:47
TebaleloS and others added 5 commits April 28, 2023 19:11
…dling/implementations/ErrorHandlingFilterRowsWithErrors.scala

Co-authored-by: David Benedeki <[email protected]>
…dling/implementations/ErrorHandlingFilterRowsWithErrors.scala

Co-authored-by: David Benedeki <[email protected]>
…e-rows-that-have-any-error' of https://github.com/AbsaOSS/spark-commons into feature/93-Implement-error-handling-that-will-filter-the-rows-that-have-any-error
Base automatically changed from feature/83-create-a-spike-for-error-handling to master May 4, 2023 14:57
@github-actions
Copy link

github-actions bot commented May 5, 2023

JaCoCo code coverage report - scala:2.11 - spark:2.4

File Coverage [95.97%] 🍏
ErrorMessageSubmitWithoutColumn.scala 100% 🍏
ErrorHandlingFilterRowsWithErrors.scala 100% 🍏
ColumnOrValue.scala 96.58% 🍏
ErrorHandling.scala 83.82% 🍏
ErrorMessageSubmit.scala 0%
Total Project Coverage 82.07% 🍏

@github-actions
Copy link

github-actions bot commented May 5, 2023

JaCoCo code coverage report - scala:2.12 - spark:3.3

File Coverage [97.5%] 🍏
ErrorMessageSubmitWithoutColumn.scala 100% 🍏
ErrorHandlingFilterRowsWithErrors.scala 100% 🍏
ErrorHandling.scala 100% 🍏
ColumnOrValue.scala 96.62% 🍏
ErrorMessageSubmit.scala 0%
Total Project Coverage 85.42% 🍏

package za.co.absa.spark.commons.errorhandling.implementations

import org.apache.spark.sql.{Column, DataFrame}
import org.apache.spark.sql.functions.{coalesce, col, lit}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col is never used


package za.co.absa.spark.commons.errorhandling.implementations
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{col, column, length}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column is never used

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{col, column, length}
import org.scalatest.funsuite.AnyFunSuite
import za.co.absa.spark.commons.errorhandling.ErrorMessage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrorMessage is never used

}

test("putError and putErrors does not group by together") {
// val errorMessageArray = ErrorMessageArray()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code, I believe it can be removed, unless you have some plans with this?

resultDf.as[ResultDfRecordType].collect().sortBy(_._1).toList
}

test("aggregateErrorColumns\" should \"return an empty list after error aggregation\"") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all these \" there?

assert(results == expectedResults)
}

test("aggregateErrorColumns\" should \"return records whose don't have errors\"") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all these \" there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those I can actually remove, they are just for readability.

val e3 = ErrorHandlingFilterRowsWithErrors.putErrorToColumn(errorSubmitB)

val resultsDF = ErrorHandlingFilterRowsWithErrors.aggregateErrorColumns(srcDf)(e1, e2, e3)
resultsDF.show()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed, although I don't insist: https://softwareengineering.stackexchange.com/questions/415573/should-unit-tests-contain-print-statements

@benedeki what is your opinion on having prints in automated tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is present in a few tests below as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prints (show is kind of print) in the final code.
I do use them occasionally for debugging etc during development, but should be removed before the actual merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I was also using them for debugging purposes. I actually forgot to remove them. But, yes they must be removed.

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good, @TebaleloS, I'm happy to approve once those tiny things I found are resolved. Great work!

Comment on lines 26 to 28
*/

object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*/
object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon {
*/
object ErrorHandlingFilterRowsWithErrors extends ErrorHandlingCommon {

Minor: I believe ScalaDoc needs to be "attached" to the class/method. No empty lines between.

val e3 = ErrorHandlingFilterRowsWithErrors.putErrorToColumn(errorSubmitB)

val resultsDF = ErrorHandlingFilterRowsWithErrors.aggregateErrorColumns(srcDf)(e1, e2, e3)
resultsDF.show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prints (show is kind of print) in the final code.
I do use them occasionally for debugging etc during development, but should be removed before the actual merge.

val resultsDf = ErrorHandlingFilterRowsWithErrors.putErrorsWithGrouping(srcDf)(
Seq(er1, er2, er3)
)
resultsDf.show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, please remove.

Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation for the ErrorHandlingFilterRowsWithErrors class. Also good test coverage. 😉

  • code reviewed
  • pulled
  • built

@TebaleloS TebaleloS merged commit 279b4e5 into master May 8, 2023
@TebaleloS TebaleloS deleted the feature/93-Implement-error-handling-that-will-filter-the-rows-that-have-any-error branch May 8, 2023 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants