-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation tasks: augmentation of GT images #85
Comments
Generally, we prefer the trait version. Hardcoding the second input as ground truth might not be clear for all scenarios. And it is not very clear what operations should be placed under the category of With a similar idea in mind, we can introduce a new array type on top of abstract type AbstractAugmentorCondition end
struct MatchSkip <: AbstractAugmentorCondition
content::Symbol
end
struct AugmentorImage{T, N, AT<:AbstractArray{T, N}, C<:AbstractAugmentorCondition} <: AbstractArray{T, N}
data::AT
condition::C
end
struct IfOp{T<:ImageOperation, C<:AbstractAugmentorCondition} <: ImageOperation
operation::T
condition::C
end
function applyeager(op::IfOp, img::AugmentorImage)
if match_condition(op.condition, img.condition)
return AugmentorImage(applyeager(op, img.data), img.condition)
else
return img
end
end With this, we can tag images and pipelines so that they can be skipped at certain condition, e.g.,: cond = MatchSkip(:gt)
gts = map(x->AugmentorImage(x, cond), gts)
pl = Rotate([10, -5, -3, 0, 3, 5, 10]) |> IfOp(AdjustContrastBrightness(...), cond)
map(imgs, gts) do img, gt
augment(img, gt, pl)
end How do you think of this? |
To be honest, it seems too complex to me. Let me try to elaborate. Operations and masksI have experience mostly with Python libraries, mainly Albumentations and transforms from torchvision. The state of torchvision transforms is pretty much the same as ours -- GT segmentations are not considered -- so it is not relevant here. On the other hand, Albumentations directly provides an API like this: # Define an augmentation pipeline
transform = Compose([HorizontalFlip(), RandomBrightnessContrast()])
# Without a mask:
transformed = transform(image=image)
# transformed["image"] is the augmented image
# With a mask:
transformed = transform(image=image, mask=mask)
# transformed["image"] is the augmented image and transformed["mask"] the augmented mask Therefore, Albumentations decides which operations are suitable for both images and masks and which only for images. Basically, the rule is "if an operation moves pixels, use it for both the image and mask; if an operation changes colors, use it only for the image". I think it makes sense as any other way would damage the mask:
I am not 100% sure if the rule is really that simple -- I would have to double check. But the point is Albumentations decides which operations are suitable for masks too. Since the library is quite popular, I would guess it covers most/all use cases. So in my opinion, whether an operation is suitable for masks should be a property of the operation, and it should be defined by us. APII would not mind calling augment((img, gt), pl) but it seems incosistent that a 2-tuple would mean an image and a gt segmentation while n-tuple (n>2) would mean n images: augment((img1, img2, img3), pl) Maybe we could use a augment(img => gt, pl)
augment_batch(imgs => gts, pl) # I would love a non-mutating version of augment_batch! |
My main concern about
I need to think about this and take a look at Albumentations. Will reply in a few days. A quick question, does |
Yes, this looks like a missing function. And our existing documentation doesn't explain this very well. My main focus is still on JuliaImages so didn't take care of this package very well. If you have an interest in maintaining this package and adding new features, I believe @Evizero would be very delighted to send you an invitation. |
(just to be clear: "mask" = "gt" = "gt segmentation") I think it satisfies the mask idea from the point of API call. IMHO it is simple enough and unambiguous. The return type could also be a aug_img = augment(img, pl) # Augment just an image, mask not provided (already have this)
aug_img, aug_mask = augment(img => mask, pl) # Augment an image and the corresponding mask (proposed)
I would be interested in maintaining this package and adding new features; however, I don't feel experienced enough to just push new code without consulting. I would feel more comfortable if anyone checked my PRs first. Slightly offtopic: if we had # 1. Only images provided:
# a)
aug_imgs = augment(imgs, pl)
# b)
aug_imgs = augment_batch(imgs, pl)
# 2. Images and masks provided:
# a)
aug_imgs, aug_masks = augment(imgs => masks, pl)
# b)
aug_imgs, aug_masks = augment_batch(imgs => masks, pl) |
i have not thought this through I will admit, but an alternative way of approaching the mask problem would be with a decorator that could be dispatched on. img1_out, img2_out = augment((img1, Mask(img2)), pl) |
A good suggestion! The |
Yes, I also like this suggestion! Would |
Generally we should keep the information and not change the type as much as we can; so it should return a Mask if the input is a Mask. There might be some glue codes and utilities needed to interact with other ecosystems, e.g. Flux. For example, we might need to provide convenient function to uniformly strip/add the Mask wrapper for batch inputs. |
I checked Albumentations and it seems that the rule actually is "pixels move => apply on masks too; colors change => apply on image only". Also note that they support augmentation of bounding boxes and key points. I don't think we need to implement it right now; just to keep in mind that it might be requested in the future. List of transformsDual transforms (applied on images and masks – and possibly bounding boxes and key points)
Image only transforms
|
I think the best would be to not touch the image types too much and to go for the pair notation proposed above. One could then introduce a new function 'applytomasks', check it for each transformation in the pipeline and throw those out for which it is false. applytomasks(::Any) = true
applytomasks(::Rotate90) = true
applytomasks(::ColorJitter) = false |
What I liked about the @Evizero's proposal (let's call it the tuple notation) was that it does not introduce a new "convention", whereas the pair notation does. But on second thought, maybe the pair notation is actually good. The tuple notation allows, e.g., augment((img1, Mask(img2), Mask(img3), img4, img5), pl) which does not have a clear meaning to me. I believe that we should either accept an image, or an image with the corresponding label (a mask for segmentation tasks). And that's what a pair implies, right? augment(img, pl) # Augment an image
augment(img => label, pl) # Augment an image and its label For segmentation tasks, both image(imglabel::Pair) = imglabel.first
label(imglabel::Pair) = imglabel.second
augment(img::AbstractMatrix, pl) = augment_image(img, pl)
augment(imglabel::Pair, pl) = augment_image(image(imglabel), pl) => augment_label(label(imglabel), pl) The function |
Another option that comes to my mind, which would be super user-friendly and probably well extensible to keypoints etc. is to just use a named tuple |
what i like about the dispatch version is that image operations can decide themselves what they apply on and its extensible for users who can define their own decorator types that have their own properties. also it allows for arbitrary number of images and masks etc
What i dislike about the pair notation is the limited scope. it hardcodes that there are two types of image data with a fixed semantic |
Agree with @Evizero, I think we should build the core functionality by dispatching on image-level instead of the collection-level because it allows more flexible composition, and then wrap a thin layer to provide the convenient user interface for segmentation semantics, i.e., hardcoding the pair semantic or making a new function name. Say we have an image restoration task with some ROI, and the label is the restored image: how would the pair version support |
@Evizero It's hard to argue against your points. I guess that the only issue I have with this approach is the fact that the img, mask = data[i]
augmented_img, augmented_mask = augment((img, Mask(mask)), pl)
# now typeof(mask) == typeof(augmented_mask) does not hold I, as a user, don't care about the |
We can hide Meanwhile, use the pair semantic or introduce a new function to do all the wrap-and-unwrap work. |
There are two levels of dispatch happening here:
|
Using the pair notation in the user-level API would still hardcode two types of image data with a fixed semantic, wouldn't it? Also it would not be as flexible as the tuple notation. I like the possibility (even though I don't have a use-case for it now) of doing augment((img, Mask(mask), Mask(mask2), KeyPoint(kp), BBox(bbox), Whatever(we)), pl) which is hard to cover with any single convention (such as |
I'm sorry if I didn't make it clear in my previous comments, we can interpret the function augment(p::Pair{<:AbstractArray, <:AbstractArray}, pl)
f, s = augment((p.first, Mask(p.second)))
return f => unwrap(s)
end This is the user-level API dispatch as I mentioned in #85 (comment) |
So for this specific (and perhaps very common) case where we want to augment an image and its mask, there would be a convenience method, and for other cases (e.g., one image & many masks), the user would call |
Yes, that's my vision here. The internal functionality should dispatch on image level with the decorators because it allows a more flexible combination, but the API part we can be creative and make it intuitive to specific use cases. But of course, this needs to be well documented. |
Okay. That would combine the advantage of both approaches. To sum up... WrappersWe introduce an abstract type for the wrappers, let's say Internal implementationOperations define if they should be applied to masks. This can be done as proposed by @Evizero: applyto(::Mask, ::ColorJitter) = false The augment method goes through all pipeline operations and for each image/wrapper, it checks It might be useful to define a new class of operations, let's say applyto(::Mask, ::ColorOperation) = false
applyto(::Mask, ::AffineOperation) = true User-level APIThe Also, for ease of use, we introduce a new convenience method for segmentation tasks which could look like @johnnychen94 proposed: # Convenience method for augmenting image and segmentation mask
function augment(p::Pair{<:AbstractArray, <:AbstractArray}, pl)
f, s = augment((p.first, Mask(p.second)))
return f => unwrap(s)
end so that the user can augment an image and its mask as aug_img, aug_mask = augment(img => mask, pl) However, the user can always define the semantics themselves and augment any combination of input types: augmented = augment((img1, Mask(img2), Mask(img3), img4), pl)
aug_img1, aug_img2, aug_img3, aug_img4 = unwrap.(augmented) Future extensionsIn case additional features are requested, such as using keypoints instead of masks, we would just implement a new wrapper, say function augment(p::Pair{<:AbstractArray, <:SomeTypeHoldingKeyPoints}, pl)
f, s = augment((p.first, KeyPoints(p.second)))
return f => unwrap(s)
end Does it seem correct? |
This is completely what lies in my mind, thank you so much for putting it together! FWIW, to avoid name conflicts, all these |
This is off-topic: https://invenia.github.io/blog/2019/11/06/julialang-features-part-2/ is a very good explanation on the |
Alright, great! I will have some free time on the weekend so if there are no objections, I can try to come up with a PR then. |
In the case of segmentation tasks, an input consists of an image and a ground-truth segmentation. To augment both the image and GT segmentation using the same operations, we can use
augment((img, gt), pipeline)
However, this cannot be used if the pipeline includes an operation that adjust the colors of images because it would damage the color encoding in the GT segmentations. As of now, we do not have such operations but non-geometric operations (such as changing contrast) were requested in #16, and PR #84 introduces first such operations.
Generally, some operations (e.g., all affine operations) are desired to be applied on both the image and GT, while others (e.g., contrast adjustment) should be applied only on the image (and not GT). I think it raises two questions:
I think that (1) could be resolved by introducing a new
abstract type ImageOnlyOperation <: Operation
. All operations that would damage GT segmentations would be subtypes of this abstract type. The existing typeImageOperation
would denote operations that should be applied on everything (images and segmentations).For (2), I think the API could be
The text was updated successfully, but these errors were encountered: