Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending for fine-grained classfication #1

Open
bw4sz opened this issue Jan 17, 2025 · 0 comments
Open

Extending for fine-grained classfication #1

bw4sz opened this issue Jan 17, 2025 · 0 comments

Comments

@bw4sz
Copy link

bw4sz commented Jan 17, 2025

Hey Ahmad,

I'm a computer vision developer (https://deepforest.readthedocs.io/) and biologist (https://scholar.google.com/citations?hl=en&user=7POnELAAAAAJ&view_op=list_works&sortby=pubdate) I'm writing a proposal for a cross-view image generation project from ground-based to airborne images. The idea is to use data from INaturalist -> https://www.inaturalist.org/taxa/475120-Ardenna-gravis/browse_photos?term_id=17&term_value_id=18&layout=grid to generate training data for airborne object detection models for fine-grained species classification.

This is a real image, but finding annotated data is hard, we have searched 100,000 images out of 3.2 million. They are high quality, if you zoom in, a biologist can easily identify the type of bird.
Image

Your recent paper is as close to anything i've found so I thought I'd get in touch. I'd probably start from this repo and modify. Happy to zoom if interested.

  1. One difference between your work and our idea is that we are looking to synthesize airborne images conditioned on ground data, rather than transform a ground image into an airborne image. Do you have an intuition or guesses on which parts of the workflow would need to change? Ocean backgrounds are pretty simple.
  2. An alternative approach is to use GANs, but the field has moved away from this, your paper hints at this "These challenges include
    the drastic viewing angle change, object occlusions, and different ranges of visibility between aerial and ground views.
    Some prior works attempted G2A synthesis mainly leveraging Generative Adversarial Networks (GANs) [15] but
    lacked explicit geometric constraints [33] or depended on
    strong priors like segmentation maps of the aerial view [45]." but doesn't really say why diffusion models are better, just the massive pretraining?
  3. A fine option for us would be to go in a different direction, make a 3D model from sets of photographs, then place this model in a simulated landscape.
    Image
  4. Any other tips about working with image generation from Gemini.

There isn't a ton of literature in this area, so any intution you have from your experiments would be valued.

Best,

Ben Weinstein
University of Florida

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant