You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a real image, but finding annotated data is hard, we have searched 100,000 images out of 3.2 million. They are high quality, if you zoom in, a biologist can easily identify the type of bird.
Your recent paper is as close to anything i've found so I thought I'd get in touch. I'd probably start from this repo and modify. Happy to zoom if interested.
One difference between your work and our idea is that we are looking to synthesize airborne images conditioned on ground data, rather than transform a ground image into an airborne image. Do you have an intuition or guesses on which parts of the workflow would need to change? Ocean backgrounds are pretty simple.
An alternative approach is to use GANs, but the field has moved away from this, your paper hints at this "These challenges include
the drastic viewing angle change, object occlusions, and different ranges of visibility between aerial and ground views.
Some prior works attempted G2A synthesis mainly leveraging Generative Adversarial Networks (GANs) [15] but
lacked explicit geometric constraints [33] or depended on
strong priors like segmentation maps of the aerial view [45]." but doesn't really say why diffusion models are better, just the massive pretraining?
A fine option for us would be to go in a different direction, make a 3D model from sets of photographs, then place this model in a simulated landscape.
Any other tips about working with image generation from Gemini.
There isn't a ton of literature in this area, so any intution you have from your experiments would be valued.
Best,
Ben Weinstein
University of Florida
The text was updated successfully, but these errors were encountered:
Hey Ahmad,
I'm a computer vision developer (https://deepforest.readthedocs.io/) and biologist (https://scholar.google.com/citations?hl=en&user=7POnELAAAAAJ&view_op=list_works&sortby=pubdate) I'm writing a proposal for a cross-view image generation project from ground-based to airborne images. The idea is to use data from INaturalist -> https://www.inaturalist.org/taxa/475120-Ardenna-gravis/browse_photos?term_id=17&term_value_id=18&layout=grid to generate training data for airborne object detection models for fine-grained species classification.
This is a real image, but finding annotated data is hard, we have searched 100,000 images out of 3.2 million. They are high quality, if you zoom in, a biologist can easily identify the type of bird.
Your recent paper is as close to anything i've found so I thought I'd get in touch. I'd probably start from this repo and modify. Happy to zoom if interested.
the drastic viewing angle change, object occlusions, and different ranges of visibility between aerial and ground views.
Some prior works attempted G2A synthesis mainly leveraging Generative Adversarial Networks (GANs) [15] but
lacked explicit geometric constraints [33] or depended on
strong priors like segmentation maps of the aerial view [45]." but doesn't really say why diffusion models are better, just the massive pretraining?
There isn't a ton of literature in this area, so any intution you have from your experiments would be valued.
Best,
Ben Weinstein
University of Florida
The text was updated successfully, but these errors were encountered: