Usability of this package and the JuliaGeo ecosystem #1

mkborregaard · 2019-01-29T07:51:07Z

Ported from Slack. Me:

Over the last week or two I’ve been working on very simple raster implementation in pure julia. The reason for doing that is that when I had a workshop earlier this month, with 30 very interested biogeographers at a conference, I asked them what it would take for them to move to Julia. And they almost all said “convenient raster support”. And - it is my impression that that support is not quite there yet in Julia. Of course you can do amazing things with ArchGDAL, but in truth almost all of my work with rasters involve getting data out of them and using that for analyses, and some plotting.

So, I designed an analysis in R: https://www.dropbox.com/s/n8790r0s63d56hm/R%2Bworked%2Braster%2Bexample.html?dl=0 and set out to create a package that could replicate it in Julia. The result is https://github.com/mkborregaard/VerySimpleRasters.jl and the replicated analysis is here: https://www.dropbox.com/s/d5mc52corz5hm7s/VerySimpleRasters%2Bexample.html?dl=0

All operations happen on disk, out of memory, and the script is very fast even for the relatively big raster in the example (except the predictable waiting time on plots).

There’s still lots to do - so far it uses R’s native raster format as it’s own native format, and supports importing ESRI Ascii grids to that format. Other formats could possibly be provided by GDAL in some way like the old RasterIO package?

I’d really love inputs - how to make this consistent with the Geointerface, is it worth continuing in this direction or have I overlooked some obvious existing capabilities etc.

For anyone wanting to play with the package, notebooks and data, I’ve collected everything in a nice folder here: https://www.dropbox.com/sh/4shiomz5gj4ggx5/AAASQv_K5p2OadhwL-CdRuEja?dl=0

cc @yeesian @visr @c42f

mkborregaard · 2019-01-29T07:51:39Z

Then @yeesian:
Yeesian Ng [5:18 AM]
I think the best way forward is to create the package you want to be using for your daily needs, and to just start using it. And feel free to evangelize it to people if it’s something you feel like maintaining and supporting. @juliohm has been developing and maintaining GeoStats.jl successfully for instance. It could be that parts of the JuliaGeo ecosystem may mature and make things easier in the future, but there’s no timeline for it (since there’s no consistent developmental effort that we can count on). So IMO people shouldn’t refrain from working on their own packages for processing rasters/etc in the meantime.
I am interested in the questions you’re bringing up too, e.g. when you write: “Here are a number of issues. First of all, VerySimpleRasters doesn’t know about Shapefiles. It just knows about polygons, and that they are a vector of Tuples. Maybe it would be useful for Shapefiles to define a polygon in accordance with GeometryTypes? Maybe GeometryTypes could satisfy the Geointerface, or the other way around? So, first I have to extract the points as a Tuple. Then, when I plot it, I see there is a problem: There are actually two polygons here, but they are treated as one. Since this breaks my masking code (but not R’s), I have to define a function to split the Polygons whenever a line returns to the starting point.”
Ideally GeoInterface should have a sufficiently granular interface for others to use. The most common comprehensive one that I know of is https://www.opengeospatial.org/standards/sfa. But it will be hard to provide and maintain implementations for the full spec upfront, and it’ll be more reasonable for us to build up to it as we go (according to demonstrated need).

mkborregaard · 2019-01-29T07:53:09Z

@yeesian:
In yeesian/ArchGDAL.jl#76 , it is now:

julia> using ArchGDAL
ArchGDAL

julia> dataset = ArchGDAL.read("temperature/bio1.bil")
GDAL Dataset (Driver: MEM/In Memory Raster)
File(s):

Dataset (width x height): 2160 x 900 (pixels)
Number of raster bands: 1
[GA_Update] Band 1 (Undefined): 2160 x 900 (Int16)


julia> ArchGDAL.read(dataset, 1) # reads the first (and only band)
2160×900 Array{Int16,2}:
-9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
   ⋮                                  ⋮  ⋱             ⋮
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999  …  -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999
-9999  -9999  -9999  -9999  -9999  -9999     -9999  -9999  -9999  -9999  -9999  -9999

so the main thing it saves you is not having to write the driver code for parsing the file format
alternatively, you don’t have to check out that PR, and can do the same (more efficiently) with

using ArchGDAL

raster = ArchGDAL.read("temperature/bio1.bil") do dataset
    ArchGDAL.read(dataset, 1)
end

but I don’t think it’ll be as fast as mmap in

VerySimpleRasters.jl/src/grd_file.jl

Lines 66 to 82 in 3fc20f4

    
           function VerySimpleRaster(grdfile::String) 
        
               @assert grdfile[end-3:end] ∈ (".grd", ".gri") 
        
               fname = grdfile[1:end-4] 
        
               data = read_grd(fname*".grd") 
        
               ncols = parse(Int, data["nrows"]) 
        
               nrows = parse(Int, data["ncols"]) 
        
               dt = datatype_translation[data["datatype"]] 
        
               fname = abspath(fname*".gri") 
        
               mat = open(fname, "r") do IO 
        
                   Mmap.mmap(IO, Matrix{dt}, (nrows, ncols)) 
        
               end 
        
               xmin, xmax = parse(Float64, data["xmin"]), parse(Float64, data["xmax"]) 
        
               ymin, ymax = parse(Float64, data["ymin"]), parse(Float64, data["ymax"]) 
        
               celly = (ymax - ymin) / ncols 
        
               cellx = (xmax - xmin) / nrows 
        
               VerySimpleRaster(mat, parse(dt, data["nodatavalue"]), xmin, ymin, cellx, celly, data["projection"], fname) 
        
           end

mkborregaard · 2019-01-29T07:53:32Z

Me:
So could these things be unified? I mean, could a solution be to have GDAL open a mmapped on-disk matrix with metadata (essentially what a VerySimpleRaster is). That would allow us to close the file again (making the operation safe and avoiding the function syntax), and also to treat the raster as a completely normal Julia Matrix (giving access to e.g. the entire Images stack for analysis). And if it is necessary to use GDAL to work directly on the on-disk representation, the VerySimpleRaster format keeps path to the input file, so functions on VerySimpleRaster could open the file, do the operation with GDAL, then close it again. Creating a seamless on-disk experience.

mkborregaard · 2019-01-29T08:01:06Z

@visr :
Thing with gdal and mmap is that gdal supports many formats, only a few of them have binary blocks that can be fully memory mapped. Some formats are ascii, or have compression, tiling etc, so their disk representation is not the same as in memory representation.
This is why the ArchGDAL example shows (Driver: MEM/In Memory Raster)

It's the reason we need GDAL as an abstraction layer if we want to support so many formats
That of course leaves room for packages like VerySimpleRasters that only support some formats, for memory mapping reasons.

mkborregaard · 2019-01-29T08:07:25Z

Great! And yes agreed, I’d be perfectly content to support the mmappable ones natively and provide a conversion for the others (via GDAL most likely)

yeesian · 2019-01-29T12:46:27Z

GDAL allows you to bring-your-own format-specific drivers, but we (JuliaGeo and/or OSGeo) are not at the point where we can allow you to easily swap in a native julian implementation of a driver into GDAL. Nonetheless, if you're interested, here is their tutorial. Their code is X/MIT-licensed, so you can see how they currently do it. E.g. here's open() for ehdr.

Is it really so slow that you need to be implementing your own driver though? You can always file an issue upstream in the GDAL repository if so: it might benefit GDAL users in R/Python/Rust/etc too.

mkborregaard · 2019-01-29T12:50:52Z

No it's not that it's slow - it's that I want to avoid the clunky interface that GDAL gives you. Julia's AbstractMatrix stack is so strong - I think much can be said for having a direct julia representation of rasters, with the option of calling GDAL on them once you need something advanced. I would never in my practical work need all the capabilities of GDAL; in fact not any, other than opening different raster formats and possibly reprojection, so for me it's overkill to have to registerdrivers() do etc. for simple things.

yeesian · 2019-01-29T12:57:17Z

No it's not that it's slow - it's that I want to avoid the clunky interface that GDAL gives you. [...] so for me it's overkill to have to registerdrivers() do etc. for simple things.

Just to be clear: that's just a limitation of an existing julia wrapper, which I'm actively working to address in yeesian/ArchGDAL.jl#76, and not a fundamental limitation of working with GDAL though.

You can still easily read in the raster data (see the examples in #1 (comment)) and have access to all the other metadata and functionality for free via GDAL.

yeesian · 2019-01-29T12:59:59Z

No it's not that it's slow

The only reason to be parsing the grd file yourself, like in

VerySimpleRasters.jl/src/grd_file.jl

Lines 66 to 82 in 3fc20f4

    
           function VerySimpleRaster(grdfile::String) 
        
               @assert grdfile[end-3:end] ∈ (".grd", ".gri") 
        
               fname = grdfile[1:end-4] 
        
               data = read_grd(fname*".grd") 
        
               ncols = parse(Int, data["nrows"]) 
        
               nrows = parse(Int, data["ncols"]) 
        
               dt = datatype_translation[data["datatype"]] 
        
               fname = abspath(fname*".gri") 
        
               mat = open(fname, "r") do IO 
        
                   Mmap.mmap(IO, Matrix{dt}, (nrows, ncols)) 
        
               end 
        
               xmin, xmax = parse(Float64, data["xmin"]), parse(Float64, data["xmax"]) 
        
               ymin, ymax = parse(Float64, data["ymin"]), parse(Float64, data["ymax"]) 
        
               celly = (ymax - ymin) / ncols 
        
               cellx = (xmax - xmin) / nrows 
        
               VerySimpleRaster(mat, parse(dt, data["nodatavalue"]), xmin, ymin, cellx, celly, data["projection"], fname) 
        
           end

is if you deem the corresponding GDAL driver "too slow", which doesn't seem to be the case for you.

mkborregaard · 2019-01-29T13:00:03Z

Nice. So just to understand, do you not think there isn't scope for the pure-julia mmap approach once you're further with ArchGDAL?

By the way,

The only reason to be parsing the grd file yourself

another reason for doing this is then this package does not need a dependency on GDAL. The GDAL integration could live somewhere else.

mkborregaard · 2019-01-29T13:01:06Z

I thought you wrote that GDAL always loads it into memory? The thing is I want a handle on the raster contents, on-disk, that I can access directly (e.g. via getindex) - I don't care about which driver.

yeesian · 2019-01-29T13:05:56Z

I thought you wrote that GDAL always loads it into memory?

That's just my own implementation of it in https://github.com/yeesian/ArchGDAL.jl/blob/d5ee1fd0c7b6b3932178d1696ac3a6284fab48dd/src/dataset.jl#L259-L271.

GDAL doesn't load data into memory by default. But you might run into nasty memory bugs later, e.g. like in JuliaGeo/LibGEOS.jl#58.

mkborregaard · 2019-01-29T13:08:09Z

So just to understand what you suggest exactly - scope for separate packages? Integration? Abandon this as soon ArchGDAL is finished with developments in the process of being merged?

yeesian · 2019-01-29T13:22:19Z

There is currently GeoJSON.jl and Shapefile.jl which natively reads into geojson and shapefile formats too. They are vector data formats, so we borrowed an "informal" interface in python for https://github.com/JuliaGeo/GeoInterface.jl for some degree of unification across those packages.

There is a discussion (which you're involved in) on a similar kind of interface for rasters. JuliaGeo has NetCDF.jl too, so it could help e.g. unify plotting for rasters across NetCDF and "R Raster and EHdr" (which this package seems to be oriented around. See https://www.gdal.org/frmt_various.html and the references therein for details.) and Arch/GDAL. But I don't work enough with rasters, or know enough about them to have an informed opinion about the direction it (the raster interface) might take.

rafaqz · 2019-05-27T05:20:19Z

I found this while thinking about writing something similar myself. I agree entirely that better raster handling is pretty fundamental to doing biogeography in Julia!

I feel that we need a RasterBase package that defines AbstractRaster and methods for working with coordinates and projections. I'm writing a bunch of packages for spatial simulations and this should be a basic type for all of them... It would also be great if my simulation packages kept spatial metadata bundled with matrices, so we can remove the need for passing in spatial parameters, and just plot the results with a recipe.

VerySimpleRaster seems like a good demonstration of this, but it would be great to abstract the interface, then provide integrations for NetCDF, ArchGDAL and similar packages.

I've read some of the other threads talking about something like this, is there a plan for something similar already in progress, or should I just go ahead and implement something?

mkborregaard · 2019-05-27T06:15:12Z

I'd happily depend on and adhere to an Abstract interface. I think it's just a question of doing it. In addition to @yeesian it'd probably be useful to speak to @evetion who has GeoRasters.

An alternative (possibly better) would be to put the abstract interface here, with the concrete implementation here as a fallback for the lightest case? This package has essentially 0 dependencies (well except RecipesBase but that's essentially a stdlib), to make it possible for more advanced implementations to depend on it at 0 cost. Happy to change this package to accomodate.

btw there's also some discussion here JuliaGeo/GeoInterface.jl#16 as any abstract interface should be compatible with the geointerface imho.

rafaqz · 2019-05-27T06:49:50Z

Yeah I was also thinking the interface should include a concrete component for the most basic case.

I'll read through those threads too. Unfortunately I'm not that experienced working with rasters and spatial data, I've mostly written models and frameworks that handle matrices, and let others do the actual raster manipulation in R! But that's a short term solution that needs attention soon. I'll put a small base package together in a few weeks (after thesis hand in!) but feedback will be really useful.

Isn't there a HDF5 and Mmap dep in this package? I was thinking it should be much more lightweight than that.

mkborregaard · 2019-05-27T06:58:26Z

Mmap is part of the Base Julia distribution, so not a dep, just like Dates. HDF5 isn't used - it's just in the project.toml because I was experimenting with it.
Before you start building an alternative package, do consider if we can do it in the context of this one. Happy to give you push rights if you're interested in collaborating.

rafaqz · 2019-05-27T07:04:46Z

Sure! if you're open to that I can work here.

I noticed you mentioned the problem of time as well in the other thread. It's a common problem for me too. I also have to deal with vertical dimension of layered environmental data. So it would be good to define common ways of accessing additional temporal and spatial dimensions. I was going to write a MicroclimateBase package specifically for this, but it would be good if that could depend on a more general base package and just add methods for the specific environmental variables instead of the whole framework for data access.

mkborregaard · 2019-05-27T09:22:55Z

Great, you should have push rights now. And yes goot point on the time dimension.

juliohm · 2019-05-27T09:26:32Z

Hi Rafael, Most of GeoStats.jl is about generalizing spatial models to work on any spatial domain type. We currently have a great amount of abstraction and plot recipes for plotting solutions over domains, spatial data, etc. At some point I will look again into the JuliaGeo packages to see if some integration is missing. The ClimateTools.jl package is already trying some integration with GeoStats.jl to let users run models on NetCDF seamlessly. Let me know what kinds of models you have in mind and I can tell you if it fits the current implementation.

…

On Mon, May 27, 2019, 04:04 Rafael Schouten ***@***.***> wrote: Sure! if you're open to that I can work here. I noticed you mentioned the problem of time as well in the other thread. It's a common problem for me too. I also have to deal with vertical dimension of layered environmental data. So it would be good to define common ways of accessing additional temporal and spatial dimensions. I was going to write a MicroclimateBase package specifically for this, but it would be good if that could depend on a more general base package and just add methods for the specific environmental variables instead of the whole framework for data access. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AAZQW3J4WW45YHH2VP45DLTPXOBY5A5CNFSM4GS55GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWI65QQ#issuecomment-496103106>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZQW3MS27ZEJQTRLIJDOIDPXOBY5ANCNFSM4GS55GSQ> .

rafaqz · 2019-05-27T11:54:15Z

Great! NetCDF integration is something I need too.

The packages I'm developing that are specifically related are Dispersal.jl and Cellular.jl for general Cellular automata style models, and dispersal modelling with heterogeneous spatial data and various dispersal and growth processes. GrowthRates.jl provides simple growth rate grids for dispersal modelling from raster climate data (from HDF5), and Microclimate.jl for providing common microclimate data for more complex species distribution modelling, from NetCDF files. They are all in various stages of development, and all need more generalised interfaces, and further abstraction of the NetCDF and HDF5 data sources. Some kind of AbstractRaster seems to be a key component, although its taken me a while to realise that.

Honesty, I don't really understand what GeoStats and ClimateTools do! there's a bit of a mismatch in the applications and language we use in ecology/biogeography.
Browsing the code it seems like AbstractSpatialData in GeoStatsBase could be what we are looking for? although the T<:Real limit would be a problem for me (Unitful).

I see ClimateTools.jl deals with 3d and 4d data in which something I specifically need for working with microclimates and I'm doing in ad-hoc ways currently. The ClimGrid type also seems to have a lot of what we need for biogeography, but also a lot more that we don't need! But inheriting from some shared abstract type would be good.

mkborregaard · 2019-05-27T12:15:19Z

@rafaqz if you're working on Species Distribution Models you may also want to check out @tpoisot 's work here: https://github.com/EcoJulia/SimpleSDMLayers.jl (just started last week) and here: https://gitlab.com/tpoisot/BioClim

mkborregaard · 2019-05-27T12:17:48Z

After looking at your repos, in general you might actually be interested in the EcoJulia organisation - the mission is to create a framework for various aspects of community ecology and geographical ecology in Julia.

rafaqz · 2019-05-27T12:27:07Z

I've been thinking about EcoJulia for a while! Was waiting until I had a few things to contribute :)

I mostly develop mechanistic SDMs, dispersal models and biophysical modelling tools (also see my other works in progress Photosynthesis.jl and DynamicEnergyBudgets.jl)

But this means my data requirements are often a lot more fine grained spatially and temporally than worldclim style data ins SDMLayers. But both of those packages seem like they would good candidates for incorporation into a more abstract approach to raster data?

juliohm · 2019-05-27T12:31:51Z

The AbstractSpatialData type in GeoStatsBase.jl covers all spatial data types in the docs: https://juliohm.github.io/GeoStats.jl/stable/spatialdata/

I did put the T<:Real restriction there without thinking about Units. But we surely can extend the type to any T, and handle units more gracefully. I have to check out the Units packages in Julia first. I am not using them yet.

rafaqz · 2019-05-27T12:36:42Z

Ok great, that would probably be a good type to inherit from then, although I'll have to get my head around your package more.

(I like to add units wherever load NetCDF with known units to memory, so that everything is automatically correct after that)

mkborregaard · 2019-05-27T12:45:25Z

IMHO the priority should be to be compatible with / inherit from the JuliaGEO types. Your types are just compatible among your own packages @juliohm , right?

rafaqz · 2019-05-27T12:46:04Z

One issue for me is the "Stats" part of GeoStatsBase!

I rarely do statistical modelling, and tools for stats do complicate the package considerably in terms of contributing to it, and orienting the ecosystem of (non-statistical) tools I would be building around it. I was imagining something more like GeoBase, with no deps at all, just an abstract interface for basic spatial types.

rafaqz · 2019-05-27T12:48:13Z

Ok I still don't have my head around how JuliaGEO and all these packages relate!

Should we be aiming to dproduce some common basis for all (or as many as possible) of these use cases?

rafaqz · 2019-05-27T13:24:26Z

I agree on leaving Missing up to implementations, sometimes I specifically disallow missing in modelling applications as it can be much slower. (Edit: and you specifically can't use missing in dispersal simulations unless you want to output whole grid of missings. But I still want spatial metadata to pass through to the output array...)

mkborregaard · 2019-05-27T13:28:51Z

Yeah sounds good, and also essentially many of the functions I define here are part of a minimal interface I'd think, eg. getindex, extract, crop, mask, bbox and in. Maybe even aggregate though that's not as minimal?

rafaqz · 2019-05-27T13:36:19Z

It might be good to add some kind of abstract type hierarchy to deal with additional spatial and temporal dimensions, clarifying what N (in AbstractGeoArray{N,T}) means for any particular dataset in a standard way. Handling 3d and 4d datasets. I'm mostly using 4d microclimates so this is particularly important to me, sometimes 5d...

ClimateTools.jl deals with 3 or 4 dimensions like this, so N is 3 or 4. Time is always the last dimension, with the first two dimensions being long lat, and the third dimension is elevation for 4d:

  if ndims(data) == 3
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:time}(timeV))
  elseif ndims(data) == 4 # this imply a 3D field (height component)
    # Get level vector
    plev = ds["plev"][:]#NetCDF.ncread(file, "plev")
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:plev}(plev), Axis{:time}(timeV))
  else
    throw(error("load takes only 3D and 4D variables for the moment"))
end
```

mkborregaard · 2019-05-27T15:27:02Z

@juliohm just to be explicit I'm 100% in favour of integration across the julia ecosystem, and I think the ecosystem you're singlehandedly building is impressive. I'd just prefer this happened through shared compatibility with the JuliaGeo org. Does this make sense to you?

juliohm · 2019-05-27T15:42:37Z

Hi Michael, I fully agree. What about we set an online meeting with everyone to go over some of the base packages containing spatial types to see what are the most productive paths of development? I'm currently doing my research single-handed basically because I don't understand what is out there. Maybe it is an issue of documentation? I should also say that I'm not coming from a geography background but from a geostats background and that makes things a little bit more challenging for me to grasp. I'll try to go over the JuliaGeo packages once again to find out the overlaps. It would be really interesting if we could gather people online to present slides or notebooks illustrating the concepts. Or maybe a general presentation containing all the infrastructure planned for the future.

…

On Mon, May 27, 2019, 12:27 Michael Krabbe Borregaard < ***@***.***> wrote: @juliohm <https://github.com/juliohm> just to be explicit I'm 100% in favour of integration across the julia ecosystem, and I think the ecosystem you're singlehandedly building is impressive. I'd just prefer this happened through shared compatibility with the JuliaGeo org. Does this make sense to you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AAZQW3OGJIJKOMFE2SAX4FTPXP4UNA5CNFSM4GS55GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWKCCMY#issuecomment-496247091>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZQW3IENISCRGTJAE2CMLLPXP4UNANCNFSM4GS55GSQ> .

tpoisot · 2019-05-27T20:11:55Z

Would love to attend the meeting - as @mkborregaard said, we are working on SimpleSDMLayers for rapid development of bioclim-type models, but there are a lot of niches to fill here. We can decide to work with compatible types, or decide on a common interface to implement for all types, or both. These are not trivial decisions, and the more people express their opinions, the better.

rafaqz · 2019-05-28T00:47:45Z

@juliohm I'm keen to attend such a meeting. It would be good to delay for 2 weeks so that there is time to learn enough about everyones packages, and write a simple API package to have as a discussion point? (I also need to finish my thesis this week)

@mkborregaard The number of use cases here makes me think we need a very simple package without an implementation, just for clarity. So starting a fresh package might be better.

rafaqz · 2019-05-28T01:23:13Z

Key inks from this thread to avoid scrolling around to find them:

Related discussion:
JuliaGeo/GeoInterface.jl#16
https://juliohm.github.io/GeoStats.jl/stable/spatialdata/

Related packages from this thread:

Biogeography/ecology
https://github.com/mkborregaard/VerySimpleRasters.jl
https://github.com/EcoJulia/SpatialEcology.jl
https://github.com/rafaqz/Dispersal.j
https://github.com/rafaqz/GrowthRates.jl (poorly named! its all loading/generating raster data)
https://github.com/rafaqz/Microclimate.jl
https://github.com/EcoJulia/SimpleSDMLayers.jl
https://gitlab.com/tpoisot/BioClim

Climate
https://github.com/Balinus/ClimateTools.jl/

Geo
https://github.com/JuliaGeo/GeoInterface.jl
https://github.com/JuliaGeo
https://github.com/juliohm/GeoStatsBase.jl
https://github.com/juliohm/GeoStats.jl
https://github.com/evetion/GeoArrays.jl
https://github.com/yeesian/ArchGDAL.jl
https://github.com/Alexander-Barth/NCDatasets.jl

EDIT more related packages not mentioned here:
https://github.com/JuliaGeo/Geodesy.jl/ (has datums)
https://github.com/FugroRoames/CoordinateTransformations.jl
https://github.com/FugroRoames/SpatialGrids.jl
https://github.com/EcoJulia/EcoBase.jl
https://github.com/JuliaGeo/Proj4.jl
https://github.com/anowacki/BritishNationalGrid.jl

mkborregaard · 2019-05-28T06:28:47Z

So starting a fresh package might be better.

Sure, if you think so, we can always add stuff in from existing packages later

rafaqz · 2019-06-07T04:37:45Z

Heres a prototype base package for discussion:

https://github.com/rafaqz/GeoArrayBase.jl/blob/master/src/GeoArrayBase.jl

It might need to be called GeoBase as I incorporated non-array geo data types as in GeoStatsBase etc that should use the same methods.

As well as basic methods it has some definitions of array dimensions, crs projection formats and calendars that don't seem to be standardised anywhere. They probably belong in other Base packages.

juliohm · 2019-06-16T12:44:53Z

@rafaqz, the T<:Real restriction in GeoStatsBase.jl is gone now, so in theory one should be able to use Unitful.jl coordinates with the spatial objects defined therein.

Thanks for putting together a GeoArrayBase.jl proposal. I believe that we also need to concentrate on other spatial configurations in this GeoBase.jl package we are discussing. For example, structured grids, point collections, images, etc. These types are kind of what I have in GeoStatsBase.jl + GeoStatsDevTools.jl, and they serve my current needs. However, they lack the concept of projections, for example. Also, I did not have a chance to include the time dimension in any of those. I don't know if space and time should be mixed actually.

I did a major rewrite of the interface in GeoStatsBase.jl to facilitate some code reuse (e.g. plot recipes) and other planned features. In the spatialobject.jl file you can find a general interface for anything spatial (that fits in the concept of point-based measurements). From this abstract type, we can define at least two sub-types, which are still abstract: spatial domain, and spatial data.

The domain type is pretty much the same as a general spatial object type. It only materializes the idea of a collection of points with coordinates. It is basically a trick to use the spatial object interface, and still have a new type on which to dispatch.

The spatial data type is a domain plus some data. For example, for each point of the underlying domain, we can save properties. Most implementations in GeoStatsDevTools.jl save these properties in a dictionary mapping symbols to flat vectors of values. Note, however, that because we inherit from spatial object, and because we have domain types already implemented, all we need to do is wrap the domain in a new type containing the domain and an extra dictionary. Everything else we get for free regarding spatial queries.

This interface is working nicely so far within GeoStats.jl, and I'be happy to go over it in a call if you guys feel it is appropriate. My hope is that at some point these domain types that I am writing can be reviewed and extended to become a separate GeoBase.jl or GeoInterface.jl package.

As a final comment, notice that all these types that I wrote above are point-based. I don't have much experience with polygon-based or area-based spatial objects. This GeoInterface.jl package that we are discussing could encompass both classes of spatial data.

rafaqz · 2019-08-06T05:56:58Z

I have an update on this. I ended up putting together DimensionalData.jl as an abstract xarray-like indexing system that can be used to manipulate and plot tagged dimensional data, that suits this problem better than AxisArrays.

GeoData.jl inherits from that, but adds spatial metadata, and hopefully integrates with spatial data as you are describing @juliohm, Maybe we could start an issue there to discuss this further if this direction seems useful to you? I've looked at AbstractSpatialObject etc but there is still a lot I don't understand the reasoning for - its a big package.

There are some examples, one of which is this (g is a 3d GeoArray):

# plot the mean sea surface temperature around Australia from August to December 2002
dimranges = Time((DateTime360Day(2002, 08, 1), DateTime360Day(2002, 012, 30))), Lat(-45:0.5), Lon(110:160)
select(g, dimranges) |> x->mean(x; dims=Time) |> plot

Which plots ~correctly with axis labels and metadata even after select() and mean(). It would be great if select could accept a polygon as well.

@mkborregaard I'm not sure how much this covers what you wanted with VerySimpleRasters. The main missing piece is to integrate with NCDatasets, GDAL (or wrapper packages) for specific load/save/conversion. Which will probably require a lot more changes to GeoDims as well... There are lots of other small pieces unfinished as well, and feedback would be very useful.

@evetion Although this doesn't look like it will work with affine transforms, it will... But they are optional as a lot of the time we don't need them. AffineDims will just wrap any subset of the dimensions and hold the AffineMap. So a concrete type doesn't need an affine field, just dims and any combination will just work. That also means normal indexing just uses dimensions, but select() will use the affine transform when it exists.

juliohm · 2019-08-06T11:03:05Z

Thank you again @rafaqz for your efforts! I am still trying to understand to what extent the types proposed can be reused directly within GeoStats.jl. Could you please confirm that the types support non-regularly sampled points? For example, something that looks like this or simple point sets without array-like layouts.

Besides this non-regular types, I need to extend the existing types in GeoStatsBase.jl to introduce the notion of volume element. Some geostatistical methods can handle spatial measurements over different support. For example, it is common to have measurements of variable A in one resolution, and measurements of another variable B in a different resolution. In order to handle this scale difference explicitly in statistical models, we need to encode the volume of the measurement.

I am starting to think that we don't need types, but a comprehensive traits interface. It will be hard to conciliate all the proposed packages since everyone in JuliaGeo has different use cases.

mkborregaard · 2019-08-06T11:06:32Z

Hi I'm really sorry for not engaging more with this. I'm currently in Ecuador and will be travelling various places until mid-September - at which time I will absolutely love to engage fully with this again.

visr · 2019-08-06T12:23:14Z

@rafaqz awesome that you're exploring this space. Haven't had time to look into everything, but just briefly checked out DimensionalData. Have you considered using/working with the folks at https://github.com/invenia/NamedDims.jl and https://github.com/invenia/IndexedDims.jl? They have similar goals, and aim to be the next AxisArray (and are inspired by xarray).

rafaqz · 2019-08-06T12:59:54Z

@juliohm non-regular grids are doable. Dims could hold the matrices and handle dispatch so it would work on regular data types:

dims = Lat(latmatrix), Lon(lonmatrix)
a = GeoArray(data, dims)
select(a, Lat((1, 7)), Lon((1, 7)))

There would just need a few methods that dispatch on AbstractDimension{Matrix} to retrieve the indices with a splat.

But the dims with indexible integer lat/long also isn't the usual case... Maybe a wrapper type on the matrix could make thar clearer, Lat(Curvilinear(latmatrix)).

Otherwise the main change this approach would mean for GeoStatsBase is either adding/loosening a dims field to structs and using it to store dimension order and bounds, or just generating dims in the dims() method using your current fields and fixed dimension order. Also adding D for AbstractSpatialObject{T,N,D} <: AbstractGeoDataset{N,D} for dispatch.

The other thing is a rebuild() method so that dims can be updated after subsetting with select() or reduction of dimensions. That's how the above example still plots through the reduction and crop.

I'm not sure if you will see these changes as worthwhile! but separating out dims seems pretty powerful for interop to me - number and order of dimensions, grid spacing, coordinates and type (like DateTime etc), bounds, units, affine transforms, and possibly other metadata are all available from one method/field without changing anything else.

You can also just add a dims() method (without inheritance) and half of DimensionalData still works, but it would mean more code in the end.

@mkborregaard if I was travelling in Ecuador I wouldn't be engaging either :) But try the example when you get back the plotting is pretty smooth.

rafaqz · 2019-08-06T13:59:10Z

@visr thanks for pointing that out. I hadn't seen IndexedDims yet, but NamedDims has some structural issues, similar to AxisArrays and the reason I didn't use them. They are locked into using the concrete type NamedDimsArray. Using a wrapper means we cant dispatch on the wrapped type for other purposes, which limits ecosystem potential. I'm also not sure how I would use them with arbitrary indexing ranges, affine maps or convolution matrices described above.

In contrast any array type can use DiminsionalData,jl indexing by adding only one method - dims(::SomeType).

AbstractDimension is the main component used for dispatch, rather than the array type. There are only 50 line dedicated to AbstractDimensionalArray, mostly basic array interface, and making similar() etc maintain the array type. And this is the actual code for the concrete array type that will update its dims through reduce functions and use plot recipes:

struct DimensionalArray{T,N,D,R,A<:AbstractArray{T,N}} <: AbstractDimensionalArray{T,N,D}
    data::A
    dims::D
    refdims::R
end
DimensionalArray(a::AbstractArray{T,N}, dims; refdims=()) where {T,N} = 
    DimensionalArray(a, checkdims(a, dims), refdims)

# Array interface (AbstractDimensionalArray takes care of everything else)
Base.parent(a::DimensionalArray) = a.data

# DimensionalArray interface
rebuild(a::DimensionalArray, data, dims, refdims) = DimensionalArray(data, dims, refdims)

dims(a::DimensionalArray) = a.dims
refdims(a::DimensionalArray) = a.refdims

So adding it to other packages is pretty easy.

rafaqz · 2019-08-06T14:21:42Z

Also @visr I really recommend running the examples script! data manipulation and plotting is so easy.

visr · 2019-08-06T14:39:59Z

They are locked into using the concrete type NamedDimsArray. Using a wrapper means we cant dispatch on the wrapped type for other purposes, which limits ecosystem potential. I'm also not sure how I would use them with arbitrary indexing ranges, affine maps or convolution matrices described above.

Maybe it's good to bring this up with the devs, you have some good ideas and clear usecases that they are looking for. Just trying to save you maintenance work in the long end ;) It seems that most of your code is not geo specific anyway, which is a good sign. If I understand correctly, IndexedDims.jl wants to allow different kind of indexes by using AcceleratedArrays (great package). I guess making an AffineIndex using that package should be relatively straightforward.

rafaqz · 2019-08-06T17:21:48Z

I'll let them know what I've been working on for sure. But IndexDims is all concrete again while DimensionalData is an abstraction designed to be bolted on to things... So I can write an easy pull request to NCDatasets (etc) to add dimensional indexing, data manipulation and plotting without getting in the way. It will be less work because the abstraction fits better.

rafaqz · 2019-08-08T01:45:51Z

This works for NCDatasets with a very small pull reqest. Just need to make it save back to netcdf and write some tests (and probably handle a lot of other cases I haven't considered yet)
JuliaGeo/NCDatasets.jl#35

It should be easy to do a similar thing for other packages.

meggart · 2019-09-18T08:21:28Z

Cross-posting from the Juliageo/meta thread. As there is some consensus that we should have a teleconference about the future of Geo Raster types I started a gdocs document and would invite everyone who would be interested in joining such a teleconf to add their availability to the list so that we can try to fix a date and time to discuss the future common interface for Geo Raster data in Julia.

https://docs.google.com/document/d/1ccaSltPDb5n-bLNgWyrotsWXz2n7ckMggOIPejy_GvQ/edit?usp=sharing

Balinus · 2019-12-19T19:35:40Z

It might be good to add some kind of abstract type hierarchy to deal with additional spatial and temporal dimensions, clarifying what N (in AbstractGeoArray{N,T}) means for any particular dataset in a standard way. Handling 3d and 4d datasets. I'm mostly using 4d microclimates so this is particularly important to me, sometimes 5d...

ClimateTools.jl deals with 3 or 4 dimensions like this, so N is 3 or 4. Time is always the last dimension, with the first two dimensions being long lat, and the third dimension is elevation for 4d:
  if ndims(data) == 3
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:time}(timeV))
  elseif ndims(data) == 4 # this imply a 3D field (height component)
    # Get level vector
    plev = ds["plev"][:]#NetCDF.ncread(file, "plev")
    # Convert data to AxisArray
    dataOut = AxisArray(data, Axis{Symbol(lonname)}(lon_raw), Axis{Symbol(latname)}(lat_raw), Axis{:plev}(plev), Axis{:time}(timeV))
  else
    throw(error("load takes only 3D and 4D variables for the moment"))
end

It might be useful to add here that the dimensions are standardized by the Climate Forecasts conventions. Thus, having the time dimension as the trailing dimension is set in this convention (FORTRAN in this case... the C dimensions ordering implementation the reverse http://www.bic.mni.mcgill.ca/users/sean/Docs/netcdf/guide.txn_22.html). Now, the interesting part that I haven't looked is that NetCDF.jl seems to be a wrapper around the C library but it returns the FORTRAN ordering. Maybe because Julia memory mapping is akind to FORTRAN?

Anyway, just wanted to point that in climate and forecasting domain, there is some conventions, so it is easier to develop (read "harcode") some decisions.

visr · 2019-12-19T21:12:01Z

Now, the interesting part that I haven't looked is that NetCDF.jl seems to be a wrapper around the C library but it returns the FORTRAN ordering. Maybe because Julia memory mapping is akind to FORTRAN?

Yeah that's right, see also JuliaIO/Zarr.jl#1 (comment) and yeesian/ArchGDAL.jl#37 for some more discussion about this topic. Nice link about the Fortran dimension ordering, that's what we are implicitly doing.

rafaqz · 2019-12-19T21:42:46Z

But the conventions break anyway once you slice the array. Making it arbitrary is better as you get to keep dim labels after slicing. GeoData.jl allready does all this seemlessly.

mkborregaard · 2019-12-19T22:27:15Z

@rafaqz do you see scope for folding more of this package into GeoData?

rafaqz · 2019-12-19T23:31:19Z

Grd load/save works although slightly differently, and I have half-finished ascii. Plotting works well, also copied from here originally. I just wrote an aggregate routine. (you can aggregate any dims/shapes, and use point sampling or mean/sum etc methods, it also aggregates the coordinate index).

Mostly what is missing is the coordinate vector methods like extract, and something like isinside as we were discussing on slack. I've been hesitant to copy those over from here as it would be good to do that integrated with GeometryBase/GeoInterface and I don't really understand where that is going yet.

Edit: but yes that is the plan

Balinus · 2019-12-20T16:34:08Z

Now, the interesting part that I haven't looked is that NetCDF.jl seems to be a wrapper around the C library but it returns the FORTRAN ordering. Maybe because Julia memory mapping is akind to FORTRAN?

Yeah that's right, see also meggart/Zarr.jl#1 (comment) and yeesian/ArchGDAL.jl#37 for some more discussion about this topic. Nice link about the Fortran dimension ordering, that's what we are implicitly doing.

That was my suspicion. I do admit that back then, I spent quite a lot of time trying to understand why I was getting F ordering when NetCDF was calling C functions! 😄

rafaqz mentioned this issue Jun 13, 2019

Abstract date handling to another package? JuliaGeo/NCDatasets.jl#34

Closed

rafaqz mentioned this issue Sep 17, 2019

Common interface for raster data/DimensionalData.jl JuliaGeo/meta#6

Open

Usability of this package and the JuliaGeo ecosystem #1

Usability of this package and the JuliaGeo ecosystem #1

Comments

mkborregaard commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

yeesian commented Jan 29, 2019

mkborregaard commented Jan 29, 2019 • edited Loading

yeesian commented Jan 29, 2019 • edited Loading

yeesian commented Jan 29, 2019

mkborregaard commented Jan 29, 2019 • edited Loading

mkborregaard commented Jan 29, 2019 • edited Loading

yeesian commented Jan 29, 2019

mkborregaard commented Jan 29, 2019

yeesian commented Jan 29, 2019 • edited Loading

rafaqz commented May 27, 2019 • edited Loading

mkborregaard commented May 27, 2019

rafaqz commented May 27, 2019

mkborregaard commented May 27, 2019 • edited Loading

rafaqz commented May 27, 2019

mkborregaard commented May 27, 2019

juliohm commented May 27, 2019 via email

rafaqz commented May 27, 2019 • edited Loading

mkborregaard commented May 27, 2019

mkborregaard commented May 27, 2019

rafaqz commented May 27, 2019 • edited Loading

juliohm commented May 27, 2019

rafaqz commented May 27, 2019

mkborregaard commented May 27, 2019

rafaqz commented May 27, 2019

rafaqz commented May 27, 2019

rafaqz commented May 27, 2019 • edited Loading

mkborregaard commented May 27, 2019 • edited Loading

rafaqz commented May 27, 2019 • edited Loading

mkborregaard commented May 27, 2019

juliohm commented May 27, 2019 via email

tpoisot commented May 27, 2019

rafaqz commented May 28, 2019

rafaqz commented May 28, 2019 • edited Loading

mkborregaard commented May 28, 2019

rafaqz commented Jun 7, 2019

juliohm commented Jun 16, 2019

rafaqz commented Aug 6, 2019 • edited Loading

juliohm commented Aug 6, 2019

mkborregaard commented Aug 6, 2019

visr commented Aug 6, 2019

rafaqz commented Aug 6, 2019 • edited Loading

rafaqz commented Aug 6, 2019 • edited Loading

rafaqz commented Aug 6, 2019 • edited Loading

visr commented Aug 6, 2019

rafaqz commented Aug 6, 2019 • edited Loading

rafaqz commented Aug 8, 2019 • edited Loading

meggart commented Sep 18, 2019

Balinus commented Dec 19, 2019 • edited Loading

visr commented Dec 19, 2019

rafaqz commented Dec 19, 2019

mkborregaard commented Dec 19, 2019

rafaqz commented Dec 19, 2019 • edited Loading

Balinus commented Dec 20, 2019

mkborregaard commented Jan 29, 2019 •

edited

Loading

yeesian commented Jan 29, 2019 •

edited

Loading

mkborregaard commented Jan 29, 2019 •

edited

Loading

mkborregaard commented Jan 29, 2019 •

edited

Loading

yeesian commented Jan 29, 2019 •

edited

Loading

rafaqz commented May 27, 2019 •

edited

Loading

mkborregaard commented May 27, 2019 •

edited

Loading

rafaqz commented May 27, 2019 •

edited

Loading

rafaqz commented May 27, 2019 •

edited

Loading

rafaqz commented May 27, 2019 •

edited

Loading

mkborregaard commented May 27, 2019 •

edited

Loading

rafaqz commented May 27, 2019 •

edited

Loading

rafaqz commented May 28, 2019 •

edited

Loading

rafaqz commented Aug 6, 2019 •

edited

Loading

rafaqz commented Aug 6, 2019 •

edited

Loading

rafaqz commented Aug 6, 2019 •

edited

Loading

rafaqz commented Aug 6, 2019 •

edited

Loading

rafaqz commented Aug 6, 2019 •

edited

Loading

rafaqz commented Aug 8, 2019 •

edited

Loading

Balinus commented Dec 19, 2019 •

edited

Loading

rafaqz commented Dec 19, 2019 •

edited

Loading