Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when downloading hdx population density maps #19

Open
wavingtowaves opened this issue Dec 30, 2022 · 7 comments
Open

Error when downloading hdx population density maps #19

wavingtowaves opened this issue Dec 30, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@wavingtowaves
Copy link

Hi 👋🏻 hoping for some help on an error I'm running into when using urbanpy

Using code from the urbanpy_workshop.ipynb when I adapt the line below for Brazil:

pop_brazil = up.download.hdx_fb_population('brazil', 'full')

I get the error

ValueError                                Traceback (most recent call last)
Cell In [30], line 1
----> 1 pop_brazil = up.download.hdx_fb_population('brazil', 'full')

File /opt/homebrew/lib/python3.10/site-packages/urbanpy/download/download.py:182, in hdx_fb_population(country, map_type)
    180     return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
    181 else:
--> 182     return pd.read_csv(dataset_dict[country][map_type])

File /opt/homebrew/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File /opt/homebrew/lib/python3.10/site-packages/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    311 if len(args) > num_allow_args:
    312     warnings.warn(
    313         msg.format(arguments=arguments),
    314         FutureWarning,
    315         stacklevel=find_stack_level(inspect.currentframe()),
    316     )
--> 317 return func(*args, **kwargs)

File /opt/homebrew/lib/python3.10/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
...
-> 1744     raise ValueError(msg)
   1746 try:
   1747     return mapping[engine](f, **self.options)

ValueError: Invalid file path or buffer object type: <class 'list'>

When is substitute children or youth instead of full the code runs fine. It's hard to tell from the metadata for this hdx data what the new keyword might be for "full population data". Wondering if you have ideas for how I can get the full population data.

Note this also means the line below from the tutorial does not work:

pop_arg = up.download.hdx_fb_population('argentina', 'full')

But it gives a 404 error.

@bitsandbricks
Copy link
Contributor

bitsandbricks commented Jan 4, 2023

Hi there!

It seems that data has been moved around at the source. Since the Humanitarian Data Exchange does not provide an API to access datasets, we have to find the URLs and hard code them. When data is updated or moved, those links no longer work... oh well.

You can still get the data with up.download.hdx_dataset() (there's an example in the same tutorial, look for "To access these data, we will use another function of UrbanPy by performing a manual search in the online repository")

Right now, population estimates for Argentina are here, and full population data is at "arg_general_2020_csv.zip", linked to https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip.

So you should be able to download it using:

up.download.hdx_dataset('https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip')

Thanks for the heads up!

@Claudio9701
Copy link
Collaborator

Hi @robcrystalornelas, thanks for your issue.

I'm currently working on updating this function in the next version of urbanpy to use on the backend the HDX API so it automatically update the data links.

This problem is caused because some of the population data links we manually set are no longer working. As @bitsandbricks (thanks!) mentioned you can use the up.download.hdx_dataset function to download any csv HDX dataset you need for the moment.

@wavingtowaves
Copy link
Author

@bitsandbricks and @Claudio9701 👋🏻 Thanks so much for the additional info. @bitsandbricks that code actually doesn't work for me within my own jupyter notebook. I also tried it with brazil data. @Claudio9701 does

up.download.hdx_dataset('https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip')

work for you? I get another 404 error.

@Claudio9701
Copy link
Collaborator

Hello @robcrystalornelas

You're are totally right!. There is a problem with the provided line of code I didn't catch at first. The hdx_dataset function receives the dataset id. For example, we go to the Argentina population dataset on HDX and right click and copy the specific dataset link we want (See figure below).

Argentina: High Resolution Population Density Maps + Demographic Estimates

WhatsApp Image 2023-01-17 at 10 51 21

The data link for the overall population density dataset is this one:

To run our function we would only copy what is after "https://data.humdata.org/dataset/" to the resource argument. This will end as:

arg_pop = up.download.hdx_dataset(resource="6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip")

In this notebook you can test the solution and generate a population density map like the one bellow:

image

To make it work with brasil you have to download the parts of the country that contains the city you want to analyze. Since it is a really big country its divided in 4 regions.

Thanks you so much for being one of the early adopters of urbanpy! I would really ove to have a brief online meeting when you are free!

@wavingtowaves
Copy link
Author

Excellent, thanks @Claudio9701!

I updated the code as you suggested and it worked great ✨

And yes, happy to schedule a time to chat about my work with urbanpy so far.

@biodatasciencearg
Copy link

The problem is just line 179 of file: urbanpy/download/download.py:

`#Brazil is split into 4 maps

if isinstance(type(dataset_dict[country][map_type]), list):
    return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
else:
    return pd.read_csv(dataset_dict[country][map_type])`

Must be:
`#Brazil is split into 4 maps

if isinstance(dataset_dict[country][map_type], list):
    return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
else:
    return pd.read_csv(dataset_dict[country][map_type])`

@Claudio9701
Copy link
Collaborator

@biodatasciencearg thanks for your comment! 🙏🏽

In master branch this problem is now addressed as you suggested.

... 
if isinstance(ids, list) and len(ids) > 1:
...

I'm working to test all the minor fixes and new functions so it can be published to pypi.

@Claudio9701 Claudio9701 added the bug Something isn't working label Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants