Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#NUM! values in amount and loc columns in XLSX are imported as the number 36 by the Activity Browser. #514

Closed
mixib opened this issue Feb 1, 2021 · 4 comments
Labels
brightway Issues related to interaction with brightway2 code bug Issues/PRs related to bugs data imports Issues related to importing data

Comments

@mixib
Copy link

mixib commented Feb 1, 2021

Consider an XLSX file in the brightway import format, with the value #NUM! for some of the exchanges amount and loc columns.

bug01_XLinput

Following the way in which amounts are calculated based on formulas containing parameters on import, I expected that the amount would be calculated as a mean value based on the uncertainty data. Instead it shows as the number 36 on the Activity Browser. Only when modifying the uncertainty fields exchange by exchange there is a prompt to update the amount to match with the mean.

bug01_prompt

According to brightway's documentation, the uncertainty type fields are defined in the package stats_arrays. When using a triangular distribution (ID-5), it is expected that when the mode is not provided, the loc value becomes the average between the minimum and the maximum values. Then, I expected #NUM! in the loc field to be replaced by the calculated loc. Instead it shows as the number 36 on the Activity Browser.

bug01_loc36

Is this normal behaviour? If so, is it documented that the LCA results tab reports on calculations based on the amount and not on the uncertainty fields?

@dgdekoning
Copy link
Contributor

This... is pretty bizarre. Can you forward the information about how distributions make up for missing data?

So this #NUM! field is always translated to 36 during the import? That shouldn't be normal behaviour.
But otherwise it kind of is normal behaviour that the AB (and I think brightway) make use of the amount field by default and only look into the uncertainty fields when performing Monte Carlo analysis or similar.

@dgdekoning dgdekoning added brightway Issues related to interaction with brightway2 code bug Issues/PRs related to bugs labels Feb 2, 2021
@mixib
Copy link
Author

mixib commented Feb 4, 2021

Hi @dgdekoning, I edited my original reply since I looked further into it and found more information.

Static vs stochastic LCA

But otherwise it kind of is normal behaviour that the AB (and I think brightway) make use of the amount field by default and only look into the uncertainty fields when performing Monte Carlo analysis or similar.

It makes sense since the default calculation is static. Indeed, the brightway documentation on the stochastic LCA mentions that the random numbers created according to the uncertainty fields overwrite the amount field.

Sub-issue 1: The bw/AB importer interprets XL error codes as numbers

So this #NUM! field is always translated to 36 during the import?

Yes. In fact, every error value in an XLS/XLSX file translates to a specific number. It is irrelevant if the error value is under amount or under an uncertainty field. Here is some information on the xlrd error codes matching the error values. However, it is very strange that bw/AB read them as numbers when xlrd uses the error tag. Parsing all of them to np.nan would be more appropriate, right?

When the uncertainty fields are used, the bw/AB exporter may fill some fields with #NUM values. The problem comes when the XLSX file is imported back to bw/AB because the #NUM values are parsed as the number corresponding to the xlrd error codes instead of an empty field.

This is how it looks like at the xlrd level:

bug01_xlrd

import xlrd
book = xlrd.open_workbook(myfile.xlsx)
sh = book.sheet_by_index(0)

for rx in range(sh.nrows):
    print(sh.row(rx))
Out:
[text:"Content is '=#NUM'", error:36]
[text:"Content is '=#N/A'", error:42]
[text:"Content is '=#NULL!'", error:0]
[text:"Content is '=1/0'", error:7]
[text:"Content is '=#REF!'", error:23]
[text:"Content is '=#NAME?'", error:29]
[text:"Content is '=#VALUE!'", error:15]
[text:"Content is '=PI()'", number:3.141592653589793]
[text:"Content is '42'", number:42.0]

Sub-issue 2: Outdated documentation of bw's stats_arrays

Can you forward the information about how distributions make up for missing data?

It seems that the documentation of the stats_arrays package is not up to date. The table under 'Mapping parameter array columns to uncertainty distributions' provides an overview of the supposedly required and optional parameters. If the optional parameters aren't specified, then these parameters are supposed to be calculated by default. See an excerpt of the table below.

bug01_defaultCalculations

However, testing with an example, if no loc value is specified for the triangular uncertainty or if it is np.nan, then the random number generator yields nan. And if there is no minimum value specified for the uniform or triangular uncertainty, or if it is np.nan, then there is an ImproperBoundsError message associated to these lines:

https://github.com/brightway-lca/stats_arrays/blob/68519a26e6ffe5ed7cefbaadbcdcc048efe75994/stats_arrays/distributions/base.py#L301-L306

This is how it looks like at the stats_arrays level:

from stats_arrays import *
import numpy as np

my_variable = UncertaintyBase.from_dicts(
    {'uncertainty_type': TriangularUncertainty.id,
     'loc': np.nan,#valid input, unexpected output
     'minimum': 15.0, #valid input
     'maximum': 25.0},
    {'uncertainty_type': DiscreteUniform.id,
     'minimum': np.nan, #valid input, expected output
     'maximum': 25.0},
    {'uncertainty_type': UniformUncertainty.id,
     # 'minimum': np.nan, #invalid input
     'minimum': 15, # valid input
     'maximum': 25.0}
)
my_rng = MCRandomNumberGenerator(my_variable)
print(list(zip(my_rng, range(10))))
Out:
[(array([        nan,  7.        , 17.37254671]), 0),
 (array([        nan, 18.        , 22.36840769]), 1),
 (array([        nan, 12.        , 22.60361591]), 2),
 (array([        nan,  7.        , 22.05667132]), 3),
 (array([        nan,  3.        , 16.50150731]), 4),
 (array([        nan, 23.        , 17.09835987]), 5),
 (array([        nan,  1.        , 20.91978931]), 6),
 (array([        nan, 19.        , 18.35423132]), 7),
 (array([        nan, 21.        , 21.61874973]), 8),
 (array([        nan, 22.        , 19.60915629]), 9)]

I think that the triangular and uniform uncertainty are missing the files to deal with the variables marked as optional. For instance, there is a file for the discrete uniform distribution additional to the base.py file. In the additional file, the lack of the optional minimum value is addressed in the next lines:

https://github.com/brightway-lca/stats_arrays/blob/68519a26e6ffe5ed7cefbaadbcdcc048efe75994/stats_arrays/distributions/discrete_uniform.py#L29-L33

@Zoophobus
Copy link
Contributor

Hi Mixib,

so I've been going through this and am finding that brightway (version 2) is dealing with these errors slightly differently now. When I've introduced them what occurs is that brightway excludes the 'amount' field, which is personally not an ideal situation, however it is safe and does allow us to throw some warnings/errors.

What I would propose to do myself is to introduce an error message that points to the activity in the database with an exchange that is absent (so should probably contain an error)

@Zoophobus Zoophobus added the data imports Issues related to importing data label Feb 9, 2023
@marc-vdm
Copy link
Member

Closing as resolved in #921

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brightway Issues related to interaction with brightway2 code bug Issues/PRs related to bugs data imports Issues related to importing data
Projects
None yet
Development

No branches or pull requests

5 participants