Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

Open
Tyler-lc opened this issue Dec 6, 2024 · 8 comments · May be fixed by #900
Open

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

Tyler-lc opened this issue Dec 6, 2024 · 8 comments · May be fixed by #900

Comments

@Tyler-lc
Copy link

Tyler-lc commented Dec 6, 2024

What happened?

In the westeros_renewable_resource and westeros_fossil_resource, when the tutorial adds parameters regarding wind_ppl and coal_ppl it uses an unfiltered vintage_and_active_years().
This means that the coal_ppl and wind_ppl will receive an input from all the possible vintage and active year combinations instead of only those that regard coal_ppl and wind_ppl

Code Sample

EDITED:
I will use westeros_renewable_resource.ipynb as an example, but this happened also in the westeros_fossil_resource.ipynb

When we are defining the input of the technology wind_ppl we use the following:

year_df = scen.vintage_and_active_years()
vintage_years, act_years = year_df["year_vtg"], year_df["year_act"]
model_horizon = scen.set("year")
country = "Westeros"

Then we go on using the vintage_years, act_years as an input for wind_ppl:

df = pd.DataFrame(
    {
        "node_loc": country,
        "technology": "wind_ppl",
        "year_vtg": vintage_years,
        "year_act": act_years,
        "mode": "standard",
        "node_origin": country,
        "commodity": "wind_onshore",
        "level": "renewable",
        "time": "year",
        "time_origin": "year",
        "value": 1,
        "unit": "%",
    }
)
scen.add_par("input", df)

This will cause the input to have the format

year_vtg year_act
0 690 700
1 690 710
2 690 720
3 700 700
4 700 710
5 700 720
6 710 710
7 710 720
8 720 720

On the other hand when we set up the outputs in westeros_baseline we used the following:
year_df = scenario.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)
which returns this table instead:

year_vtg year_act
0 690 700
1 700 700
2 700 710
3 710 710
4 710 720
5 720 720

So this creates a mismatch between the input and the output of the technology wind_ppl (but also for coal_ppl in the westeros_fossil_resource).

What did you expect to happen?

I am not sure if this creates issues with the input for the different *_ppl since the input vintages do not exist.

Versions

>>message-ix show-versions
'gams' is not recognized as an internal or external command,
operable program or batch file.

ixmp:        3.9.0
message_ix:  0.0.0
message_ix_models: None
message_data: None

click:       8.1.7
dask:        2024.10.0
genno:       installed
graphviz:    None
jpype:       1.5.0
… JVM path:  C:\Users\casamassima\AppData\Local\anaconda3\envs\message_39\Library\lib\jvm\bin\server\jvm.dll
openpyxl:    3.1.5
pandas:      2.2.2
pint:        0.24.3
xarray:      2024.10.0
yaml:        6.0.2

iam_units:   installed
jupyter:     installed
matplotlib:  3.9.2
plotnine:    0.14.0
pyam:        2.3.0

GAMS:        'gams' executable not in PATH

python:      3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:47:54) [MSC v.1941 64 bit (AMD64)]
python-bits: 64
OS:          Windows
OS-release:  11
machine:     AMD64
processor:   Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
byteorder:   little
LC_ALL:      None
LANG:        en_US.UTF-8
LOCALE:      ('English_Austria', '1252')

Additional Context

No response

@glatterf42
Copy link
Member

Hi @Tyler-lc, thanks for opening this issue :)
I'm sorry, but I'm not sure I understand your question. In the westeros_fossil_resource tutorial, we use vintage_and_active_years() only for the input parameter, which in turn determines how much "coal" there is to run the coal_ppl. The parameter definition contains a lot of information, among others the name of the technology, its location, operating mode, and year of activity. This means that the input of coal_ppl and wind_ppl cannot be confused with one another by the model and across time. As far as I know, we can visualize the parameter somewhat like this:

technology year_vintage year_active commodity value unit
coal_ppl 690 710 coal 1 %
coal_ppl 700 710 coal 1 %
wind_ppl 700 710 fast winds 1 -

Of course, the values and units may differ, and I agree that 1 | % is confusing: I think it means 100%, but I'm not too sure.

The meaning of vintage and active years has been well explained before, see e.g. here. It's fine that technologies built in vintage years still receive input as long as they remain active.

Please let me know if any of this helped you. If it doesn't, please explain your question again as precisely as you can. With all this terminology, one needs to be very precise with wording.

In the meantime, I notice that your output from message-ix show-versions says that gams cannot be found. Without gams, you cannot run the tutorials, or did you find a way around it?
Also, your message_ix version says 0.0.0, which does not seem quite right and should be 3.9.0. Just before releasing v3.9.0, we fixed a number of issues in the tutorials, some related to vintage_and_active_years(). So please make sure you use that version or a later one installed from GitHub :)

@Tyler-lc
Copy link
Author

Tyler-lc commented Dec 9, 2024

Hi @glatterf42,
the problem arises from the fact that when we define the output of the technologies in the first tutorial, we are filtering the vintage_and_active_years() using the tuple (country,tech) . So for example when we call the method we use:
scen.vintage_active_years((country, "wind_ppl")). This will return a DataFrame with only the relevant vintage and active years combinations. Hence we end up having a DF that looks like this:

year_vtg year_act
690 700
700 700
700 710
710 710
710 720
720 720

The problem is that in the later exercises, we only call scen.vintage_and_active_years(). Which returns this table instead:

year_vtg year_act
0 690 700
1 690 710
2 690 720
3 700 700
4 700 710
5 700 720
6 710 710
7 710 720
8 720 720

So, in the input we have that for each vintage year, we have 3 active years. The output, on the other hand, will have 2 years for each vintage. We end up having a mismatch between the vintage/active years combination between input and output. I think, it would be more correct to call the method using the correct filter. But I am not sure whether this input/output mismatch has an impact on the results.

Regarding the message_ix version: for some reason this is the reporting when using messeage-ix show-versions in the prompt. If I manually check the list of installed packages in the anaconda prompt (using conda list), it tells me that message-ix version is the 3.9. And regarding GAMS, I have gams installed with a license. I did not add the PATH though, so that might be why it is reporting that I do not have it. But it otherwise works fine.

I hope this clarified the issue.
And also let me know if you think there is something wrong in my installation.

Thank you very much!

Regards,
Luca Casamassima.

@Tyler-lc Tyler-lc changed the title potential wrong filtering for vintage_and_active_years() potential wrong usage vintage_and_active_years() in message_ix tutorial Dec 12, 2024
@glatterf42
Copy link
Member

Hi @Tyler-lc, thanks for specifying the issues here! I have just run the westeros_renewable_resource tutorial with year_df = scen.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)) and the result changes quite substantially: the OBJ level changes from 445637.21875 to 341937.4375 and the model uses much less of the wind_ppl.

Your explanation makes sense to me, I agree that it makes more sense to make the year_df specific to the technology at hand. The two tutorials that you reported here have been written by @OFR-IIASA, @behnam-zakeri, and @khaeru originally, it seems. So I would make a PR that changes the definition of year_df in these tutorials and hope that either of them stops the PR should this not be the correct way after all.

Please report further occurrences of this issue as you encounter them!

@khaeru
Copy link
Member

khaeru commented Dec 12, 2024

I think the best way to understand the implications of mismatched parameter data is to look at the mathematical formulation in the documentation and/or the GAMS code. From those we can draw simple conclusions.

For any given $(t, y^V, y^A)$:

  • if $y^A - y^V > \text{technical-lifetime}_{t,y^A}$ then the technology will not operate. In other words, values in input or output are irrelevant if the period (‘year’) of activity is too far from the period of vintage. For instance, a technology constructed in $y^V = 690$ with a life time of 10 years cannot be operative in $y^A = 720$. Thus input and output values with keys including $(y^V, y^A) = (690, 720)$ would have no effect, and in this case it does not affect the model solution if they are mismatched.
  • if there is an output value but no input value, the technology can be active and produce its output $(c, l)$, yet without consuming anything.
    This means there is no cost or limitation associated with supply of any input commodities for that technology; only its inherent costs (inv_cost, fix_cost, and var_cost) and constraints.
    The solver usually finds it optimal to use more or lots of such technologies, especially in relation to other technologies that output the same $(c, l)$ but do have input commodity costs/constraints.
  • if there is an input value but no output value, the technology can be active, but it does not produce anything that could be an input to other technologies or directly satisfy demand.
    Aside from constraints that force the technology to be active, the solver will usually find it optimal to not use such $(t, y^V, y^A)$ at all.

In the case in the issue description, take the example of $y^V = 700$. For this vintage:

  • …there are input values for $y^A \in {700, 710, 720}$. So long as the technical_lifetime allows, the technology can be active in any of these periods.
  • …there are output values for $y^A \in {700, 710}$. So it can only produce usable output in these periods.
    If instead there were output values for $y^A \in {700, 710, 720}$, then in 720 the model would have more alternatives for that $(c, l)$.

Another thing to consider is that, if there is an inv_cost to construct a technology in $y^V = 700$, that cost can be 'levelized' more if the technology is used across more periods. So missing values in input or output that prevent the technology being used in 1 or more periods can make the technology less attractive to the solver.

NB I have not re-read the tutorial or its Git history to discover which is intended, but generally I agree it is less confusing as a learning aid or expression of the model behaviour and modeling best practices to use the same set for year_vtg and year_act for each technology.

@glatterf42
Copy link
Member

Thanks for jumping in here! I also thought that changing year_df to the technology-specific version should not affect the model if the rest of the model was well-defined, but changing year_df did change the outcome.
The only place that year_df is used in the renewable_resource tutorial is the definition of the input parameter for wind_ppl, so I'm assuming the change arises from a newly-introduced mismatch between input and output for wind_ppl. I'll look into the definition of the output and see if we can adjust that accordingly and reach the same result again.

@glatterf42
Copy link
Member

The output for wind_ppl is cloned, in the end, from the westeros_baseline tutorial, where it is defined by this:

year_df = scenario.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)
vintage_years, act_years = year_df["year_vtg"], year_df["year_act"]
base.update(
    year_vtg=vintage_years,
    year_act=act_years,
    technology="wind_ppl",
    commodity="electricity",
    level="secondary",
    value=1.0,
)
wind_out = make_df("output", **base, node_dest=country, time_dest="year")
scenario.add_par("output", wind_out)

Which seems to suggest that for consistency, we should use @Tyler-lc's suggestion in renewable_resource. Until now, we have defined input for more year-combinations than output, which should mean, if I understand @khaeru correctly, that the solver shied away from this technology for the extra combinations. But for some reason, the model solution still changes when correcting year_df, indicating that it was used somehow.

@Tyler-lc
Copy link
Author

From what @khaeru said, it makes sense that the results would change, because there are inv_cost related to the technologies wind_ppl and coal_ppl, right?

@glatterf42
Copy link
Member

Ah, yes, per the last point: when using the full year_df, we supply additional output values, which the model is using to 'levelize' the inv_cost associated with the wind_ppl. When removing these extra values, the wind_ppl becomes less attractive as described.
I'll draft a PR with the fix :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants