potential wrong usage vintage_and_active_years() in message_ix tutorial #897

Tyler-lc · 2024-12-06T12:44:04Z

What happened?

In the westeros_renewable_resource and westeros_fossil_resource, when the tutorial adds parameters regarding wind_ppl and coal_ppl it uses an unfiltered vintage_and_active_years().
This means that the coal_ppl and wind_ppl will receive an input from all the possible vintage and active year combinations instead of only those that regard coal_ppl and wind_ppl

Code Sample

EDITED:
I will use westeros_renewable_resource.ipynb as an example, but this happened also in the westeros_fossil_resource.ipynb

When we are defining the input of the technology wind_ppl we use the following:

year_df = scen.vintage_and_active_years()
vintage_years, act_years = year_df["year_vtg"], year_df["year_act"]
model_horizon = scen.set("year")
country = "Westeros"

Then we go on using the vintage_years, act_years as an input for wind_ppl:

df = pd.DataFrame(
    {
        "node_loc": country,
        "technology": "wind_ppl",
        "year_vtg": vintage_years,
        "year_act": act_years,
        "mode": "standard",
        "node_origin": country,
        "commodity": "wind_onshore",
        "level": "renewable",
        "time": "year",
        "time_origin": "year",
        "value": 1,
        "unit": "%",
    }
)
scen.add_par("input", df)

This will cause the input to have the format

	year_vtg	year_act
0	690	700
1	690	710
2	690	720
3	700	700
4	700	710
5	700	720
6	710	710
7	710	720
8	720	720

On the other hand when we set up the outputs in westeros_baseline we used the following:
year_df = scenario.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)
which returns this table instead:

	year_vtg	year_act
0	690	700
1	700	700
2	700	710
3	710	710
4	710	720
5	720	720

So this creates a mismatch between the input and the output of the technology wind_ppl (but also for coal_ppl in the westeros_fossil_resource).

What did you expect to happen?

I am not sure if this creates issues with the input for the different *_ppl since the input vintages do not exist.

Versions

>>message-ix show-versions
'gams' is not recognized as an internal or external command,
operable program or batch file.

ixmp:        3.9.0
message_ix:  0.0.0
message_ix_models: None
message_data: None

click:       8.1.7
dask:        2024.10.0
genno:       installed
graphviz:    None
jpype:       1.5.0
… JVM path:  C:\Users\casamassima\AppData\Local\anaconda3\envs\message_39\Library\lib\jvm\bin\server\jvm.dll
openpyxl:    3.1.5
pandas:      2.2.2
pint:        0.24.3
xarray:      2024.10.0
yaml:        6.0.2

iam_units:   installed
jupyter:     installed
matplotlib:  3.9.2
plotnine:    0.14.0
pyam:        2.3.0

GAMS:        'gams' executable not in PATH

python:      3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:47:54) [MSC v.1941 64 bit (AMD64)]
python-bits: 64
OS:          Windows
OS-release:  11
machine:     AMD64
processor:   Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
byteorder:   little
LC_ALL:      None
LANG:        en_US.UTF-8
LOCALE:      ('English_Austria', '1252')

Additional Context

No response

The text was updated successfully, but these errors were encountered:

glatterf42 · 2024-12-06T14:06:25Z

Hi @Tyler-lc, thanks for opening this issue :)
I'm sorry, but I'm not sure I understand your question. In the westeros_fossil_resource tutorial, we use vintage_and_active_years() only for the input parameter, which in turn determines how much "coal" there is to run the coal_ppl. The parameter definition contains a lot of information, among others the name of the technology, its location, operating mode, and year of activity. This means that the input of coal_ppl and wind_ppl cannot be confused with one another by the model and across time. As far as I know, we can visualize the parameter somewhat like this:

technology	year_vintage	year_active	commodity	value	unit
coal_ppl	690	710	coal	1	%
coal_ppl	700	710	coal	1	%
wind_ppl	700	710	fast winds	1	-

Of course, the values and units may differ, and I agree that 1 | % is confusing: I think it means 100%, but I'm not too sure.

The meaning of vintage and active years has been well explained before, see e.g. here. It's fine that technologies built in vintage years still receive input as long as they remain active.

Please let me know if any of this helped you. If it doesn't, please explain your question again as precisely as you can. With all this terminology, one needs to be very precise with wording.

In the meantime, I notice that your output from message-ix show-versions says that gams cannot be found. Without gams, you cannot run the tutorials, or did you find a way around it?
Also, your message_ix version says 0.0.0, which does not seem quite right and should be 3.9.0. Just before releasing v3.9.0, we fixed a number of issues in the tutorials, some related to vintage_and_active_years(). So please make sure you use that version or a later one installed from GitHub :)

Tyler-lc · 2024-12-09T08:23:48Z

Hi @glatterf42,
the problem arises from the fact that when we define the output of the technologies in the first tutorial, we are filtering the vintage_and_active_years() using the tuple (country,tech) . So for example when we call the method we use:
scen.vintage_active_years((country, "wind_ppl")). This will return a DataFrame with only the relevant vintage and active years combinations. Hence we end up having a DF that looks like this:

year_vtg	year_act
690	700
700	700
700	710
710	710
710	720
720	720

The problem is that in the later exercises, we only call scen.vintage_and_active_years(). Which returns this table instead:

	year_vtg	year_act
0	690	700
1	690	710
2	690	720
3	700	700
4	700	710
5	700	720
6	710	710
7	710	720
8	720	720

So, in the input we have that for each vintage year, we have 3 active years. The output, on the other hand, will have 2 years for each vintage. We end up having a mismatch between the vintage/active years combination between input and output. I think, it would be more correct to call the method using the correct filter. But I am not sure whether this input/output mismatch has an impact on the results.

Regarding the message_ix version: for some reason this is the reporting when using messeage-ix show-versions in the prompt. If I manually check the list of installed packages in the anaconda prompt (using conda list), it tells me that message-ix version is the 3.9. And regarding GAMS, I have gams installed with a license. I did not add the PATH though, so that might be why it is reporting that I do not have it. But it otherwise works fine.

I hope this clarified the issue.
And also let me know if you think there is something wrong in my installation.

Thank you very much!

Regards,
Luca Casamassima.

glatterf42 · 2024-12-12T11:18:34Z

Hi @Tyler-lc, thanks for specifying the issues here! I have just run the westeros_renewable_resource tutorial with year_df = scen.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)) and the result changes quite substantially: the OBJ level changes from 445637.21875 to 341937.4375 and the model uses much less of the wind_ppl.

Your explanation makes sense to me, I agree that it makes more sense to make the year_df specific to the technology at hand. The two tutorials that you reported here have been written by @OFR-IIASA, @behnam-zakeri, and @khaeru originally, it seems. So I would make a PR that changes the definition of year_df in these tutorials and hope that either of them stops the PR should this not be the correct way after all.

Please report further occurrences of this issue as you encounter them!

khaeru · 2024-12-12T11:48:25Z

I think the best way to understand the implications of mismatched parameter data is to look at the mathematical formulation in the documentation and/or the GAMS code. From those we can draw simple conclusions.

For any given $(t, y^V, y^A)$:

if $y^A - y^V > \text{technical-lifetime}_{t,y^A}$ then the technology will not operate. In other words, values in input or output are irrelevant if the period (‘year’) of activity is too far from the period of vintage. For instance, a technology constructed in $y^V = 690$ with a life time of 10 years cannot be operative in $y^A = 720$. Thus input and output values with keys including $(y^V, y^A) = (690, 720)$ would have no effect, and in this case it does not affect the model solution if they are mismatched.
if there is an output value but no input value, the technology can be active and produce its output $(c, l)$, yet without consuming anything.
This means there is no cost or limitation associated with supply of any input commodities for that technology; only its inherent costs (inv_cost, fix_cost, and var_cost) and constraints.
The solver usually finds it optimal to use more or lots of such technologies, especially in relation to other technologies that output the same $(c, l)$ but do have input commodity costs/constraints.
if there is an input value but no output value, the technology can be active, but it does not produce anything that could be an input to other technologies or directly satisfy demand.
Aside from constraints that force the technology to be active, the solver will usually find it optimal to not use such $(t, y^V, y^A)$ at all.

In the case in the issue description, take the example of $y^V = 700$. For this vintage:

…there are input values for $y^A \in {700, 710, 720}$. So long as the technical_lifetime allows, the technology can be active in any of these periods.
…there are output values for $y^A \in {700, 710}$. So it can only produce usable output in these periods.
If instead there were output values for $y^A \in {700, 710, 720}$, then in 720 the model would have more alternatives for that $(c, l)$.

Another thing to consider is that, if there is an inv_cost to construct a technology in $y^V = 700$, that cost can be 'levelized' more if the technology is used across more periods. So missing values in input or output that prevent the technology being used in 1 or more periods can make the technology less attractive to the solver.

NB I have not re-read the tutorial or its Git history to discover which is intended, but generally I agree it is less confusing as a learning aid or expression of the model behaviour and modeling best practices to use the same set for year_vtg and year_act for each technology.

glatterf42 · 2024-12-12T12:30:06Z

Thanks for jumping in here! I also thought that changing year_df to the technology-specific version should not affect the model if the rest of the model was well-defined, but changing year_df did change the outcome.
The only place that year_df is used in the renewable_resource tutorial is the definition of the input parameter for wind_ppl, so I'm assuming the change arises from a newly-introduced mismatch between input and output for wind_ppl. I'll look into the definition of the output and see if we can adjust that accordingly and reach the same result again.

glatterf42 · 2024-12-12T12:37:10Z

The output for wind_ppl is cloned, in the end, from the westeros_baseline tutorial, where it is defined by this:

year_df = scenario.vintage_and_active_years((country, "wind_ppl"), in_horizon=False)
vintage_years, act_years = year_df["year_vtg"], year_df["year_act"]
base.update(
    year_vtg=vintage_years,
    year_act=act_years,
    technology="wind_ppl",
    commodity="electricity",
    level="secondary",
    value=1.0,
)
wind_out = make_df("output", **base, node_dest=country, time_dest="year")
scenario.add_par("output", wind_out)

Which seems to suggest that for consistency, we should use @Tyler-lc's suggestion in renewable_resource. Until now, we have defined input for more year-combinations than output, which should mean, if I understand @khaeru correctly, that the solver shied away from this technology for the extra combinations. But for some reason, the model solution still changes when correcting year_df, indicating that it was used somehow.

Tyler-lc · 2024-12-12T12:38:27Z

From what @khaeru said, it makes sense that the results would change, because there are inv_cost related to the technologies wind_ppl and coal_ppl, right?

glatterf42 · 2024-12-12T12:53:26Z

Ah, yes, per the last point: when using the full year_df, we supply additional output values, which the model is using to 'levelize' the inv_cost associated with the wind_ppl. When removing these extra values, the wind_ppl becomes less attractive as described.
I'll draft a PR with the fix :)

Tyler-lc changed the title ~~potential wrong filtering for vintage_and_active_years()~~ potential wrong usage vintage_and_active_years() in message_ix tutorial Dec 12, 2024

glatterf42 linked a pull request Dec 12, 2024 that will close this issue

Make usage of vintage_and_active_years() in the tutorials consistent #900

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

Tyler-lc commented Dec 6, 2024 •

edited

Loading

glatterf42 commented Dec 6, 2024

Tyler-lc commented Dec 9, 2024

glatterf42 commented Dec 12, 2024

khaeru commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

Tyler-lc commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

potential wrong usage vintage_and_active_years() in message_ix tutorial #897

Comments

Tyler-lc commented Dec 6, 2024 • edited Loading

What happened?

Code Sample

What did you expect to happen?

Versions

Additional Context

glatterf42 commented Dec 6, 2024

Tyler-lc commented Dec 9, 2024

glatterf42 commented Dec 12, 2024

khaeru commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

Tyler-lc commented Dec 12, 2024

glatterf42 commented Dec 12, 2024

Tyler-lc commented Dec 6, 2024 •

edited

Loading