Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refacto scenario #273

Merged
merged 66 commits into from
Nov 7, 2023
Merged

Refacto scenario #273

merged 66 commits into from
Nov 7, 2023

Conversation

benoit-cty
Copy link
Contributor

@benoit-cty benoit-cty commented Oct 9, 2023

Breaking changes

This is a major refactoring of the AbstractSurveyScenario object and affects other related objects.

  • Refactor AbstractSurveyScenario
  • Create ReformScenario
  • Monkey patch openfiscca_core.simulations.Simulation and openfisca_core.simulations.simulation_builder.SimulationBuilder.
  • Adapt AbstractAggregates accordingly

Rationale

The main goal was to separate the different steps to produce an impact analysis on survey or administrative data
and to create a more flexible tools to deal with different use case.
To do so, we performed the following changes:

  • Create a generic AbstractSurveyScenario that can hande as many simulations as needed.
  • Move to the appropriate (lower) level the methods to load the data or perform some calculation, mainly:
    • Monkey patch the Simulation objects to deal all loading and calculation using pandas that are not available in the original openfisca_core.simulations.Simulation object which rely solely on numpy (and will not change anytime soon for good reason)
    • Monkey patch the SimulationBuilder to add the needed methods to init the simulation from tabular data.
  • Create a ReformScenario that retains the main characteristics of the old AbstractSurveyScenario
  • Adapt AbstractAggregates to these new scenarios. Might need more refactoring to be more generic, but works with actual use case mainly openfisca-france-data.

Migration

  • Users of AbstractSurveyScenario should use ReformScenario.
  • Use attribute period instead of year.
  • The generic simulation initialisation from survey data goes through the method Simulation.new_from_tax_benefit_system with a data dict argument with new keys as collection, id_variable_by_entity_key, role_variable_by_entity_key, used_as_input_variables to mimic at the simulation level what was done before this PR at the scenario level.

Comment on lines +378 to +409
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False,
period = None, simulation = None, merge = False):
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe).

Args:
variables(list, optional): Variable to compute, defaults to None
expressions(str, optional): Expressions to compute, defaults to None
filter_by(str, optional): Boolean variable or expression, defaults to None
index(bool, optional): Index by entity id, defaults to False
period(Period, optional): Period, defaults to None
simulation(str, optional): Simulation to use
merge(bool, optional): Merge all the entities in one data frame, defaults to False

Returns:
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables

"""
if simulation is None:
assert len(self.simulations.keys()) == 1
simulation = list(self.simulations.values())[0]
else:
simulation = self.simulations[simulation]

return simulation.create_data_frame_by_entity(
variables = variables,
expressions = expressions,
filter_by = filter_by,
index = index,
period = period,
merge = merge,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False,
period = None, simulation = None, merge = False):
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe).
Args:
variables(list, optional): Variable to compute, defaults to None
expressions(str, optional): Expressions to compute, defaults to None
filter_by(str, optional): Boolean variable or expression, defaults to None
index(bool, optional): Index by entity id, defaults to False
period(Period, optional): Period, defaults to None
simulation(str, optional): Simulation to use
merge(bool, optional): Merge all the entities in one data frame, defaults to False
Returns:
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables
"""
if simulation is None:
assert len(self.simulations.keys()) == 1
simulation = list(self.simulations.values())[0]
else:
simulation = self.simulations[simulation]
return simulation.create_data_frame_by_entity(
variables = variables,
expressions = expressions,
filter_by = filter_by,
index = index,
period = period,
merge = merge,
)
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False,
period = None, simulation = None, baseline_simulation = None, merge = False):
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe).
Args:
variables(list, optional): Variable to compute, defaults to None
expressions(str, optional): Expressions to compute, defaults to None
filter_by(str, optional): Boolean variable or expression, defaults to None
index(bool, optional): Index by entity id, defaults to False
period(Period, optional): Period, defaults to None
simulation(str, optional): Simulation to use
merge(bool, optional): Merge all the entities in one data frame, defaults to False
Returns:
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables
"""
if simulation is None:
assert len(self.simulations.keys()) == 1
simulation = list(self.simulations.values())[0]
else:
simulation = self.simulations[simulation]
dataframe_by_entity = simulation.create_data_frame_by_entity(
variables = variables,
expressions = expressions,
filter_by = filter_by,
index = index,
period = period,
merge = merge,
)
if baseline_simulation:
baseline_simulation = self.simulations[baseline_simulation]
baseline_dataframe_by_entity = baseline_simulation.create_data_frame_by_entity(
variables = variables,
expressions = expressions,
filter_by = filter_by,
index = index,
period = period,
merge = merge,
)
for entity, baseline_dataframe in baseline_dataframe_by_entity.items():
dataframe_by_entity[entity] = (
dataframe_by_entity[entity] - baseline_dataframe
)
return dataframe_by_entity

return values


def compute_pivot_table(simulation: Simulation = None, baseline_simulation: Simulation = None, aggfunc = 'mean',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjello : Why do we have a baseline_simulation option here ? It doesn't exist in the others functions (except for compute_winners_loosers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have to combine simulation and baseline_simulation at the micro-data level?.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have to combine simulation and baseline_simulation at the micro-data level ?.
For example we may want some mean of difference of some values (let's say ompot_revenu for France) within cells of another variable (as revenu_disponible or rfr)

@clallemand clallemand force-pushed the refacto-scenario branch 2 times, most recently from f02d41a to 3bbdc07 Compare October 23, 2023 15:49
@benjello benjello requested a review from sandcha October 24, 2023 10:46
@clallemand clallemand self-requested a review November 7, 2023 10:46
Copy link
Contributor

@clallemand clallemand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @benjello,

@sandcha and I aggree to merge this merge request. We will do some cleaning changes in a future MR (soon) because it's only minor changes and we don't want to block this merge any longer.

Thanks a lot for this huge work !

@benjello benjello marked this pull request as ready for review November 7, 2023 15:31
@benjello benjello merged commit 1defa8e into master Nov 7, 2023
10 checks passed
@benjello benjello deleted the refacto-scenario branch November 7, 2023 15:32
@benjello benjello restored the refacto-scenario branch November 7, 2023 17:25
@benjello benjello deleted the refacto-scenario branch January 12, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants