-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refacto scenario #273
Refacto scenario #273
Conversation
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False, | ||
period = None, simulation = None, merge = False): | ||
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe). | ||
|
||
Args: | ||
variables(list, optional): Variable to compute, defaults to None | ||
expressions(str, optional): Expressions to compute, defaults to None | ||
filter_by(str, optional): Boolean variable or expression, defaults to None | ||
index(bool, optional): Index by entity id, defaults to False | ||
period(Period, optional): Period, defaults to None | ||
simulation(str, optional): Simulation to use | ||
merge(bool, optional): Merge all the entities in one data frame, defaults to False | ||
|
||
Returns: | ||
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables | ||
|
||
""" | ||
if simulation is None: | ||
assert len(self.simulations.keys()) == 1 | ||
simulation = list(self.simulations.values())[0] | ||
else: | ||
simulation = self.simulations[simulation] | ||
|
||
return simulation.create_data_frame_by_entity( | ||
variables = variables, | ||
expressions = expressions, | ||
filter_by = filter_by, | ||
index = index, | ||
period = period, | ||
merge = merge, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False, | |
period = None, simulation = None, merge = False): | |
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe). | |
Args: | |
variables(list, optional): Variable to compute, defaults to None | |
expressions(str, optional): Expressions to compute, defaults to None | |
filter_by(str, optional): Boolean variable or expression, defaults to None | |
index(bool, optional): Index by entity id, defaults to False | |
period(Period, optional): Period, defaults to None | |
simulation(str, optional): Simulation to use | |
merge(bool, optional): Merge all the entities in one data frame, defaults to False | |
Returns: | |
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables | |
""" | |
if simulation is None: | |
assert len(self.simulations.keys()) == 1 | |
simulation = list(self.simulations.values())[0] | |
else: | |
simulation = self.simulations[simulation] | |
return simulation.create_data_frame_by_entity( | |
variables = variables, | |
expressions = expressions, | |
filter_by = filter_by, | |
index = index, | |
period = period, | |
merge = merge, | |
) | |
def create_data_frame_by_entity(self, variables = None, expressions = None, filter_by = None, index = False, | |
period = None, simulation = None, baseline_simulation = None, merge = False): | |
"""Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe). | |
Args: | |
variables(list, optional): Variable to compute, defaults to None | |
expressions(str, optional): Expressions to compute, defaults to None | |
filter_by(str, optional): Boolean variable or expression, defaults to None | |
index(bool, optional): Index by entity id, defaults to False | |
period(Period, optional): Period, defaults to None | |
simulation(str, optional): Simulation to use | |
merge(bool, optional): Merge all the entities in one data frame, defaults to False | |
Returns: | |
dict or pandas.DataFrame: Dictionnary of dataframes by entities or dataframe with all the computed variables | |
""" | |
if simulation is None: | |
assert len(self.simulations.keys()) == 1 | |
simulation = list(self.simulations.values())[0] | |
else: | |
simulation = self.simulations[simulation] | |
dataframe_by_entity = simulation.create_data_frame_by_entity( | |
variables = variables, | |
expressions = expressions, | |
filter_by = filter_by, | |
index = index, | |
period = period, | |
merge = merge, | |
) | |
if baseline_simulation: | |
baseline_simulation = self.simulations[baseline_simulation] | |
baseline_dataframe_by_entity = baseline_simulation.create_data_frame_by_entity( | |
variables = variables, | |
expressions = expressions, | |
filter_by = filter_by, | |
index = index, | |
period = period, | |
merge = merge, | |
) | |
for entity, baseline_dataframe in baseline_dataframe_by_entity.items(): | |
dataframe_by_entity[entity] = ( | |
dataframe_by_entity[entity] - baseline_dataframe | |
) | |
return dataframe_by_entity |
return values | ||
|
||
|
||
def compute_pivot_table(simulation: Simulation = None, baseline_simulation: Simulation = None, aggfunc = 'mean', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjello : Why do we have a baseline_simulation
option here ? It doesn't exist in the others functions (except for compute_winners_loosers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have to combine simulation
and baseline_simulation
at the micro-data level?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have to combine simulation
and baseline_simulation
at the micro-data level ?.
For example we may want some mean of difference of some values (let's say ompot_revenu
for France) within cells of another variable (as revenu_disponible
or rfr
)
f02d41a
to
3bbdc07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modularize compute_aggregate, datafrale_by_entity to simulation, compute_pivot_table to Simulation, new_simulation Add quantile computation
WIP Doctrings
Co-authored-by: chloelallemand <[email protected]>
fecc301
to
35b32d0
Compare
Breaking changes
This is a major refactoring of the
AbstractSurveyScenario
object and affects other related objects.AbstractSurveyScenario
ReformScenario
openfiscca_core.simulations.Simulation
andopenfisca_core.simulations.simulation_builder.SimulationBuilder
.AbstractAggregates
accordinglyRationale
The main goal was to separate the different steps to produce an impact analysis on survey or administrative data
and to create a more flexible tools to deal with different use case.
To do so, we performed the following changes:
AbstractSurveyScenario
that can hande as many simulations as needed.Simulation
objects to deal all loading and calculation usingpandas
that are not available in the originalopenfisca_core.simulations.Simulation
object which rely solely onnumpy
(and will not change anytime soon for good reason)SimulationBuilder
to add the needed methods to init the simulation from tabular data.ReformScenario
that retains the main characteristics of the oldAbstractSurveyScenario
AbstractAggregates
to these new scenarios. Might need more refactoring to be more generic, but works with actual use case mainlyopenfisca-france-data
.Migration
AbstractSurveyScenario
should useReformScenario
.period
instead ofyear
.Simulation.new_from_tax_benefit_system
with a data dict argument with new keys ascollection
,id_variable_by_entity_key
,role_variable_by_entity_key
,used_as_input_variables
to mimic at the simulation level what was done before this PR at the scenario level.