Skip to content

factorpricingmodel/factor-pricing-model-universe

Repository files navigation

Factor Pricing Model Universe

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Package to build universes for factor pricing model. For further details, please refer to the documentation

Installation

Install this via pip (or your favourite package manager):

pip install factor-pricing-model-universe

Usage

The library contains the pipelines to build the universe. You can run the pipelines interactively in Jupyter Notebook.

from fpm_universe import pipeline

Alternatively, for scheduled runs, you can create a configuration and run the command line entry point to create the universe.

Configuration

The configuration is in yaml format and contains a few inputs

Name Description
output_filename Output filename
intermediate_directory Intermediate directory to export the pipeline outputs
start_datetime Start datetime of the universe
last_datetime Last datetime of the universe
frequency Frequency of the universe. For further details, please see the "Offset aliases" in pandas documentation
pipeline List of pipelines to filter the universe
data Defines the data used by pipeline, or referred by yaml tag !data

Each pipeline returns a pandas dataframe indicating if the instrument is included into the universe on the specified date / time. For example, the pipeline returns the following dataframe

+------------+--------+-------+
|    date    |  AAPL  | GOOGL |
+------------+--------+-------+
| 2022-11-17 |  True  | False |
+------------+--------+-------+
| 2022-11-18 |  True  |  True |
+------------+--------+-------+

and it indicates AAPL is included in the universe on both 2022-11-17 and 2022-11-18 while GOOGL only on 2022-11-18.

By default, the pipeline functions are imported from module fpm_universe.pipeline.

Each data defines the method to retrieve from the source, or the operator on the source data. The return type of each data is unconstrained. It can be a json-like dict, a list, a pandas series, or even a pandas dataframe.

In the configuration, Each data can be referred by yaml tag !data, and it is loaded in lazy only when it is referred by another data object or a pipeline.

Command

The entry point factor-pricing-model-universe is to generate the universe regarding the given configuration to the destination, with dynamically passing the parameters to format the configuration.

The arguments of the entry point are

Argument Description
-c, --config TEXT Required. Configuration file path.
-p, --parameter TEXT Parameters to be formatted in the configuration.

For example, given the configuration as follows,

output_filename: "{output_directory}/{date}.parquet"
intermediate_directory: "{output_directory}/{date}"
start_datetime: "2015-01-01"
last_datetime: "{date}"
frequency: "B"
pipeline:
  - name: range_validity
    function: range_validity
    parameters:
      values: !data initial_validity
data:
  symbols:
    function: jq_compile
    parameters:
      json_filename: "{data_directory}/index/sp500/default/{date}.json"
      pattern: "[.[] | .tickers[]] | sort | unique | .[]"
  initial_validity:
    function: jq_compile
    parameters:
      json_filename: "{data_directory}/listings/{date}.json"
      pattern: ".[] | {{ symbol: .symbol, valid_start_datetime: .ipoDate, valid_last_datetime: .delistingDate }}"
      includes:
        symbol: !data symbols

and run the following command

factor-pricing-model-universe \
  --config <path> \
  --parameter output_directory=$HOME/output \
  --parameter data_directory=$HOME/data \
  --parameter date=2022-10-20

the universe dataframe is output to $HOME/output/2022-10-20.parquet (formatted with the parameter output_directory and date).