Skip to content

Latest commit

 

History

History
229 lines (144 loc) · 13 KB

DEV_NOTE2.md

File metadata and controls

229 lines (144 loc) · 13 KB

LPhy Developer Guide 102 (LPhy in Java)

This tutorial focuses on how to implement LPhy components using Java classes.

LPhy terms

Please read the following articles before you start to write the code:

It is essential to have a thorough understanding of the following concepts:

In the Java implementation, Value and Generator classes are defined by GraphicalModelNode. Also see https://linguaphylo.github.io/programming/2020/09/22/linguaphylo-for-developers.html

LPhy data type

LPhy is a dynamic typing language. Therefore, as a developer, you need to understand how the data type is handled. For example,

  • All actual values are wrapped in the Value class, there are few classes inherit it, such as RandomVariable.

You need to use the method .value() to retrieve the actual value, and .getType() to get its data type.

Although we have already implemented some commonly used data types in LPhy, developers may still need to implement new LPhy data types for certain new generators.

LPhy data type is not sequence type

You may encounter many different "data types" in LPhy or BEAST. Please do not confuse these with sequence types. In LPhy, data types are specifically defined for the LPhy language. For example, they can be Double, Integer, Taxa, Alignment, or TimeTree.

However, any "data type" classes that inherit from JEBL SequenceType do not fall under this concept. These classes define the type of sequences.

Write your LPhy object in Java

Generative distribution

It is a Java interface to represent all types of generative distributions, such as probability distributions, tree generative distributions (e.g. Birth-death, Coalescent), and PhyloCTMC generative distributions.

To write your own generative distribution, you need to follow these steps:

  1. Design your LPhy script first, for example, Θ ~ LogNormal(meanlog=3.0, sdlog=1.0);.

  2. Create a Java class (e.g. LogNormal.java) to implement GenerativeDistribution.

Look at the example LogNormal.java. A few things are required:

  • Define its LPhy name by the annotation @GeneratorInfo for the overwritten method RandomVariable<Double> sample().

    name = "LogNormal" will allow the parser to parse it in a LPhy code into this Java object.

  • Define the arguments for this distribution using the annotation @ParameterInfo inside the constructor.

    name = "meanlog" declares one of the arguments as "meanlog". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such as Value<Number> M. We use Number so that this input can accept integer values. To make an argument optional, simply add optional = true.

  • Define the data type, e.g. LogNormal extends ParametricDistribution<Double> implements GenerativeDistribution1D<Double>, where Double replaces T and must be consistent with the returned type RandomVariable<Double> sample().

  • Implement the method RandomVariable<...> sample() which should sample a value from this distribution and then wrap it into RandomVariable.

  • Correctly implement both methods Map<String, Value> getParams() and setParam(String paramName, Value value), otherwise, it will fail when re-sampling values from the probabilistic graphical model represented by an LPhy script using this distribution.

  1. Register the distribution to SPI.

The SPI registration class for generative distributions is located at the Java package named as *.spi, for example, lphy.base.spi.LPhyBaseImpl, or phylonco.lphy.spi.PhyloncoImpl.
You can simply add your class into the list returned by the method List<Class<? extends GenerativeDistribution>> declareDistributions(). Here is the example in LPhyBaseImpl.

Please note the LPhy code will only function properly after the distribution class is registered. Therefore, it is acceptable to commit incomplete LPhy object during development (to avoid painful merges) without registering it, provided it compiles and is not included in any published unit tests.

Deterministic function

It is an abstract class and extends BasicFunction.

To write your own deterministic function, you need to follow the similar steps:

  1. Design your LPhy script first, for example, Q = hky(kappa=κ, freq=π).

  2. Create a Java class (e.g. HKY.java) to extend DeterministicFunction.

Look at the example HKY.java. A few things are required:

  • Define its LPhy name by the annotation @GeneratorInfo for the overwritten method Value<Double[][]> apply().

    name = "hky" will allow the parser to parse it in a LPhy code into this Java object.

  • Define the arguments for this distribution using the annotation @ParameterInfo inside the constructor.

    name = "kappa" declares one of the arguments as "kappa". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such as Value<Number> kappa. We use Number so that this input can accept integer values. To make an argument optional, simply add optional = true.

  • Define the data type, e.g. extends DeterministicFunction<Double[][]>, where the 2d matrix Double[][] replaces T and must be consistent with the returned type Value<Double[][]> apply().

  • Implement the method Value<...> apply() which should return a value deterministically and then wrap it into Value.

  1. Register the distribution to SPI.

Simply add your class into the list returned by the method List<Class<? extends BasicFunction>> declareFunctions().

Method call

The method call is a special case of deterministic function, but its implementation in Java is somewhat simpler. Here is an example of an LPhy script:

data {
  D = readNexus(file="data/primate.nex");
  taxa = D.taxa();
  ...
}

In this script, the first line imports an alignment D from "primate.nex", and the second line uses the method call D.taxa() to extract the taxa object.

To implement this, simply add a Java method with the same name, taxa(), in the Alignment class. Then, add the @MethodInfo annotation with the necessary information. The script line taxa = D.taxa(); will work as long as D is an Alignment object.

It is important to note that the method call must be implemented inside an existing Java class implementing the LPhy object that calls this method.

Inheritance

You can use Java inheritance to reuse code. For example, the RateMatrix, class is the parent class of most substitution models.

Overload

LPhy allows overloading. For example, the 1st script is implemented by Bernoulli

I_siteRates ~ Bernoulli(p=0.5);

The 2nd script is implemented by BernoulliMulti

I ~ Bernoulli(p=0.5, replicates=dim, minSuccesses=dim-2);

Registration

After you complete the Java implementation, you need to register it using SPI (Service Provider Interface) so that it can be applied in an LPhy script.

In LPhy core, the registration is normally in the Service Provider class lphy.base.spi.LPhyBaseImpl. Add your class into the corresponding list. In the LPhy extension, the registration is in the class to inherit LPhyBaseImpl.

The registration for SequenceType is in the different Service Provider class lphy.base.spi.SequenceTypeBaseImpl

SPI in LPhy extensions

First, you need to create your own Service Provider class. For GenerativeDistribution and BasicFunction, it should inherit (extends) LPhyBaseImpl. Then, follow the steps below:

  1. create an empty constructor, which is required by ServiceLoader.
  2. overwrite declareDistributions() and register your GenerativeDistribution there;
  3. overwrite declareFunctions() and register your BasicFunction there;
  4. overwrite getExtensionName().

For SequenceType, it should inherit (extends) SequenceTypeBaseImpl.

  1. initialize Map using the same code in the constructor;
  2. copy the method register();
  3. overwrite declareSequenceTypes() and register your SequenceType there;
  4. overwrite getExtensionName().

The last step is to add your SPI implementation into two configuration files:

i. module-info, ii. META-INF/services.