This tutorial focuses on how to implement LPhy components using Java classes.
Please read the following articles before you start to write the code:
It is essential to have a thorough understanding of the following concepts:
- Probabilistic graphical model
- Value
- Constant
- Random variable
- Generator
In the Java implementation, Value and Generator classes are defined by GraphicalModelNode. Also see https://linguaphylo.github.io/programming/2020/09/22/linguaphylo-for-developers.html
LPhy is a dynamic typing language. Therefore, as a developer, you need to understand how the data type is handled. For example,
- All actual values are wrapped in the Value class, there are few classes inherit it, such as RandomVariable.
You need to use the method .value()
to retrieve the actual value,
and .getType()
to get its data type.
- It is also required to define what data type to return in either Generative distribution or Deterministic function. The detail is explained in next subsections.
Although we have already implemented some commonly used data types in LPhy, developers may still need to implement new LPhy data types for certain new generators.
You may encounter many different "data types" in LPhy or BEAST. Please do not confuse these with sequence types. In LPhy, data types are specifically defined for the LPhy language. For example, they can be Double, Integer, Taxa, Alignment, or TimeTree.
However, any "data type" classes that inherit from JEBL SequenceType do not fall under this concept. These classes define the type of sequences.
It is a Java interface to represent all types of generative distributions, such as probability distributions, tree generative distributions (e.g. Birth-death, Coalescent), and PhyloCTMC generative distributions.
To write your own generative distribution, you need to follow these steps:
-
Design your LPhy script first, for example,
Θ ~ LogNormal(meanlog=3.0, sdlog=1.0);
. -
Create a Java class (e.g. LogNormal.java) to implement GenerativeDistribution.
Look at the example LogNormal.java. A few things are required:
-
Define its LPhy name by the annotation
@GeneratorInfo
for the overwritten methodRandomVariable<Double> sample()
.name = "LogNormal"
will allow the parser to parse it in a LPhy code into this Java object. -
Define the arguments for this distribution using the annotation
@ParameterInfo
inside the constructor.name = "meanlog"
declares one of the arguments as "meanlog". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such asValue<Number> M
. We useNumber
so that this input can accept integer values. To make an argument optional, simply addoptional = true
. -
Define the data type, e.g.
LogNormal extends ParametricDistribution<Double> implements GenerativeDistribution1D<Double>
, whereDouble
replacesT
and must be consistent with the returned typeRandomVariable<Double> sample()
. -
Implement the method
RandomVariable<...> sample()
which should sample a value from this distribution and then wrap it intoRandomVariable
. -
Correctly implement both methods
Map<String, Value> getParams()
andsetParam(String paramName, Value value)
, otherwise, it will fail when re-sampling values from the probabilistic graphical model represented by an LPhy script using this distribution.
- Register the distribution to SPI.
The SPI registration class for generative distributions is located at the Java package named as *.spi
,
for example, lphy.base.spi.LPhyBaseImpl
, or phylonco.lphy.spi.PhyloncoImpl
.
You can simply add your class into the list returned by the method List<Class<? extends GenerativeDistribution>> declareDistributions()
.
Here is the example in LPhyBaseImpl.
Please note the LPhy code will only function properly after the distribution class is registered. Therefore, it is acceptable to commit incomplete LPhy object during development (to avoid painful merges) without registering it, provided it compiles and is not included in any published unit tests.
It is an abstract class and extends BasicFunction.
To write your own deterministic function, you need to follow the similar steps:
-
Design your LPhy script first, for example,
Q = hky(kappa=κ, freq=π)
. -
Create a Java class (e.g. HKY.java) to extend DeterministicFunction.
Look at the example HKY.java. A few things are required:
-
Define its LPhy name by the annotation
@GeneratorInfo
for the overwritten methodValue<Double[][]> apply()
.name = "hky"
will allow the parser to parse it in a LPhy code into this Java object. -
Define the arguments for this distribution using the annotation
@ParameterInfo
inside the constructor.name = "kappa"
declares one of the arguments as "kappa". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such asValue<Number> kappa
. We useNumber
so that this input can accept integer values. To make an argument optional, simply addoptional = true
. -
Define the data type, e.g.
extends DeterministicFunction<Double[][]>
, where the 2d matrixDouble[][]
replacesT
and must be consistent with the returned typeValue<Double[][]> apply()
. -
Implement the method
Value<...> apply()
which should return a value deterministically and then wrap it intoValue
.
- Register the distribution to SPI.
Simply add your class into the list returned by the method
List<Class<? extends BasicFunction>> declareFunctions()
.
The method call is a special case of deterministic function, but its implementation in Java is somewhat simpler. Here is an example of an LPhy script:
data {
D = readNexus(file="data/primate.nex");
taxa = D.taxa();
...
}
In this script, the first line imports an alignment D
from "primate.nex",
and the second line uses the method call D.taxa()
to extract the taxa object.
To implement this, simply add a Java method with the same name, taxa()
, in the Alignment class.
Then, add the @MethodInfo
annotation with the necessary information.
The script line taxa = D.taxa();
will work as long as D
is an Alignment object.
It is important to note that the method call must be implemented inside an existing Java class implementing the LPhy object that calls this method.
You can use Java inheritance to reuse code. For example, the RateMatrix, class is the parent class of most substitution models.
LPhy allows overloading. For example, the 1st script is implemented by Bernoulli
I_siteRates ~ Bernoulli(p=0.5);
The 2nd script is implemented by BernoulliMulti
I ~ Bernoulli(p=0.5, replicates=dim, minSuccesses=dim-2);
After you complete the Java implementation, you need to register it using SPI (Service Provider Interface) so that it can be applied in an LPhy script.
In LPhy core, the registration is normally in the Service Provider class lphy.base.spi.LPhyBaseImpl.
Add your class into the corresponding list. In the LPhy extension, the registration is in the class to inherit LPhyBaseImpl
.
The registration for SequenceType
is in the different Service Provider class lphy.base.spi.SequenceTypeBaseImpl
First, you need to create your own Service Provider class.
For GenerativeDistribution
and BasicFunction
,
it should inherit (extends) LPhyBaseImpl
. Then, follow the steps below:
- create an empty constructor, which is required by
ServiceLoader
. - overwrite
declareDistributions()
and register yourGenerativeDistribution
there; - overwrite
declareFunctions()
and register yourBasicFunction
there; - overwrite
getExtensionName()
.
For SequenceType
, it should inherit (extends) SequenceTypeBaseImpl
.
- initialize Map using the same code in the constructor;
- copy the method
register()
; - overwrite
declareSequenceTypes()
and register yourSequenceType
there; - overwrite
getExtensionName()
.
The last step is to add your SPI implementation into two configuration files:
i. module-info, ii. META-INF/services.