Replies: 1 comment 1 reply
-
This is super cool, and we're definitely interested in having a larger set of statistical functions available. Splitting it off from the prelude is a good idea too. I don't have particularly strong opinions about naming conventions, but I feel like we should take advantage of the typeclass system to achieve some polymorphism. So, instead of having But feel free to send an initial PR. We might want to discuss some more details in the code itself, but we are 100% onboard with including a library like that! |
Beta Was this translation helpful? Give feedback.
-
It would be useful to have better support for commonly encountered probability distributions in Dex. I noticed that @emilyfertig recently added sampling from binomial and poisson distributions to the prelude. This is very cool. But there are lots of commonly used distributions, and in addition to sampling functions, it's useful to have other functions associated with distributions, such as the log probability density/mass function, the log probability distribution function, the log survivor function, the quantile function, etc. So I'm thinking that rather than filling the prelude with potentially dozens of such functions, it might be better to have all such functions in a separate library, say,
lib/stats.dx
.I've started to flesh out what this might look like here: https://github.com/darrenjw/logreg/blob/main/Dex/stats.dx
Comments welcome. The most contentious issue with this kind of library is always the naming convention. I really don't feel strongly about this, and would be very happy to go with a consensus. eg. do we use full names or abbreviated names? So,
binomial
rather thanbinom
andbernoulli
rather thanbern
? And is itnormal
orgaussian
, anyway?! Similarly, a convention for the different functions relating to a particular distribution is required. Again, this varies widely among programming languages and libraries. In the above I've gone for a fairly minimal*_ld
for log density,*_lp
for log probability distribution, etc., and no annotation for the sampling function. But we could go for something more descriptive, like*_logpdf
and*_logcdf
. But there's still a question about whether to annotate the sampling functions (eg.gaussian_rand
), and a separate question about whether to distinguish between the (log) probability density function of a continuous random variable and the probability mass function of a discrete random variable (I have a slight preference to not).I'm very happy to start this off with an initial PR, but it would be good to have a bit of a steer on naming first, as it will be inconvenient to change later.
Beta Was this translation helpful? Give feedback.
All reactions