Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting pdf/logpdf over UnivariateFiniteArrays - some corner cases #15

Open
ablaom opened this issue Jun 3, 2021 · 3 comments
Open

Comments

@ablaom
Copy link
Member

ablaom commented Jun 3, 2021

Two corner cases previously not accounted for:

First is a bug: The following works as expected:

julia> v = UnivariateFinite(["M", "F"], [0.5 0.5; 0.4 0.6], pool=missing)
2-element UnivariateFiniteVector{Multiclass{2}, String, UInt8, Float64}:
 UnivariateFinite{Multiclass{2}}(M=>0.5, F=>0.5)
 UnivariateFinite{Multiclass{2}}(M=>0.4, F=>0.6)

julia> pdf(v, ["M", "F"])
2×2 Matrix{Float64}:
 0.5  0.5
 0.4  0.6

However, this should probably also work but doesn't:

julia> v = UnivariateFinite[v...]
2-element Vector{UnivariateFinite}:
 UnivariateFinite{Multiclass{2}}(M=>0.5, F=>0.5)
 UnivariateFinite{Multiclass{2}}(M=>0.4, F=>0.6)

julia> pdf(v, ["M", "F"])
ERROR: MethodError: no method matching pdf(::Vector{UnivariateFinite}, ::Vector{String})

Second is an optimization: (see comment below)

This works, but is does not use one of the optimised methods for broadcasting we have in src/univariatefinite/arrays:

pdf.(v, levels.(v))
@OkonSamuel
Copy link
Member

@ablaom
The first case could easily be resolved. But the fix would only work if the vector is homogeneous and would fail if it is not. Consider the example below:

julia> d = UnivariateFinite([0.1, 0.3, 0.6], pool =missing)
UnivariateFinite{Multiclass{3}}(class_1=>0.1, class_2=>0.3, class_3=>0.6)

julia> v = UnivariateFinite(["M", "F"], [0.5 0.5; 0.4 0.6], pool=missing)
2-element MLJBase.UnivariateFiniteVector{Multiclass{2}, String, UInt8, Float64}:
 UnivariateFinite{Multiclass{2}}(M=>0.5, F=>0.5)
 UnivariateFinite{Multiclass{2}}(M=>0.4, F=>0.6)

julia> n = [d, v[1]] #not having the same pool hence not homogeneous in this context
2-element Vector{UnivariateFinite{S, String, UInt8, Float64} where S}:
 UnivariateFinite{Multiclass{3}}(class_1=>0.1, class_2=>0.3, class_3=>0.6)
 UnivariateFinite{Multiclass{2}}(M=>0.5, F=>0.5)

julia> m = [v...] # has the same pool hence homogenous in this context
2-element Vector{UnivariateFinite{Multiclass{2}, String, UInt8, Float64}}:
 UnivariateFinite{Multiclass{2}}(M=>0.5, F=>0.5)
 UnivariateFinite{Multiclass{2}}(M=>0.4, F=>0.6)

julia> pdf(m, ["M", "F"])
2×2 Matrix{Float64}:
 0.5  0.5
 0.4  0.6

julia> pdf(n, ["M", "F"])
ERROR: MethodError: no method matching pdf(::Vector{UnivariateFinite{S, String, UInt8, Float64} where S}, ::Vector{String})
Closest candidates are:
  pdf(::Distributions.MultivariateDistribution{S} where S<:Distributions.ValueSupport, ::AbstractVector{T} where T) at /home/okonsamuel/.julia/packages/Distributions/r0PBA/src/multivariates.jl:204
  pdf(::AbstractArray{var"#s41", N} where var"#s41"<:UnivariateFinite{S, V, R, P}, ::AbstractVector{var"#s40"} where var"#s40"<:Union{CategoricalArrays.CategoricalValue{V, R}, V}) where {S, V, R, P, N} at /home/okonsamuel/.julia/packages/MLJBase/gcxd6/src/univariate_finite/arrays.jl:96
  pdf(::UnivariateFinite{S, V, R, P} where {R, P}, ::V) where {S, V} at /home/okonsamuel/.julia/packages/MLJBase/gcxd6/src/univariate_finite/methods.jl:229
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[22]:1

But I don't think the second case can make use of any of the optimized methods in src/univariatefinite/arrays. This is because those optimized methods assume a UnivariateFiniteArray which is just one of the numerous subtypes of AbstractArray{<:UnivariateFinite, N} where N.
Even in the case that v in your example above is an UnivariateFineArray object, there is no optimized implementation in MLJBase for pdf(d::UnivariateFinite, X::AbstractArray) which is what pdf.(v, levels.(v)) would eventually call multiple times when it gets lowered.

@ablaom
Copy link
Member Author

ablaom commented Jun 4, 2021

@OkonSamuel Thanks for giving this more thought.

In your example pdf(n, ["M", "F"]) ought to fail anyway, as pdf(d, "M") fails, because "M" is not a level of d. Although it is an unlikely case in MLJ, it seems natural to have pdf(n, ["M", "F"]) work whenever the corresponding "scalar" calls make sense. In these weird cases, we don't have (or really care) about performance. Could we not add a method with broader signature pdf(v::AbstractArray{<:UnivariateFinite}, C::AbstractVector) as a slow fallback to catch even inhomogeneous examples, failing only when they wouldn't make sense anyway (ie failing only if some v[i] has a pool that does not include C)?

But I don't think the second case can make use of any of the optimized methods in src/univariatefinite/arrays

Yes, I think you are right. So we can drop that one.

@OkonSamuel
Copy link
Member

Could we not add a method with broader signature pdf(v::AbstractArray{<:UnivariateFinite}, C::AbstractVector) as a slow fallback to catch even inhomogeneous examples, failing only when they wouldn't make sense anyway (ie failing only if some v[i] has a pool that does not include C)?

Yes, this was what I was hinting at.

@ablaom ablaom transferred this issue from JuliaAI/MLJBase.jl Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants