Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API of SortedContainer #918

Open
Wu-Chenyang opened this issue Nov 23, 2024 · 4 comments
Open

API of SortedContainer #918

Wu-Chenyang opened this issue Nov 23, 2024 · 4 comments

Comments

@Wu-Chenyang
Copy link

Wu-Chenyang commented Nov 23, 2024

In my application, I find it's useful to define the following APIs for SortedContainer.

Base.checkbounds(::Type{Bool}, container::SortedContainer, token::DataStructures.IntSemiToken) =
    token.address >= 3 && (token.address in container.bt.useddatacells)
Base.checkbounds(container::SortedContainer, token::DataStructures.IntSemiToken) = (container, token) |> DataStructures.has_data
Base.nextind(container::SortedContainer, token::DataStructures.IntSemiToken) = (container, token) |> advance
Base.prevind(container::SortedContainer, token::DataStructures.IntSemiToken) = (container, token) |> regress

Base.@propagate_inbounds function Base.getindex(m::SortedDict,
                              i::DataStructures.IntSemiToken)
           @boundscheck DataStructures.has_data((m,i))
           return (m, i) |> deref
end
Base.@propagate_inbounds function Base.getindex(m::SortedMultiDict,
                              i::DataStructures.IntSemiToken)
           @boundscheck DataStructures.has_data((m,i))
           return (m, i) |> deref
end
Base.@propagate_inbounds function Base.getindex(m::SortedSet,
                              i::DataStructures.IntSemiToken)
           @boundscheck DataStructures.has_data((m,i))
           return (m, i) |> deref
end
# Base.@propagate_inbounds function Base.getindex(m::SortedContainer, token::DataStructures.IntSemiToken)
#     @boundscheck DataStructures.has_data((m,token))
#     return (container, token) |> deref
# end

The first three functions checkbounds, nextind, prevind are straightforward and align well with existing APIs. The getindex functions for IntSemiToken have been implemented for SortedDict and SortedMultiDict, which return only the values. I modified them to return key-value pairs, because it is natural to do so in my application. Besides, given the existence of another getindex function that retrieves the value using the key, I think retrieving key-value pairs using IntSemiToken seems a natural choice. Of course, it might not be a good choice in general. How do you think about these changes?

Here is a piece of my application code for motivating the changes. It uses the general APIs for sorted containers to simultaneously handle sorted AbstractVector and SortedDict.

struct Observation{I, O}
    index::I
    obs::O
    Observation{I,O}(index::I, obs::O) where {I, O} = new{I,O}(index, obs)
end
Observation(index::I, obs::O) where {I, O} = Observation{I,O}(index, obs)
Base.lt(::Base.Order.ForwardOrdering, a::Observation, b::Observation) = a.index < b.index
Base.lt(::Base.Order.ForwardOrdering, a::Real, b::Observation) = a < b.index
Base.lt(::Base.Order.ForwardOrdering, a::Observation, b::Real) = a.index < b
flatten(obs::Observation) = (obs.index, obs.obs)
flatten(pair::Pair) = (pair.first, pair.second)

const ObservationContainer{I,O} = Union{AbstractVector{<:Observation{I,O}}, SortedDict{I,O}}

function (w::AbstractWienerProcess)(t::Real, condition::ObservationContainer)
    w₀, Σ = params(w)

    # `IntSemiToken` for `SortedDict`, `Int` for `AbstractVector`
    idx = searchsortedlast(condition, t)
    if checkbounds(Bool, condition, idx)
        t₁, a = flatten(condition[idx])
    else
        t₁ = zero(t)
        a = w₀
    end

    next_idx = nextind(condition, idx)
    if checkbounds(Bool, condition, next_idx)
        t₂, b = flatten(condition[next_idx])
        μ = a + (t - t₁) / (t₂ - t₁) * (b - a)
        cov_t = ((t₂ - t) * (t - t₁) / (t₂ - t₁))
        construct_normal(μ, Σ, cov_t)
    else
        construct_normal(a, Σ, t - t₁)
    end
end

I also tried to use the SortedSet as the container in this code block but encountered a problem that searchsortedlast for SortedSet tries to convert the t::Real to Observation before comparing it to other observations in the container. I think this compulsory conversion might not be necessary. In some cases, it is natural to directly compare objects of different types. The above code is an example.

@Wu-Chenyang
Copy link
Author

If you are interested in these changes, I can help with submitting a pull request.

@StephenVavasis
Copy link
Contributor

I would recommend against changing the API for getindex. Although I see the logic of the request, this API has been in place for 10 years, and changing it now could break many existing codes.

As for loosening the restriction in the API of searchsortedfirst and its relatives: I can see that it is sensible for your use-case. I also see the pitfall that some future user could be misled into thinking that the sorted containers allow for two incompatible orderings simultaneously. So I don't have a strong opinion either way.

@Wu-Chenyang
Copy link
Author

I see. Indeed, the changes in getindex and searchsortedfirst are not necessary even in my use case. I think your prudent decision is correct. What about the addition of nextind, prevind, and checkbounds?

@StephenVavasis
Copy link
Contributor

Yes, it would make sense to implement nextind and prevind and checkbounds. Furthermore, it would probably make sense to deprecate advance and regress in favor of nextind and prevind from Base, since a near-term goal for the package is to make the API more consistent with Base. We wouldn't want to do this overnight since it would break existing codes, but it could be phased in over a few release cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants