Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDS: supply Gram matrix directly #232

Open
timholy opened this issue Jun 25, 2024 · 0 comments
Open

MDS: supply Gram matrix directly #232

timholy opened this issue Jun 25, 2024 · 0 comments

Comments

@timholy
Copy link
Collaborator

timholy commented Jun 25, 2024

Currently the pipeline for MDS is X -> D -> G -> M, where X is a coordinate representation of the data, D is the pairwise distance matrix, G is the Gram matrix, and M is the final MDS model. One can alternatively go D -> G -> M by supplying the distances=true keyword. However, there are applications where the natural thing to supply is G. For example, if you are working with objects for which there is a natural inner product, and you want to visualize these objects in a lower-dimensional space, you're going to first compute G, and it would be quite silly to use fit(MDS, gram2dist(G); distances=true) when the first thing that will happen inside fit is to convert the distances back into the Gram matrix.

I know there isn't a ton of stuff inside fit that happens after you have G, but from the standpoint of keeping things in sync I think it would be best to expose this to the user instead of forcing them to create their own private version of fit.

The only hard problem is deciding the API. I think a good design would be

struct Distances{M<:AbstractMatrix}
    D::M
end

struct Gramian{M<:AbstractMatrix}
    G::M
end

fit(::Type{MDS}, X::AbstractMatrix) = fit(MDS, Distances(L2distance(X)))
fit(::Type{MDS}, D::Distances) = fit(MDS, Gramian(dmat2gram(D.D)))
function fit(::Type{MDS}, G::Gramian)
    # the "main" implementation
end

Then users call one of:

  • fit(MDS, X)
  • fit(MDS, Distances(D))
  • fit(MSD, Gramian(G))

depending on what kind of starting point they have.

One alternative approach is to use a keyword argument, but here we would run into a compatibility problem: if we add gramian::Bool=false as a keyword, then we have to look for the case where distances == gramian == true and throw an error. This is not a big deal but it feels a bit ugly. Alternatively we could add interpretation=:coordinates and support the settings :distances and :gramian but that would be a breaking change. We can do a breaking change, but it seems to be a bit silly for a very minor change. With the "type tagging" of the input data, we could have fit(::Type{MDS}, X::AbstractMatrix) still support the distances kwarg but give a deprecation warning, and then we can remove that API whenever other more substantive changes force a breaking release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant