Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008) #3723

Closed
henrymj opened this issue Jul 3, 2024 · 6 comments

Comments

@henrymj
Copy link

henrymj commented Jul 3, 2024

I (and many psychologists) frequently work with repeated-measure designs, where a given individual participates in multiple conditions. A classic example would be the Stroop task, where participants see trials where the word and the color of the text are the same (congruent), and when they are different (incongruent). Almost everyone is a little bit slower for incongruent trials compared to congruent trials.

To my knowledge, Seaborn does not have a way of removing individual effects from its error-bars to highlight the consistent effects across individuals, though I believe the option would help many psychologists communicate their effects (and any other field which collects multiple measures from one "individual").

Following Loftus & Masson, 1994, and Cousineau, 2005, Morey, 2008 proposed a very simple estimate of confidence intervals that correct for repeated-measures designs. Hopefully it would be an easy feature to add to Seaborn!

Here's one example showing the benefits of this correction, using an R implementation.

Morey, R. D. (2008). Confidence intervals from normalized data: A correction to Cousineau (2005). Tutorials in quantitative methods for psychology, 4(2), 61-64.
https://www.tqmp.org/RegularArticles/vol04-2/p061/p061.pdf

@henrymj henrymj changed the title Consider add support for repeated-measures error bars/ CIs (following Morey, 2008) Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008) Jul 3, 2024
@thuiop
Copy link
Contributor

thuiop commented Jul 10, 2024

This seems a bit specific to your domain. Is there anything preventing you from passing a callable to the errorbar parameter, which allows you to define how the error is computed ?

@henrymj
Copy link
Author

henrymj commented Jul 10, 2024

My impression is that this would be a beneficial feature to almost also social science domains (and any clinical work that compares post-treatments to baselines), but I might be wrong.

I looked into customizable error bar callables using the old API (I haven't migrated yet), but at least in the documentation it appears that the callables should only expect a 1D vector of data, and thus wouldn't be able to make adjustments that require richer vectors with information like a individual identifier. However, it's possible I've just been looking in the wrong place and/or haven't understood how flexible the callables can be.

Here's an example that illustrates the stroop example above - if someone were able to provide a flexible solution to it, I'd be very happy to withdraw the issue!

import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(0)

# getting group average RTs for the 2 conditions
congruent_rts = np.random.normal(250, 50, 30)
incongruent_shift = np.random.normal(25, 10, 30)
incongruent_rts = congruent_rts + incongruent_shift

df = pd.DataFrame({
    'Condition': ['Congruent'] * 30 + ['Incongruent'] * 30,
    'subject': np.repeat(range(30), 2),
    'RT': np.concatenate([congruent_rts, incongruent_rts])
})

_ = sns.barplot(x='Condition', y='RT', data=df)  # barplots will mask the difference between conditions
Screenshot 2024-07-10 at 1 47 58 PM

@thuiop
Copy link
Contributor

thuiop commented Jul 10, 2024

Hm, indeed I have no obvious solution here (besides doing the statistical work outside of seaborn). I could probably whip up something for the objects interface though (which is probably the way to go; I believe that the changes to the old interface would need to be somewhat heavy for this to work).

@mwaskom
Copy link
Owner

mwaskom commented Jul 11, 2024

I've made plots like these before, IIRC the upstream data transform can be done in a pandas one-liner; I'm not sure why the linked stackoverflow question makes it seem so complicated.

But I am 👎 on adding this in seaborn; even though the data transform is straightforward, exposing a specification API that is sufficiently general for all experimental designs where you would want to use it is not.

Agree it could probably be supported through a plugin stat object.

Thanks for the suggestion.

@mwaskom mwaskom closed this as completed Jul 11, 2024
@thuiop
Copy link
Contributor

thuiop commented Jul 12, 2024

@henrymj Ok, so I built this for the object interface (it is a bit ugly, sorry) :

import numpy as np
import pandas as pd
import seaborn.objects as so

from seaborn._stats.aggregation import Est

class CMEst(Est):
    def __call__(
        self, data, groupby, orient, scales,
    ):
        var = {"x": "y", "y": "x"}[orient]
        means_indiv = data.groupby([orient,"id_var"]).agg(np.mean)[var]
        means_all = data.groupby(orient).agg(np.mean)[var]
        num_cond = None
        def normalize_df(df):
            nonlocal num_cond
            if num_cond is None:
                num_cond = len(df)
            new_df = df.copy()
            i = df.reset_index().loc[0,[orient,"id_var"]]
            partial_mean = means_indiv[i[orient],i["id_var"]]
            full_mean = means_all[i[orient]]
            new_df[var] = new_df[var] - partial_mean  + full_mean
            return new_df
        data = data.groupby([orient,"id_var"],group_keys=False).apply(normalize_df)
        def adjust_variance(df):
            df[var] = df[var].mean() + np.sqrt(num_cond/(num_cond-1)) * (df[var] - df[var].mean())
            return df
        data = groupby.apply(data,adjust_variance)
        return super().__call__(data, groupby, orient, scales)

This is close to a drop-in replacement of so.Est(), except that you need to specify an id_var column which represents the individual you are considering. The example from your stackexchange post would look like :

df = pd.read_csv("DemoWS-30x2.csv")
# Busy work for converting to long-form
df["index"] = df.index
df = pd.wide_to_long(df,[f"activation.{i}" for i in range(1,31)],i="index",j="condition",sep=".").reset_index()
df["index"] = df.index
df = pd.wide_to_long(df,"activation",i="index",j="timepoints",sep=".").reset_index()
p = (
    so.Plot(data=df, x="timepoints", y="activation", color="condition")
    .add(so.Band(), CMEst(),id_var="id")
    #.add(so.Band(),so.Est())
    .add(so.Line(marker="o"), so.Agg())
    .scale(color=so.Nominal(["red","blue"]))
)
p.show()

giving (there are some differences with the provided example though, I believe it might be due to the way the CI is computed, but don't hesitate to check the code as I may have made a mistake)
Figure_1

@henrymj
Copy link
Author

henrymj commented Jul 12, 2024

Thank you so much @thuiop! I'll test out this approach. I really appreciate you taking the time to generate this - hopefully other social scientists will also benefit from this example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants