Skip to content

tea_tasting.metrics.nonparametric #

Metrics for nonparametric analysis.

MannWhitneyU(column, *, alternative=None, correction=None, method='auto', nan_policy='propagate') #

Bases: MetricBaseGranular[MannWhitneyUResult]

Metric for nonparametric analysis with the Mann-Whitney U test.

Parameters:

Name Type Description Default
column str

Metric column name.

required
alternative Literal['two-sided', 'greater', 'less'] | None

Alternative hypothesis:

  • "two-sided": the distributions are not equal,
  • "greater": the treatment distribution is stochastically greater than the control distribution,
  • "less": the treatment distribution is stochastically less than the control distribution.
None
correction bool | None

Whether a continuity correction (1/2) should be applied. Only for the asymptotic method. Defaults to the global config value (True).

None
method Literal['auto', 'asymptotic', 'exact']

Method used for p-value calculation:

  • "auto": exact when sample sizes are small and there are no ties; asymptotic otherwise.
  • "asymptotic": normal approximation with tie correction.
  • "exact": exact p-value calculation.
'auto'
nan_policy Literal['propagate', 'omit', 'raise']

Defines how to handle nan values:

  • "propagate": return nan,
  • "omit": ignore nan values,
  • "raise": raise an exception.
'propagate'
Parameter defaults

Defaults for parameters alternative and correction can be changed using the config_context and set_config functions. See the Global configuration reference for details.

References

Examples:

>>> import tea_tasting as tt

>>> data = tt.make_users_data(seed=42, n_users=1000)
>>> experiment = tt.Experiment(
...     revenue_auc=tt.MannWhitneyU("revenue"),
... )
>>> result = experiment.analyze(data)
>>> result
     metric control treatment rel_effect_size rel_effect_size_ci pvalue
revenue_auc   0.472     0.528               -             [-, -] 0.0698

With specific alternative and method:

>>> experiment = tt.Experiment(
...     revenue_auc=tt.MannWhitneyU(
...         "revenue",
...         alternative="greater",
...         method="asymptotic",
...         correction=False,
...     ),
... )
>>> experiment.analyze(data)
     metric control treatment rel_effect_size rel_effect_size_ci pvalue
revenue_auc   0.472     0.528               -             [-, -] 0.0349
Source code in src/tea_tasting/metrics/nonparametric.py
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def __init__(
    self,
    column: str,
    *,
    alternative: Literal["two-sided", "greater", "less"] | None = None,
    correction: bool | None = None,
    method: Literal["auto", "asymptotic", "exact"] = "auto",
    nan_policy: Literal["propagate", "omit", "raise"] = "propagate",
) -> None:
    """Metric for nonparametric analysis with the Mann-Whitney U test.

    Args:
        column: Metric column name.
        alternative: Alternative hypothesis:

            - `"two-sided"`: the distributions are not equal,
            - `"greater"`: the treatment distribution is stochastically greater
                than the control distribution,
            - `"less"`: the treatment distribution is stochastically less than
                the control distribution.

        correction: Whether a continuity correction (1/2) should be applied.
            Only for the asymptotic method.
            Defaults to the global config value (`True`).
        method: Method used for p-value calculation:

            - `"auto"`: exact when sample sizes are small and there are no ties;
                asymptotic otherwise.
            - `"asymptotic"`: normal approximation with tie correction.
            - `"exact"`: exact p-value calculation.

        nan_policy: Defines how to handle `nan` values:

            - `"propagate"`: return `nan`,
            - `"omit"`: ignore `nan` values,
            - `"raise"`: raise an exception.

    Parameter defaults:
        Defaults for parameters `alternative` and `correction` can be changed using
        the `config_context` and `set_config` functions.
        See the [Global configuration](https://tea-tasting.e10v.me/api/config/)
        reference for details.

    References:
        - [Mann-Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test).
        - [scipy.stats.mannwhitneyu — SciPy Manual](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html).

    Examples:
        ```pycon
        >>> import tea_tasting as tt

        >>> data = tt.make_users_data(seed=42, n_users=1000)
        >>> experiment = tt.Experiment(
        ...     revenue_auc=tt.MannWhitneyU("revenue"),
        ... )
        >>> result = experiment.analyze(data)
        >>> result
             metric control treatment rel_effect_size rel_effect_size_ci pvalue
        revenue_auc   0.472     0.528               -             [-, -] 0.0698

        ```

        With specific alternative and method:

        ```pycon
        >>> experiment = tt.Experiment(
        ...     revenue_auc=tt.MannWhitneyU(
        ...         "revenue",
        ...         alternative="greater",
        ...         method="asymptotic",
        ...         correction=False,
        ...     ),
        ... )
        >>> experiment.analyze(data)
             metric control treatment rel_effect_size rel_effect_size_ci pvalue
        revenue_auc   0.472     0.528               -             [-, -] 0.0349

        ```
    """
    self.column = tea_tasting.utils.check_scalar(column, "column", typ=str)
    self.alternative: Literal["two-sided", "greater", "less"] = (
        tea_tasting.utils.auto_check(alternative, "alternative")
        if alternative is not None
        else tea_tasting.config.get_config("alternative")
    )
    self.correction = (
        tea_tasting.utils.auto_check(correction, "correction")
        if correction is not None
        else tea_tasting.config.get_config("correction")
    )
    self.method = tea_tasting.utils.check_scalar(
        method,
        "method",
        typ=str,
        in_={"auto", "asymptotic", "exact"},
    )
    self.nan_policy: Literal["propagate", "omit", "raise"]
    self.nan_policy = tea_tasting.utils.check_scalar(
        nan_policy,
        "nan_policy",
        typ=str,
        in_={"propagate", "omit", "raise"},
    )

cols property #

Columns to be fetched for a metric analysis.

analyze(data, control, treatment, variant=None) #

Analyze a metric in an experiment.

Parameters:

Name Type Description Default
data IntoFrame | Table | dict[object, Table]

Experimental data.

required
control object

Control variant.

required
treatment object

Treatment variant.

required
variant str | None

Variant column name.

None

Returns:

Type Description
MetricResultT

Analysis result.

Source code in src/tea_tasting/metrics/base.py
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
def analyze(
    self,
    data: (
        narwhals.typing.IntoFrame |
        ibis.expr.types.Table |
        dict[object, pa.Table]
    ),
    control: object,
    treatment: object,
    variant: str | None = None,
) -> MetricResultT:
    """Analyze a metric in an experiment.

    Args:
        data: Experimental data.
        control: Control variant.
        treatment: Treatment variant.
        variant: Variant column name.

    Returns:
        Analysis result.
    """
    tea_tasting.utils.check_scalar(variant, "variant", typ=str | None)
    dfs = read_granular(
        data,
        cols=self.cols,
        variant=variant,
    )
    return self.analyze_granular(
        control=dfs[control],
        treatment=dfs[treatment],
    )

analyze_granular(control, treatment) #

Analyze a metric in an experiment using granular data.

Parameters:

Name Type Description Default
control Table

Control data.

required
treatment Table

Treatment data.

required

Returns:

Type Description
MannWhitneyUResult

Analysis result.

Source code in src/tea_tasting/metrics/nonparametric.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
def analyze_granular(
    self,
    control: pa.Table,
    treatment: pa.Table,
) -> MannWhitneyUResult:
    """Analyze a metric in an experiment using granular data.

    Args:
        control: Control data.
        treatment: Treatment data.

    Returns:
        Analysis result.
    """
    contr = _select_as_numpy(control, self.column)
    treat = _select_as_numpy(treatment, self.column)
    contr, treat = _handle_nan_policy(contr, treat, self.nan_policy)
    if len(contr) == 0 or len(treat) == 0:
        return MannWhitneyUResult(
            control=float("nan"),
            treatment=float("nan"),
            effect_size=float("nan"),
            pvalue=float("nan"),
            statistic=float("nan"),
        )

    result = scipy.stats.mannwhitneyu(
        treat,
        contr,
        alternative=self.alternative,
        use_continuity=self.correction,
        method=self.method,
    )
    n_pairs = len(contr) * len(treat)
    treat_auc = float(result.statistic) / n_pairs if n_pairs > 0 else float("nan")
    contr_auc = 1 - treat_auc

    return MannWhitneyUResult(
        control=contr_auc,
        treatment=treat_auc,
        effect_size=treat_auc - contr_auc,
        pvalue=float(result.pvalue),
        statistic=float(result.statistic),
    )

MannWhitneyUResult #

Bases: NamedTuple

Result of the analysis using the Mann-Whitney U test.

Attributes:

Name Type Description
control float

ROC AUC for control. Probability that a value from control is greater than a value from treatment, plus half the probability that they are equal.

treatment float

ROC AUC for treatment. Probability that a value from treatment is greater than a value from control, plus half the probability that they are equal.

effect_size float

Absolute effect size. Difference between treatment and control ROC AUC values.

pvalue float

P-value.

statistic float

Mann-Whitney U statistic.