# User guide#

## Installation#

## Basic usage#

Begin with this simple example to understand the basic functionality:

```
import tea_tasting as tt
data = tt.make_users_data(seed=42)
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)
result = experiment.analyze(data)
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```

In the following sections, each step of this process is explained in detail.

### Input data#

The `make_users_data`

function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns:

`user`

: The unique identifier for each user.`variant`

: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test.`sessions`

: The total number of user's sessions.`orders`

: The total number of user's orders.`revenue`

: The total revenue generated by the user.

**tea-tasting** can process data in the form of either a Pandas DataFrame or an Ibis Table. Ibis is a Python package that serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL query, wrap it as an Ibis Table and pass it to **tea-tasting**.

Many statistical tests, such as the Student's t-test or the Z-test, require only aggregated data for analysis. For these tests, **tea-tasting** retrieves only aggregated statistics like mean and variance instead of downloading all detailed data. See more details in the guide on data backends.

**tea-tasting** assumes that:

- Data is grouped by randomization units, such as individual users.
- There is a column indicating the variant of the A/B test (typically labeled as A, B, etc.).
- All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table.

### A/B test definition#

The `Experiment`

class defines parameters of an A/B test: metrics and a variant column name. There are two ways to define metrics:

- Using keyword parameters, with metric names as parameter names, and metric definitions as parameter values, as in example above.
- Using the first argument
`metrics`

which accepts metrics in a form of dictionary with metric names as keys and metric definitions as values.

By default, **tea-testing** assumes that the A/B test variant is stored in a column named `"variant"`

. You can change it, using the `variant`

parameter of the `Experiment`

class.

Example usage:

```
experiment = tt.Experiment(
{
"sessions per user": tt.Mean("sessions"),
"orders per session": tt.RatioOfMeans("orders", "sessions"),
"orders per user": tt.Mean("orders"),
"revenue per user": tt.Mean("revenue"),
},
variant="variant",
)
```

### Metrics#

Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics.

Use the `Mean`

class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`

.

Use the `RatioOfMeans`

class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using parameters `numer`

and `denom`

.

Use the following parameters of `Mean`

and `RatioOfMeans`

to customize the analysis:

`alternative`

: Alternative hypothesis. The following options are available:`"two-sided"`

(default): the means are unequal.`"greater"`

: the mean in the treatment variant is greater than the mean in the control variant.`"less"`

: the mean in the treatment variant is less than the mean in the control variant.

`confidence_level`

: Confidence level of the confidence interval. Default is`0.95`

.`equal_var`

: Defines whether equal variance is assumed. If`True`

, pooled variance is used for the calculation of the standard error of the difference between two means. Default is`False`

.`use_t`

: Defines whether to use the Student's t-distribution (`True`

) or the Normal distribution (`False`

). Default is`True`

.

Example usage:

```
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions", alternative="greater"),
orders_per_session=tt.RatioOfMeans("orders", "sessions", confidence_level=0.9),
orders_per_user=tt.Mean("orders", equal_var=True),
revenue_per_user=tt.Mean("revenue", use_t=False),
)
```

Look for other supported metrics in the Metrics reference.

You can change default values of these four parameters using the global settings.

### Analyzing and retrieving experiment results#

After defining an experiment and metrics, you can analyze the experiment data using the `analyze`

method of the `Experiment`

class. This method takes data as an input and returns an `ExperimentResult`

object with experiment result.

By default, **tea-tasting** assumes that the variant with the lowest ID is a control. Change default behavior using the `control`

parameter:

`ExperimentResult`

is a mapping. Get a metric's analysis result using metric name as a key.

```
print(result["orders_per_user"])
#> MeanResult(control=0.5304003954522986, treatment=0.5730905412240769,
#> effect_size=0.04269014577177832, effect_size_ci_lower=-0.010800201598205564,
#> effect_size_ci_upper=0.0961804931417622, rel_effect_size=0.08048664016431273,
#> rel_effect_size_ci_lower=-0.019515294044062048,
#> rel_effect_size_ci_upper=0.19068800612788883, pvalue=0.11773177998716244,
#> statistic=1.5647028839586694)
```

Fields in result depend on metrics. For `Mean`

and `RatioOfMeans`

, the fields include:

`metric`

: Metric name.`control`

: Mean or ratio of means in the control variant.`treatment`

: Mean or ratio of means in the treatment variant.`effect_size`

: Absolute effect size. Difference between two means.`effect_size_ci_lower`

: Lower bound of the absolute effect size confidence interval.`effect_size_ci_upper`

: Upper bound of the absolute effect size confidence interval.`rel_effect_size`

: Relative effect size. Difference between two means, divided by the control mean.`rel_effect_size_ci_lower`

: Lower bound of the relative effect size confidence interval.`rel_effect_size_ci_upper`

: Upper bound of the relative effect size confidence interval.`pvalue`

: P-value`statistic`

: Statistic (standardized effect size).

`ExperimentResult`

provides the following methods to serialize and view the experiment result:

`to_dicts`

: Convert the result to a sequence of dictionaries.`to_pandas`

: Convert the result to a Pandas DataFrame.`to_pretty`

: Convert the result to a Pandas Dataframe with formatted values (as strings).`to_string`

: Convert the result to a string.`to_html`

: Convert the result to HTML.

`print(result)`

is the same as `print(result.to_string())`

.

```
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```

By default, methods `to_pretty`

, `to_string`

, and `to_html`

return a predefined list of attributes. This list can be customized:

```
print(result.to_string(names=(
"control",
"treatment",
"effect_size",
"effect_size_ci",
)))
#> metric control treatment effect_size effect_size_ci
#> sessions_per_user 2.00 1.98 -0.0132 [-0.0750, 0.0485]
#> orders_per_session 0.266 0.289 0.0233 [-0.00246, 0.0491]
#> orders_per_user 0.530 0.573 0.0427 [-0.0108, 0.0962]
#> revenue_per_user 5.24 5.73 0.489 [-0.133, 1.11]
```

In Jupyter and IPython, the output of the line `result`

will be a rendered HTML table.

## More features#

### Variance reduction with CUPED/CUPAC#

**tea-tasting** supports variance reduction with CUPED/CUPAC, within both `Mean`

and `RatioOfMeans`

classes.

Example usage:

```
import tea_tasting as tt
data = tt.make_users_data(seed=42, covariates=True)
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions", "sessions_covariate"),
orders_per_session=tt.RatioOfMeans(
numer="orders",
denom="sessions",
numer_covariate="orders_covariate",
denom_covariate="sessions_covariate",
),
orders_per_user=tt.Mean("orders", "orders_covariate"),
revenue_per_user=tt.Mean("revenue", "revenue_covariate"),
)
result = experiment.analyze(data)
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.68% [-3.2%, 1.9%] 0.603
#> orders_per_session 0.262 0.293 12% [4.2%, 21%] 0.00229
#> orders_per_user 0.523 0.581 11% [2.9%, 20%] 0.00733
#> revenue_per_user 5.12 5.85 14% [3.8%, 26%] 0.00675
```

Set the `covariates`

parameter of the `make_users_data`

functions to `True`

to add the following columns with pre-experimental data:

`sessions_covariate`

: Number of sessions before the experiment.`orders_covariate`

: Number of orders before the experiment.`revenue_covariate`

: Revenue before the experiment.

Define the metrics' covariates:

- In
`Mean`

, specify the covariate using the`covariate`

parameter. - In
`RatioOfMeans`

, specify the covariates for the numerator and denominator using the`numer_covariate`

and`denom_covariate`

parameters, respectively.

### Sample ratio mismatch check#

The `SampleRatio`

class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test.

Example usage:

```
import tea_tasting as tt
experiment = tt.Experiment(
sample_ratio=tt.SampleRatio(),
)
data = tt.make_users_data(seed=42)
result = experiment.analyze(data)
print(result.to_string(("control", "treatment", "pvalue")))
#> metric control treatment pvalue
#> sample_ratio 2023 1977 0.477
```

By default, `SampleRatio`

expects equal number of observations across all variants. To specify a different ratio, use the `ratio`

parameter. It accepts two types of values:

- Ratio of the number of observation in treatment relative to control, as a positive number. Example:
`SampleRatio(0.5)`

. - A dictionary with variants as keys and expected ratios as values. Example:
`SampleRatio({"A": 2, "B": 1})`

.

The `method`

parameter determines the statistical test to apply:

`"auto"`

: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise.`"binom"`

: Apply exact binomial test.`"norm"`

: Apply normal approximation of the binomial distribution.

The result of the sample ratio mismatch includes the following attributes:

`metric`

: Metric name.`control`

: Number of observations in control.`treatment`

: Number of observations in treatment.`pvalue`

: P-value

### Global settings#

In **tea-tasting**, you can change defaults for the following parameters:

`alternative`

: Alternative hypothesis.`confidence_level`

: Confidence level of the confidence interval.`equal_var`

: If`False`

, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.`n_resamples`

: The number of resamples performed to form the bootstrap distribution of a statistic.`use_t`

: If`True`

, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.- And more.

Use `get_config`

with the option name as a parameter to get a global option value:

Use `get_config`

without parameters to get a dictionary of global options:

Use `set_config`

to set a global option value:

```
tt.set_config(equal_var=True, use_t=False)
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)
experiment.metrics["orders_per_user"]
#> Mean(value='orders', covariate=None, alternative='two-sided',
#> confidence_level=0.95, equal_var=True, use_t=False)
```

Use `config_context`

to temporarily set a global option value within a context:

```
with tt.config_context(equal_var=True, use_t=False):
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)
experiment.metrics["orders_per_user"]
#> Mean(value='orders', covariate=None, alternative='two-sided',
#> confidence_level=0.95, equal_var=True, use_t=False)
```

### More than two variants#

In **tea-tasting**, it's possible to analyze experiments with more than two variants. However, the variants will be compared in pairs through two-sample statistical tests.

How variant pairs are determined:

- Default control variant: When the
`control`

parameter of the`analyze`

method is set to`None`

,**tea-tasting**automatically compares each variant pair. The variant with the lowest ID in each pair is a control. - Specified control variant: If a specific variant is set as
`control`

, it is then compared against each of the other variants.

The result of the analysis is a dictionary of `ExperimentResult`

objects with tuples (control, treatment) as keys.

Keep in mind that **tea-tasting** does not adjust for multiple comparisons. When dealing with multiple variant pairs, additional steps may be necessary to account for this, depending on your analysis needs.