tea-tasting: statistical analysis of A/B tests#
tea-tasting is a Python package for the statistical analysis of A/B tests featuring:
- Student's t-test, Z-test, Bootstrap, and quantile metrics out of the box.
- Extensible API: define and use statistical tests of your choice.
- Delta method for ratio metrics.
- Variance reduction with CUPED/CUPAC (also in combination with the delta method for ratio metrics).
- Confidence intervals for both absolute and percentage change.
- Sample ratio mismatch check.
- Power analysis.
tea-tasting calculates statistics directly within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and 20+ other backends supported by Ibis. This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.
Check out the blog post explaining the advantages of using tea-tasting for the analysis of A/B tests.
Installation#
Basic example#
import tea_tasting as tt
data = tt.make_users_data(seed=42)
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)
result = experiment.analyze(data)
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
Learn more in the detailed user guide. Additionally, see the guides on data backends, power analysis, and custom metrics.
Roadmap#
- Multiple hypotheses testing:
- Family-wise error rate: Holm–Bonferroni method.
- False discovery rate: Benjamini–Hochberg procedure.
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
- Mann–Whitney U test.
- Sequential testing: always valid p-value with mSPRT.
Package name#
The package name "tea-tasting" is a play on words that refers to two subjects:
- Lady tasting tea is a famous experiment which was devised by Ronald Fisher. In this experiment, Fisher developed the null hypothesis significance testing framework to analyze a lady's claim that she could discern whether the tea or the milk was added first to the cup.
- "tea-tasting" phonetically resembles "t-testing" or Student's t-test, a statistical test developed by William Gosset.