🚧 Note: The data warehouse integration is currently in
beta
. We are keen to gather as much feedback as possible so if you try this out, please let us know. You can email daniel.b@posthog.com, send feedback via the in-app support panel, or use one of our other support options.
Evaluating experiment metrics always depends on events. They rely on something happening at a certain point in time. If one of your data warehouse tables includes event-like data, you can use it as a primary or secondary metric for your trends experiment.
To use a data warehouse table with an experiment, you'll first need to join the events
table to your data warehouse table:
Navigate to the Data warehouse tab and click on Add join from the triple dot menu next to your table.
Join the
events
table to your data warehouse table:Under Source Table Key, specify a column that holds the value of the
distinct_id
present for the$feature_flag_called
event.Check Optimize for Experiments to ensure only the most recent matching event is joined to your table.
Under Source Timestamp Key, specify a column that represents the timestamp of the table row. It will be compared with the event timestamp to determine the most recent
$feature_flag_called
event for the row.
Once you've joined the events
table to your data warehouse table, you can use the data warehouse table as a primary or secondary metric for your experiment:
When picking your metrics, click on the Data warehouse tables category and select your table.
Specify the columns in your data warehouse table that represent the unique ID, the distinct ID, and the timestamp.
When you select a data warehouse table as a primary or secondary metric, PostHog sees it as an 'events-like' table and is thus able to use it to calculate results.
The PostHog uses the join to the events
table to determine which variant a given data warehouse table row is associated with. For trends experiments, the most recent $feature_flag_called
event before the row represents their active variant assignment:
Data Warehouse: timestamp=2024-01-03 12:00, distinct_id=user1Events:- 2024-01-02 12:00: (user1, variant='control') <- This event will be joined- 2024-01-03 18:00: (user1, variant='test') <- Ignored (occurs after data warehouse timestamp)