sequence_match

Aggregate function that checks whether a sequence of events matches a pattern. Uses a mini-regex syntax over condition references, executed by an NFA engine.

Signature

sequence_match(pattern VARCHAR, timestamp TIMESTAMP,
               cond1 BOOLEAN, cond2 BOOLEAN [, ...]) -> BOOLEAN

Parameters:

ParameterTypeDescription
patternVARCHARPattern string using the syntax described below
timestampTIMESTAMPEvent timestamp
cond1..condNBOOLEANEvent conditions (2 to 32)

Returns: BOOLEAN -- true if the event stream contains a subsequence matching the pattern, false otherwise.

Usage

-- Did the user view a product and then purchase?
SELECT user_id,
  sequence_match('(?1).*(?2)', event_time,
    event_type = 'view',
    event_type = 'purchase'
  ) as converted
FROM events
GROUP BY user_id;

Pattern Syntax

Patterns are composed of the following elements:

PatternDescription
(?N)Match an event where condition N (1-indexed) is true
.Match exactly one event (any conditions)
.*Match zero or more events (any conditions)
(?t>=N)Time constraint: at least N seconds since previous match
(?t<=N)Time constraint: at most N seconds since previous match
(?t>N)Time constraint: more than N seconds since previous match
(?t<N)Time constraint: less than N seconds since previous match
(?t==N)Time constraint: exactly N seconds since previous match
(?t!=N)Time constraint: not exactly N seconds since previous match

Pattern Examples

-- View then purchase (any events in between)
'(?1).*(?2)'

-- View then purchase with no intervening events
'(?1)(?2)'

-- View, exactly one event, then purchase
'(?1).(?2)'

-- View, then purchase within 1 hour
'(?1).*(?t<=3600)(?2)'

-- Three-step sequence with time constraints
'(?1).*(?t<=3600)(?2).*(?t<=7200)(?3)'

Behavior

  1. Events are sorted by timestamp.
  2. The pattern is compiled into an NFA (nondeterministic finite automaton).
  3. The NFA is executed against the event stream using lazy matching semantics for .* (prefer advancing the pattern over consuming additional events).
  4. Returns true if any accepting state is reached.

Time constraints are evaluated relative to the timestamp of the previously matched condition step. The time difference is computed in seconds, matching ClickHouse semantics.

Implementation

OperationComplexity
UpdateO(1) amortized (event append)
CombineO(m) where m = events in other state
FinalizeO(n * s) NFA execution, where n = events, s = pattern steps
SpaceO(n) -- all collected events

At benchmark scale, sequence_match processes 100 million events in 1.05 s (95 Melem/s).

See Also