Generating custom strategies with hypothesis.strategies
Property-based testing shifts the paradigm from example-driven assertions to invariant validation. While Hypothesis ships with a robust standard library of primitives, production-grade systems inevitably encounter domain models that require cross-field correlation, strict business invariants, or complex type hierarchies. Learning how to programmatically construct, register, and debug custom strategies is the definitive skill separating basic test coverage from resilient, self-healing test suites. This guide details the architectural patterns, diagnostic workflows, and CI optimization techniques required for generating custom strategies with hypothesis.strategies in modern Python codebases.
When and Why to Generate Custom Strategies
The architectural boundary between built-in strategies and custom implementations is defined by constraint density and generation efficiency. Built-in primitives like st.integers(), st.text(), and st.dictionaries() operate on independent, uniformly distributed domains. They excel at fuzzing generic algorithms but degrade rapidly when applied to tightly coupled domain objects. When business rules dictate that start_date <= end_date, currency_code must match locale, or nested payloads must maintain referential integrity, naive composition triggers excessive rejection sampling. Hypothesis’s rejection sampler discards invalid examples and retries, but when rejection rates exceed 15%, generation throughput collapses, shrinking becomes non-deterministic, and CI pipelines stall.
Custom strategies solve this by shifting validation from post-generation filtering to pre-generation routing. Instead of generating arbitrary data and hoping it satisfies invariants, you construct strategies that natively produce valid states. This requires understanding strategy composition versus inheritance. Hypothesis strategies are immutable, lazy-evaluated generators; they do not support subclassing. Instead, you compose them functionally using combinators. The core API surface for this work includes @st.composite for imperative, stateful generation; st.builds for declarative constructor mapping; and st.register_type_strategy for automated type resolution.
When designing these abstractions, map business rules directly to Hypothesis primitives. If a field is an enum, use st.sampled_from(). If a value is constant in certain contexts, use st.just(). If multiple valid structural shapes exist, use st.one_of(). By aligning your generation logic with the underlying constraint solver, you eliminate the performance bottlenecks that plague naive property tests. For foundational testing paradigms and broader architectural context, refer to Property-Based & Fuzz Testing Strategies before diving into advanced composition patterns.
Composing Strategies with @st.composite and st.builds
The @st.composite decorator transforms a standard Python function into a strategy factory. Inside a composite function, you call data.draw(strategy) to consume values from underlying strategies sequentially. This enables stateful generation, conditional branching, and cross-field validation without triggering the rejection sampler. Each draw() call advances Hypothesis’s internal DataTree, allowing the shrinking engine to trace dependencies and minimize failures deterministically.
Contrast this with st.builds(target, **kwargs), which declaratively maps keyword arguments to strategies and invokes target with the generated values. st.builds is ideal for pure constructors where fields are independent. It automatically resolves type hints via st.from_type() and handles typing.Optional gracefully. However, st.builds cannot enforce cross-field constraints. When you need field_b > field_a, @st.composite becomes mandatory.
Generator overhead and lazy evaluation are critical considerations. Strategies are not evaluated until draw() is called. This means you can define complex conditional routing without incurring upfront costs. Use st.one_of() to branch generation paths based on domain probabilities, and st.sampled_from() to restrict categorical domains. Avoid placing assume() calls after expensive draws; instead, route generation early.
from datetime import date, timedelta
from typing import Protocol
import hypothesis.strategies as st
from hypothesis import given, settings, Verbosity, Phase
class TimeRange(Protocol):
start_date: date
end_date: date
duration_days: int
@st.composite
def valid_time_ranges(draw: st.DrawFn) -> dict[str, date | int]:
"""
Generates time ranges where start_date <= end_date.
Uses conditional routing to avoid st.filter() rejection overhead.
"""
# Draw start date from a bounded domain for CI stability
start = draw(st.dates(min_value=date(2020, 1, 1), max_value=date(2024, 12, 31)))
# Conditionally route end_date generation
# 80% chance of valid future date, 20% chance of same day
end_strategy = st.one_of(
st.dates(min_value=start, max_value=start + timedelta(days=365)),
st.just(start)
)
end = draw(end_strategy)
# Explicit assume() only for edge-case safety, not primary routing
assume(end >= start)
return {
"start_date": start,
"end_date": end,
"duration_days": (end - start).days
}
@given(time_range=valid_time_ranges())
@settings(
max_examples=200,
phases=[Phase.generate, Phase.shrink],
verbosity=Verbosity.normal,
database=None # Isolate for deterministic runs
)
def test_time_range_invariants(time_range: dict[str, date | int]) -> None:
assert time_range["start_date"] <= time_range["end_date"]
assert time_range["duration_days"] >= 0
The draw() lifecycle isolates side effects. Each call to a composite function receives a fresh Data object, ensuring test isolation. When designing conditional routing, prefer st.one_of() over assume() whenever possible. assume() triggers Hypothesis’s rejection sampler, which increases generation latency and complicates shrinking. By pre-filtering branches, you guarantee that every generated example is structurally valid, allowing the shrinking engine to focus on semantic failures rather than constraint violations.
Type Registration and Automatic Resolution
Hypothesis’s type resolution engine bridges static typing and runtime generation. st.from_type() automatically infers strategies from PEP 484 annotations, but it requires explicit registration for custom classes, Pydantic models, or attrs dataclasses. st.register_type_strategy(type_, strategy) binds a strategy to a type globally, enabling automatic resolution across your test suite.
Registration order matters. Hypothesis resolves types by walking the Method Resolution Order (MRO) and checking registered strategies before falling back to built-in inference. When registering strategies for complex type hierarchies, avoid namespace pollution by scoping registrations to test modules or using pytest fixtures that yield temporary registrations. For complex type hierarchies and automated strategy discovery, consult Advanced Property-Based Testing.
from dataclasses import dataclass, field
from typing import Optional, Literal
import hypothesis.strategies as st
from hypothesis import given, settings, Phase
@dataclass
class UserConfig:
username: str
tier: Literal["free", "pro", "enterprise"]
max_requests: Optional[int] = None
metadata: dict[str, str] = field(default_factory=dict)
# Local registration strategy using context manager pattern
def register_user_config_strategy() -> st.SearchStrategy[UserConfig]:
return st.builds(
UserConfig,
username=st.text(min_size=3, max_size=20).filter(lambda s: s.isalnum()),
tier=st.sampled_from(["free", "pro", "enterprise"]),
max_requests=st.one_of(st.none(), st.integers(min_value=100, max_value=10000)),
metadata=st.dictionaries(
keys=st.text(min_size=1, max_size=15),
values=st.text(max_size=50)
)
)
# Register globally for automatic st.from_type() resolution
st.register_type_strategy(UserConfig, register_user_config_strategy())
@given(config=st.from_type(UserConfig))
@settings(max_examples=100, phases=[Phase.generate, Phase.shrink])
def test_user_config_type_resolution(config: UserConfig) -> None:
assert config.username.isalnum()
assert config.tier in {"free", "pro", "enterprise"}
if config.max_requests is not None:
assert config.max_requests >= 100
When resolving typing.Annotated or typing.Literal, Hypothesis extracts constraints automatically. However, circular type resolution occurs when A references B and B references A without explicit termination. Mitigate this by using st.recursive() with explicit depth limits or by breaking circular dependencies with st.just() placeholders during generation. Integration with static type checkers (mypy, pyright) remains seamless because st.from_type() respects runtime type hints without modifying source signatures. Always prefer local registration in test modules to avoid polluting the global strategy cache in large monorepos.
Debugging Shrinking Failures and Minimal Repros
Shrinking is Hypothesis’s most powerful diagnostic tool, but it fails when predicates are non-monotonic, involve external state, or trigger excessive rejection. UnsatisfiedAssumption errors indicate that the rejection sampler exhausted its attempt budget without finding a valid example. Shrinking timeouts occur when the minimization algorithm traverses an excessively large search space.
To isolate minimal reproducible examples, use hypothesis.find(strategy, predicate). This bypasses the @given runner and directly minimizes a single input that satisfies your condition. It leverages the same DataTree and shrinking algorithms but returns the smallest valid example immediately.
import hypothesis.strategies as st
from hypothesis import find, settings, Verbosity, Phase
from datetime import date
def is_invalid_range(r: dict[str, date | int]) -> bool:
# Simulate a complex business rule violation
return r["duration_days"] > 300 and r["start_date"].month == 12
# Targeted minimal repro extraction
minimal_failure = find(
st.dates(min_value=date(2020, 1, 1), max_value=date(2024, 12, 31)).map(
lambda d: {
"start_date": d,
"end_date": d,
"duration_days": 0
}
),
is_invalid_range,
settings=settings(
max_examples=1000,
verbosity=Verbosity.verbose,
phases=[Phase.generate, Phase.shrink],
database=None
)
)
print(f"Minimal failing input: {minimal_failure}")
Configure phase control to bypass shrinking when debugging known complex constraints: @settings(phases=[Phase.generate]). This prevents CI timeouts during initial failure isolation. Use DirectoryExampleDatabase to cache minimal examples across runs, ensuring deterministic regression testing. Run pytest --hypothesis-show-statistics to inspect draw counts, rejection rates, and phase durations. When isolating flaky external dependencies, mock network calls, database connections, or time-dependent functions before strategy evaluation. Hypothesis assumes pure functions; impure predicates break shrinking guarantees.
Profiling Generation Overhead and CI Optimization
Generation overhead manifests as slow test execution, CI runner timeouts, or inconsistent flakiness. Profile strategy latency using cProfile, pytest-profiling, or Hypothesis’s internal statistics. The primary bottleneck is almost always st.filter() rejection. Each filtered draw triggers a retry loop, consuming CPU cycles and fragmenting the DataTree. Replace st.filter() with assume() placed immediately after cheap draws, or refactor to @st.composite with conditional routing.
import cProfile
import pstats
import io
from hypothesis import given, settings, Verbosity, Phase
import hypothesis.strategies as st
# Inefficient: High rejection rate via st.filter()
def inefficient_strategy() -> st.SearchStrategy[int]:
return st.integers(min_value=1, max_value=10000).filter(lambda x: x % 17 == 0 and x > 5000)
# Optimized: Pre-filtered domain via st.sampled_from
def efficient_strategy() -> st.SearchStrategy[int]:
valid_values = [x for x in range(5001, 10001) if x % 17 == 0]
return st.sampled_from(valid_values)
@given(val=efficient_strategy())
@settings(
max_examples=500,
deadline=None, # Disable per-example timeout for CI
phases=[Phase.generate, Phase.shrink],
database=None
)
def test_optimized_generation(val: int) -> None:
assert val % 17 == 0
assert val > 5000
# Profiling wrapper
def profile_strategy_overhead() -> None:
pr = cProfile.Profile()
pr.enable()
test_optimized_generation()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats("cumulative")
ps.print_stats(10)
print(s.getvalue())
Implement caching for expensive draws using functools.lru_cache or Hypothesis’s st.shared() to reuse generated objects across related fields. Batch generation by yielding multiple values in a single @st.composite call rather than invoking separate @given tests. Pin deterministic seeds in CI via @seed(12345) or the HYPOTHESIS_PROFILE environment variable to guarantee reproducible runs. Tune max_examples based on strategy complexity: 100 for complex nested structures, 500+ for simple primitives. Always set deadline=None in CI when testing I/O-bound or computationally heavy strategies.
Edge-Case Resolution and Common Pitfalls
Custom strategies introduce subtle failure modes that bypass standard unit test coverage. The following diagnostic table addresses the most frequent production issues:
| Issue | Diagnosis | Resolution |
|---|---|---|
Excessive Rejection from st.filter() | Shrinking stalls; UnsatisfiedAssumption dominates output | Replace with conditional @st.composite routing or pre-filtered st.one_of. Enforce <15% rejection rate via pytest --hypothesis-show-statistics. |
Unhashable Types in st.dictionaries/st.sets | TypeError: unhashable type during strategy evaluation | Convert mutable collections to tuple/frozenset before hashing. Use st.from_type() with explicit Hashable constraints. |
| Circular References in Recursive Strategies | RecursionError or infinite generation during draw() | Implement explicit depth counters. Use st.recursive() with max_leaves/max_depth. Terminate branches via st.just() or st.sampled_from(). |
| Mutable State Leakage Across Test Runs | Flaky failures dependent on execution order or shared fixtures | copy.deepcopy() generated objects before mutation. Isolate databases via @settings(database=DirectoryExampleDatabase(...)). |
| CI Timeout Due to Shrinking Overhead | Tests exceed deadline or CI runner timeout during Phase.shrink | Tune @settings(deadline=None, max_examples=100). Disable shrinking for known edge cases via phases=[Phase.generate]. Cache minimal examples. |
from typing import Any
import copy
import hypothesis.strategies as st
from hypothesis import given, settings, Phase, Verbosity
@st.composite
def safe_recursive_json(draw: st.DrawFn, max_depth: int = 3) -> Any:
"""
Generates nested JSON-like structures with explicit depth termination.
Prevents infinite recursion and memory exhaustion in CI.
"""
if max_depth <= 0:
return draw(st.one_of(
st.integers(min_value=-100, max_value=100),
st.text(max_size=20),
st.booleans(),
st.none()
))
# Recursive step with depth tracking
return draw(st.one_of(
st.lists(
safe_recursive_json(max_depth=max_depth - 1),
max_size=5
),
st.dictionaries(
keys=st.text(min_size=1, max_size=10),
values=safe_recursive_json(max_depth=max_depth - 1),
max_size=5
)
))
@given(data=safe_recursive_json())
@settings(
max_examples=150,
phases=[Phase.generate, Phase.shrink],
verbosity=Verbosity.normal,
database=None
)
def test_recursive_json_serialization(data: Any) -> None:
# Isolate mutation via deepcopy
snapshot = copy.deepcopy(data)
# Validate invariants on copy
assert isinstance(snapshot, (dict, list, int, str, bool, type(None)))
Enforce __init__ purity in generated objects. If constructors perform I/O, network calls, or global state mutation, wrap them in @st.composite and mock dependencies before instantiation. Always validate constraints pre-generation rather than post-generation to maintain shrinking determinism.
Conclusion and Production Readiness Checklist
Generating custom strategies with hypothesis.strategies transforms property-based testing from a theoretical exercise into a production-grade validation engine. By composing strategies with @st.composite, leveraging st.builds for declarative mapping, and registering types for automatic resolution, you eliminate rejection bottlenecks and guarantee invariant compliance. Debugging workflows centered on hypothesis.find(), phase control, and database isolation enable rapid diagnosis of shrinking failures. Profiling generation overhead and optimizing rejection rates ensure CI pipelines remain fast and deterministic.
Before merging custom strategies into main, validate against this production readiness checklist:
- Rejection rate consistently below 15% (verify via
pytest --hypothesis-show-statistics) - Deterministic CI seeds configured via
@seed()or environment variables - Test databases isolated using
DirectoryExampleDatabaseorNonefor stateless runs - Shrinking timeout guards applied (
deadline=None,phases=[Phase.generate]for known