[{"data":1,"prerenderedAt":854},["ShallowReactive",2],{"page-\u002Fadvanced-pytest-architecture-configuration\u002Foptimizing-test-discovery\u002Fdebugging-flaky-tests-with-pytest-rerunfailures\u002F":3},{"id":4,"title":5,"body":6,"description":847,"extension":848,"meta":849,"navigation":216,"path":850,"seo":851,"stem":852,"__hash__":853},"content\u002Fadvanced-pytest-architecture-configuration\u002Foptimizing-test-discovery\u002Fdebugging-flaky-tests-with-pytest-rerunfailures\u002Findex.md","Debugging Flaky Tests with pytest-rerunfailures",{"type":7,"value":8,"toc":838},"minimark",[9,13,22,46,51,84,93,96,140,149,153,160,170,290,301,325,332,336,347,436,439,449,467,471,478,481,526,529,539,545,608,615,619,637,640,694,700,711,718,722,725,783,786,790,796,816,825,834],[10,11,5],"h1",{"id":12},"debugging-flaky-tests-with-pytest-rerunfailures",[14,15,16,17,21],"p",{},"Flaky tests represent one of the most insidious forms of technical debt in modern Python codebases. Unlike deterministic failures, flakiness manifests intermittently, often masking underlying race conditions, state leakage, or environmental volatility. When diagnosing these failures, ",[18,19,20],"code",{},"pytest-rerunfailures"," is frequently mischaracterized as a superficial CI workaround. In reality, when deployed with architectural precision, it functions as a diagnostic amplifier that forces non-deterministic execution paths into observable, repeatable states.",[14,23,24,25,27,28,31,32,35,36,39,40,45],{},"A rigorous flakiness taxonomy typically isolates four primary vectors: timing-dependent I\u002FO, shared mutable state, asynchronous event loop interference, and external service volatility. ",[18,26,20],{}," (version ",[18,29,30],{},">=12.0"," paired with ",[18,33,34],{},"pytest>=7.4"," and Python ",[18,37,38],{},"3.9+",") intercepts the test runner lifecycle to execute controlled retry cycles. However, its default behavior does not reset session-scoped resources or clear module-level globals, which means improper configuration can silently compound state corruption across attempts. To leverage this plugin effectively, engineers must first understand how it interacts with pytest’s internal hook system. The plugin’s architecture relies heavily on pytest’s extensible runner model, a concept thoroughly documented in ",[41,42,44],"a",{"href":43},"\u002Fadvanced-pytest-architecture-configuration\u002F","Advanced Pytest Architecture & Configuration",". By treating reruns as a controlled stress test rather than a blind retry mechanism, teams can isolate the exact execution boundary where determinism breaks down, establish a diagnostic baseline, and implement targeted remediation without compromising test suite integrity.",[47,48,50],"h2",{"id":49},"anatomy-of-rerun-mechanics-hook-execution","Anatomy of Rerun Mechanics & Hook Execution",[14,52,53,54,56,57,60,61,64,65,68,69,72,73,76,77,79,80,83],{},"Understanding how ",[18,55,20],{}," manipulates the test lifecycle is critical to avoiding false positives and hidden state corruption. The plugin operates by registering a ",[18,58,59],{},"pytest_runtest_makereport"," hook implementation. When a test item executes, pytest generates a ",[18,62,63],{},"TestReport"," object at three distinct phases: ",[18,66,67],{},"setup",", ",[18,70,71],{},"call",", and ",[18,74,75],{},"teardown",". The plugin monitors these reports and, upon detecting a failure during the ",[18,78,71],{}," phase, intercepts the standard execution flow. Instead of allowing the runner to proceed to teardown and mark the item as failed, the plugin triggers a secondary invocation of ",[18,81,82],{},"pytest_runtest_protocol"," for the same test item.",[14,85,86,87,89,90,92],{},"This architectural choice has profound implications for fixture lifecycle management. During a rerun cycle, ",[18,88,67],{}," executes again, meaning function-scoped fixtures are freshly instantiated. However, ",[18,91,75],{}," is deliberately deferred until the final rerun attempt concludes. This design prevents premature resource cleanup that could trigger cascading failures during intermediate retries, but it also means that any state mutation occurring in a function-scoped fixture persists across attempts unless explicitly cleared. Session-scoped, module-scoped, and class-scoped fixtures are never recreated between reruns. They maintain their original instantiation state, which is why tests relying on shared database connections, in-memory caches, or singleton objects frequently exhibit compounding flakiness under retry conditions.",[14,94,95],{},"The hook execution order during reruns follows a strict sequence:",[97,98,99,109,115,121,126,133],"ol",{},[100,101,102,104,105,108],"li",{},[18,103,82],{}," initiates for attempt ",[18,106,107],{},"N",".",[100,110,111,114],{},[18,112,113],{},"pytest_runtest_setup"," executes (re-running setup hooks).",[100,116,117,120],{},[18,118,119],{},"pytest_runtest_call"," executes the actual test function.",[100,122,123,125],{},[18,124,59],{}," evaluates the outcome.",[100,127,128,129,132],{},"If failed and ",[18,130,131],{},"reruns"," threshold not met, the plugin increments the attempt counter and loops back to step 2.",[100,134,135,136,139],{},"If passed or threshold exhausted, ",[18,137,138],{},"pytest_runtest_teardown"," finally executes.",[14,141,142,143,145,146,148],{},"This lifecycle means that any diagnostic logging or state inspection must be anchored to ",[18,144,59],{}," rather than ",[18,147,138],{},". Engineers frequently misdiagnose fixture teardown failures because they assume cleanup runs per attempt. In reality, if a test fails three times, teardown runs exactly once, carrying the accumulated state from all three attempts. Recognizing this boundary is essential when implementing retry-safe fixtures or debugging resource exhaustion patterns.",[47,150,152],{"id":151},"rapid-diagnosis-isolating-state-leakage-race-conditions","Rapid Diagnosis: Isolating State Leakage & Race Conditions",[14,154,155,156,159],{},"Isolating the root cause of flakiness requires a systematic approach that strips away environmental noise and forces the failure into a deterministic context. The first step is disabling output capture and enabling verbose logging during reruns. Executing ",[18,157,158],{},"pytest --reruns=3 --capture=no --log-cli-level=DEBUG"," forces pytest to stream real-time execution traces, revealing hidden race conditions or delayed I\u002FO operations that standard CI runners suppress.",[14,161,162,163,165,166,169],{},"When standard logging proves insufficient, implementing a custom ",[18,164,59],{}," hook provides immediate visibility into process state at the exact moment of failure. The following ",[18,167,168],{},"conftest.py"," implementation captures critical runtime metrics before the plugin triggers a retry:",[171,172,177],"pre",{"className":173,"code":174,"language":175,"meta":176,"style":176},"language-python shiki shiki-themes github-light github-dark","import pytest\nimport sys\nimport gc\nimport threading\nimport json\n\n@pytest.hookimpl(hookwrapper=True)\ndef pytest_runtest_makereport(item, call):\n outcome = yield\n report = outcome.get_result()\n if report.when == \"call\" and report.failed:\n state_snapshot = {\n \"sys_modules\": len(sys.modules),\n \"gc_objects\": len(gc.get_objects()),\n \"thread_count\": len(threading.enumerate())\n }\n with open(f\"flaky_state_{item.nodeid.replace('::', '_')}.json\", \"w\") as f:\n json.dump(state_snapshot, f, indent=2)\n","python","",[18,178,179,187,193,199,205,211,218,224,230,236,242,248,254,260,266,272,278,284],{"__ignoreMap":176},[180,181,184],"span",{"class":182,"line":183},"line",1,[180,185,186],{},"import pytest\n",[180,188,190],{"class":182,"line":189},2,[180,191,192],{},"import sys\n",[180,194,196],{"class":182,"line":195},3,[180,197,198],{},"import gc\n",[180,200,202],{"class":182,"line":201},4,[180,203,204],{},"import threading\n",[180,206,208],{"class":182,"line":207},5,[180,209,210],{},"import json\n",[180,212,214],{"class":182,"line":213},6,[180,215,217],{"emptyLinePlaceholder":216},true,"\n",[180,219,221],{"class":182,"line":220},7,[180,222,223],{},"@pytest.hookimpl(hookwrapper=True)\n",[180,225,227],{"class":182,"line":226},8,[180,228,229],{},"def pytest_runtest_makereport(item, call):\n",[180,231,233],{"class":182,"line":232},9,[180,234,235],{}," outcome = yield\n",[180,237,239],{"class":182,"line":238},10,[180,240,241],{}," report = outcome.get_result()\n",[180,243,245],{"class":182,"line":244},11,[180,246,247],{}," if report.when == \"call\" and report.failed:\n",[180,249,251],{"class":182,"line":250},12,[180,252,253],{}," state_snapshot = {\n",[180,255,257],{"class":182,"line":256},13,[180,258,259],{}," \"sys_modules\": len(sys.modules),\n",[180,261,263],{"class":182,"line":262},14,[180,264,265],{}," \"gc_objects\": len(gc.get_objects()),\n",[180,267,269],{"class":182,"line":268},15,[180,270,271],{}," \"thread_count\": len(threading.enumerate())\n",[180,273,275],{"class":182,"line":274},16,[180,276,277],{}," }\n",[180,279,281],{"class":182,"line":280},17,[180,282,283],{}," with open(f\"flaky_state_{item.nodeid.replace('::', '_')}.json\", \"w\") as f:\n",[180,285,287],{"class":182,"line":286},18,[180,288,289],{}," json.dump(state_snapshot, f, indent=2)\n",[14,291,292,293,296,297,300],{},"This diagnostic pattern is particularly effective for identifying module-level import side effects, unclosed file descriptors, or thread pool exhaustion. By comparing the ",[18,294,295],{},"gc_objects"," and ",[18,298,299],{},"thread_count"," values across successive rerun artifacts, engineers can quickly distinguish between memory leaks and transient concurrency bottlenecks.",[14,302,303,304,307,308,311,312,315,316,319,320,324],{},"Global state mutation remains the most common source of flakiness in large test suites. Python’s module import cache (",[18,305,306],{},"sys.modules",") and class-level attributes are frequently mutated by test functions without proper restoration. To detect this, run tests with ",[18,309,310],{},"--strict-markers"," and enforce ",[18,313,314],{},"pytest-randomly"," or ",[18,317,318],{},"pytest-shuffle"," to randomize execution order. When collection order changes, hidden dependencies between tests surface immediately. Understanding how pytest resolves and orders test items during collection is critical for reproducing these order-dependent failures locally, a process extensively covered in ",[41,321,323],{"href":322},"\u002Fadvanced-pytest-architecture-configuration\u002Foptimizing-test-discovery\u002F","Optimizing Test Discovery",". By deliberately shuffling execution sequences, you force the test runner to violate implicit ordering assumptions, making state leakage reproducible on demand.",[14,326,327,328,331],{},"For race conditions involving external services or database locks, cross-reference rerun timestamps with transaction logs and network trace captures. Implement explicit backoff using ",[18,329,330],{},"pytest.mark.flaky(reruns=3, reruns_delay=2)"," to allow transient network partitions or connection pool exhaustion to resolve naturally. If the test passes consistently with a delay but fails without it, the root cause is almost certainly timing-dependent resource contention rather than logical assertion errors.",[47,333,335],{"id":334},"profiling-flaky-execution-with-cprofile-pytest-profiling","Profiling Flaky Execution with cProfile & pytest-profiling",[14,337,338,339,342,343,346],{},"Performance degradation often masquerades as flakiness when timeouts, connection pool starvation, or blocking I\u002FO introduce non-deterministic execution windows. Attaching profilers directly to rerun cycles provides quantitative evidence to separate true flakiness from infrastructure bottlenecks. The ",[18,340,341],{},"pytest-profiling"," plugin can be integrated with rerun markers to generate per-attempt execution traces, but for granular control, wrapping individual test functions with ",[18,344,345],{},"cProfile"," yields immediate diagnostic output.",[171,348,350],{"className":173,"code":349,"language":175,"meta":176,"style":176},"import cProfile\nimport pstats\nimport io\n\n@pytest.mark.flaky(reruns=2)\ndef test_async_endpoint():\n profiler = cProfile.Profile()\n profiler.enable()\n try:\n response = await client.get(\"\u002Fapi\u002Fdata\")\n assert response.status_code == 200\n finally:\n profiler.disable()\n s = io.StringIO()\n ps = pstats.Stats(profiler, stream=s).sort_stats(\"cumulative\")\n ps.print_stats(10)\n print(f\"Rerun Profile:\\n{s.getvalue()}\")\n",[18,351,352,357,362,367,371,376,381,386,391,396,401,406,411,416,421,426,431],{"__ignoreMap":176},[180,353,354],{"class":182,"line":183},[180,355,356],{},"import cProfile\n",[180,358,359],{"class":182,"line":189},[180,360,361],{},"import pstats\n",[180,363,364],{"class":182,"line":195},[180,365,366],{},"import io\n",[180,368,369],{"class":182,"line":201},[180,370,217],{"emptyLinePlaceholder":216},[180,372,373],{"class":182,"line":207},[180,374,375],{},"@pytest.mark.flaky(reruns=2)\n",[180,377,378],{"class":182,"line":213},[180,379,380],{},"def test_async_endpoint():\n",[180,382,383],{"class":182,"line":220},[180,384,385],{}," profiler = cProfile.Profile()\n",[180,387,388],{"class":182,"line":226},[180,389,390],{}," profiler.enable()\n",[180,392,393],{"class":182,"line":232},[180,394,395],{}," try:\n",[180,397,398],{"class":182,"line":238},[180,399,400],{}," response = await client.get(\"\u002Fapi\u002Fdata\")\n",[180,402,403],{"class":182,"line":244},[180,404,405],{}," assert response.status_code == 200\n",[180,407,408],{"class":182,"line":250},[180,409,410],{}," finally:\n",[180,412,413],{"class":182,"line":256},[180,414,415],{}," profiler.disable()\n",[180,417,418],{"class":182,"line":262},[180,419,420],{}," s = io.StringIO()\n",[180,422,423],{"class":182,"line":268},[180,424,425],{}," ps = pstats.Stats(profiler, stream=s).sort_stats(\"cumulative\")\n",[180,427,428],{"class":182,"line":274},[180,429,430],{}," ps.print_stats(10)\n",[180,432,433],{"class":182,"line":280},[180,434,435],{}," print(f\"Rerun Profile:\\n{s.getvalue()}\")\n",[14,437,438],{},"This pattern isolates cumulative execution time per attempt. If the first attempt shows high latency in database connection initialization while subsequent attempts complete rapidly, the flakiness stems from cold-start overhead rather than assertion logic. Conversely, if cumulative time increases linearly across attempts, you are likely observing resource accumulation or connection pool exhaustion.",[14,440,441,442,445,446,448],{},"For memory-related flakiness, integrate ",[18,443,444],{},"tracemalloc"," to track object allocation across reruns. Start the tracer in a session-scoped fixture and dump snapshots on failure. Significant delta growth in ",[18,447,444],{}," statistics between attempts indicates objects are being retained across rerun boundaries, typically due to circular references in cached responses or uncollected generator states.",[14,450,451,452,296,455,458,459,462,463,466],{},"Thread and async event loop interference requires specialized monitoring. Use ",[18,453,454],{},"threading.enumerate()",[18,456,457],{},"asyncio.all_tasks()"," within diagnostic hooks to verify that background workers or scheduled coroutines complete before teardown. When ",[18,460,461],{},"pytest-xdist"," is enabled, parallel workers can amplify timing-dependent failures due to shared resource contention. Profile worker-local execution with ",[18,464,465],{},"pytest-profiling --profile-svg"," to visualize call graph divergence across workers. If SVG outputs show drastically different execution paths or blocking durations for identical test items, the flakiness is worker-scheduling dependent and requires resource isolation or sequential execution for that specific test subset.",[47,468,470],{"id":469},"edge-case-resolution-fixtures-conftest-hierarchies-and-parametrization","Edge-Case Resolution: Fixtures, Conftest Hierarchies, and Parametrization",[14,472,473,474,477],{},"Fixture scope mismatches during reruns represent a frequent source of silent state corruption. When a test parametrized with ",[18,475,476],{},"@pytest.mark.parametrize"," executes, each parameter combination runs as a distinct test item. However, if the parametrization interacts with a session-scoped fixture, the fixture initializes once and persists across all parameter iterations. During a rerun, the fixture does not reset, meaning subsequent parameter combinations inherit mutated state from previous failures.",[14,479,480],{},"To enforce deterministic teardown, implement explicit cleanup assertions in yield-based fixtures:",[171,482,484],{"className":173,"code":483,"language":175,"meta":176,"style":176},"@pytest.fixture\ndef db_session():\n conn = create_engine(\"sqlite:\u002F\u002F\u002F:memory:\").connect()\n yield conn\n # Explicit cleanup to prevent cross-rerun state leakage\n conn.execute(text(\"DROP TABLE IF EXISTS test_data\"))\n conn.close()\n assert conn.closed, \"Connection leak detected during rerun teardown\"\n",[18,485,486,491,496,501,506,511,516,521],{"__ignoreMap":176},[180,487,488],{"class":182,"line":183},[180,489,490],{},"@pytest.fixture\n",[180,492,493],{"class":182,"line":189},[180,494,495],{},"def db_session():\n",[180,497,498],{"class":182,"line":195},[180,499,500],{}," conn = create_engine(\"sqlite:\u002F\u002F\u002F:memory:\").connect()\n",[180,502,503],{"class":182,"line":201},[180,504,505],{}," yield conn\n",[180,507,508],{"class":182,"line":207},[180,509,510],{}," # Explicit cleanup to prevent cross-rerun state leakage\n",[180,512,513],{"class":182,"line":213},[180,514,515],{}," conn.execute(text(\"DROP TABLE IF EXISTS test_data\"))\n",[180,517,518],{"class":182,"line":220},[180,519,520],{}," conn.close()\n",[180,522,523],{"class":182,"line":226},[180,524,525],{}," assert conn.closed, \"Connection leak detected during rerun teardown\"\n",[14,527,528],{},"This pattern guarantees that teardown executes exactly once after the final rerun attempt and validates that resources are properly released. If the assertion fails, pytest will mark the teardown as failed, preventing silent resource exhaustion from propagating to subsequent test items.",[14,530,531,532,534,535,538],{},"Conftest hierarchy conflicts frequently emerge when multiple ",[18,533,168],{}," files define overlapping autouse fixtures. During reruns, pytest resolves fixture scopes hierarchically, and a higher-level conftest may inadvertently override a lower-level fixture’s teardown logic. Always verify fixture resolution order using ",[18,536,537],{},"pytest --fixtures"," and ensure that autouse fixtures explicitly yield control rather than returning values, which can disrupt the teardown chain.",[14,540,541,542,544],{},"Hypothesis property-based testing introduces a unique conflict with ",[18,543,20],{},". Hypothesis relies on deterministic execution for its shrinking algorithm, which systematically reduces failing inputs to minimal reproducible examples. Reruns introduce non-deterministic timing and state resets, breaking Hypothesis’s internal state machine and causing infinite shrinking loops or false passes. Mitigate this by disabling Hypothesis shrinking during rerun cycles:",[171,546,548],{"className":173,"code":547,"language":175,"meta":176,"style":176},"from hypothesis import settings, given, strategies as st\nimport pytest\nimport os\n\n@given(st.integers())\n@settings(database=None, max_examples=50)\n@pytest.mark.flaky(reruns=1, reruns_delay=0.5)\ndef test_idempotent_transform(n):\n # Disable Hypothesis shrinking on rerun by checking environment\n if os.environ.get(\"PYTEST_RERUN_COUNT\", \"0\") != \"0\":\n pytest.skip(\"Skipping Hypothesis test during rerun to prevent shrink conflicts\")\n assert transform(n) == expected(n)\n",[18,549,550,555,559,564,568,573,578,583,588,593,598,603],{"__ignoreMap":176},[180,551,552],{"class":182,"line":183},[180,553,554],{},"from hypothesis import settings, given, strategies as st\n",[180,556,557],{"class":182,"line":189},[180,558,186],{},[180,560,561],{"class":182,"line":195},[180,562,563],{},"import os\n",[180,565,566],{"class":182,"line":201},[180,567,217],{"emptyLinePlaceholder":216},[180,569,570],{"class":182,"line":207},[180,571,572],{},"@given(st.integers())\n",[180,574,575],{"class":182,"line":213},[180,576,577],{},"@settings(database=None, max_examples=50)\n",[180,579,580],{"class":182,"line":220},[180,581,582],{},"@pytest.mark.flaky(reruns=1, reruns_delay=0.5)\n",[180,584,585],{"class":182,"line":226},[180,586,587],{},"def test_idempotent_transform(n):\n",[180,589,590],{"class":182,"line":232},[180,591,592],{}," # Disable Hypothesis shrinking on rerun by checking environment\n",[180,594,595],{"class":182,"line":238},[180,596,597],{}," if os.environ.get(\"PYTEST_RERUN_COUNT\", \"0\") != \"0\":\n",[180,599,600],{"class":182,"line":244},[180,601,602],{}," pytest.skip(\"Skipping Hypothesis test during rerun to prevent shrink conflicts\")\n",[180,604,605],{"class":182,"line":250},[180,606,607],{}," assert transform(n) == expected(n)\n",[14,609,610,611,614],{},"By checking the ",[18,612,613],{},"PYTEST_RERUN_COUNT"," environment variable (injected by the plugin), you can gracefully skip property tests during retry cycles, preserving Hypothesis’s deterministic guarantees while allowing the plugin to handle transient infrastructure failures.",[47,616,618],{"id":617},"cicd-integration-deterministic-rerun-strategies","CI\u002FCD Integration & Deterministic Rerun Strategies",[14,620,621,622,624,625,628,629,632,633,636],{},"Deploying ",[18,623,20],{}," in CI\u002FCD pipelines requires strict governance to prevent infinite retry loops and ensure failure signatures are accurately tracked. Blindly applying ",[18,626,627],{},"--reruns=5"," across an entire test suite masks genuine regressions and inflates pipeline execution time. Instead, adopt a tiered retry strategy: use ",[18,630,631],{},"pytest.mark.flaky"," selectively for known transient failures, and enforce ",[18,634,635],{},"--max-reruns=3"," globally to prevent runaway execution.",[14,638,639],{},"GitHub Actions matrix strategies can isolate flaky tests by running them in dedicated, sequential workers while parallelizing deterministic suites. Configure your workflow to split execution:",[171,641,645],{"className":642,"code":643,"language":644,"meta":176,"style":176},"language-yaml shiki shiki-themes github-light github-dark","- name: Run Deterministic Tests\n run: pytest -n auto --ignore=tests\u002Fflaky\u002F\n- name: Run Flaky Tests Sequentially\n run: pytest --reruns=2 --reruns-delay=3 tests\u002Fflaky\u002F\n","yaml",[18,646,647,664,674,685],{"__ignoreMap":176},[180,648,649,653,657,660],{"class":182,"line":183},[180,650,652],{"class":651},"sVt8B","- ",[180,654,656],{"class":655},"s9eBZ","name",[180,658,659],{"class":651},": ",[180,661,663],{"class":662},"sZZnC","Run Deterministic Tests\n",[180,665,666,669,671],{"class":182,"line":189},[180,667,668],{"class":655}," run",[180,670,659],{"class":651},[180,672,673],{"class":662},"pytest -n auto --ignore=tests\u002Fflaky\u002F\n",[180,675,676,678,680,682],{"class":182,"line":195},[180,677,652],{"class":651},[180,679,656],{"class":655},[180,681,659],{"class":651},[180,683,684],{"class":662},"Run Flaky Tests Sequentially\n",[180,686,687,689,691],{"class":182,"line":201},[180,688,668],{"class":655},[180,690,659],{"class":651},[180,692,693],{"class":662},"pytest --reruns=2 --reruns-delay=3 tests\u002Fflaky\u002F\n",[14,695,696,697,699],{},"This separation prevents ",[18,698,461],{}," from masking shared resource contention and ensures that flaky tests execute in a controlled, sequential environment.",[14,701,702,703,706,707,710],{},"JUnit XML merging requires careful handling when reruns are enabled. By default, pytest overwrites the XML report for each attempt. To preserve historical failure data, configure ",[18,704,705],{},"pytest"," with ",[18,708,709],{},"--junit-xml=report.xml"," and use a CI artifact collector to store per-attempt logs. Implement failure signature tracking by hashing assertion error messages and stack traces. If the same signature appears across multiple pipeline runs despite successful reruns, escalate the test to a permanent quarantine until the root cause is resolved.",[14,712,713,714,717],{},"Avoid using ",[18,715,716],{},"--reruns"," as a permanent CI fix. Instead, pair rerun configurations with automated alerting. When a test exceeds its retry threshold, trigger a Slack or PagerDuty notification with the exact failure signature, environment variables, and rerun delay metrics. This transforms flaky tests from silent pipeline noise into actionable engineering tickets.",[47,719,721],{"id":720},"minimal-reproduction-framework-validation-checklist","Minimal Reproduction Framework & Validation Checklist",[14,723,724],{},"Before merging any fix for a flaky test, establish a minimal reproduction framework that guarantees deterministic validation. The framework should isolate the test from external dependencies, pin environment variables, and enforce strict execution ordering.",[97,726,727,741,760,773],{},[100,728,729,733,734,315,737,740],{},[730,731,732],"strong",{},"Environment Pinning:"," Lock Python version, dependency versions, and OS-level libraries. Use ",[18,735,736],{},"pip freeze",[18,738,739],{},"poetry export"," to capture exact dependency trees.",[100,742,743,746,747,68,750,72,753,756,757,108],{},[730,744,745],{},"Deterministic Seeding:"," Inject ",[18,748,749],{},"random.seed()",[18,751,752],{},"numpy.random.seed()",[18,754,755],{},"os.environ[\"PYTHONHASHSEED\"]"," at the session level. Verify that test execution order is fixed using ",[18,758,759],{},"pytest --collect-only",[100,761,762,765,766,315,769,772],{},[730,763,764],{},"State Isolation:"," Replace shared fixtures with function-scoped equivalents. Use ",[18,767,768],{},"unittest.mock.patch",[18,770,771],{},"pytest-mock"," to externalize I\u002FO boundaries.",[100,774,775,778,779,782],{},[730,776,777],{},"Validation Loop:"," Execute ",[18,780,781],{},"pytest --reruns=10 --reruns-delay=0"," locally. If the test passes 10\u002F10 times, the fix is likely deterministic. If it fails intermittently, the root cause remains unresolved.",[14,784,785],{},"Automated regression testing for flakiness should be integrated into pre-commit hooks. Run a subset of historically flaky tests with elevated retry counts and strict timeout limits. If any test exceeds its threshold, block the merge and require a root cause analysis. This enforces a zero-tolerance policy for unexplained flakiness while allowing controlled retries for known transient infrastructure issues.",[47,787,789],{"id":788},"faqs","FAQs",[14,791,792,795],{},[730,793,794],{},"Does pytest-rerunfailures reset fixture state between retry attempts?","\nNo. Only function-scoped fixtures are recreated per rerun attempt. Module, class, and session-scoped fixtures persist across retries. To force a full reset, convert fixtures to function scope or explicitly clear mutable state within a yield-based teardown block before the final attempt concludes.",[14,797,798,801,802,804,805,808,809,812,813,815],{},[730,799,800],{},"How do I prevent pytest-rerunfailures from masking genuine test failures?","\nConfigure ",[18,803,716],{}," with a strict ",[18,806,807],{},"--reruns-delay"," and enforce a maximum retry threshold via ",[18,810,811],{},"--max-reruns",". Pair this with a custom ",[18,814,59],{}," hook to log failure signatures. Implement CI-level alerting that tracks identical failure hashes across multiple pipeline runs, ensuring that persistent logical errors are escalated rather than silently retried.",[14,817,818,821,822,824],{},[730,819,820],{},"Why does pytest-rerunfailures conflict with Hypothesis shrinking?","\nHypothesis assumes deterministic execution for its shrinking algorithm. Reruns introduce non-deterministic timing and state resets, breaking the shrinking state machine and causing infinite reduction loops or false passes. Mitigate this by checking the ",[18,823,613],{}," environment variable and skipping property tests during retry cycles, or isolate Hypothesis suites from retry logic entirely.",[14,826,827,830,831,833],{},[730,828,829],{},"Can I use pytest-rerunfailures with pytest-xdist for parallel execution?","\nYes, but reruns are worker-local. If a test fails on worker A, it retries exclusively on worker A. Shared resources such as databases, file locks, or network ports can cause cascading failures across workers. Use ",[18,832,461],{}," with isolated worker environments, or route flaky test subsets to sequential execution to prevent cross-worker resource contention.",[835,836,837],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sVt8B, html code.shiki .sVt8B{--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .s9eBZ, html code.shiki .s9eBZ{--shiki-default:#22863A;--shiki-dark:#85E89D}html pre.shiki code .sZZnC, html code.shiki .sZZnC{--shiki-default:#032F62;--shiki-dark:#9ECBFF}",{"title":176,"searchDepth":189,"depth":189,"links":839},[840,841,842,843,844,845,846],{"id":49,"depth":189,"text":50},{"id":151,"depth":189,"text":152},{"id":334,"depth":189,"text":335},{"id":469,"depth":189,"text":470},{"id":617,"depth":189,"text":618},{"id":720,"depth":189,"text":721},{"id":788,"depth":189,"text":789},"Flaky tests represent one of the most insidious forms of technical debt in modern Python codebases. Unlike deterministic failures, flakiness manifests intermittently, often masking underlying race conditions, state leakage, or environmental volatility. When diagnosing these failures, pytest-rerunfailures is frequently mischaracterized as a superficial CI workaround. In reality, when deployed with architectural precision, it functions as a diagnostic amplifier that forces non-deterministic execution paths into observable, repeatable states.","md",{},"\u002Fadvanced-pytest-architecture-configuration\u002Foptimizing-test-discovery\u002Fdebugging-flaky-tests-with-pytest-rerunfailures",{"title":5,"description":847},"advanced-pytest-architecture-configuration\u002Foptimizing-test-discovery\u002Fdebugging-flaky-tests-with-pytest-rerunfailures\u002Findex","11kAcWmrXeEvEz3dThg1Kt6ePmZfJ9kq2EIm05bO2s8",1778004579208]