Skip to content

Commit 1f79bca

Browse files
committed
ci/maintenance-unit-tests: stop+rm long-up named test-service containers
Tests that exercise module_postgres / module_redis / module_immich (and friends) spawn long-running service containers — postgres, postgres-immich, mysql, redis — and only call `remove` if the test reaches that stage. An interrupted test (or one that only does install) leaves them wedged on the runner; the diagnostic step on this branch's first run showed postgres-immich and mysql with 'Up 7 days', and a `postgres` container in 'Created' state for 8h because its 5432 publish was held by the older one. Add a second cleanup block to both diagnostic steps, ahead of the orphaned-Created sweep, that stop+rm's exactly these four names if they have been *running for more than 2 hours*. The threshold is the safety knob: native job timeout-minutes is 90, emulated 210 (3.5h), so 2h sits comfortably below "test still in flight" and well above "leftover from a prior run". Anything younger is left alone — the next slot's start will pick it up if needed. Implementation: * Names are exact-match anchored ('^name$') so a future sibling like 'mysql-readonly' or 'postgres-staging' isn't caught. * Stopped / Created containers are deliberately skipped here — the orphaned-Created sweep below uses the bare-hash filter and handles them safely. * Uptime via 'docker inspect ... .State.StartedAt' → 'date -d' → epoch math. GNU date is standard on Linux runners. * 'docker stop --time 30' gives postgres / mysql a chance to flush the WAL before SIGKILL; '|| true' on each docker call so a transient hiccup doesn't fail the test slot. * Output is one line per name ('absent', 'recent <2h (skip)', 'stopped (skip — orphan sweep handles below)', or 'stop+rm: <name> (up Xh)') so the GHA log shows exactly which containers were touched and why. Runs *before* the orphaned-Created sweep so a freshly-stopped service isn't briefly seen as orphan in the same pass. Same block in both gradle-native and gradle-emulated diagnostic steps so the cleanup fires on both runner hosts (the arm64 host is only seen by native; the amd64 host is seen by both).
1 parent 9ce9fa7 commit 1f79bca

1 file changed

Lines changed: 102 additions & 0 deletions

File tree

.github/workflows/maintenance-unit-tests.yml

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,57 @@ jobs:
242242
docker system df || true
243243
echo "::endgroup::"
244244
245+
echo "::group::Cleanup: stop and remove named test-service containers (uptime > 2h)"
246+
# Tests that exercise module_postgres / module_redis /
247+
# module_immich (and friends) spawn long-running service
248+
# containers and only tear them down if the test reaches
249+
# its `remove` stage. A test that's interrupted, or one
250+
# that only calls `install`, leaves these wedged on the
251+
# runner; the diagnostic above showed postgres-immich and
252+
# mysql 'Up 7 days' for that reason.
253+
#
254+
# Skip containers that have been up less than 2h — they
255+
# could be a sibling matrix slot's test currently in
256+
# flight (native timeout-minutes is 90, emulated 210, so
257+
# 2h is comfortably below "real test still running" and
258+
# well above "leftover from a prior run").
259+
#
260+
# Stopped / Created containers are left to the
261+
# orphaned-Created sweep below (which uses the bare-hash
262+
# filter to avoid the same false-positive concern).
263+
#
264+
# Exact-match anchored names (^name$) so a future sibling
265+
# like 'mysql-readonly' or 'postgres-staging' isn't caught.
266+
# Runs *before* the orphaned-Created sweep so a freshly
267+
# stopped service isn't briefly seen as orphan in the same
268+
# pass. 'docker stop --time 30' gives postgres / mysql a
269+
# chance to flush before SIGKILL; '|| true' on each step
270+
# so a transient hiccup doesn't fail the test slot.
271+
now_epoch=$(date +%s)
272+
threshold=$(( 2 * 3600 ))
273+
for c in postgres-immich mysql redis postgres; do
274+
if ! docker ps -a --filter "name=^${c}$" --format '{{.Names}}' | grep -q "^${c}$"; then
275+
echo " absent : $c"
276+
continue
277+
fi
278+
running=$(docker inspect "$c" --format '{{.State.Running}}' 2>/dev/null || echo false)
279+
if [[ "$running" != "true" ]]; then
280+
echo " stopped (skip — orphan sweep handles below): $c"
281+
continue
282+
fi
283+
started=$(docker inspect "$c" --format '{{.State.StartedAt}}' 2>/dev/null)
284+
started_epoch=$(date -d "$started" +%s 2>/dev/null || echo "$now_epoch")
285+
age=$(( now_epoch - started_epoch ))
286+
if (( age < threshold )); then
287+
printf " recent <2h (skip): %s (up %dm)\n" "$c" $(( age / 60 ))
288+
continue
289+
fi
290+
docker stop --time 30 "$c" >/dev/null 2>&1 || true
291+
docker rm "$c" >/dev/null 2>&1 || true
292+
printf " stop+rm: %s (up %dh)\n" "$c" $(( age / 3600 ))
293+
done
294+
echo "::endgroup::"
295+
245296
echo "::group::Cleanup: orphaned 'Created' containers (image ref is bare hash)"
246297
# Pattern: stopped-but-never-started containers whose Image
247298
# column is a 12-char hex ID (no repo:tag). These are
@@ -412,6 +463,57 @@ jobs:
412463
docker system df || true
413464
echo "::endgroup::"
414465
466+
echo "::group::Cleanup: stop and remove named test-service containers (uptime > 2h)"
467+
# Tests that exercise module_postgres / module_redis /
468+
# module_immich (and friends) spawn long-running service
469+
# containers and only tear them down if the test reaches
470+
# its `remove` stage. A test that's interrupted, or one
471+
# that only calls `install`, leaves these wedged on the
472+
# runner; the diagnostic above showed postgres-immich and
473+
# mysql 'Up 7 days' for that reason.
474+
#
475+
# Skip containers that have been up less than 2h — they
476+
# could be a sibling matrix slot's test currently in
477+
# flight (native timeout-minutes is 90, emulated 210, so
478+
# 2h is comfortably below "real test still running" and
479+
# well above "leftover from a prior run").
480+
#
481+
# Stopped / Created containers are left to the
482+
# orphaned-Created sweep below (which uses the bare-hash
483+
# filter to avoid the same false-positive concern).
484+
#
485+
# Exact-match anchored names (^name$) so a future sibling
486+
# like 'mysql-readonly' or 'postgres-staging' isn't caught.
487+
# Runs *before* the orphaned-Created sweep so a freshly
488+
# stopped service isn't briefly seen as orphan in the same
489+
# pass. 'docker stop --time 30' gives postgres / mysql a
490+
# chance to flush before SIGKILL; '|| true' on each step
491+
# so a transient hiccup doesn't fail the test slot.
492+
now_epoch=$(date +%s)
493+
threshold=$(( 2 * 3600 ))
494+
for c in postgres-immich mysql redis postgres; do
495+
if ! docker ps -a --filter "name=^${c}$" --format '{{.Names}}' | grep -q "^${c}$"; then
496+
echo " absent : $c"
497+
continue
498+
fi
499+
running=$(docker inspect "$c" --format '{{.State.Running}}' 2>/dev/null || echo false)
500+
if [[ "$running" != "true" ]]; then
501+
echo " stopped (skip — orphan sweep handles below): $c"
502+
continue
503+
fi
504+
started=$(docker inspect "$c" --format '{{.State.StartedAt}}' 2>/dev/null)
505+
started_epoch=$(date -d "$started" +%s 2>/dev/null || echo "$now_epoch")
506+
age=$(( now_epoch - started_epoch ))
507+
if (( age < threshold )); then
508+
printf " recent <2h (skip): %s (up %dm)\n" "$c" $(( age / 60 ))
509+
continue
510+
fi
511+
docker stop --time 30 "$c" >/dev/null 2>&1 || true
512+
docker rm "$c" >/dev/null 2>&1 || true
513+
printf " stop+rm: %s (up %dh)\n" "$c" $(( age / 3600 ))
514+
done
515+
echo "::endgroup::"
516+
415517
echo "::group::Cleanup: orphaned 'Created' containers (image ref is bare hash)"
416518
# Pattern: stopped-but-never-started containers whose Image
417519
# column is a 12-char hex ID (no repo:tag). These are

0 commit comments

Comments
 (0)