Deploy says success but old container still runs

Open the deploy dashboard. The green checkmark says success. Now open the app in a browser. The footer shows yesterday's release SHA. Traffic is still hitting the previous container. The dashboard lied.

This is a small failure mode with a big trust cost. The operator now knows that "the deploy succeeded" can mean the build succeeded or the image was pushed or the panel got the right response but doesn't necessarily mean the live app is the new version. Every future green checkmark gets discounted.

The Reddit r/dokploy thread that surfaced this — operator deploys, panel says success, app still serves the old version until the operator manually stops and restarts — is the canonical version. But the same shape shows up across every panel and every PaaS at one point or another.

The fix isn't "trust the dashboard more." The fix is to define what "deploy success" actually means and verify it.

What "success" usually means under the hood

A deploy dashboard checks a set of conditions before flipping the indicator green. The exact set varies by tool, but it's almost always some subset of:

The image build completed without errors.
The image was pushed to the registry.
The platform received the new image reference.
The container was started (or the rolling update was triggered).
The deploy command exited zero.

Notice what's not in the list: confirmation that the new container is actually receiving traffic, that the proxy is routing to it, that the old containers have stopped, and that the app's live behavior matches the new release.

The dashboard answers "did we successfully attempt to deploy?" The operator wants the answer to "is the new version live?" Those are not the same question.

Why the gap exists in modern PaaS tooling

This isn't usually negligence. It's a hard problem the easier metrics dodge.

Containers can start without serving. A container can be marked "running" by the runtime while still failing its first health checks. The dashboard sees "running"; the user sees the old version because the proxy hasn't cut over.

Rolling updates introduce ambiguity. During a rolling update, both versions are temporarily live. "Deploy succeeded" might be true for one replica and false for the others.

Caches are everywhere. Reverse-proxy caches, CDN caches, browser caches, internal HTTP caches inside the app. Each of them can serve the old version even after the new container is up.

Volume coupling. When a deploy depends on a volume mount, the content on the volume might still be old even if the container is new. The user sees stale data and assumes the deploy didn't run.

Sticky sessions. Existing connections may continue to be served by the old container even after the cutover. New connections see the new version; the operator's browser, with a still-live session, sees the old.

These five sources cover most of the "green checkmark, old version" reports. They all happen after the dashboard's view of the world ends.

Verify-after-deploy as a discipline

Closing the trust gap doesn't require building a new platform. It requires adding a verification step after the dashboard reports success and before the operator considers the deploy done.

Verification has three parts.

1. Fingerprint the release

Every build embeds an identifier — a git SHA, a build timestamp, a semver tag — somewhere the live app can return. A /version endpoint, a header on every response, a footer on the rendered HTML.

The fingerprint has to be cheap to read and unmistakable. "Version 1.2.3-abc123" is unmistakable. "Version 1.2.3" alone is not, because two builds can claim the same human version.

2. Probe the live app

After the dashboard reports success, the verification step makes a request to the live app and reads the fingerprint. It compares to the expected fingerprint from the deploy artifact.

Match: the verification passes. The deploy is, by your definition, done.

Mismatch: the verification fails. The deploy did not successfully ship the new version, even though the dashboard says success.

This probe should hit the same surface real users hit — the same proxy, the same domain, the same path. Hitting the container directly bypasses the layers that are causing the problem.

3. Drive recovery from the verification, not from human discovery

When verification fails, the next step shouldn't be "the operator opens a browser and notices." The verification step should trigger an alert, a roll-back, or a remediation script automatically.

This is the discipline that recovers trust. The deploy dashboard becomes one signal. The verification step becomes the closing signal. Until both are green, the deploy isn't considered done.

What this looks like in Dokploy, Coolify, or similar

A practical setup for one of the common self-host PaaS:

The app exposes GET /version returning { "sha": "abc123", "built_at": "2026-06-03T10:00:00Z" }.
The deploy pipeline embeds the SHA at build time (typically via build-args or env files).
Right after the platform reports success, a small script makes a request to https://app.example.com/version and reads the SHA.
If the SHA matches expected, the script posts a green status to the team's notification channel.
If the SHA doesn't match, the script posts a red status with the actual and expected SHAs, plus a hint at the likely cause (stale container, cache, proxy).

The whole thing is twenty lines of shell or Python. It runs outside the platform; the platform's success signal is the trigger, not the conclusion.

Common causes the verification step exposes

Once verification is in place, the false-success cases get diagnosed faster because the failure modes are visible at the gateway.

Old container still running. Verification probes the live SHA, gets the old one. The fix is to stop the old container or fix the platform's restart logic.

Proxy stale upstream. Verification probes the live SHA and gets a 502 or the wrong SHA depending on which upstream the proxy chose. The fix is to reload the proxy after the new container is healthy.

Cache layer serving old responses. Verification probes a path that should reflect the new build (often /version) and gets the old version. The fix is to invalidate the cache.

Volume content not migrated. Verification probes a feature that depends on the new schema and gets old behavior. The fix is to run the data migration before flipping the proxy.

The value of verification is that it names the failure rather than letting the operator hunt for it.

What this changes about trust

The long-term payoff isn't just better deploys — it's that the operator can trust the deploy dashboard again. Once verification consistently aligns with what's actually live, the dashboard's green checkmark stops being something the operator second-guesses.

That trust matters because deploys happen often. Each deploy where the operator has to manually verify is a deploy with a hidden cost. Each deploy where the dashboard and verification agree is a deploy that takes less attention than the last one.

The cultural side

A team that has been through enough false-success deploys often develops bad habits: spot-checking every deploy by hand, running redundant tests, distrusting the platform.

Verification-as-discipline lets the team retire those habits. The verification step is the one spot-check. It's automated. It produces a single trustworthy signal. The team can stop running its own.

This is also how teams stop blaming the platform for problems the platform was honest about. "The platform said success, the live app isn't right, somebody else's job" is a common dysfunction. "The platform said success but verification failed, here's the named cause" is a fixable problem.

A short version

Deploy dashboards report what they can confirm: that the deploy attempt succeeded. The operator wants to know whether the live app is the new version. Closing the gap requires a verify-after-deploy step: fingerprint the release, probe the live surface, treat verification as the closing signal. The work is small. The trust recovery is large.