External-service integrations: Overleaf, GitHub, Zenodo
All three integrations are now landed (Overleaf first, then GitHub, then Zenodo). This document captures the patterns that survived three concrete implementations — the ones you can rely on when adding a fourth service — and the traps that only become visible after you have built more than one.
It is written to be actionable, not exhaustive. If a section seems too short, that’s intentional: the full source is the source of truth.
Mental model
Every external-service integration in vaibify is four concentric layers, ordered from the user’s action inward:
Frontend modal (IIFE JS) — connect, target selection, per-file diff, confirm.
FastAPI route — HTTP endpoint, pydantic validation, dispatches to layer 3.
Host-side dispatcher (
vaibify/gui/syncDispatcher.py) — the mediator that stitches together host-side operations (mirror refresh, digest computation, credential lookup) with container-side operations (running the actual push inside the container).Container CLI — a self-contained Python script shipped into
/usr/share/vaibify/that performs the network write (git push, API upload). Imports only stdlib + keyring + a handful of allowed adapters.
The container/host boundary is load-bearing: code that runs inside
the container cannot import vaibify.* because vaibify is not
installed in the container. Anything that needs vaibify internals
(workflow manager, route helpers) lives host-side. Respect this
boundary religiously — several rounds of debugging went into
restoring it.
What to reuse from Overleaf
These modules and patterns are ready to generalize as-is:
Token + auth plumbing
vaibify/reproducibility/overleafAuth.py—fsWriteAskpassScriptwrites a mode-700 temp file that the git subprocess consults for credentials; the token never touches argv or environment. Reuse for GitHub directly. For Zenodo (REST API, not git) the askpass pattern doesn’t apply, but the mode-600 temp-file discipline does.vaibify/config/secretManager.py::fnStoreSecret / fsRetrieveSecret / fbSecretExists / fnDeleteSecret— host OS keyring backend. Change one thing: the current Overleaf integration uses a single keyring slotvaibify:overleaf_token. This was a known smell (audit finding #7) and will collide across projects. For GitHub and Zenodo, namespace by service + project: key<service>_token:<projectOrRepoId>. Do this from the start.
Error classification
syncDispatcher.fdictClassifyErrorhas extensible pattern lists (_LIST_AUTH_PATTERNS,_LIST_RATE_LIMIT_PATTERNS, etc.). Add service-specific patterns alongside; keep the dict shape{sErrorType, sMessage}identical across services so the frontend’s_DICT_SYNC_ERROR_MESSAGEScan stay DRY.syncDispatcher.fsRedactStderr(via overleafMirror) redacts credentials from stderr before surfacing to the UI. Always run service-boundary output through it or an equivalent.
Git hardening (GitHub)
Every git clone, git fetch, and git ls-remote in vaibify must
carry these flags:
-c protocol.file.allow=never
-c protocol.allow=user
-c core.symlinks=false
-c submodule.recurse=false
and --no-recurse-submodules on clones. These defend against
malicious-repo attacks (.gitmodules with file:// URLs,
cross-tree symlinks, hook execution). The canonical list lives at
vaibify/reproducibility/gitHardening.py::LIST_GIT_HARDENING_CONFIG
and is imported by gui.gitStatus, reproducibility.overleafMirror,
and gui.syncDispatcher. reproducibility.overleafSync keeps a
local copy because it ships into the container as a standalone
script — keep the two lists in lockstep.
Credential helper scoping
Never mutate the container’s or host’s global git config. Always
use -c credential.https://<host>/.helper=... inline on the single
git command. The Overleaf implementation originally wrote a global
helper; removing it was a security fix. Don’t repeat the mistake.
Path validation (defense in depth)
Validate at three layers, every time:
Route (
syncRoutes.py): pydantic types + explicitfnValidatePathWithinRootagainstWORKSPACE_ROOTfor every file path in the request. Also reject\x00, leading/, and..segments for target directories.Dispatcher (
syncDispatcher.py): validate the projectId / repoId / conceptRecId regex before it reaches any filesystem or shell.Container CLI (
overleafSync.py): validate again before thepathlib.Pathjoin, becausePath('/tmp/clone') / '/Figures'evaluates to/Figures(pathlib’s absolute-RHS semantics — a silent misroute trap).
Symlink handling: on push, refuse any source where os.path.islink
is True. On pull, after clone, realpath-compare every file to the
repo root and refuse anything that escapes. Always pass
follow_symlinks=False to shutil.copy*.
Host/container digest computation
When the frontend sends container-absolute paths to the server, the
server cannot compute digests host-side — those files don’t exist
there. syncDispatcher.fdictComputeContainerDigests runs a single
docker exec python3 -c "..." that hashes all requested files in
one round-trip. Same shape will work for GitHub (git blob SHAs) and
for Zenodo (file content SHA256 — whatever Zenodo returns in its
file-list API; match that algorithm).
Route layer patterns
Every state-changing endpoint is behind dictCtx["require"]() (CSRF
session token) AND the new _fbRequestHasAllowedHost middleware (DNS
rebinding defense). Keep both.
Frontend unified push modal
The current push modal (scriptSyncManager.js) is service-aware:
For Overleaf it renders a target-directory input, a diff summary (new / overwrite / unchanged with greyed-unchanged rows), a case-collision banner if applicable, a conflict banner with “Overwrite anyway” gating, and a “Push All” / “Push Selected” pair.
For GitHub and Zenodo today it renders a simpler list without diff.
When you add GitHub, reuse the Overleaf flow — GitHub diff maps cleanly. For Zenodo the diff concept is slightly different (deposits are bundled and versioned; the unit of comparison is usually “existing file with same name” vs “new file”). The modal can still host the same inline-status UI.
Gotchas likely to recur
Case folding (Overleaf-specific, but watch for echoes)
Overleaf’s underlying storage is case-insensitive. Its git bridge
surfaces both case-variants (Figures/ and figures/) as separate
tree entries with the same tree SHA. This produced a spectacular
“12 unchanged” debugging session. Detect and warn at the adapter
boundary: overleafMirror.flistDetectCaseCollisions returns a list
of {sLocalPath, sTypedRemotePath, sCanonicalRemotePath}. The diff
endpoint returns these plus sSuggestedTargetDirectory, and the
frontend shows a banner with a one-click “Use canonical case” button.
GitHub is case-sensitive in its storage, so this specific quirk probably won’t surface. But: don’t assume; Windows-hosted GitHub repos through GitHub Desktop can introduce case weirdness, and macOS development filesystems are case-insensitive. Add the detection anyway; it’s cheap.
Zenodo is REST-API-only with a flat file list per deposition, so no directory cases at all — irrelevant for Zenodo.
Silent no-op success (universal)
The most dangerous error class: a push that “succeeds” but changed
nothing remotely. Overleaf hit this when git status --porcelain
came back empty because files copied to the wrong place (pathlib
absolute-join trap) or were byte-identical to existing remote files.
Every container CLI that mutates remote state must emit an
unambiguous status signal on stdout: overleafSync.py emits
PUSH_STATUS=pushed or PUSH_STATUS=no-changes, parsed host-side.
Apply the same pattern to GitHub push (commit count) and Zenodo
upload (did a new version get published? was anything new added to
a draft?).
Layer-cache masking of base-image bugs
Docker caches RUN layers aggressively. Once a base layer succeeds,
it gets reused forever until the cache is invalidated. We had an
apt/gpgv sandbox bug latent in every Ubuntu 24.04 base image we
built; it only surfaced when the user clicked Force Rebuild
(--no-cache) after weeks of cached-layer reuse. Lesson: don’t
interpret “this has been working for weeks” as “the layer is
correct.” When adding GitHub/Zenodo, if you change anything in the
Dockerfile (new packages, new config), test with Force Rebuild at
least once before shipping.
Container CLI hot-patching during dev
For fast iteration: docker cp vaibify/reproducibility/<cli>.py <container>:/usr/share/vaibify/<cli>.py is the answer. The
Dockerfile’s COPY is the permanent solution. Expect this cycle
during development. Don’t be fooled when your host-side tests pass
but the container still runs an older CLI — docker exec <cid> python3 /usr/share/vaibify/<cli>.py --help to check.
DNS rebinding / Host-header checks
Any localhost-bound server is vulnerable to DNS rebinding. We added
fbIsAllowedHostHeader middleware that rejects requests whose Host
header isn’t 127.0.0.1:<port>, localhost:<port>, or [::1]:<port>.
If you add any new endpoint, it gets this defense for free; don’t
undo it.
Stderr leaks tokens
Git sometimes echoes URLs with embedded credentials on auth failure:
fatal: Authentication failed for 'https://git:<token>@github.com/...'.
The existing redactor handles URL creds + “password/token/bearer/authorization”
keyword lines. GitHub’s REST API may surface raw tokens in different
error shapes (JSON bodies with "message" fields); extend the
redactor to cover those before exposing to the UI.
Pathlib absolute-RHS trap
Path("/tmp/clone") / "/Figures" is /Figures, not
/tmp/clone/Figures. Pathlib discards the left side when the right
starts with a separator. Every target-directory validator MUST
reject leading slashes before any join. This was the root cause of
Overleaf’s “phantom push with no remote change” bug.
What to modularize vs. what to write fresh
Credential persistence: where each service’s token lives
Service |
Token stored in |
Set by |
Persists across container rebuild? |
|---|---|---|---|
Overleaf |
host OS keychain ( |
Yes — host-side |
|
GitHub |
container keyring ( |
Yes — via named credentials volume |
|
Zenodo |
container keyring ( |
Yes — via named credentials volume |
The GUI’s Restart and Rebuild actions both do docker rm + docker run, so in-container writable layers are ephemeral. Container-side
credential stores must live on a named volume or they vanish every
time the user Rebuilds. The fix is a second named volume next to the
workspace mount:
containerManager._fnAddCredentialsVolume
mounts {projectName}-credentials at
/home/<user>/.local/share/python_keyring/. The Dockerfile
pre-creates the directory with mode 700 and the container user as
owner; Docker’s copy-on-mount copies that into the volume the first
time it’s used, and every subsequent container recreation sees the
existing tokens.
Already consolidated (three-service common core)
These lifted cleanly once the third service was in place:
fsWriteAskpassScript— the on-disk temp-file machinery (mkstemp + chmod 700 + return path) now lives inreproducibility/askpassHelper.py::fsWriteExecutableScript. Service-specific askpass source builders stay ingithubAuth.pyandoverleafAuth.py.LIST_GIT_HARDENING_CONFIG— single list of-cflags atreproducibility/gitHardening.py, imported bygui.gitStatus,reproducibility.overleafMirror, andgui.syncDispatcher.overleafSync.pykeeps a local copy because it ships into the container standalone.ZenodoClient— the host-side Zenodo API wrapper is also shipped into the container at/usr/share/vaibify/zenodoClient.py. The in-container archive script imports it flat and calls methods instead of reimplementing the HTTP surface. One bearer-token path for both host and container.
Still candidates to extract (when a fourth service lands)
fsRedactStderrhelper (overleafMirror / overleafSync; the container-shipped copy is deliberately divergent).fnValidateTargetDirectory(currently inoverleafSync.py).fnValidatePullRelativePath.fdictComputeContainerDigests(the digest-compute docker-exec helper).The
PUSH_STATUS=+HEAD_SHA=stdout protocol.Host header / session token middleware (already shared).
What we wrote fresh per service
GitHub — git-native like Overleaf but with a different authentication
story (gh auth token fallback, per-repo keyring slots). We wrote
githubAuth.py,
gitRoutes.py, and push flows in
syncDispatcher.py fresh, stole the hardening flags and askpass
machinery, and left overleafMirror.py Overleaf-specific (its quirks
are in the docstring for a reason). No shared serviceMirror.py
emerged — the only true overlap was the hardening flags, and those
lifted cleanly without a wrapper.
Zenodo — not git. REST API with Bearer tokens, deposits, and
newversion semantics. First pass implemented the whole API surface
inline inside a base64-encoded container script because
ZenodoClient needed host-only imports. That left us with two
implementations of the same Bearer-token logic, which is the
cross-cutting concern summarized below.
Ship the host client, don’t reimplement
When a host module wraps an external REST API cleanly and a
container script needs the same API, ship the host module into
the container (the overleafSync.py flat-file pattern) instead
of reimplementing the HTTP calls inline. The cost is one line in
docker/Dockerfile and one entry in
fnCopyContainerScripts; the win
is a single source of truth for:
Bearer-token auth construction
HTTP error classification (
_fnCheckResponse→ typed exceptions)Response-shape edge cases (
links.latest_draft, nestedidfields, 204 handling)Future retry / timeout policy
Prerequisites for ship-in: the module must import only
container-resident packages at top level (no vaibify.*, no
optional deps like tqdm unless you lazy-import them). Use the
same try: from vaibify.reproducibility.X import ... except ImportError: from X import ... fallback pattern overleafSync uses
for its siblings.
Zenodo landed this second pass in commit d0358c3: ZenodoClient
grew optional sToken / sBaseUrl kwargs, tqdm moved to lazy
imports, the archive script dropped from ~90 lines of inline HTTP
to ~55 lines of method calls.
Do NOT modularize yet (wait for four)
These felt common across Overleaf and GitHub but may not transfer when the fourth service lands:
flistDetectCaseCollisions— Overleaf-specific case folding; didn’t need it for GitHub.The specific
OverleafBehaviorfixture pattern — worth replicating per service, but don’t force a shared API.Target-directory selection UI — GitHub has branches, not directories; Zenodo has no directory concept at all.
Rule of thumb: three concrete instances is the right time to abstract. Two is premature; four starts to feel like you’re fighting divergence.
Architectural invariants to respect
Run these after every change:
python -m pytest tests/testArchitecturalInvariants.py -v
The ones most relevant to new services:
testNoRawFetchInFeatureModules— useVaibifyApi.*wrappers in the frontend.testDirectorUsesOsPath— host-side Python usesos.path.testLeafModuleHasNoIntraPackageImports— don’t add vaibify-gui imports to any file that ships into the container.testEveryJsFileIsRecognizedAsIIFE— register new JS modules inindex.htmland follow the IIFE convention.
Testing discipline
Mock
subprocess.runat the module boundary. Never invoke real git or make real network calls in unit tests.Use
tmp_pathfixtures for anything filesystem-related.Behavior-adapter tests (
testOverleafBehavior.py): static fixture strings that simulate the external service’s output, asserting the adapter interprets them correctly. These fail loudly when the external service changes. Create one of these per service.Route tests use FastAPI’s TestClient with sessions —
testSyncRoutesCoverage.pyis the model.Don’t weaken existing tests to make new ones pass. If a security fix makes an existing test’s input now invalid, update the test narrowly to use valid input that still exercises the same behavior.
Known follow-ups from the security audit
These were flagged but deferred during Overleaf’s final push. They will almost certainly bite GitHub and Zenodo too; fix them during those integrations rather than leaving three partial implementations:
Single keyring slot across projects — namespace by service + project from day one.
Token files leak to
/tmpon SIGKILL — add a startup sweep of/tmp/_vc_*tok*and/tmp/vc_askpass_*.fsRedactStderrmisses bare-token lines — when a service emits a raw token on a line by itself (no label keyword), redaction won’t catch it. Consider blanket-replacing the just-used token string.Pydantic models without
extra="forbid"— hardening, not exploitable.
Recommended sequence
Read this doc.
Read the Overleaf implementation top to bottom, especially overleafMirror.py and syncDispatcher.py.
Read the two relevant plan files in
.claude/plans/for the Overleaf push and mirror plans — they show the level of detail expected.Write GitHub first (closer shape to Overleaf), landing in small commits.
Write Zenodo (different shape entirely).
Extract shared helpers into a
serviceAuth.py/serviceMirror.pyafter both are working.Run the security audit prompt (ask the user for the one we used on Overleaf) against the new code.
Good luck. The Overleaf round took longer than estimated because of the seven or eight quirks documented above; budget accordingly for the next two services.