# Philosophy This document explains the motivation behind Vaibify and how it enables AI-assisted scientific computing. For the implementation that flows from these beliefs, see [architecture.md](architecture.md). For the methodology behind the project's own agent documentation, see [vibeCoding.md](vibeCoding.md). ## Harnessing Coding Agents for Scientific Analyses AI coding agents are remarkably capable and remarkably dangerous. An agent running loose in a researcher's home directory can read SSH keys, overwrite datasets built over months, or push a half-finished branch to a shared repository. Treating a scientific workstation as an agent's playground places years of careful work at risk for a few hours of convenience. Vaibify's first job is to contain AI agents to a secure environment. Once contained, you can safely allow AI coding agents to freely build code, monitor long-running jobs, and generate figures and TeX for manuscripts. Experiment with confidence and push the limits of the combined abilities of your brain and AI. Vaibify is based on the assumption that code is written by agents now. While access to an IDE is included, vaibify is optimized for command-line agents like Claude Code. The command line is ground truth for the operating system. It will always be the fundamental representation of a filesystem. Agent-written code must pass different validation tests than human-written code, and AI-assisted science must pass higher thresholds than human-only science. The central assumption of vaibify is: > *Visually verifying the outputs of AI-written software is a faster > and more reliable path to good science than analyzing data with > only human-written code.* The value proposition follows: after an hour of setup, users can obtain better results in a shorter amount of time than they could working from a human-only codebase. This is a falsifiable claim. If it turns out that code written only by humans reliably outperforms code written by agents and verified by humans, the whole architecture of vaibify is wasted effort. The bet is that verification of outputs is the part of science that genuinely benefits from human attention, and that delegating the mechanical code-writing to agents frees a researcher to spend more time on the parts that actually matter: asking the right questions, inspecting the results, and deciding what to do next. Crucially, however, an experiment that vaibify has deemed "verified" is not necessarily accurate. Vaibify can only monitor the contents of files; the interpretation of the results is still up to you. All vaibify can do is that your results are self-consistent locally and in your remote repositories ## How the results can be trusted The same way they always have been: through visually inspecting scripts, plots, and even raw data. Vaibify is more explicit about the practice because an agent has an outsized capacity to veer off course and change scripts or files that apply to other steps, silently breaking downstream work. Three mechanisms operate together: - **Rigorous testing of produced datasets.** Every step produces outputs that are checked against deterministic assertions at test time. When a file changes, its associated tests become stale and the test must be re-run. - **Dependency monitoring.** When an upstream step is modified, every downstream step is flagged as potentially invalidated. The researcher sees which steps need attention. - **Per-step sign-off by the user.** After inspecting the outputs of a step, the researcher marks it verified. That sign-off is a first-class artifact alongside the code and the data, not a memory in somebody's head. The three together close the loop: an AI writes the code, the container constrains where it can reach, the test suite catches regressions, dependency tracking surfaces knock-on effects, and the researcher signs off on each step only after looking at it. ## AI-assisted development as practice Vaibify is developed primarily through AI-assisted coding with Claude Code, under human review and direction. This is not incidental to the project — it is the practice the tool was built to enable. Vaibify's own codebase is one of the first tests of the central assumption above. The structure of the documentation in this repository mirrors the same principles the tool asks researchers to adopt: separate what can be tested from what cannot, never trust what can drift, and make the human's verification role explicit. [vibeCoding.md](vibeCoding.md) describes this methodology in detail so that other scientific software projects can adopt the same approach. ## The tagline is the architecture spec The tagline *"Vibe boldly. Verify everything."* is not marketing. It is the architectural specification for the entire project. Bold vibing happens inside a Docker container the agent cannot escape. Verification happens in a browser dashboard that makes the researcher's "yes, I looked at this" a first-class artifact alongside the code and the data. Every design choice downstream — the containerization model, the verification state machine, the polling cadence, the rule that the dashboard never lies — falls out of taking both halves of the tagline seriously at the same time. For how that specification becomes code, read [architecture.md](architecture.md) next.