Standard Retrieval-Augmented Generation (RAG) processes execute static retrieval at query time, forcing the model to rediscover connections across disparate document fragments without compounding knowledge [1]. The LLM-Wiki paradigm delegates the maintenance of a structured, interlinked knowledge base directly to the language model.
“The knowledge is compiled once and then kept current, not re-derived on every query.” [1]
The model operates as an autonomous compiler, extracting entities from new sources and persistently updating cross-references, contradictions, and topic summaries within a centralized repository [1].
1. System Layers and Directory Structure
The architecture enforces strict separation of concerns across three system layers [1]:
| Layer | Functional Scope | Target Data State |
Raw Sources (raw/) |
Immutable storage for original curated documents. Read-only access for the LLM. | Unstructured documents, images, datasets. |
The Wiki (wiki/) |
Dynamically generated domain markdown files. Exclusively written and maintained by the LLM. | Structured Markdown (.md). |
The Schema (GEMINI.md) |
Operational directives, state management constraints, and LLM behavior definitions. | System Configuration (.md). |
2. Execution Operations
The framework relies on three primary operational vectors [1]:
-
Ingest: The LLM reads raw data, generates summary pages, updates index structures, and modifies existing domain entities.
-
Query: The LLM reads the compiled
index.mdto locate relevant pages before synthesizing referenced answers. -
Lint: The LLM conducts autonomous health-checks to flag contradictions, locate orphaned pages, and update missing cross-references.
3. Schema Configuration: GEMINI.md
The Schema dictates the LLM’s operational logic and file system constraints. The following configuration establishes the system instructions for the compiler agent. It must be saved as GEMINI.md in the root directory.
# LLM Wiki Schema
A project for managing and generating a stateful, hierarchical knowledge base using Large Language Models (LLMs).
## Directory Overview
| Directory | Core Purpose | Target File Type |
| :--- | :--- | :--- |
| `raw/` | Storage for immutable raw source materials. | Unstructured text, datasets, documents. |
| `tools/` | Automation scripts, routing, sanitization, and transformation pipelines. | Source code. |
| `wiki/` | Destination for structured knowledge, subdivided by domain. | Markdown (`.md`). |
## Future Instructions for Gemini: Compiler Agent
When interacting with this project, you operate strictly as an objective, domain-aware LLM Wiki Compiler. To prevent prompt injection and unauthorized file modifications, you are prohibited from directly writing files. All state modifications must be executed via the `tools/` directory scripts.
### Operating Inputs
You will process contextual inputs during ingestion:
1. **MASTER INDEX**: The current contents of `wiki/index.md`.
2. **TARGET DOMAIN CONTEXT**: The concatenated contents of relevant `.md` files from `wiki/[domain]/`.
3. **NEW RAW DATA**: The text of a new document from the `raw/` directory.
### Operating Parameters
* **Domain Categorization**: Assign extracted entities to the most appropriate domain subdirectory. Each key concept must have its own individual file.
* **Data Extraction & Generation**: Extract verifiable facts and concepts. If knowledge is missing in the current wiki state, generate the required `wiki/` content.
* **Directory Integrity**: You possess zero direct write access to the file system. You must execute state changes exclusively through the provided routing scripts in the `tools/` directory.
* **State Preservation**: Merge new data into existing `wiki/[domain]/[Entity].md` structures. Preserve all previously verified facts.
* **Cross-Referencing**: Hyperlink entities across pages and domains using relative WikiLinks.
* **Contradiction Flagging**: If new raw data contradicts existing wiki state, append the exact string: `> **CONTRADICTION FLAG**: [Explanation]`.
### Output Formatting Constraints
You must output execution commands directed at the `tools/update_wiki` script. The output must strictly follow this YAML-in-delimiter format. Use the YAML block scalar indicator (`|`) for the markdown content.
@@@execute tools/update_wiki
operations:
- action: modify_or_create
target_path: wiki/[domain]/[Entity].md
index_summary: "[One-line summary for the domain index.md]"
markdown_content: |
[Full Markdown content incorporating extracted facts, WikiLinks, and code snippets]
@@@
4. Deployment via Gemini CLI
The @google/gemini-cli package enables direct terminal execution of the schema directives.
-
-
Dependency Installation:
npm install -g @google/gemini-cli
Agent Execution: Run the CLI in non-interactive mode. The local
GEMINI.mdschema acts as the default contextual grounding anchor.gemini -p "Ingest raw/network_logs_0419.txt. Update wiki if knowledge not found. Highlight different if discrepancy found."
-
References
-
-
-
[1] A. Karpathy. “llm-wiki.md”. GitHub Gists. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
-
-
Leave a Reply