> For the complete documentation index, see [llms.txt](https://docs.etiq.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.etiq.ai/agent-instruction.md).

# Agent Instruction

This page gives AI agents the minimum operating instructions needed to use Etiq against a python script.

### Use This File When

Use this file when an agent needs to:

* scan a python script with Etiq
* generate a lineage graph
* inspect captured lineage objects
* return a structured summary of scan outputs

### Purpose

Use Etiq to scan a python file and it's run,  generate lineage, and retrieve captured lineage objects such as dataframes, models, and agent states.

### Project Context

* Product name: Etiq
* Install package: `etiq-copilot`
* python import namespace: `etiq_copilot`
* Main scan result object: `CodeScannerResult`  from `DebuggerCodeScanner()`
* Example target file: `test_repo/iris_pipeline.py`
* Related docs:
  * Quickstart
  * Core Concepts
  * Scan Outputs

### Setup Commands

Install with `pip`:

```bash
pip install etiq-copilot
```

Install with `uv`:

```bash
uv pip install etiq-copilot
```

### Files That Matter

* `docs/etiq/quickstart.md`: scanner setup and minimal file scan example
* `docs/etiq/core-concepts.md`: lineage graph and lineage object concepts
* `docs/etiq/scan-results.md`: `CodeScannerResult` methods, captured states, source nodes, and larger-codebase entry-file scans
* `test_repo/iris_pipeline.py`: example target script

### Scan Helper

Use this helper to scan a target file and return a `CodeScannerResult`.

```python
from pathlib import Path

from etiq_copilot.engine.implementations.scanner.code_scanner import DebuggerCodeScanner
from etiq_copilot.engine.implementations.scanner.scan_results import CodeScannerResult


def scan_file(scan_file_path: Path | str) -> CodeScannerResult:
    scan_file_path = Path(scan_file_path)
    original_code = scan_file_path.read_text(encoding="utf-8")
    scanner = DebuggerCodeScanner()
    return scanner.scan_code(code_str=original_code)
```

### Minimum Working Routine

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

result = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "lineage_objects": {
        "datasets": scan_results.list_dataframes(),
        "models": scan_results.list_models(),
        "agents": scan_results.list_agents(),
    },
    "source_evidence": [
        {
            "names": sorted(state.names),
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in scan_results.get_dataframes()
    ],
    "lineage_graph": {
        "format": "json",
        "value": scan_results.create_full_lineage_graph(graph_format="json"),
    },
}
```

### Default Workflow

1. Set the target file path explicitly.
2. Scan the entry file with `scan_file(...)`.
3. Read `scan_results.scan_errors`.
4. If `scan_errors` is non-empty, report them before drawing conclusions.
5. Generate lineage with `create_full_lineage_graph(graph_format="json")` when another tool or agent needs to parse the graph.
6. List captured lineage objects with `list_dataframes()`, `list_models()`, and `list_agents()`.
7. Retrieve full state objects only when names or graph output are not enough.
8. Use `state.node.as_string()` and `state.node.scope()` only when source evidence is needed.

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

scan_errors = scan_results.scan_errors
lineage_json = scan_results.create_full_lineage_graph(graph_format="json")

lineage_objects = {
    "datasets": scan_results.list_dataframes(),
    "models": scan_results.list_models(),
    "agents": scan_results.list_agents(),
}

source_evidence = [
    {
        "names": sorted(state.names),
        "source": state.node.as_string(),
        "scope": type(state.node.scope()).__name__,
    }
    for state in scan_results.get_dataframes()
]
```

### API Commands

Use these public methods and properties.

| When you need to               | Use                                                           | Output                     |
| ------------------------------ | ------------------------------------------------------------- | -------------------------- |
| Check scan issues              | `scan_results.scan_errors`                                    | Scan error details         |
| Generate parseable lineage     | `scan_results.create_full_lineage_graph(graph_format="json")` | JSON graph string          |
| Generate visual lineage source | `scan_results.create_full_lineage_graph(graph_format="dot")`  | DOT graph string           |
| List dataset lineage objects   | `scan_results.list_dataframes()`                              | `list[str]`                |
| Retrieve dataset states        | `scan_results.get_dataframes()`                               | Dataframe state objects    |
| List model lineage objects     | `scan_results.list_models()`                                  | `list[str]`                |
| Retrieve model states          | `scan_results.get_models()`                                   | Model state objects        |
| List agent states              | `scan_results.list_agents()`                                  | `list[str]`                |
| Retrieve agent states          | `scan_results.get_agent_states()`                             | Agent state objects        |
| Retrieve other captured states | `scan_results.get_unstructured_states()`                      | Unstructured state objects |

### Task Recipes

#### Summarize A Target Script

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

dataframe_states = scan_results.get_dataframes()
model_states = scan_results.get_models()
agent_states = scan_results.get_agent_states()
unstructured_states = scan_results.get_unstructured_states()

summary = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "dataframes": scan_results.list_dataframes(),
    "models": scan_results.list_models(),
    "agents": scan_results.list_agents(),
    "counts": {
        "dataframes": len(dataframe_states),
        "models": len(model_states),
        "agents": len(agent_states),
        "unstructured": len(unstructured_states),
    },
    "states": [
        {
            "names": sorted(state.names),
            "state_type": type(state).__name__,
            "line_no": state.line_no,
            "node_type": type(state.node).__name__,
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in dataframe_states
    ],
}
```

#### Generate Lineage For Another Agent

```python
lineage_json = scan_results.create_full_lineage_graph(graph_format="json")
```

Return the graph as a string and state that the graph schema should not be treated as stable until a versioned schema is published.

#### Inspect Dataset State Objects

```python
for state in scan_results.get_dataframes():
    print(state)
```

Use state objects when the agent needs captured values, source evidence, or Etiq metadata for a lineage object.

#### Inspect Source Evidence

```python
for state in scan_results.get_dataframes():
    print("names:", state.names)
    print("source:", state.node.as_string())
    print("scope:", type(state.node.scope()).__name__)
```

Use this when the user asks where a lineage object came from in the code.

#### Inspect Model State Objects

```python
for state in scan_results.get_models():
    print(state)
```

Use this when the user asks which models were created, used, or captured.

### Decision Rules

* If the user asks for lineage output that another tool will parse, use `graph_format="json"`.
* If the user asks for visualization-oriented output, use `graph_format="dot"`.
* If `scan_errors` is non-empty, report the errors before summarizing lineage.
* If names from `list_dataframes()`, `list_models()`, or `list_agents()` are sufficient, do not retrieve full state objects.
* If the target workflow spans multiple files, scan the entry file that starts the run.
* If source evidence or captured values are needed, retrieve state objects with the corresponding `get_*` method and inspect `state.node`.
* If the task requires installing Etiq, ask for approval before running an install command.
* If the task requires deleting files, changing public APIs, or modifying generated output, ask before proceeding.

### Do Not

* Do not rely on private internals.
* Do not assume all lineage objects are dataframes or models forever.
* Do not assume graph JSON has a stable schema until that schema is published.
* Do not ignore `scan_errors`.
* Do not install dependencies without approval.
* Do not reformat unrelated files when editing docs or examples.

### Completion Criteria

Before finishing an agent task, report:

* target file scanned
* scan errors, or that no scan errors were reported
* lineage object names found
* graph format generated, if any
* commands run
* checks skipped and why
* known limitations or follow-up

Use this response shape when returning structured results:

```python
result = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "lineage_objects": {
        "datasets": scan_results.list_dataframes(),
        "models": scan_results.list_models(),
        "agents": scan_results.list_agents(),
    },
    "source_evidence": [
        {
            "names": sorted(state.names),
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in scan_results.get_dataframes()
    ],
    "lineage_graph": {
        "format": "json",
        "value": lineage_json,
    },
}
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.etiq.ai/agent-instruction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
