> For the complete documentation index, see [llms.txt](https://docs.etiq.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.etiq.ai/agent-instruction.md).

# Agent Instruction

This page gives AI agents the minimum operating instructions needed to use Etiq against a python script.

### Use This File When

Use this file when an agent needs to:

* scan a python script with Etiq
* generate a lineage graph
* inspect captured lineage objects
* return a structured summary of scan outputs

### Purpose

Use Etiq to scan a python file and it's run, generate lineage, and retrieve captured lineage objects such as dataframes, models, unstructured data, and agent states.

### Project Context

* Product name: Etiq
* Install package: `etiq-copilot`
* python import namespace: `etiq_copilot`
* Main scan result object: `CodeScannerResult`  from `DebuggerCodeScanner().scan_code(source_code)`
* Example target file: `test_repo/iris_pipeline.py`
* Related docs:
  * Quickstart
  * Core Concepts
  * Scan Outputs
  * Agentic Workflows

### Setup Commands

Install with `pip`:

```bash
pip install etiq-copilot
```

Install with `uv`:

```bash
uv pip install etiq-copilot
```

### Files That Matter

* `docs/etiq/quickstart.md`: scanner setup and minimal file scan example
* `docs/etiq/core-concepts.md`: lineage graph and lineage object concepts
* `docs/etiq/scan-results.md`: `CodeScannerResult` methods, captured states, source nodes, and larger-codebase entry-file scans
* `test_repo/iris_pipeline.py`: example target script

### Scan Helper

Use this helper to scan a target file and return a `CodeScannerResult`.

```python
from pathlib import Path

from etiq_copilot.engine.implementations.scanner.code_scanner import DebuggerCodeScanner
from etiq_copilot.engine.implementations.scanner.scan_results import CodeScannerResult


def scan_file(scan_file_path: Path | str) -> CodeScannerResult:
    scan_file_path = Path(scan_file_path)
    original_code = scan_file_path.read_text(encoding="utf-8")
    scanner = DebuggerCodeScanner()
    return scanner.scan_code(code_str=original_code)
```

### Minimum Working Routine

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

result = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "lineage_objects": {
        "datasets": scan_results.list_dataframes(),
        "models": scan_results.list_models(),
        "agents": scan_results.list_agents(),
    },
    "source_evidence": [
        {
            "names": sorted(state.names),
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in scan_results.get_dataframes()
    ],
    "lineage_graph": {
        "format": "json",
        "value": scan_results.create_full_lineage_graph(graph_format="json"),
    },
}
```

### Default Workflow

1. Set the target file path explicitly.
2. Scan the entry file with `scan_file(...)`.
3. Read `scan_results.scan_errors`.
4. If `scan_errors` is non-empty, report them before drawing conclusions.
5. Generate lineage with `create_full_lineage_graph(graph_format="json")` when another tool or agent needs to parse the graph.
6. List captured lineage objects with `list_dataframes()`, `list_models()`, and `list_agents()`.
7. Retrieve full state objects only when names or graph output are not enough.
8. Use `state.node.as_string()` and `state.node.scope()` only when source evidence is needed.

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

scan_errors = scan_results.scan_errors
lineage_json = scan_results.create_full_lineage_graph(graph_format="json")

lineage_objects = {
    "datasets": scan_results.list_dataframes(),
    "models": scan_results.list_models(),
    "agents": scan_results.list_agents(),
}

source_evidence = [
    {
        "names": sorted(state.names),
        "source": state.node.as_string(),
        "scope": type(state.node.scope()).__name__,
    }
    for state in scan_results.get_dataframes()
]
```

### API Commands

Use these public methods and properties.

| When you need to               | Use                                                           | Output                     |
| ------------------------------ | ------------------------------------------------------------- | -------------------------- |
| Check scan issues              | `scan_results.scan_errors`                                    | Scan error details         |
| Generate parseable lineage     | `scan_results.create_full_lineage_graph(graph_format="json")` | JSON graph string          |
| Generate visual lineage source | `scan_results.create_full_lineage_graph(graph_format="dot")`  | DOT graph string           |
| List dataset lineage objects   | `scan_results.list_dataframes()`                              | `list[str]`                |
| Retrieve dataset states        | `scan_results.get_dataframes()`                               | Dataframe state objects    |
| List model lineage objects     | `scan_results.list_models()`                                  | `list[str]`                |
| Retrieve model states          | `scan_results.get_models()`                                   | Model state objects        |
| List agent states              | `scan_results.list_agents()`                                  | `list[str]`                |
| Retrieve agent states          | `scan_results.get_agent_states()`                             | Agent state objects        |
| Retrieve other captured states | `scan_results.get_unstructured_states()`                      | Unstructured state objects |

### Task Recipes

#### Summarize A Target Script

```python
target_file = "test_repo/iris_pipeline.py"
scan_results = scan_file(target_file)

dataframe_states = scan_results.get_dataframes()
model_states = scan_results.get_models()
agent_states = scan_results.get_agent_states()
unstructured_states = scan_results.get_unstructured_states()

summary = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "dataframes": scan_results.list_dataframes(),
    "models": scan_results.list_models(),
    "agents": scan_results.list_agents(),
    "counts": {
        "dataframes": len(dataframe_states),
        "models": len(model_states),
        "agents": len(agent_states),
        "unstructured": len(unstructured_states),
    },
    "states": [
        {
            "names": sorted(state.names),
            "state_type": type(state).__name__,
            "line_no": state.line_no,
            "node_type": type(state.node).__name__,
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in dataframe_states
    ],
}
```

#### Generate Lineage For Another Agent

```python
lineage_json = scan_results.create_full_lineage_graph(graph_format="json")
```

Return the graph as a string and state that the graph schema should not be treated as stable until a versioned schema is published.

#### Inspect Dataset State Objects

```python
for state in scan_results.get_dataframes():
    print(state)
```

Use state objects when the agent needs captured values, source evidence, or Etiq metadata for a lineage object.

#### Inspect Source Evidence

```python
for state in scan_results.get_dataframes():
    print("names:", state.names)
    print("source:", state.node.as_string())
    print("scope:", type(state.node.scope()).__name__)
```

Use this when the user asks where a lineage object came from in the code.

#### Inspect Model State Objects

```python
for state in scan_results.get_models():
    print(state)
```

Use this when the user asks which models were created, used, or captured.

### Decision Rules

* If the user asks for lineage output that another tool will parse, use `graph_format="json"`.
* If the user asks for visualization-oriented output, use `graph_format="dot"`.
* If `scan_errors` is non-empty, report the errors before summarizing lineage.
* If names from `list_dataframes()`, `list_models()`, or `list_agents()` are sufficient, do not retrieve full state objects.
* If the target workflow spans multiple files, scan the entry file that starts the run.
* If source evidence or captured values are needed, retrieve state objects with the corresponding `get_*` method and inspect `state.node`.
* If the task requires installing Etiq, ask for approval before running an install command.
* If the task requires deleting files, changing public APIs, or modifying generated output, ask before proceeding.

### Do Not

* Do not rely on private internals.
* Do not assume all lineage objects are dataframes or models forever.
* Do not assume graph JSON has a stable schema until that schema is published.
* Do not ignore `scan_errors`.
* Do not install dependencies without approval.
* Do not reformat unrelated files when editing docs or examples.

### Completion Criteria

Before finishing an agent task, report:

* target file scanned
* scan errors, or that no scan errors were reported
* lineage object names found
* graph format generated, if any
* commands run
* checks skipped and why
* known limitations or follow-up

Use this response shape when returning structured results:

```python
result = {
    "target_file": target_file,
    "scan_errors": scan_results.scan_errors,
    "lineage_objects": {
        "datasets": scan_results.list_dataframes(),
        "models": scan_results.list_models(),
        "agents": scan_results.list_agents(),
    },
    "source_evidence": [
        {
            "names": sorted(state.names),
            "source": state.node.as_string(),
            "scope": type(state.node.scope()).__name__,
        }
        for state in scan_results.get_dataframes()
    ],
    "lineage_graph": {
        "format": "json",
        "value": lineage_json,
    },
}
```