Working with Scan Results

Scan Outputs

CodeScannerResult is the main object returned by DebuggerCodeScanner().scan_code(...). It stores the captured states, lineage graph outputs, scan errors, and source nodes Etiq uses to build lineage.

This page uses a small dataframe example throughout.

from etiq_copilot.engine.implementations.scanner import DebuggerCodeScanner

src = """import pandas as pd
df = pd.DataFrame([1])

def add_one(adf):
    new_df = adf + 1
    return new_df

df2 = add_one(df)
"""

scanner = DebuggerCodeScanner()
result = scanner.scan_code(src)

What The Scan Captures

Every captured object has a state. Etiq uses the state to build lineage, including parent/child relationships and function mappings. The state also stores the Astroid node for the captured code location.

For this example, Etiq captures four dataframe states:

Captured name
Source line
Captured node

df

2

df = pd.DataFrame([1])

adf

4

def add_one(adf): ...

new_df

5

new_df = adf + 1

df2

8

df2 = add_one(df)

You can inspect the raw state store with result.values:

You can also use the CodeScannerResult methods below:

Lineage Graph

Use create_full_lineage_graph() to generate the lineage graph.

What you get:

Call
Output

result.create_full_lineage_graph()

DOT graph string

result.create_full_lineage_graph(graph_format="json")

JSON graph string

For this example, both outputs are strings. The exact string length and generated node IDs can differ between runs.

Dataset Lineage Objects

Use list_dataframes() when you only need the names of captured dataframe lineage objects.

Example output:

Use get_dataframes() when you need the state objects.

For this example, get_dataframes() returns four dataframe states: df, adf, new_df, and df2.

Model Lineage Objects

Use list_models() and get_models() for captured model lineage objects.

Agent States

Use list_agents() and get_agent_states() for captured agent states.

Unstructured States

Use get_unstructured_states() for captured states that do not fit a more specific lineage object category.

Paths Between States

Use get_shortest_path(parent_node, child_node) to inspect a lineage path between two captured data states.

For this example, the returned path connects the function argument state back to the original dataframe state:

Scan Errors

Use scan_errors to inspect scan errors before relying on downstream outputs.

Source Nodes And Scope

Each captured state stores a source node. The node points back to the code Etiq associated with that captured object.

Use the node when you need to answer source-level questions:

  • where the captured object came from, such as line number and scope

  • the source snippet Etiq associated with the captured object

For most lineage workflows, use the CodeScannerResult methods above. The node is mainly useful for source evidence and debugging.

Module

In the example, df is created at the top level of the script:

After the scan, find the captured state for df and inspect its source node:

Output:

Module means the captured object came from the outermost script scope, not from inside a function.

Function-local object

Example output:

new_df is created inside add_one, so scope() returns FunctionDef.

Use node.as_string() for the source snippet. Use node.scope() when you need to know whether the captured object came from module-level code, a function body, or another scope.

Scanning a Codebase

For a codebase, scan the entry file that starts the run. The entry file can import and call functions from other local files. Etiq executes the entry file and captures lineage objects produced along that execution path.

Example project:

transforms.py contains a helper function:

pipeline.py is the entry file:

Scan pipeline.py. You do not need to scan transforms.py separately; it is called by the entry file during execution.

Run this from the project root so local imports such as from transforms import add_one resolve normally.

Use the entry file for the workflow you want to observe. Any imported code that runs as part of that workflow is part of the execution path Etiq observes.

Example output from scanning pipeline.py:

This shows Etiq capturing lineage objects from the entry file and from the imported function that ran during the entry file's execution. In particular, Etiq captures both new_df and new_df2, even though they are created inside add_one in transforms.py.

Last updated