Architecture¶

This document explains how AdxLite is structured internally and how a KQL statement moves from raw text to a pandas DataFrame result. The focus is on implementation architecture rather than user-facing syntax. For public API usage, see API reference. For language details, see KQL syntax, Operators, and Functions.

Monorepo structure¶

The repository contains two Python packages:

adxpandas — standalone KQL-over-pandas execution (no database)
adxlite — SQLite-backed KQL execution (depends on adxpandas)

repo_root/
├── adxpandas/              # Sub-project: pure pandas KQL engine
│   ├── pyproject.toml
│   ├── src/adxpandas/
│   │   ├── parser/         # Shared KQL parser (tokenizer, AST, parser)
│   │   ├── engine/         # Pandas execution engine
│   │   ├── functions.py    # Pure-Python KQL function implementations
│   │   ├── client.py       # AdxPandasClient API
│   │   ├── wrap.py         # Wrap class (single-DataFrame quick query)
│   │   ├── magic.py        # Jupyter %kql magic (optional, requires IPython)
│   │   ├── render.py       # Chart rendering (optional, requires matplotlib)
│   │   └── exceptions.py   # Base exceptions
│   └── tests/
├── adxlite/                # Main project: SQLite-backed KQL engine
│   ├── pyproject.toml
│   ├── src/adxlite/
│   │   ├── parser/         # Shim modules re-exporting from adxpandas
│   │   ├── engine/         # SQLite+pandas hybrid execution
│   │   ├── translator/     # AST-to-SQL translation (SQLite-specific)
│   │   ├── storage/        # SQLite database layer
│   │   ├── client.py       # AdxLiteClient API
│   │   └── exceptions.py   # Re-exports + SQLite-specific exceptions
│   └── tests/
├── docs/                   # Shared documentation
├── mkdocs.yml
└── README.md

Architectural goals¶

AdxLite is built around a small set of goals:

keep execution local and dependency-light by embedding SQLite
expose a compact Python API suitable for scripts, tests, and notebooks
support a practical subset of KQL rather than the full Azure Data Explorer surface area
preserve predictable pipeline semantics even when multiple KQL operators are chained
use pandas where it provides meaningfully better ergonomics than raw SQL
keep type behavior explicit through schema metadata rather than relying only on SQLite affinity

Module map¶

The codebase is organized under src/adxlite/.

src/adxlite/
├── __init__.py
├── client.py
├── exceptions.py
├── parser/
│   ├── tokenizer.py
│   ├── ast_nodes.py
│   └── parser.py
├── translator/
│   ├── translator.py
│   ├── functions.py
│   └── sql_utils.py
├── storage/
│   ├── database.py
│   ├── kql_types.py
│   └── udf.py
└── engine/
    ├── planner.py
    ├── executor.py
    └── pandas_ops.py

Dependency diagram¶

The following diagram shows the primary runtime dependencies between modules.

AdxLiteClient (client.py)
    |
    v
ExecutionEngine (engine/executor.py)
    |-------------------------------> Database (storage/database.py)
    |                                     |
    |                                     +--> kql_types.py
    |                                     +--> udf.py
    |
    +--> parse_kql (parser/parser.py)
    |       |
    |       +--> tokenizer.py
    |       +--> ast_nodes.py
    |
    +--> Planner (engine/planner.py)
    |       |
    |       +--> Database.get_schema()
    |       +--> ast_nodes.py
    |
    +--> SqlTranslator (translator/translator.py)
    |       |
    |       +--> translator/functions.py
    |       +--> translator/sql_utils.py
    |       +--> ast_nodes.py
    |
    +--> PandasOperatorExecutor (engine/pandas_ops.py)
            |
            +--> storage/udf.py
            +--> ast_nodes.py

Layer responsibilities¶

Public API layer¶

client.py contains AdxLiteClient, the only class most users need.

Responsibilities:

open a database connection through Database
expose simple methods for ingest, query, table listing, schema inspection, dropping tables, and closing resources
hide parser, planner, translator, and execution details behind a stable interface
support context manager usage for deterministic cleanup

This layer deliberately stays thin. It is an orchestration layer, not a business-logic layer.

Parser layer¶

The parser layer is split into three modules:

tokenizer.py: raw text to token stream
ast_nodes.py: immutable syntax tree structures
parser.py: recursive-descent parsing from tokens into AST nodes

Responsibilities:

recognize table names, operators, literals, and expressions
normalize supported keywords into a stable AST representation
reject unsupported syntax early with KqlParseError or KqlUnsupportedError
produce either a Pipeline or an AppendCommand

The parser understands KQL as a pipeline language: a source table followed by zero or more pipe operators.

Planning layer¶

engine/planner.py bridges parsing and execution.

Responsibilities:

verify that the source table exists
fetch the current table schema from metadata
rewrite project-away into a positive projection based on known columns
infer output schema after each operator
split a pipeline into a SQL-compatible prefix and a pandas-only suffix

Today, the split is simple: everything is SQL-compatible except parse. The design is still important because it creates an explicit boundary for hybrid execution.

Translation layer¶

translator/translator.py converts SQL-compatible AST nodes into SQLite SQL plus parameter values.

Responsibilities:

turn the source table into SELECT * FROM "Table"
wrap each pipeline stage in a nested subquery to preserve operator ordering
map KQL scalar functions to SQLite expressions or SQLite UDF calls
translate aggregates into SQL aggregate expressions
bind literal values as SQL parameters rather than interpolating them into SQL text

translator/functions.py contains the function-rendering registry used by the translator. translator/sql_utils.py contains small helpers such as identifier quoting.

Storage layer¶

The storage layer centers on storage/database.py.

Responsibilities:

own the SQLite connection
register UDFs on the connection
create and maintain internal metadata tables
create user tables during ingestion
normalize DataFrame values for storage
execute SQL and restore result columns according to KQL schema metadata

storage/kql_types.py handles type inference, normalization, and result restoration. storage/udf.py implements SQLite UDFs used by translated SQL and by pandas fallback execution.

Execution layer¶

The execution layer has two main modules:

engine/executor.py: top-level orchestration
engine/pandas_ops.py: pandas-side operator execution

Responsibilities:

parse KQL text
recognize .append management commands
invoke the planner and translator
execute generated SQL through Database
apply any pandas operators to the SQL result DataFrame
normalize final datetime columns for the result

End-to-end execution pipeline¶

KQL text
  |
  v
Tokenizer
  |
  v
Tokens
  |
  v
Recursive-descent parser
  |
  v
AST (Pipeline or AppendCommand)
  |
  v
Planner
  |
  +--> resolves project-away
  +--> infers schemas
  +--> splits SQL and pandas phases
  |
  v
SQL-compatible pipeline + pandas operator suffix
  |
  v
SQL translator
  |
  v
SQLite SQL + parameters
  |
  v
Database.query_dataframe()
  |
  v
Initial pandas DataFrame result
  |
  v
PandasOperatorExecutor (if needed)
  |
  v
Final pandas DataFrame

Query flow: pure SQL pipeline¶

A pure SQL pipeline is any pipeline whose operators are all SQL-compatible.

Example:

Users
| where score >= 20
| extend doubled = score * 2
| summarize total=count(), max_score=max(score) by city
| sort by total desc

Execution flow:

parse_kql() turns the query into a Pipeline AST.
Planner.plan() confirms the table exists and infers output schema after each operator.
Because none of the operators require pandas, the pandas suffix is empty.
SqlTranslator.translate() produces nested SQL statements plus bound parameters.
Database.query_dataframe() executes the SQL and restores result types such as booleans and datetimes where schema information is available.
The result is returned directly to the caller.

Benefits of the pure SQL path:

minimal data transfer between SQLite and pandas
faster execution for filters, projections, aggregates, and sorts
leverages SQLite indexes if added outside AdxLite

Query flow: hybrid SQL + pandas pipeline¶

A hybrid pipeline is used when at least one operator cannot be expressed in the SQL subset used by AdxLite. Today the main example is parse.

Example:

Logs
| where Message matches regex "login|logout"
| parse Message with "user=" user " action=" action
| project user, action

Execution flow:

The parser returns a Pipeline AST containing WhereOp, ParseOp, and ProjectOp.
The planner walks the operators in order.
where is SQL-compatible and stays in the SQL prefix.
parse is not SQL-compatible, so the planner marks a split point.
parse and all following operators become the pandas suffix.
SQL executes first and returns a DataFrame containing only the rows that passed the filter.
PandasOperatorExecutor applies parse by building a regular expression from the parse pattern and extracting capture columns.
Remaining pandas-side operators such as project operate on that DataFrame.
The final DataFrame is returned to the caller.

Why the suffix includes all later operators:

later operators may depend on columns created by parse
a single split point keeps execution order easy to reason about
it avoids repeated back-and-forth conversion between SQLite and pandas

Management command flow: `.append`¶

AdxLite supports one management-style command:

.append Archive <| Source | where ok == true

Execution flow:

The parser returns an AppendCommand, not a normal Pipeline.
ExecutionEngine.execute() recognizes the command.
The nested query is executed as a normal pipeline.
The resulting DataFrame is ingested into the target table in append mode.
The command returns an empty DataFrame to the caller.

This architecture reuses the same planner, translator, and executor path for both query-only and query-then-write workflows.

Storage model¶

AdxLite stores data in ordinary SQLite user tables plus one internal metadata table.

User tables¶

Each ingested DataFrame becomes a SQLite table whose name matches the supplied table name. Column storage types are chosen from the inferred logical KQL schema.

Metadata table¶

Database maintains an internal table named __adxlite_columns.

Columns:

Column	Meaning
`table_name`	User table name
`column_name`	Logical column name
`ordinal`	Original column order from ingestion
`kql_type`	Logical KQL type such as `long`, `string`, or `datetime`

Why the metadata exists:

SQLite storage affinity alone is not enough to preserve KQL intent
datetime columns are stored as text but should be restored as pandas datetimes
append validation needs a reliable logical schema
user table order must remain stable for schema inspection and append checks

Datetime data path¶

Datetime handling is a cross-cutting architectural concern.

kql_types.infer_column_type() detects pandas datetime dtypes.
normalize_for_storage() converts each value to ISO-8601 text before insertion.
SQLite stores the value in a TEXT column.
Query functions such as bin, ago, now, and datetime_add operate through UDFs that parse and re-emit ISO-8601 strings.
restore_series() converts result columns marked as datetime back to pandas datetime dtype.

This choice keeps storage simple while preserving enough semantics for comparison, sorting, and bucketing.

Operator compatibility by layer¶

Operator	Parser	Planner	SQLite SQL	Pandas fallback
`where`	yes	yes	yes	yes
`project`	yes	yes	yes	yes
`project-away`	yes	resolved to `project`	indirectly	yes
`extend`	yes	yes	yes	yes
`summarize`	yes	yes	yes	yes
`take` / `limit`	yes	yes	yes	yes
`count`	yes	yes	yes	yes
`sort by` / `order by`	yes	yes	yes	yes
`top`	yes	yes	yes	yes
`distinct`	yes	yes	yes	yes
`parse`	yes	yes	no	yes

Error propagation model¶

Errors are intentionally categorized by architectural stage.

Stage	Typical exception
Tokenization or parsing	`KqlParseError`
Valid but unsupported syntax	`KqlUnsupportedError`
Missing table	`TableNotFoundError`
Append schema mismatch	`SchemaError`
SQL translation problem	`TranslationError`
SQLite execution problem	`ExecutionError`

This separation is useful in application code because callers can decide whether a failure should be shown as a user syntax issue, a product limitation, or a storage/runtime problem.

Why the architecture is intentionally simple¶

AdxLite is not trying to implement a full optimizer or distributed execution stack. The architecture favors clarity and correctness over maximum abstraction.

Examples of intentional simplicity:

one embedded database engine: SQLite
one public client class: AdxLiteClient
one explicit planner split between SQL and pandas
one metadata table for logical schema tracking
one AST model shared across parser, planner, translator, and pandas execution

Design decisions explains why these architectural choices were made.
Type system explains how schemas move between pandas, SQLite, and KQL.
Requirements lists the functional requirements this architecture serves.
API reference documents the public surfaces that sit on top of this design.

Architecture¶

Monorepo structure¶

Architectural goals¶

Module map¶

Dependency diagram¶

Layer responsibilities¶

Public API layer¶

Parser layer¶

Planning layer¶

Translation layer¶

Storage layer¶

Execution layer¶

End-to-end execution pipeline¶

Query flow: pure SQL pipeline¶

Query flow: hybrid SQL + pandas pipeline¶

Management command flow: .append¶

Storage model¶

User tables¶

Metadata table¶

Datetime data path¶

Operator compatibility by layer¶

Error propagation model¶

Why the architecture is intentionally simple¶

Related documents¶

Management command flow: `.append`¶