Functional Requirements¶

This document captures the complete functional requirement set for AdxLite as implemented and documented in this repository. It is intended to be read by maintainers, contributors, and reviewers who want a single place that explains the required behavior of the project.

The requirements below are written as product-level capabilities rather than low-level implementation tasks. Where useful, each requirement includes rationale, expected behavior, and validation notes.

Scope statement¶

AdxLite is a local, SQLite-based analytical engine that accepts a supported subset of KQL and operates over tables ingested from pandas DataFrames. The product must remain local-first, easy to embed in Python programs, and sufficiently documented for users to understand both supported behavior and limitations without reading the source.

Requirement summary table¶

ID	Requirement	Status expectation
FR-01	SQLite-based local file database	mandatory
FR-02	Pandas DataFrame ingestion as tables	mandatory
FR-03	KQL query support over local tables	mandatory
FR-04	`.append` command support	mandatory
FR-05	Simple client API	mandatory
FR-06	No external Kusto cluster connections	mandatory
FR-07	No join/union across databases	mandatory
FR-08	`parse` operator support	mandatory
FR-09	`ago()` and `now()` support	mandatory
FR-10	`matches regex` partial-match semantics	mandatory
FR-11	`extract()` function support	mandatory
FR-12	`log()`, `log10()`, `pow()`, `sqrt()` support	mandatory
FR-13	`parse_json()` and `dynamic()` support	mandatory
FR-14	DateTime type support	mandatory
FR-15	Empty-table aggregation behavior	mandatory
FR-16	Rich documentation	mandatory
FR-17	Proper module architecture	mandatory
FR-18	Comprehensive tests	mandatory
FR-19	`let` statement support	mandatory
FR-20	`union` operator support	mandatory
FR-21	`join` operator support	mandatory
FR-22	Wrap single-DataFrame quick-query API (adxpandas)	mandatory
FR-23	Jupyter magic commands (adxpandas)	mandatory
FR-24	`render` operator chart visualization (adxpandas)	mandatory

FR-01: SQLite-based local file database¶

AdxLite must use SQLite as its storage engine.

Expected behavior:

tables are stored in a SQLite database file or in-memory database
no external service is required to create, query, or mutate data
the database connection is created through Python's SQLite support

Validation ideas:

instantiate AdxLiteClient("analytics.db")
ingest data
reopen the file and confirm tables persist

FR-02: Pandas DataFrame ingestion as tables¶

AdxLite must ingest pandas DataFrames into queryable tables.

Expected behavior:

ingest_from_pandas() must exist as a public API
ingest() may exist as the primary implementation or alias
the table name must become queryable immediately after ingestion
schema metadata must be recorded for later restoration and validation

Validation ideas:

ingest a DataFrame with string, numeric, bool, and datetime columns
confirm list_tables() shows the new table
confirm get_schema() reports inferred logical types

FR-03: KQL query support (virtual Kusto database)¶

AdxLite must support querying local tables with a documented subset of KQL.

Expected behavior:

the query language must be KQL-inspired pipeline syntax
supported operators and functions must be documented
results must be returned as pandas DataFrames
queries must run entirely against local ingested tables

Validation ideas:

run representative where, project, extend, summarize, and sort by queries
confirm output matches expected DataFrame content

FR-04: `.append` command¶

AdxLite must support a KQL-style append command that writes query results into a local table.

Expected behavior:

syntax must follow .append TableName <| query
the nested query must run through the normal planning and execution path
the command must append rows using existing schema validation
the command should return an empty pandas DataFrame after success

Validation ideas:

create a destination table
append filtered rows from another table
query the destination table and confirm new rows were written

FR-05: Simple client API¶

AdxLite must provide a small, easy-to-understand Python API.

Expected behavior:

a public AdxLiteClient class must exist
the client must expose ingestion, query, schema inspection, table listing, dropping, and closing operations
the client must support context manager usage

Validation ideas:

instantiate the client
call every public method at least once in tests or examples
confirm the context manager closes cleanly

FR-06: No external Kusto cluster connections¶

AdxLite must remain local-only and must not require or attempt remote Azure Data Explorer connectivity.

Expected behavior:

no connection strings, cluster URIs, or remote auth flows are required
all data must originate from local ingestion
documentation must make the local-only model explicit

Validation ideas:

inspect the public API and configuration surface
confirm no cluster connection API exists

FR-07: No join/union across databases¶

AdxLite must not support multi-database relational composition such as cross-database joins or unions.

Expected behavior:

cross-database references (e.g., database("other").Table) should raise an error
all tables in join and union must exist in the same local database
documentation must state that the supported model is a single local database

Validation ideas:

join or union a table that does not exist in the local database
confirm a TableNotFoundError is raised

FR-08: `parse` operator support¶

AdxLite must support the KQL parse operator.

Expected behavior:

the parser must recognize parse <expr> with <pattern>
capture variables must be added as new columns
wildcard * segments must skip text without capturing
execution may use pandas post-processing if SQL-only execution is unsuitable

Validation ideas:

ingest a log message column
parse user and action fields from the message text
confirm projected capture columns match expected values

FR-09: `ago()` and `now()` functions¶

AdxLite must support current-time and relative-time datetime helpers.

Expected behavior:

now() must return a current UTC timestamp representation
ago(timespan) must subtract a supported timespan from the current UTC timestamp
both functions must be usable in query expressions

Validation ideas:

run queries using now() and ago(1d) in filters or extend
confirm outputs are valid datetime-like strings or restored values as documented

FR-10: `matches regex` with partial-match semantics¶

AdxLite must support the matches regex operator using partial-match behavior.

Expected behavior:

the operator must return true when the pattern matches any substring, not just the whole string
regex matching must be available both in SQLite execution and pandas fallback execution
documentation must clearly state the partial-match semantics

Validation ideas:

evaluate a query such as Message matches regex "err.."
confirm a row with prefix error suffix is matched

FR-11: `extract()` function¶

AdxLite must support regex capture extraction through extract().

Expected behavior:

the function must accept a pattern, group index, and text value
it must return the captured subgroup or null when no match is found
documentation must explain the return type and group numbering expectations

Validation ideas:

query extract("user=(\w+)", 1, Message) on sample rows
confirm the returned value is the captured user name

FR-12: Math function support¶

AdxLite must support at least log(), log10(), pow(), and sqrt().

Expected behavior:

the functions must be callable in expressions such as extend
results must use numeric return values
the SQL and pandas execution layers must behave consistently enough for documented examples and tests

Validation ideas:

ingest numeric columns
compute logarithms, powers, and roots in a query
compare results with Python expectations

FR-13: `parse_json()` and `dynamic()` support¶

AdxLite must support JSON parsing helpers.

Expected behavior:

parse_json(x) and dynamic(x) must exist
JSON values may be represented as JSON text rather than full nested object types in results
extractjson() must allow pulling values from JSON text

Validation ideas:

ingest JSON text payloads
parse and extract fields in a query
confirm documented result behavior

FR-14: DateTime type support¶

AdxLite must support datetime ingestion, storage, comparison, and helper functions.

Expected behavior:

pandas datetime columns must be detected during ingestion
storage must preserve enough information for correct ordering and restoration
result columns typed as datetime must be restored to pandas datetime dtype where applicable
datetime helpers such as bin, datetime_diff, format_datetime, and datetime_add must be supported as documented

Validation ideas:

ingest datetime columns
query with datetime filters and computed buckets
confirm round-trip dtype restoration

FR-15: Empty table aggregation¶

AdxLite must support sensible aggregation behavior on empty tables.

Expected behavior:

count() on an empty input must return 0
aggregate queries against empty tables must still return a row when appropriate for the summarize form being used
null-like results for aggregates such as sum and avg should be documented clearly

Validation ideas:

ingest an empty DataFrame with a numeric column
run summarize total=count(), sum_value=sum(value)
confirm total == 0

FR-16: Rich documentation¶

AdxLite must include documentation that is detailed enough for a user to understand the product without reading the source code.

Expected behavior:

architecture, decisions, type system, requirements, guides, reference docs, and project README must exist
each document should include explanation, examples, parameter notes, return value notes, and cross-references where relevant
limitations and behavioral differences from Azure Data Explorer must be explicit

Validation ideas:

review the documentation set for completeness and internal consistency
confirm that major APIs, operators, functions, and constraints are documented

FR-17: Proper module architecture¶

AdxLite must keep a clear separation between API, parsing, translation, storage, and execution concerns.

Expected behavior:

the repository should contain distinct modules for the public client, parser, translator, storage, and engine layers
responsibilities should remain understandable and reviewable
the architecture should support future extension without collapsing all logic into a single file

Validation ideas:

inspect the module tree
confirm responsibilities match the documentation in Architecture

FR-18: Comprehensive tests¶

AdxLite must include automated tests covering the supported feature set.

Expected behavior:

parser tests must exist
translator tests must exist
unit tests for UDF behavior must exist
integration tests must cover query execution, datetime handling, advanced functions, and append behavior

Validation ideas:

run python -m pytest -q
confirm the suite passes
review test files for coverage of representative features

Non-requirement clarifications¶

The following are intentionally not required by the current scope:

external Kusto cluster connectivity
full Azure Data Explorer feature parity
distributed execution
cross-database joins or unions
remote metadata catalogs
automatic dynamic-object expansion to nested Python structures in result DataFrames

Traceability to documentation¶

Each major requirement is covered elsewhere in the docs.

Requirement area	Primary reference
Client API	API reference
Query operators	Operators reference
Functions	Functions reference
Architecture	Architecture
Storage and types	Type system
Usage examples	Quickstart, Advanced queries
Limitations	Limitations

Review checklist¶

When evaluating changes to AdxLite, use this checklist:

does the change preserve local-only execution?
does it keep the client API simple?
is the supported behavior covered by tests?
is the behavior documented in the relevant guide or reference page?
does it preserve or intentionally extend the supported KQL subset?

FR-19: `let` statement support¶

AdxLite must support let bindings for naming scalar values and tabular sub-queries.

Supported forms¶

Form	Example	Supported
Scalar let	`let threshold = 100; T \\| where val > threshold`	Yes
Tabular let	`let errors = T \\| where level == "error"; errors \\| count`	Yes
Function let	`let f = (x: int) { x * 2 };`	No (raise KqlUnsupportedError)

Expected behavior¶

Multiple let bindings separated by ; before the main query body
Scalar lets substitute as literal values in subsequent expressions
Tabular lets execute the sub-pipeline and make the result available as a table name
Column names in the current row scope take precedence over scalar let names (matching Kusto semantics)
Tabular let results are cleaned up after the main query completes
Later let bindings can reference earlier let bindings

Validation ideas¶

let x = 5; T | where col > x returns correct filtered result
let filtered = T | where active == true; filtered | count returns correct count
Column named same as let variable: column wins
Undefined let reference: error

FR-20: `union` operator support¶

AdxLite must support combining rows from multiple local tables using union.

Supported forms¶

Form	Example	Supported
Source form	`union T1, T2 \\| where x > 5`	Yes
Pipe form	`T1 \\| union T2, T3`	Yes
kind=outer (default)	All columns from all tables, NULL for missing	Yes
kind=inner	Only columns common to all tables	Yes
withsource=col	Adds a column indicating source table name	Yes
Sub-pipeline args	`union (T1 \\| where x > 5), T2`	No (MVP)

Expected behavior¶

Schema alignment: columns missing in a table are filled with NULL (kind=outer)
Column ordering: first table's columns come first, then new columns from subsequent tables
kind=inner: only columns present in ALL tables appear in the result
withsource=colname: prepends a string column with the source table name per row
Works both as a source (before pipe) and as a pipe operator
Union of empty tables produces an empty result with correct schema

Validation ideas¶

Union two tables with same schema: row count = sum
Union tables with different schemas: NULL fill verified
kind=inner: verify only common columns in output
withsource: verify source column content
Union followed by where/summarize

FR-21: `join` operator support¶

AdxLite must support joining two local tables based on key columns.

Supported join kinds¶

Kind	Behavior	Output columns
`innerunique` (default)	Rows matching on both sides	Left + right
`inner`	All matching row combinations	Left + right
`leftouter`	All left rows, matched right or NULL	Left + right
`rightouter`	All right rows, matched left or NULL	Left + right
`fullouter`	All rows from both, NULL where unmatched	Left + right
`leftanti`	Left rows with NO match on right	Left only
`leftsemi`	Left rows with at least one match on right	Left only
`rightanti`	Right rows with NO match on left	Right only
`rightsemi`	Right rows with at least one match on left	Right only

Supported syntax forms¶

Form	Example
Simple key	`T1 \\| join T2 on key`
With kind	`T1 \\| join kind=leftouter T2 on key`
Multi-key	`T1 \\| join T2 on key1, key2`
Qualified keys	`T1 \\| join T2 on $left.id == $right.user_id`
Right sub-pipeline	`T1 \\| join kind=inner (T2 \\| where x > 5) on id`

Output column naming rules¶

Join key columns (simple form): appear once in output (from left side)
Left non-key columns: keep original names
Right non-key columns: if name conflicts with left column, suffix with _right
Anti/semi joins: only output the relevant side's columns

Expected behavior¶

Default join kind is innerunique (treated as inner for MVP — no automatic right-side dedup)
Right side can be a full sub-pipeline enclosed in parentheses
NULL keys do not match (SQL semantics)
All tables must be in the same local database (no cross-database joins)

Validation ideas¶

Inner join: verify only matching rows appear
Left outer: verify NULL fill for unmatched right
Left anti: verify only non-matching left rows
Left semi: verify matching left rows without duplication
Multi-key join: verify correct matching
Column conflict: verify _right suffix
Join with empty right table: left outer returns all left rows with NULL
Right side sub-pipeline: verify filter applies before join

FR-22: Wrap single-DataFrame quick-query API (adxpandas)¶

adxpandas must provide a Wrap class that wraps a single DataFrame for quick KQL queries with method chaining.

Rationale¶

Many users work with a single DataFrame and want a quick, fluent way to apply KQL operators without setting up a client and registering tables. The Wrap pattern (inspired by KustoPandas) provides this.

Expected behavior¶

Wrap(df) wraps any pandas DataFrame
w.execute("self | where x > 1 | project name") runs a query using self as the table name
execute() returns a new Wrap for further chaining (or RenderResult if query ends with render)
Convenience methods .where(), .project(), .extend(), .summarize(), .sort(), .take(), .top(), .count(), .distinct(), .project_away() each return a new Wrap
.render() returns a RenderResult (not chainable — terminal operation)
.df property exposes the underlying DataFrame
_repr_html_() delegates to DataFrame for Jupyter display
The wrapped DataFrame is never mutated; each operation returns a new Wrap

Validation ideas¶

Wrap(df).where("x > 1") returns Wrap with filtered rows
Chain: w.where(...).project(...).take(5) produces correct result
w.df is the expected DataFrame
w.execute("self | ...") works with any KQL pipeline

FR-23: Jupyter magic commands (adxpandas)¶

adxpandas must provide IPython/Jupyter magic commands for interactive KQL querying.

Rationale¶

Notebooks are a primary use case. Magic commands let users write KQL directly in cells without Python boilerplate.

Expected behavior¶

import adxpandas.magic registers the %kql and %%kql magic
Line magic: %kql df | where x > 1 | take 5 — executes against local-scope DataFrames
Cell magic: %%kql\ndf | where x > 1 — multi-line queries
Variables in the local and global namespace that are DataFrames or Wraps are available as table names
Returns Wrap (displayable in notebook) or RenderResult (if query ends with render)
Result can be captured: result = %kql df | where x > 1
IPython is a lazy import; import adxpandas must succeed without IPython installed
If IPython is not available when importing adxpandas.magic, raise a clear ImportError

Validation ideas¶

Magic returns correct filtered DataFrame
Multiple DataFrames in namespace can be referenced
Wrap objects in namespace are usable as tables
Missing IPython raises clear error

FR-24: `render` operator chart visualization (adxpandas)¶

adxpandas must support the KQL render operator for chart visualization.

Rationale¶

KQL's render operator is a standard way to visualize query results. Supporting it allows users to produce charts from their queries without separate plotting code.

Supported visualization types¶

Type	Chart	Description
`timechart`	Line chart	Time series with datetime x-axis
`linechart`	Line chart	General line chart
`barchart`	Horizontal bar	Horizontal bar chart
`columnchart`	Vertical bar	Vertical bar chart
`piechart`	Pie chart	Proportional display
`areachart`	Area chart	Filled line chart
`table`	Table figure	Tabular display

Supported syntax¶

T | ... | render visualization [with (xcolumn=col, ycolumns=col1, title="...")]

Expected behavior¶

render must be the terminal operator in a pipeline (nothing follows it)
render is a display directive, not a data transformation — it does not change the DataFrame
The executor ignores RenderOp during execution (no effect on data pipeline)
When detected by Wrap or magic, the result is a RenderResult object
RenderResult has a .df property for the raw data and .figure for the matplotlib figure
RenderResult._repr_html_() embeds the chart as PNG in notebooks
matplotlib is a lazy import; render raises ImportError with install instructions if missing
AdxPandasClient.query() always returns a DataFrame (render is ignored at that level)
Wrap .render() method creates a RenderResult directly (without query string)

Design constraints¶

RenderOp is modeled as an Operator AST node for parser consistency
The parser accepts render anywhere in the pipeline position but semantically it should be terminal
with (...) properties are optional and parsed as key=value pairs
Default xcolumn is the first column; default ycolumns are all remaining columns

Validation ideas¶

T | summarize count() by city | render barchart parses correctly
Executor produces correct DataFrame (render doesn't alter data)
Wrap.execute("self | ... | render timechart") returns RenderResult
Wrap.render("barchart") returns RenderResult
Missing matplotlib gives clear ImportError
render with with (xcolumn=time, title="My Chart") parses properties

Functional Requirements¶

Scope statement¶

Requirement summary table¶

FR-01: SQLite-based local file database¶

FR-02: Pandas DataFrame ingestion as tables¶

FR-03: KQL query support (virtual Kusto database)¶

FR-04: .append command¶

FR-05: Simple client API¶

FR-06: No external Kusto cluster connections¶

FR-07: No join/union across databases¶

FR-08: parse operator support¶

FR-09: ago() and now() functions¶

FR-10: matches regex with partial-match semantics¶

FR-11: extract() function¶

FR-12: Math function support¶

FR-13: parse_json() and dynamic() support¶

FR-14: DateTime type support¶

FR-15: Empty table aggregation¶

FR-16: Rich documentation¶

FR-17: Proper module architecture¶

FR-18: Comprehensive tests¶

Non-requirement clarifications¶

Traceability to documentation¶

Review checklist¶

Related documents¶

FR-19: let statement support¶

Supported forms¶

Expected behavior¶

Validation ideas¶

FR-20: union operator support¶

Supported forms¶

Expected behavior¶

Validation ideas¶

FR-21: join operator support¶

Supported join kinds¶

Supported syntax forms¶

Output column naming rules¶

Expected behavior¶

Validation ideas¶

FR-22: Wrap single-DataFrame quick-query API (adxpandas)¶

Rationale¶

Expected behavior¶

Validation ideas¶

FR-23: Jupyter magic commands (adxpandas)¶

Rationale¶

Expected behavior¶

Validation ideas¶

FR-24: render operator chart visualization (adxpandas)¶

Rationale¶

Supported visualization types¶

Supported syntax¶

Expected behavior¶

Design constraints¶

Validation ideas¶

FR-04: `.append` command¶

FR-08: `parse` operator support¶

FR-09: `ago()` and `now()` functions¶

FR-10: `matches regex` with partial-match semantics¶

FR-11: `extract()` function¶

FR-13: `parse_json()` and `dynamic()` support¶

FR-19: `let` statement support¶

FR-20: `union` operator support¶

FR-21: `join` operator support¶

FR-24: `render` operator chart visualization (adxpandas)¶