Functional Requirements¶
This document captures the complete functional requirement set for AdxLite as implemented and documented in this repository. It is intended to be read by maintainers, contributors, and reviewers who want a single place that explains the required behavior of the project.
The requirements below are written as product-level capabilities rather than low-level implementation tasks. Where useful, each requirement includes rationale, expected behavior, and validation notes.
Scope statement¶
AdxLite is a local, SQLite-based analytical engine that accepts a supported subset of KQL and operates over tables ingested from pandas DataFrames. The product must remain local-first, easy to embed in Python programs, and sufficiently documented for users to understand both supported behavior and limitations without reading the source.
Requirement summary table¶
| ID | Requirement | Status expectation |
|---|---|---|
| FR-01 | SQLite-based local file database | mandatory |
| FR-02 | Pandas DataFrame ingestion as tables | mandatory |
| FR-03 | KQL query support over local tables | mandatory |
| FR-04 | .append command support |
mandatory |
| FR-05 | Simple client API | mandatory |
| FR-06 | No external Kusto cluster connections | mandatory |
| FR-07 | No join/union across databases | mandatory |
| FR-08 | parse operator support |
mandatory |
| FR-09 | ago() and now() support |
mandatory |
| FR-10 | matches regex partial-match semantics |
mandatory |
| FR-11 | extract() function support |
mandatory |
| FR-12 | log(), log10(), pow(), sqrt() support |
mandatory |
| FR-13 | parse_json() and dynamic() support |
mandatory |
| FR-14 | DateTime type support | mandatory |
| FR-15 | Empty-table aggregation behavior | mandatory |
| FR-16 | Rich documentation | mandatory |
| FR-17 | Proper module architecture | mandatory |
| FR-18 | Comprehensive tests | mandatory |
| FR-19 | let statement support |
mandatory |
| FR-20 | union operator support |
mandatory |
| FR-21 | join operator support |
mandatory |
| FR-22 | Wrap single-DataFrame quick-query API (adxpandas) | mandatory |
| FR-23 | Jupyter magic commands (adxpandas) | mandatory |
| FR-24 | render operator chart visualization (adxpandas) |
mandatory |
FR-01: SQLite-based local file database¶
AdxLite must use SQLite as its storage engine.
Expected behavior:
- tables are stored in a SQLite database file or in-memory database
- no external service is required to create, query, or mutate data
- the database connection is created through Python's SQLite support
Validation ideas:
- instantiate
AdxLiteClient("analytics.db") - ingest data
- reopen the file and confirm tables persist
FR-02: Pandas DataFrame ingestion as tables¶
AdxLite must ingest pandas DataFrames into queryable tables.
Expected behavior:
ingest_from_pandas()must exist as a public APIingest()may exist as the primary implementation or alias- the table name must become queryable immediately after ingestion
- schema metadata must be recorded for later restoration and validation
Validation ideas:
- ingest a DataFrame with string, numeric, bool, and datetime columns
- confirm
list_tables()shows the new table - confirm
get_schema()reports inferred logical types
FR-03: KQL query support (virtual Kusto database)¶
AdxLite must support querying local tables with a documented subset of KQL.
Expected behavior:
- the query language must be KQL-inspired pipeline syntax
- supported operators and functions must be documented
- results must be returned as pandas DataFrames
- queries must run entirely against local ingested tables
Validation ideas:
- run representative
where,project,extend,summarize, andsort byqueries - confirm output matches expected DataFrame content
FR-04: .append command¶
AdxLite must support a KQL-style append command that writes query results into a local table.
Expected behavior:
- syntax must follow
.append TableName <| query - the nested query must run through the normal planning and execution path
- the command must append rows using existing schema validation
- the command should return an empty pandas DataFrame after success
Validation ideas:
- create a destination table
- append filtered rows from another table
- query the destination table and confirm new rows were written
FR-05: Simple client API¶
AdxLite must provide a small, easy-to-understand Python API.
Expected behavior:
- a public
AdxLiteClientclass must exist - the client must expose ingestion, query, schema inspection, table listing, dropping, and closing operations
- the client must support context manager usage
Validation ideas:
- instantiate the client
- call every public method at least once in tests or examples
- confirm the context manager closes cleanly
FR-06: No external Kusto cluster connections¶
AdxLite must remain local-only and must not require or attempt remote Azure Data Explorer connectivity.
Expected behavior:
- no connection strings, cluster URIs, or remote auth flows are required
- all data must originate from local ingestion
- documentation must make the local-only model explicit
Validation ideas:
- inspect the public API and configuration surface
- confirm no cluster connection API exists
FR-07: No join/union across databases¶
AdxLite must not support multi-database relational composition such as cross-database joins or unions.
Expected behavior:
- cross-database references (e.g.,
database("other").Table) should raise an error - all tables in
joinandunionmust exist in the same local database - documentation must state that the supported model is a single local database
Validation ideas:
- join or union a table that does not exist in the local database
- confirm a
TableNotFoundErroris raised
FR-08: parse operator support¶
AdxLite must support the KQL parse operator.
Expected behavior:
- the parser must recognize
parse <expr> with <pattern> - capture variables must be added as new columns
- wildcard
*segments must skip text without capturing - execution may use pandas post-processing if SQL-only execution is unsuitable
Validation ideas:
- ingest a log message column
- parse user and action fields from the message text
- confirm projected capture columns match expected values
FR-09: ago() and now() functions¶
AdxLite must support current-time and relative-time datetime helpers.
Expected behavior:
now()must return a current UTC timestamp representationago(timespan)must subtract a supported timespan from the current UTC timestamp- both functions must be usable in query expressions
Validation ideas:
- run queries using
now()andago(1d)in filters orextend - confirm outputs are valid datetime-like strings or restored values as documented
FR-10: matches regex with partial-match semantics¶
AdxLite must support the matches regex operator using partial-match behavior.
Expected behavior:
- the operator must return true when the pattern matches any substring, not just the whole string
- regex matching must be available both in SQLite execution and pandas fallback execution
- documentation must clearly state the partial-match semantics
Validation ideas:
- evaluate a query such as
Message matches regex "err.." - confirm a row with
prefix error suffixis matched
FR-11: extract() function¶
AdxLite must support regex capture extraction through extract().
Expected behavior:
- the function must accept a pattern, group index, and text value
- it must return the captured subgroup or null when no match is found
- documentation must explain the return type and group numbering expectations
Validation ideas:
- query
extract("user=(\w+)", 1, Message)on sample rows - confirm the returned value is the captured user name
FR-12: Math function support¶
AdxLite must support at least log(), log10(), pow(), and sqrt().
Expected behavior:
- the functions must be callable in expressions such as
extend - results must use numeric return values
- the SQL and pandas execution layers must behave consistently enough for documented examples and tests
Validation ideas:
- ingest numeric columns
- compute logarithms, powers, and roots in a query
- compare results with Python expectations
FR-13: parse_json() and dynamic() support¶
AdxLite must support JSON parsing helpers.
Expected behavior:
parse_json(x)anddynamic(x)must exist- JSON values may be represented as JSON text rather than full nested object types in results
extractjson()must allow pulling values from JSON text
Validation ideas:
- ingest JSON text payloads
- parse and extract fields in a query
- confirm documented result behavior
FR-14: DateTime type support¶
AdxLite must support datetime ingestion, storage, comparison, and helper functions.
Expected behavior:
- pandas datetime columns must be detected during ingestion
- storage must preserve enough information for correct ordering and restoration
- result columns typed as datetime must be restored to pandas datetime dtype where applicable
- datetime helpers such as
bin,datetime_diff,format_datetime, anddatetime_addmust be supported as documented
Validation ideas:
- ingest datetime columns
- query with datetime filters and computed buckets
- confirm round-trip dtype restoration
FR-15: Empty table aggregation¶
AdxLite must support sensible aggregation behavior on empty tables.
Expected behavior:
count()on an empty input must return0- aggregate queries against empty tables must still return a row when appropriate for the summarize form being used
- null-like results for aggregates such as
sumandavgshould be documented clearly
Validation ideas:
- ingest an empty DataFrame with a numeric column
- run
summarize total=count(), sum_value=sum(value) - confirm
total == 0
FR-16: Rich documentation¶
AdxLite must include documentation that is detailed enough for a user to understand the product without reading the source code.
Expected behavior:
- architecture, decisions, type system, requirements, guides, reference docs, and project README must exist
- each document should include explanation, examples, parameter notes, return value notes, and cross-references where relevant
- limitations and behavioral differences from Azure Data Explorer must be explicit
Validation ideas:
- review the documentation set for completeness and internal consistency
- confirm that major APIs, operators, functions, and constraints are documented
FR-17: Proper module architecture¶
AdxLite must keep a clear separation between API, parsing, translation, storage, and execution concerns.
Expected behavior:
- the repository should contain distinct modules for the public client, parser, translator, storage, and engine layers
- responsibilities should remain understandable and reviewable
- the architecture should support future extension without collapsing all logic into a single file
Validation ideas:
- inspect the module tree
- confirm responsibilities match the documentation in Architecture
FR-18: Comprehensive tests¶
AdxLite must include automated tests covering the supported feature set.
Expected behavior:
- parser tests must exist
- translator tests must exist
- unit tests for UDF behavior must exist
- integration tests must cover query execution, datetime handling, advanced functions, and append behavior
Validation ideas:
- run
python -m pytest -q - confirm the suite passes
- review test files for coverage of representative features
Non-requirement clarifications¶
The following are intentionally not required by the current scope:
- external Kusto cluster connectivity
- full Azure Data Explorer feature parity
- distributed execution
- cross-database joins or unions
- remote metadata catalogs
- automatic dynamic-object expansion to nested Python structures in result DataFrames
Traceability to documentation¶
Each major requirement is covered elsewhere in the docs.
| Requirement area | Primary reference |
|---|---|
| Client API | API reference |
| Query operators | Operators reference |
| Functions | Functions reference |
| Architecture | Architecture |
| Storage and types | Type system |
| Usage examples | Quickstart, Advanced queries |
| Limitations | Limitations |
Review checklist¶
When evaluating changes to AdxLite, use this checklist:
- does the change preserve local-only execution?
- does it keep the client API simple?
- is the supported behavior covered by tests?
- is the behavior documented in the relevant guide or reference page?
- does it preserve or intentionally extend the supported KQL subset?
Related documents¶
FR-19: let statement support¶
AdxLite must support let bindings for naming scalar values and tabular sub-queries.
Supported forms¶
| Form | Example | Supported |
|---|---|---|
| Scalar let | let threshold = 100; T \| where val > threshold |
Yes |
| Tabular let | let errors = T \| where level == "error"; errors \| count |
Yes |
| Function let | let f = (x: int) { x * 2 }; |
No (raise KqlUnsupportedError) |
Expected behavior¶
- Multiple
letbindings separated by;before the main query body - Scalar lets substitute as literal values in subsequent expressions
- Tabular lets execute the sub-pipeline and make the result available as a table name
- Column names in the current row scope take precedence over scalar let names (matching Kusto semantics)
- Tabular let results are cleaned up after the main query completes
- Later let bindings can reference earlier let bindings
Validation ideas¶
let x = 5; T | where col > xreturns correct filtered resultlet filtered = T | where active == true; filtered | countreturns correct count- Column named same as let variable: column wins
- Undefined let reference: error
FR-20: union operator support¶
AdxLite must support combining rows from multiple local tables using union.
Supported forms¶
| Form | Example | Supported |
|---|---|---|
| Source form | union T1, T2 \| where x > 5 |
Yes |
| Pipe form | T1 \| union T2, T3 |
Yes |
| kind=outer (default) | All columns from all tables, NULL for missing | Yes |
| kind=inner | Only columns common to all tables | Yes |
| withsource=col | Adds a column indicating source table name | Yes |
| Sub-pipeline args | union (T1 \| where x > 5), T2 |
No (MVP) |
Expected behavior¶
- Schema alignment: columns missing in a table are filled with NULL (kind=outer)
- Column ordering: first table's columns come first, then new columns from subsequent tables
kind=inner: only columns present in ALL tables appear in the resultwithsource=colname: prepends a string column with the source table name per row- Works both as a source (before pipe) and as a pipe operator
- Union of empty tables produces an empty result with correct schema
Validation ideas¶
- Union two tables with same schema: row count = sum
- Union tables with different schemas: NULL fill verified
- kind=inner: verify only common columns in output
- withsource: verify source column content
- Union followed by where/summarize
FR-21: join operator support¶
AdxLite must support joining two local tables based on key columns.
Supported join kinds¶
| Kind | Behavior | Output columns |
|---|---|---|
innerunique (default) |
Rows matching on both sides | Left + right |
inner |
All matching row combinations | Left + right |
leftouter |
All left rows, matched right or NULL | Left + right |
rightouter |
All right rows, matched left or NULL | Left + right |
fullouter |
All rows from both, NULL where unmatched | Left + right |
leftanti |
Left rows with NO match on right | Left only |
leftsemi |
Left rows with at least one match on right | Left only |
rightanti |
Right rows with NO match on left | Right only |
rightsemi |
Right rows with at least one match on left | Right only |
Supported syntax forms¶
| Form | Example |
|---|---|
| Simple key | T1 \| join T2 on key |
| With kind | T1 \| join kind=leftouter T2 on key |
| Multi-key | T1 \| join T2 on key1, key2 |
| Qualified keys | T1 \| join T2 on $left.id == $right.user_id |
| Right sub-pipeline | T1 \| join kind=inner (T2 \| where x > 5) on id |
Output column naming rules¶
- Join key columns (simple form): appear once in output (from left side)
- Left non-key columns: keep original names
- Right non-key columns: if name conflicts with left column, suffix with
_right - Anti/semi joins: only output the relevant side's columns
Expected behavior¶
- Default join kind is
innerunique(treated asinnerfor MVP — no automatic right-side dedup) - Right side can be a full sub-pipeline enclosed in parentheses
- NULL keys do not match (SQL semantics)
- All tables must be in the same local database (no cross-database joins)
Validation ideas¶
- Inner join: verify only matching rows appear
- Left outer: verify NULL fill for unmatched right
- Left anti: verify only non-matching left rows
- Left semi: verify matching left rows without duplication
- Multi-key join: verify correct matching
- Column conflict: verify
_rightsuffix - Join with empty right table: left outer returns all left rows with NULL
- Right side sub-pipeline: verify filter applies before join
FR-22: Wrap single-DataFrame quick-query API (adxpandas)¶
adxpandas must provide a Wrap class that wraps a single DataFrame for quick KQL queries with method chaining.
Rationale¶
Many users work with a single DataFrame and want a quick, fluent way to apply KQL operators without setting up a client and registering tables. The Wrap pattern (inspired by KustoPandas) provides this.
Expected behavior¶
Wrap(df)wraps any pandas DataFramew.execute("self | where x > 1 | project name")runs a query usingselfas the table nameexecute()returns a newWrapfor further chaining (orRenderResultif query ends with render)- Convenience methods
.where(),.project(),.extend(),.summarize(),.sort(),.take(),.top(),.count(),.distinct(),.project_away()each return a newWrap .render()returns aRenderResult(not chainable — terminal operation).dfproperty exposes the underlying DataFrame_repr_html_()delegates to DataFrame for Jupyter display- The wrapped DataFrame is never mutated; each operation returns a new Wrap
Validation ideas¶
Wrap(df).where("x > 1")returns Wrap with filtered rows- Chain:
w.where(...).project(...).take(5)produces correct result w.dfis the expected DataFramew.execute("self | ...")works with any KQL pipeline
FR-23: Jupyter magic commands (adxpandas)¶
adxpandas must provide IPython/Jupyter magic commands for interactive KQL querying.
Rationale¶
Notebooks are a primary use case. Magic commands let users write KQL directly in cells without Python boilerplate.
Expected behavior¶
import adxpandas.magicregisters the%kqland%%kqlmagic- Line magic:
%kql df | where x > 1 | take 5— executes against local-scope DataFrames - Cell magic:
%%kql\ndf | where x > 1— multi-line queries - Variables in the local and global namespace that are DataFrames or Wraps are available as table names
- Returns
Wrap(displayable in notebook) orRenderResult(if query ends with render) - Result can be captured:
result = %kql df | where x > 1 - IPython is a lazy import;
import adxpandasmust succeed without IPython installed - If IPython is not available when importing
adxpandas.magic, raise a clear ImportError
Validation ideas¶
- Magic returns correct filtered DataFrame
- Multiple DataFrames in namespace can be referenced
- Wrap objects in namespace are usable as tables
- Missing IPython raises clear error
FR-24: render operator chart visualization (adxpandas)¶
adxpandas must support the KQL render operator for chart visualization.
Rationale¶
KQL's render operator is a standard way to visualize query results. Supporting it allows users to produce charts from their queries without separate plotting code.
Supported visualization types¶
| Type | Chart | Description |
|---|---|---|
timechart |
Line chart | Time series with datetime x-axis |
linechart |
Line chart | General line chart |
barchart |
Horizontal bar | Horizontal bar chart |
columnchart |
Vertical bar | Vertical bar chart |
piechart |
Pie chart | Proportional display |
areachart |
Area chart | Filled line chart |
table |
Table figure | Tabular display |
Supported syntax¶
Expected behavior¶
rendermust be the terminal operator in a pipeline (nothing follows it)renderis a display directive, not a data transformation — it does not change the DataFrame- The executor ignores RenderOp during execution (no effect on data pipeline)
- When detected by Wrap or magic, the result is a
RenderResultobject RenderResulthas a.dfproperty for the raw data and.figurefor the matplotlib figureRenderResult._repr_html_()embeds the chart as PNG in notebooks- matplotlib is a lazy import; render raises ImportError with install instructions if missing
AdxPandasClient.query()always returns a DataFrame (render is ignored at that level)- Wrap
.render()method creates a RenderResult directly (without query string)
Design constraints¶
- RenderOp is modeled as an Operator AST node for parser consistency
- The parser accepts render anywhere in the pipeline position but semantically it should be terminal
with (...)properties are optional and parsed as key=value pairs- Default xcolumn is the first column; default ycolumns are all remaining columns
Validation ideas¶
T | summarize count() by city | render barchartparses correctly- Executor produces correct DataFrame (render doesn't alter data)
Wrap.execute("self | ... | render timechart")returns RenderResultWrap.render("barchart")returns RenderResult- Missing matplotlib gives clear ImportError
renderwithwith (xcolumn=time, title="My Chart")parses properties