Type System¶
This document explains how AdxLite represents data types across its three execution layers: KQL, SQLite, and pandas. Understanding the type system is important because AdxLite intentionally tracks logical type metadata rather than relying only on SQLite storage affinity.
For ingestion behavior see Ingestion guide. For architecture context see Architecture. For function-level details see Functions reference.
Type system overview¶
AdxLite works with logical KQL-style types, physical SQLite storage types, and pandas runtime dtypes.
The same value may therefore have three relevant representations:
- a logical KQL type exposed by
get_schema() - a SQLite column storage type used on disk
- a pandas dtype or Python object representation seen in DataFrame results
Supported logical types¶
The project documentation and schema metadata use the following logical types.
| Logical type | Meaning |
|---|---|
string |
Textual data |
int |
32-bit style integer conceptually; see note below |
long |
Integer values used by current inference and most numeric whole-number operations |
real |
Floating-point numeric data |
bool |
Boolean data |
datetime |
Timestamp data stored as ISO-8601 text |
timespan |
Duration literals such as 1d and 30m used in expressions and functions |
dynamic |
JSON-like values represented as JSON text |
Important note on int vs long¶
Current DataFrame inference maps pandas integer columns to long. The public language surface includes conversion functions such as toint() and tolong(), but schema inference does not currently distinguish a separate stored int type from long.
Practical implication:
- for table schemas produced by ingestion, expect integer columns to appear as
long - treat
intin the language reference as a supported conversion target rather than a distinct inferred storage schema category
Mapping between KQL, SQLite, and pandas¶
| Logical KQL type | SQLite storage type | Typical pandas/runtime form | Notes |
|---|---|---|---|
string |
TEXT |
object / string-like values |
Default fallback type for non-numeric, non-datetime columns |
long |
INTEGER |
integer dtype or object depending on result shape | Used for inferred integer columns and count-like outputs |
real |
REAL |
float dtype | Used for floats and many math function outputs |
bool |
INTEGER |
pandas nullable boolean on restored schema columns | SQLite stores booleans using integer affinity |
datetime |
TEXT |
datetime64[ns] when restored |
Stored as ISO-8601 text, restored through metadata |
dynamic |
TEXT |
usually object/string containing JSON text | JSON is not auto-expanded to nested Python structures in results |
timespan |
literal, not table column by default | string literal parsed by helpers | Used primarily in expression parsing and UDF evaluation |
Type inference during ingestion¶
Type inference happens in storage.kql_types.infer_column_type().
Current inference rules:
- pandas datetime dtype ->
datetime - pandas boolean dtype ->
bool - pandas integer dtype ->
long - pandas float dtype ->
real - everything else ->
string
This means object-dtype columns containing Python dictionaries or lists are not automatically inferred as dynamic; they fall back to string unless you pre-normalize them to JSON text intentionally.
Storage normalization rules¶
After type inference, AdxLite normalizes values for SQLite storage.
datetime¶
- converted with
pandas.to_datetime(..., errors="coerce") - each non-null value is stored as
value.isoformat() - null or invalid values become
None
bool¶
- non-null values are converted to
bool(value) - nulls remain
None - SQLite stores the values with integer affinity
Other types¶
- values are passed through with missing values normalized to
None
Result restoration rules¶
When Database.query_dataframe() receives a result_schema, it restores result columns by logical type.
Current restoration behavior:
datetime->pandas.to_datetime(series, errors="coerce")bool->series.astype("boolean")- other types -> returned as-is from the DataFrame built from SQLite rows
Important implication:
Columns created by SQL expressions may only be restored if the planner inferred an appropriate logical type for them.
Datetime handling in detail¶
Datetime behavior is one of the most important parts of the type system because SQLite has no dedicated datetime storage type.
Storage representation¶
Datetimes are stored as ISO-8601 strings in TEXT columns.
Example stored values:
Why this works well¶
- readable during direct SQLite inspection
- easy to parse back into pandas timestamps
- stable for UDF-based datetime math
- compatible with lexical ordering when values are consistently normalized
Comparison semantics¶
Datetime comparisons are typically executed on the ISO-8601 values or UDF-transformed results.
Examples:
Events | where ts >= datetime(2024-01-01)
Events | where ts between (datetime(2024-01-01) .. datetime(2024-01-31))
Because AdxLite uses normalized ISO strings, these comparisons behave predictably for typical timestamp forms produced by ingestion and datetime helpers.
Datetime-producing helpers¶
The planner treats several functions as datetime-like outputs represented as strings in SQL and then restored where schema allows.
Functions in this category include:
now()ago(timespan)bin(datetime_value, timespan)datetime_add(timespan, datetime_value)
format_datetime() is different: it deliberately produces a formatted string, not a datetime value.
datetime(...) literals¶
AdxLite supports datetime(...) literals in the parser.
Examples:
The parser collects the literal content and stores it as a datetime-kind literal node. At translation time it becomes a bound parameter value.
now() and ago()¶
now()returns the current UTC timestamp as an ISO-8601 stringago(1d)subtracts a parsed timespan from the current UTC timestamp and returns an ISO-8601 string
These are implemented in the UDF layer, not by SQLite built-ins.
bin()¶
bin(datetime_value, timespan) snaps a datetime down to a fixed boundary.
Example:
If ts is 2024-01-01T10:45:30 and the bucket is 1h, the result is 2024-01-01T10:00:00.
datetime_diff()¶
datetime_diff(unit, left, right) returns an integer difference measured in the requested unit.
Supported units currently include:
dayhourminutesecondmillisecond
datetime_add()¶
AdxLite supports a simplified signature:
Example:
This is narrower than some Kusto variants that split unit and amount.
Timespan handling in detail¶
Timespans are not currently inferred as table column schemas by ingestion. Instead, they are recognized primarily as expression literals and UDF inputs.
Supported literal formats¶
| Literal example | Meaning |
|---|---|
1d |
one day |
12h |
twelve hours |
30m |
thirty minutes |
5s |
five seconds |
100ms |
one hundred milliseconds |
1.5h |
one and a half hours |
Parsing rules¶
The UDF layer parses timespans using a simple <number><unit> model.
Supported units:
dhmsms
Unsupported examples include:
- compound values such as
1h30m - week units such as
1w - month or year units such as
1moor1y
Use the simplest supported unit form and, when needed, convert compound durations to a single unit manually.
Dynamic / JSON handling¶
AdxLite exposes dynamic, parse_json, and extractjson, but it currently represents dynamic data as JSON text rather than a nested in-memory object model.
What dynamic() and parse_json() return¶
Both functions produce JSON text strings.
Examples:
If payload contains {"count": 9}, the resulting value is a JSON string such as {"count": 9}.
What extractjson() returns¶
extractjson(path, json_text) extracts a JSON path from a JSON string.
Behavior:
- returns
Noneif the JSON is invalid or the path is not found - returns JSON text when the extracted value is a list or object
- returns string form when the extracted value is scalar
Example:
The result is a string such as "9", not an integer. If you need a numeric type, convert it explicitly:
String and numeric coercion rules¶
AdxLite is intentionally pragmatic rather than enforcing a strict formal static type system.
Common coercion patterns¶
tostring(x)casts to texttoint(x)/tolong(x)cast numeric text to integer when possibletodouble(x)/toreal(x)cast numeric text to floating-point when possible- arithmetic operators generally rely on SQLite or pandas numeric behavior
- string helpers convert inputs to strings as needed
Null handling¶
Many helper functions return None when one or more required inputs are null. Examples include many UDF-backed functions such as replace_string, sqrt, or extractjson when parsing fails.
Boolean handling¶
Boolean table columns are restored to pandas nullable boolean dtype when the result schema says the column is bool. Intermediate expression results may also behave as standard pandas boolean Series depending on execution path.
Type inference in planner expressions¶
The planner infers result types for expressions so the engine can restore outputs sensibly.
Examples:
- comparison operators such as
==,contains,has, andmatches regexinferbool - numeric functions such as
sqrt,log, andpowinferreal - integer-style functions such as
toint,countof, anddatetime_diffinferlong - string-style functions such as
extract,replace_string, andformat_datetimeinferstring - aggregate
count,countif, anddcountinferlong
This inferred schema is one reason query results can restore datetime or boolean columns accurately after SQL execution.
Practical recommendations¶
- use explicit conversion functions when reading JSON scalars
- keep datetime columns as pandas datetime dtype before ingestion
- treat
dynamicvalues as JSON text unless you explicitly deserialize them in Python - prefer
longas the mental model for integer table columns - remember that type restoration happens after query execution, not during raw SQLite storage