The Rust SDK for building DuckDB loadable extensions — no C++ required.
What is quack-rs?
quack-rs is a production-grade Rust SDK that makes building DuckDB
loadable extensions straightforward and safe. It wraps the DuckDB C Extension API — the same
API used by official DuckDB extensions — and eliminates every known FFI pitfall so you can
focus on writing extension logic in pure Rust.
DuckDB's own documentation acknowledges the gap:
"Writing a Rust-based DuckDB extension requires writing glue code in C++ and will force you to build through DuckDB's CMake & C++ based extension template. We understand that this is not ideal and acknowledge the fact that Rust developers prefer to work on pure Rust codebases."
quack-rs closes that gap. No C++. No CMake. No glue code.
What you can build
| Extension type | quack-rs support |
|---|---|
| Scalar functions | ✅ ScalarFunctionBuilder |
| Overloaded scalars | ✅ ScalarFunctionSetBuilder |
| Aggregate functions | ✅ AggregateFunctionBuilder |
| Overloaded aggregates | ✅ AggregateFunctionSetBuilder |
| Table functions | ✅ TableFunctionBuilder |
| Cast / TRY_CAST functions | ✅ CastFunctionBuilder |
| Replacement scans | ✅ ReplacementScanBuilder |
| SQL macros (scalar) | ✅ SqlMacro::scalar |
| SQL macros (table) | ✅ SqlMacro::table |
Copy functions (COPY TO) | ✅ CopyFunctionBuilder (requires duckdb-1-5) |
Note: Window functions have no counterpart in DuckDB's public C Extension API and cannot be implemented from Rust (or any language) via that API. See Known Limitations.
Why does this exist?
quack-rs was extracted from
duckdb-behavioral, a production DuckDB
community extension. Building that extension revealed 16 undocumented pitfalls in DuckDB's
Rust FFI surface — struct layouts, callback contracts, and initialization sequences that
aren't covered anywhere in the DuckDB documentation or libduckdb-sys docs.
Three of those pitfalls caused extension-breaking bugs that passed 435 unit tests before being caught by end-to-end tests:
- A SEGFAULT on load (wrong entry point sequence)
- 6 of 7 functions silently not registered (undocumented function-set naming rule)
- Wrong aggregate results under parallel plans (combine callback not propagating configuration fields to fresh target states)
quack-rs makes each of these impossible through type-safe builders and safe wrappers.
The full catalog is documented in the Pitfall Reference.
Key features
- Zero C++ — no
CMakeLists.txt, no header files, no glue code - All C API function types — scalar, aggregate, table, cast, replacement scan, SQL macro, copy function (
duckdb-1-5) - Panic-free FFI —
init_extensionnever panics; errors surface viaResult - RAII memory management —
LogicalTypeandFfiState<T>prevent leaks and double-frees - Type-safe builders —
ScalarFunctionBuilder,AggregateFunctionBuilder,TableFunctionBuilder,CastFunctionBuilder,ReplacementScanBuilder - SQL macros — register
CREATE MACROstatements without any FFI callbacks - Testable state —
AggregateTestHarness<T>tests aggregate logic without a live DuckDB - Scaffold generator — produces a submission-ready community extension project from code
- 16 pitfalls documented — every known DuckDB Rust FFI pitfall, with symptoms and fixes
Navigation
New to DuckDB extensions? → Start with Quick Start
Adding quack-rs to an existing project? → See Installation
Writing your first function? → See Scalar Functions or Aggregate Functions
Want SQL macros without FFI callbacks? → See SQL Macros
Submitting a community extension? → See Community Extensions
Something broke? → See Pitfall Catalog
Quick Start
This page gets you from zero to a working DuckDB extension in three steps.
Prerequisites
Step 1 — Add quack-rs to your extension
In your extension's Cargo.toml:
[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }
[lib]
name = "my_extension" # must match your extension name — see Pitfall P1
crate-type = ["cdylib", "rlib"]
[profile.release]
panic = "abort" # required — panics across FFI are undefined behavior
lto = true
opt-level = 3
codegen-units = 1
strip = true
Start fresh? Use the scaffold generator to generate a complete, submission-ready project from code.
Step 2 — Write the extension
#![allow(unused)] fn main() { // src/lib.rs use quack_rs::entry_point; use quack_rs::error::ExtensionError; use quack_rs::scalar::ScalarFunctionBuilder; use quack_rs::types::TypeId; use quack_rs::vector::{VectorReader, VectorWriter}; use libduckdb_sys::{duckdb_connection, duckdb_function_info, duckdb_data_chunk, duckdb_vector}; /// Scalar function: double_it(BIGINT) → BIGINT unsafe extern "C" fn double_it( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { // SAFETY: input is a valid data chunk provided by DuckDB. let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; let row_count = reader.row_count(); for row in 0..row_count { if unsafe { !reader.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let value = unsafe { reader.read_i64(row) }; unsafe { writer.write_i64(row, value * 2) }; } } fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { ScalarFunctionBuilder::new("double_it") .param(TypeId::BigInt) .returns(TypeId::BigInt) .function(double_it) .register(con)?; } Ok(()) } entry_point!(my_extension_init_c_api, |con| register(con)); }
Step 3 — Build and test
# Build the extension
cargo build --release
# Load in DuckDB CLI
duckdb -cmd "LOAD './target/release/libmy_extension.so'; SELECT double_it(21);"
# ┌───────────────┐
# │ double_it(21) │
# │ int64 │
# ├───────────────┤
# │ 42 │
# └───────────────┘
macOS: use
.dylibextension. Windows: use.dll.
What's next?
- Learn how DuckDB calls your extension: Extension Anatomy
- Add an aggregate function: Aggregate Functions
- Add SQL macros without any callbacks: SQL Macros
- Generate a complete community extension project: Project Scaffold
Installation
Adding quack-rs to an existing extension
Add the following to your extension's Cargo.toml:
[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }
Why
>=1.4.4, <2? DuckDB 1.4.x and 1.5.x expose the same C API version (v1.2.0), soquack-rssupports both with a single bounded range. The<2upper bound prevents silent adoption of a future major release whose C API may change in breaking ways — making any such upgrade an explicit, auditable decision. See Extension Anatomy.
Required Cargo.toml settings
Every DuckDB extension requires specific Cargo settings to link and behave correctly:
[lib]
name = "my_extension" # ← must match extension name exactly (Pitfall P1)
crate-type = ["cdylib", "rlib"]
# ^^^^^^ cdylib produces the .so/.dylib/.dll DuckDB loads
# rlib allows unit tests and documentation to work
[profile.release]
panic = "abort" # REQUIRED — panics across FFI are undefined behavior
lto = true # recommended — reduces binary size, improves performance
opt-level = 3 # recommended
codegen-units = 1 # recommended — enables full LTO
strip = true # recommended — reduces binary size
Why panic = "abort"?
Rust's default panic behavior unwinds the stack. When a panic crosses an FFI boundary into
DuckDB's C++ code, the result is undefined behavior — DuckDB may crash, corrupt memory,
or silently produce wrong results. The panic = "abort" setting converts panics into
immediate process termination, which is far safer.
quack-rs itself never panics in FFI callbacks, but this setting protects you if a
dependency or your own code panics.
Minimum Supported Rust Version
quack-rs requires Rust ≥ 1.84.1.
This MSRV is required for:
&raw mut exprsyntax for creating raw pointers without references (sound and stable since 1.84.0)const extern fnsupport
Install or update via:
rustup update stable
rustup default stable
Verify:
rustc --version # must be ≥ 1.84.1
Development dependencies
For testing with a live DuckDB instance (example-extension tests only):
[dev-dependencies]
duckdb = { version = ">=1.4.4, <2", features = ["bundled"] }
Important: you cannot call any
duckdb_*function in acargo testprocess when using theloadable-extensionfeature. See Testing Guide for the full explanation.
Starting a new extension from scratch
Use the scaffold generator to produce a complete project with all required files pre-configured. This is the fastest and most reliable way to start a new extension.
Your First Extension
This page walks through hello-ext, the complete reference example bundled with quack-rs.
It registers four functions that together cover every major pattern:
| SQL | Kind | Signature |
|---|---|---|
word_count(text) | Aggregate | VARCHAR → BIGINT |
first_word(text) | Scalar | VARCHAR → VARCHAR |
generate_series_ext(n) | Table | BIGINT → TABLE(value BIGINT) |
CAST(VARCHAR AS INTEGER) | Cast | VARCHAR → INTEGER |
Full source: examples/hello-ext/src/lib.rs
Build and try it
cargo build --release --manifest-path examples/hello-ext/Cargo.toml
Then in the DuckDB CLI:
LOAD './examples/hello-ext/target/release/libhello_ext.so';
-- Aggregate: total words across all rows
SELECT word_count(sentence) FROM (
VALUES ('hello world'), ('one two three'), (NULL)
) t(sentence);
-- → 5 (2 + 3; NULL contributes 0)
-- Scalar: first word of each row
SELECT first_word(sentence) FROM (
VALUES ('hello world'), (' padded '), (''), (NULL)
) t(sentence);
-- → 'hello', 'padded', '', NULL
Overview
An extension has four parts:
- State struct — holds data accumulated during aggregation (aggregate only)
- Callbacks —
update,combine,finalize,state_size,state_init,state_destroy(aggregate) or a single function callback (scalar) - Registration — wire callbacks to DuckDB via
AggregateFunctionBuilder/ScalarFunctionBuilder - Entry point — DuckDB's initialization hook, generated by
entry_point!
Part 1 — Aggregate function: word_count
An aggregate function accumulates state across many rows and emits one result per group.
1a. The state struct
#![allow(unused)] fn main() { #[derive(Default, Debug)] struct WordCountState { count: i64, } impl AggregateState for WordCountState {} }
AggregateState is a marker trait — no methods required.
FfiState<WordCountState> wraps it in a heap-allocated Box<T> behind a raw pointer
and manages the full lifecycle (init, combine, destroy).
1b. state_size and state_init
These two callbacks are always identical boilerplate — delegate to FfiState:
#![allow(unused)] fn main() { unsafe extern "C" fn wc_state_size(_info: duckdb_function_info) -> idx_t { FfiState::<WordCountState>::size_callback(_info) } unsafe extern "C" fn wc_state_init(info: duckdb_function_info, state: duckdb_aggregate_state) { unsafe { FfiState::<WordCountState>::init_callback(info, state) }; } }
size_callback returns size_of::<*mut WordCountState>() — DuckDB allocates a pointer-slot
per group. init_callback runs Box::new(WordCountState::default()) and writes the pointer
into that slot.
1c. update — accumulate one batch
#![allow(unused)] fn main() { unsafe extern "C" fn wc_update( _info: duckdb_function_info, input: duckdb_data_chunk, states: *mut duckdb_aggregate_state, ) { let reader = unsafe { VectorReader::new(input, 0) }; let row_count = reader.row_count(); for row in 0..row_count { if !unsafe { reader.is_valid(row) } { continue; // NULL input → skip (contributes 0 words) } let s = unsafe { reader.read_str(row) }; let words = count_words(s); let state_ptr = unsafe { *states.add(row) }; if let Some(st) = unsafe { FfiState::<WordCountState>::with_state_mut(state_ptr) } { st.count += words; } } } }
Key points:
- Check
is_valid(row)before reading — never dereference an invalid (NULL) row VectorReader::new(chunk, col)gives columncolfrom the chunkcount_wordsis pure Rust — no unsafe, easy to unit-test separately
1d. combine — merge parallel results
Pitfall L1: DuckDB creates fresh zero-initialized target states before calling
combine. You must copy all fields — not just the result field. In an aggregate with config fields (e.g., a histogram with abin_width) you must also copy those, or results will be silently corrupted.
#![allow(unused)] fn main() { unsafe extern "C" fn wc_combine( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, target: *mut duckdb_aggregate_state, count: idx_t, ) { for i in 0..count as usize { let src_ptr = unsafe { *source.add(i) }; let tgt_ptr = unsafe { *target.add(i) }; let src = unsafe { FfiState::<WordCountState>::with_state(src_ptr) }; let tgt = unsafe { FfiState::<WordCountState>::with_state_mut(tgt_ptr) }; if let (Some(s), Some(t)) = (src, tgt) { t.count += s.count; // If you add fields to WordCountState, combine them here too. } } } }
1e. finalize — write output
#![allow(unused)] fn main() { unsafe extern "C" fn wc_finalize( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, result: duckdb_vector, count: idx_t, offset: idx_t, ) { let mut writer = unsafe { VectorWriter::new(result) }; for i in 0..count as usize { let state_ptr = unsafe { *source.add(i) }; match unsafe { FfiState::<WordCountState>::with_state(state_ptr) } { Some(st) => unsafe { writer.write_i64(offset as usize + i, st.count) }, None => unsafe { writer.set_null(offset as usize + i) }, } } } }
offset is DuckDB's output row offset — always use offset as usize + i, not just i.
1f. state_destroy
#![allow(unused)] fn main() { unsafe extern "C" fn wc_state_destroy( states: *mut duckdb_aggregate_state, count: idx_t, ) { unsafe { FfiState::<WordCountState>::destroy_callback(states, count) }; } }
destroy_callback calls Box::from_raw and nulls each pointer, preventing double-free.
Part 2 — Scalar function: first_word
A scalar function processes one data chunk and returns one output value per row. The callback receives the full chunk and an output vector (not per-row state pointers).
Key rule: always propagate NULL
If the input row is NULL, write NULL to output — never read from an invalid row.
#![allow(unused)] fn main() { unsafe extern "C" fn first_word_scalar( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; let row_count = reader.row_count(); for row in 0..row_count { if !unsafe { reader.is_valid(row) } { unsafe { writer.set_null(row) }; // NULL in → NULL out continue; } let s = unsafe { reader.read_str(row) }; unsafe { writer.write_varchar(row, first_word(s)) }; } } }
The pure logic:
#![allow(unused)] fn main() { pub fn first_word(s: &str) -> &str { s.split_whitespace().next().unwrap_or("") } }
Note: set_null internally calls duckdb_vector_ensure_validity_writable before writing
the null flag — this is required by DuckDB and handled for you by VectorWriter.
Part 3 — Registration
#![allow(unused)] fn main() { unsafe fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> { unsafe { AggregateFunctionBuilder::new("word_count") .param(TypeId::Varchar) .returns(TypeId::BigInt) .state_size(wc_state_size) .init(wc_state_init) .update(wc_update) .combine(wc_combine) .finalize(wc_finalize) .destructor(wc_state_destroy) .register(con)?; ScalarFunctionBuilder::new("first_word") .param(TypeId::Varchar) .returns(TypeId::Varchar) .function(first_word_scalar) .register(con)?; } Ok(()) } }
Both builders call the DuckDB C API internally. register returns Err if DuckDB reports
a failure — this propagates to the entry point and is surfaced to the user.
Part 4 — Entry point
#![allow(unused)] fn main() { quack_rs::entry_point!(hello_ext_init_c_api, |con| unsafe { register(con) }); }
This one line emits:
#![allow(unused)] fn main() { #[no_mangle] pub unsafe extern "C" fn hello_ext_init_c_api( info: duckdb_extension_info, access: *const duckdb_extension_access, ) -> bool { unsafe { quack_rs::entry_point::init_extension( info, access, quack_rs::DUCKDB_API_VERSION, |con| unsafe { register(con) }, ) } } }
Pass the full symbol name — hello_ext_init_c_api here. DuckDB looks up this exact
symbol when loading the extension. See The Entry Point for
the full initialization sequence.
Unit tests (no DuckDB process needed)
Test pure logic directly:
#![allow(unused)] fn main() { #[test] fn count_words_whitespace_variants() { assert_eq!(count_words(" hello world "), 2); assert_eq!(count_words("\t\nhello\tworld\n"), 2); assert_eq!(count_words(" "), 0); // all whitespace → 0 } #[test] fn first_word_empty_and_whitespace() { assert_eq!(first_word(""), ""); assert_eq!(first_word(" "), ""); } }
Test aggregate state with AggregateTestHarness:
#![allow(unused)] fn main() { #[test] fn word_count_null_rows_are_skipped() { // NULL rows: the callback skips them (no update call) let mut h = AggregateTestHarness::<WordCountState>::new(); h.update(|s| s.count += count_words("hello")); // NULL row omitted — models callback skip h.update(|s| s.count += count_words("world")); assert_eq!(h.finalize().count, 2); } #[test] fn word_count_combine() { let mut h1 = AggregateTestHarness::<WordCountState>::new(); h1.update(|s| s.count += count_words("hello world")); // 2 let mut h2 = AggregateTestHarness::<WordCountState>::new(); h2.update(|s| s.count += count_words("one two three four")); // 4 h2.combine(&h1, |src, tgt| tgt.count += src.count); assert_eq!(h2.finalize().count, 6); } }
Run all tests with:
cargo test --manifest-path examples/hello-ext/Cargo.toml
See the Testing Guide for the full test strategy.
Project Scaffold
quack_rs::scaffold::generate_scaffold generates a complete, submission-ready DuckDB
community extension project from a single function call. No manual file creation, no
copy-pasting templates.
What it generates
my_extension/
├── Cargo.toml # cdylib crate, pinned deps, release profile
├── Makefile # delegates to cargo + extension-ci-tools
├── extension_config.cmake # required by extension-ci-tools
├── src/
│ ├── lib.rs # entry point template
│ └── wasm_lib.rs # WASM staticlib shim
├── description.yml # community extension metadata
├── test/
│ └── sql/
│ └── my_extension.test # SQLLogicTest skeleton
├── .github/
│ └── workflows/
│ └── extension-ci.yml # cross-platform CI workflow
├── .gitmodules # extension-ci-tools submodule
├── .gitignore
└── .cargo/
└── config.toml # Windows CRT static linking
Usage
use quack_rs::scaffold::{ScaffoldConfig, generate_scaffold}; use std::path::Path; fn main() { let config = ScaffoldConfig { name: "my_extension".to_string(), description: "My DuckDB extension".to_string(), version: "0.1.0".to_string(), license: "MIT".to_string(), maintainer: "Your Name".to_string(), github_repo: "yourorg/duckdb-my-extension".to_string(), excluded_platforms: vec![], }; let files = generate_scaffold(&config).expect("scaffold generation failed"); for file in &files { let path = Path::new(&file.path); if let Some(parent) = path.parent() { std::fs::create_dir_all(parent).unwrap(); } std::fs::write(path, &file.content).unwrap(); println!("created {}", file.path); } }
ScaffoldConfig fields
| Field | Type | Description |
|---|---|---|
name | String | Extension name — must match [lib] name in Cargo.toml and description.yml |
description | String | One-line description for description.yml |
version | String | Semver or git hash — validated by validate_extension_version |
license | String | SPDX license identifier (e.g., "MIT", "Apache-2.0") |
maintainer | String | Your name or org, listed in description.yml |
github_repo | String | "owner/repo" format |
excluded_platforms | Vec<String> | Platforms to skip (e.g., ["wasm_mvp", "wasm_eh"]) |
Name validation
Extension names must satisfy all of:
- Match
^[a-z][a-z0-9_-]*$ - Not exceed 64 characters
- Be globally unique on community-extensions.duckdb.org
Use vendor-prefixed names to avoid collisions: myorg_analytics, not analytics.
The scaffold generator validates the name before generating any files and returns an error if it violates the rules.
After scaffolding
cd my_extension
git init
git submodule add https://github.com/duckdb/extension-ci-tools.git extension-ci-tools
git submodule update --init --recursive
make configure
make release
Then add your function logic in src/lib.rs, write your SQLLogicTests in
test/sql/my_extension.test, and push to GitHub — CI runs automatically.
Excluded platforms
Some extensions cannot be built for all platforms (e.g., extensions that depend on platform-specific system libraries, or WASM environments that lack threading).
#![allow(unused)] fn main() { ScaffoldConfig { excluded_platforms: vec![ "wasm_mvp".to_string(), "wasm_eh".to_string(), "wasm_threads".to_string(), ], // ... } }
Validate individual platform names with quack_rs::validate::validate_platform, or a
semicolon-delimited string (as used in description.yml) with
quack_rs::validate::validate_excluded_platforms_str.
Extension Anatomy
A DuckDB loadable extension is a shared library (.so / .dylib / .dll) that DuckDB loads
at runtime. Understanding what DuckDB expects makes every other part of quack-rs click.
The initialization sequence
When DuckDB loads your extension, it:
- Opens the shared library and looks up the symbol
{name}_init_c_api - Calls that function with an
infohandle and a pointer to function dispatch pointers - Your function must:
a. Call
duckdb_rs_extension_api_init(info, access, api_version)to initialize the dispatch table b. Get theduckdb_databasehandle viaaccess.get_database(info)c. Open aduckdb_connectionviaduckdb_connectd. Register functions on that connection e. Disconnect f. Returntrue(success) orfalse(failure)
quack_rs::entry_point::init_extension performs all of this correctly. The entry_point!
macro generates the required #[no_mangle] extern "C" symbol:
#![allow(unused)] fn main() { entry_point!(my_extension_init_c_api, |con| register(con)); // emits: #[no_mangle] pub unsafe extern "C" fn my_extension_init_c_api(...) }
Symbol naming
The symbol name must be {extension_name}_init_c_api — all lowercase, underscores only.
If the symbol is missing or misnamed, DuckDB fails to load the extension.
Extension name: "word_count_ext"
Required symbol: word_count_ext_init_c_api
Pass the full symbol name to entry_point!. This keeps the exported name explicit and
visible at the call site — no hidden identifier manipulation at compile time.
The loadable-extension feature
libduckdb-sys with features = ["loadable-extension"] changes how DuckDB API functions
work fundamentally:
Without feature: duckdb_query(...) → calls linked libduckdb directly
With feature: duckdb_query(...) → dispatches through an AtomicPtr table
The AtomicPtr table starts as null. DuckDB fills it in by calling
duckdb_rs_extension_api_init. This means:
- Any call before
duckdb_rs_extension_api_initpanics with"DuckDB API not initialized" - In
cargo test, you cannot call anyduckdb_*function — the table is never initialized
This is why quack-rs uses AggregateTestHarness for testing: it simulates the aggregate
lifecycle in pure Rust, with zero DuckDB API calls.
Dependency model
graph TD
EXT["your-extension"]
QR["quack-rs"]
LDS["libduckdb-sys >=1.4.4, <2<br/>{loadable-extension}<br/>(headers only — no linked library)"]
EXT --> QR
EXT --> LDS
QR --> LDS
The loadable-extension feature produces a shared library that does not statically link
DuckDB. Instead, it receives DuckDB's function pointers at load time. This is the correct
model for extensions: you run inside DuckDB's process, using its memory and threading.
Version support
libduckdb-sys = ">=1.4.4, <2" — the bounded range is intentional.
DuckDB 1.4.x and 1.5.x both expose C API version v1.2.0 (the version string embedded
in duckdb_rs_extension_api_init). quack-rs has been E2E tested against both releases.
Using a range rather than an exact pin means:
- Extension authors can choose their DuckDB target (pin to
=1.4.4or=1.5.0in their ownCargo.toml) and resolve cleanly againstquack-rs quack-rsitself doesn't force a DuckDB downgrade on users
The <2 upper bound is equally intentional: it prevents silent adoption of a future major
release that may introduce breaking C API changes. Upgrading beyond the 1.x band requires
an explicit quack-rs release that audits the new C API surface.
For your own extension's
Cargo.toml: pinlibduckdb-systo the exact DuckDB version you build and test against (e.g.,=1.5.0). Your extension binary will only load in the DuckDB version it was compiled for regardless — the range only matters forquack-rsitself as a library dependency.
Binary compatibility
Extension binaries are tied to a specific DuckDB version and platform. Key facts:
- An extension compiled for DuckDB 1.4.4 will not load in DuckDB 1.5.0
- DuckDB verifies binary compatibility at load time and refuses mismatched binaries
- Official DuckDB extensions are cryptographically signed; community extensions are not
- To load unsigned extensions:
SET allow_unsigned_extensions = true(development only) - The community extension CI provides automated cross-platform builds for each DuckDB release
The Entry Point
Every DuckDB extension must export a single C-callable symbol that DuckDB invokes at load time. quack-rs provides two ways to create it.
Option A: entry_point_v2! with Connection (recommended)
Added in v0.4.0.
The entry_point_v2! macro gives your closure a &Connection instead of a raw
duckdb_connection. The Connection type implements the Registrar trait, which
provides ergonomic methods for registering every function type:
#![allow(unused)] fn main() { use quack_rs::entry_point_v2; use quack_rs::connection::{Connection, Registrar}; use quack_rs::error::ExtensionError; unsafe fn register(con: &Connection) -> Result<(), ExtensionError> { unsafe { con.register_scalar(/* ScalarFunctionBuilder */)?; con.register_aggregate(/* AggregateFunctionBuilder */)?; con.register_table(/* TableFunctionBuilder */)?; con.register_cast(/* CastFunctionBuilder */)?; con.register_scalar_set(/* ScalarFunctionSetBuilder */)?; con.register_aggregate_set(/* AggregateFunctionSetBuilder */)?; con.register_sql_macro(/* SqlMacro */)?; con.register_replacement_scan(/* callback, data, destructor */); // con.register_copy_function(/* CopyFunctionBuilder */)?; // requires duckdb-1-5 } Ok(()) } entry_point_v2!(my_extension_init_c_api, |con| unsafe { register(con) }); }
This emits:
#![allow(unused)] fn main() { #[no_mangle] pub unsafe extern "C" fn my_extension_init_c_api( info: duckdb_extension_info, access: *const duckdb_extension_access, ) -> bool { unsafe { quack_rs::entry_point::init_extension_v2( info, access, quack_rs::DUCKDB_API_VERSION, |con| unsafe { register(con) }, ) } } }
Pass the full symbol name to the macro. The symbol {name}_init_c_api must match the
name field in description.yml and the [lib] name in Cargo.toml.
Why Connection over raw duckdb_connection?
| Feature | entry_point! (raw) | entry_point_v2! (Connection) |
|---|---|---|
| Receives | duckdb_connection | &Connection |
| Registration | Call builders' .register(con) | Call con.register_*() |
| Type safety | Raw pointer | Wrapper with lifetime |
| Future-proofing | Tied to C pointer | Can evolve without breaking extensions |
Option B: The entry_point! macro
The original macro passes a raw duckdb_connection to your closure. It works
identically but requires you to pass the connection to each builder's .register():
#![allow(unused)] fn main() { use quack_rs::entry_point; use quack_rs::error::ExtensionError; fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> { unsafe { // register your functions here Ok(()) } } entry_point!(my_extension_init_c_api, |con| register(con)); }
Option C: Manual entry point
If you need full control (e.g., multiple registration functions, conditional logic):
#![allow(unused)] fn main() { use quack_rs::entry_point::init_extension; use libduckdb_sys::{duckdb_extension_info, duckdb_extension_access}; #[no_mangle] pub unsafe extern "C" fn my_extension_init_c_api( info: duckdb_extension_info, access: *const duckdb_extension_access, ) -> bool { unsafe { init_extension(info, access, quack_rs::DUCKDB_API_VERSION, |con| { register_scalar_functions(con)?; register_aggregate_functions(con)?; register_sql_macros(con)?; Ok(()) }) } } }
What init_extension does
flowchart TD
A["**1. duckdb_rs_extension_api_init**(info, access, version)<br/>Fills the global AtomicPtr dispatch table"]
B["**2. access.get_database**(info)<br/>Returns the duckdb_database handle"]
C["**3. duckdb_connect**(db, &mut con)<br/>Opens a connection for function registration"]
D["**4. register**(con) ← your closure"]
E["**5. duckdb_disconnect**(&mut con)<br/>Always runs, even if registration failed"]
F{Error?}
G["return **true**"]
H["return **false**<br/>error reported via access.set_error"]
A --> B --> C --> D --> E --> F
F -->|no| G
F -->|yes| H
style G fill:#1c3b1c,stroke:#4a9e4a,color:#c8ecc8
style H fill:#3b1c1c,stroke:#9e4a4a,color:#ecc8c8
Errors from step 4 are reported back to DuckDB via access.set_error and the function
returns false. DuckDB then surfaces the error message to the user.
The C API version constant
#![allow(unused)] fn main() { pub const DUCKDB_API_VERSION: &str = "v1.2.0"; }
Pitfall P2: This is the C API version, not the DuckDB release version. DuckDB 1.4.x, 1.5.0, and 1.5.1 all use C API version
v1.2.0. Passing the wrong string causes the metadata script to fail or produce incorrect metadata. See Pitfall P2.
No panics in the entry point
init_extension never panics. All error paths use Result and ?. If your registration
closure returns Err, the error message is reported to DuckDB via access.set_error and
the extension fails to load gracefully.
Never use unwrap() or expect() in FFI callbacks.
See Pitfall L3.
Error Handling
quack-rs uses a single error type throughout: ExtensionError.
ExtensionError
#![allow(unused)] fn main() { use quack_rs::error::{ExtensionError, ExtResult}; // From a string literal let e = ExtensionError::from("something went wrong"); // From a format string let e = ExtensionError::new(format!("failed to register '{}': code {}", name, code)); // Wrapping another error let e = ExtensionError::from_error(some_std_error); }
ExtensionError implements:
std::error::ErrorDisplay,Debug,Clone,PartialEq,EqFrom<&str>,From<String>,From<Box<dyn Error>>
ExtResult<T>
A type alias for Result<T, ExtensionError>, used throughout the SDK:
#![allow(unused)] fn main() { pub type ExtResult<T> = Result<T, ExtensionError>; }
Propagating errors with ?
In your registration function:
#![allow(unused)] fn main() { fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { ScalarFunctionBuilder::new("my_fn") .param(TypeId::BigInt) .returns(TypeId::BigInt) .function(my_fn) .register(con)?; // ← ? propagates registration errors SqlMacro::scalar("my_macro", &["x"], "x + 1")? .register(con)?; Ok(()) } } }
If any registration call fails, ? returns the error from register, which
init_extension then reports to DuckDB via access.set_error.
Error reporting to DuckDB
init_extension converts ExtensionError to a CString for the DuckDB error callback:
#![allow(unused)] fn main() { pub fn to_c_string(&self) -> CString { // Truncates at the first null byte if message contains one CString::new(self.message.as_bytes()).unwrap_or_else(...) } }
DuckDB surfaces this string to the user as the extension load error.
No panics, ever
The cardinal rule of DuckDB extension development:
Never
unwrap(),expect(), orpanic!()in any code path that DuckDB may call.
Rust panics that cross FFI boundaries are undefined behavior. With panic = "abort"
in the release profile, a panic terminates the process — which is safer than UB, but still
unacceptable in production.
Safe patterns
#![allow(unused)] fn main() { // ✅ Use Option methods if let Some(s) = FfiState::<MyState>::with_state_mut(state_ptr) { s.count += 1; } // ✅ Use Result and ? let value = some_fallible_call()?; // ✅ Use unwrap_or / unwrap_or_else / map let count = maybe_count.unwrap_or(0); // ❌ Never in FFI callbacks let s = FfiState::<MyState>::with_state_mut(state_ptr).unwrap(); // undefined behavior }
In init_extension
init_extension wraps everything in match and reports errors via set_error — it can
never panic regardless of what your registration closure returns.
Type System
quack-rs provides TypeId and LogicalType to bridge Rust types and DuckDB column types.
TypeId
TypeId is an ergonomic enum covering all DuckDB column types:
#![allow(unused)] fn main() { use quack_rs::types::TypeId; TypeId::Boolean TypeId::TinyInt // i8 TypeId::SmallInt // i16 TypeId::Integer // i32 TypeId::BigInt // i64 TypeId::UTinyInt // u8 TypeId::USmallInt // u16 TypeId::UInteger // u32 TypeId::UBigInt // u64 TypeId::HugeInt // i128 TypeId::UHugeInt // u128 TypeId::Float // f32 TypeId::Double // f64 TypeId::Timestamp TypeId::TimestampTz TypeId::TimestampS TypeId::TimestampMs TypeId::TimestampNs TypeId::Date TypeId::Time TypeId::TimeTz TypeId::Interval TypeId::Varchar TypeId::Blob TypeId::Decimal TypeId::Enum TypeId::List TypeId::Struct TypeId::Map TypeId::Uuid TypeId::Union TypeId::Bit TypeId::Array TypeId::TimeNs // duckdb-1-5 TypeId::Any // duckdb-1-5 TypeId::Varint // duckdb-1-5 TypeId::SqlNull // duckdb-1-5 TypeId::IntegerLiteral // duckdb-1-5 TypeId::StringLiteral // duckdb-1-5 }
TypeId is Copy, Clone, Debug, PartialEq, Eq, and Display.
SQL name
#![allow(unused)] fn main() { assert_eq!(TypeId::BigInt.sql_name(), "BIGINT"); assert_eq!(TypeId::Varchar.sql_name(), "VARCHAR"); assert_eq!(format!("{}", TypeId::Timestamp), "TIMESTAMP"); }
DuckDB constant
TypeId::to_duckdb_type() returns the DUCKDB_TYPE_* integer constant from libduckdb-sys.
You rarely need this directly — it's called internally by LogicalType::new.
Reverse conversion
TypeId::from_duckdb_type(raw) converts a raw DUCKDB_TYPE constant back into a TypeId.
Panics if the value does not match any known constant.
#![allow(unused)] fn main() { use quack_rs::types::TypeId; let type_id = TypeId::from_duckdb_type(libduckdb_sys::DUCKDB_TYPE_DUCKDB_TYPE_BIGINT); assert_eq!(type_id, TypeId::BigInt); }
LogicalType
LogicalType is a RAII wrapper around DuckDB's duckdb_logical_type. It is used internally
by the function builders.
#![allow(unused)] fn main() { use quack_rs::types::{LogicalType, TypeId}; let lt = LogicalType::new(TypeId::Varchar); // lt.as_raw() returns the duckdb_logical_type pointer // Drop calls duckdb_destroy_logical_type automatically }
Pitfall L7:
duckdb_create_logical_typeallocates memory that must be freed withduckdb_destroy_logical_type.LogicalType'sDropimplementation does this automatically, preventing the memory leak that occurs when calling the DuckDB C API directly. See Pitfall L7.
You almost never need to create LogicalType directly. The function builders
(ScalarFunctionBuilder, AggregateFunctionBuilder) create and destroy them internally.
Constructors
| Constructor | Creates |
|---|---|
LogicalType::new(type_id) | Simple type from a TypeId |
LogicalType::from_raw(ptr) | Takes ownership of a raw duckdb_logical_type handle (unsafe) |
LogicalType::decimal(width, scale) | DECIMAL(width, scale) |
LogicalType::list(element_type) | LIST<element_type> from a TypeId |
LogicalType::list_from_logical(element) | LIST<element> from an existing LogicalType |
LogicalType::map(key, value) | MAP<key, value> from TypeIds |
LogicalType::map_from_logical(key, value) | MAP<key, value> from existing LogicalTypes |
LogicalType::struct_type(fields) | STRUCT from &[(&str, TypeId)] |
LogicalType::struct_type_from_logical(fields) | STRUCT from &[(&str, LogicalType)] |
LogicalType::union_type(members) | UNION from &[(&str, TypeId)] |
LogicalType::union_type_from_logical(members) | UNION from &[(&str, LogicalType)] |
LogicalType::enum_type(members) | ENUM from &[&str] |
LogicalType::array(element_type, size) | ARRAY<element_type>[size] from a TypeId |
LogicalType::array_from_logical(element, size) | ARRAY<element>[size] from an existing LogicalType |
Introspection methods
All introspection methods are unsafe (require a valid DuckDB runtime handle).
| Method | Returns | Applicable to |
|---|---|---|
get_type_id() | TypeId | Any |
get_alias() | Option<String> | Any |
set_alias(alias) | () | Any |
decimal_width() | u8 | DECIMAL |
decimal_scale() | u8 | DECIMAL |
decimal_internal_type() | TypeId | DECIMAL |
enum_internal_type() | TypeId | ENUM |
enum_dictionary_size() | u32 | ENUM |
enum_dictionary_value(index) | String | ENUM |
list_child_type() | LogicalType | LIST |
map_key_type() | LogicalType | MAP |
map_value_type() | LogicalType | MAP |
struct_child_count() | u64 | STRUCT |
struct_child_name(index) | String | STRUCT |
struct_child_type(index) | LogicalType | STRUCT |
union_member_count() | u64 | UNION |
union_member_name(index) | String | UNION |
union_member_type(index) | LogicalType | UNION |
array_size() | u64 | ARRAY |
array_child_type() | LogicalType | ARRAY |
Rust type ↔ DuckDB type mapping
When reading from or writing to vectors, use the corresponding VectorReader/VectorWriter
method:
| DuckDB type | TypeId | Reader method | Writer method |
|---|---|---|---|
BOOLEAN | Boolean | read_bool | write_bool |
TINYINT | TinyInt | read_i8 | write_i8 |
SMALLINT | SmallInt | read_i16 | write_i16 |
INTEGER | Integer | read_i32 | write_i32 |
BIGINT | BigInt | read_i64 | write_i64 |
UTINYINT | UTinyInt | read_u8 | write_u8 |
USMALLINT | USmallInt | read_u16 | write_u16 |
UINTEGER | UInteger | read_u32 | write_u32 |
UBIGINT | UBigInt | read_u64 | write_u64 |
FLOAT | Float | read_f32 | write_f32 |
DOUBLE | Double | read_f64 | write_f64 |
VARCHAR | Varchar | read_str | write_varchar |
INTERVAL | Interval | read_interval | write_interval |
NULLs are handled separately — see NULL Handling & Strings.
Scalar Functions
Scalar functions transform a batch of input rows into a corresponding batch of output values.
They are the most common DuckDB extension pattern — equivalent to SQL's built-in functions
like length(), upper(), or sin().
Function signature
DuckDB calls your scalar function once per data chunk (not once per row). The signature is:
#![allow(unused)] fn main() { unsafe extern "C" fn my_fn( info: duckdb_function_info, // function metadata (rarely needed) input: duckdb_data_chunk, // input data — one or more columns output: duckdb_vector, // output vector — one value per input row ) }
Inside the function, you:
- Create a
VectorReaderfor each input column - Create a
VectorWriterfor the output - Loop over rows, checking for NULLs and transforming values
Registration
#![allow(unused)] fn main() { use quack_rs::scalar::ScalarFunctionBuilder; use quack_rs::types::TypeId; unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { ScalarFunctionBuilder::new("my_fn") .param(TypeId::BigInt) // first parameter type .param(TypeId::BigInt) // second parameter type (if any) .returns(TypeId::BigInt) // return type .function(my_fn) // callback .register(con)?; } Ok(()) } }
The builder validates that returns and function are set before calling
duckdb_register_scalar_function. If DuckDB reports failure, register returns Err.
Validated registration
For user-configurable function names (e.g., from a config file), use try_new:
#![allow(unused)] fn main() { ScalarFunctionBuilder::try_new(name)? // validates name before building .param(TypeId::Varchar) .returns(TypeId::Varchar) .function(my_fn) .register(con)?; }
try_new validates the name against DuckDB naming rules:
[a-z_][a-z0-9_]*, max 256 characters. new panics on invalid names (suitable for
compile-time-known names only).
Complete example: double_it(BIGINT) → BIGINT
#![allow(unused)] fn main() { use quack_rs::vector::{VectorReader, VectorWriter}; use libduckdb_sys::{duckdb_function_info, duckdb_data_chunk, duckdb_vector}; unsafe extern "C" fn double_it( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { // SAFETY: DuckDB provides valid chunk and vector pointers. let reader = unsafe { VectorReader::new(input, 0) }; // column 0 let mut writer = unsafe { VectorWriter::new(output) }; let row_count = reader.row_count(); for row in 0..row_count { if unsafe { !reader.is_valid(row) } { // NULL input → NULL output // SAFETY: row < row_count, writer is valid. unsafe { writer.set_null(row) }; continue; } let value = unsafe { reader.read_i64(row) }; unsafe { writer.write_i64(row, value * 2) }; } } }
Multi-parameter example: add(BIGINT, BIGINT) → BIGINT
#![allow(unused)] fn main() { unsafe extern "C" fn add( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let col0 = unsafe { VectorReader::new(input, 0) }; // first param let col1 = unsafe { VectorReader::new(input, 1) }; // second param let mut writer = unsafe { VectorWriter::new(output) }; for row in 0..col0.row_count() { if unsafe { !col0.is_valid(row) || !col1.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let a = unsafe { col0.read_i64(row) }; let b = unsafe { col1.read_i64(row) }; unsafe { writer.write_i64(row, a + b) }; } } }
VARCHAR example: shout(VARCHAR) → VARCHAR
#![allow(unused)] fn main() { unsafe extern "C" fn shout( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; for row in 0..reader.row_count() { if unsafe { !reader.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let s = unsafe { reader.read_str(row) }; let upper = s.to_uppercase(); unsafe { writer.write_varchar(row, &upper) }; } } }
Overloading with Function Sets
If your function accepts different parameter types or arities, use ScalarFunctionSetBuilder
to register multiple overloads under a single name:
#![allow(unused)] fn main() { use quack_rs::scalar::{ScalarFunctionSetBuilder, ScalarOverloadBuilder}; use quack_rs::types::TypeId; unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { ScalarFunctionSetBuilder::new("my_add") .overload( ScalarOverloadBuilder::new() .param(TypeId::Integer).param(TypeId::Integer) .returns(TypeId::Integer) .function(add_ints) ) .overload( ScalarOverloadBuilder::new() .param(TypeId::Double).param(TypeId::Double) .returns(TypeId::Double) .function(add_doubles) ) .register(con)?; } Ok(()) } }
Like AggregateFunctionSetBuilder, this builder calls duckdb_scalar_function_set_name
on every individual function before adding it to the set
(Pitfall L6).
NULL Handling
By default, DuckDB returns NULL if any argument is NULL — your function callback is
never called for those rows. If you need to handle NULLs explicitly (e.g., for a
COALESCE-like function), set SpecialNullHandling:
#![allow(unused)] fn main() { use quack_rs::types::NullHandling; ScalarFunctionBuilder::new("coalesce_custom") .param(TypeId::BigInt) .returns(TypeId::BigInt) .null_handling(NullHandling::SpecialNullHandling) .function(my_coalesce_fn) .register(con)?; }
With SpecialNullHandling, your callback must check VectorReader::is_valid(row)
and handle NULLs yourself.
Complex parameter and return types
For scalar functions that accept or return parameterized types like LIST(BIGINT),
use param_logical and returns_logical:
#![allow(unused)] fn main() { use quack_rs::scalar::ScalarFunctionBuilder; use quack_rs::types::{LogicalType, TypeId}; ScalarFunctionBuilder::new("flatten_list") .param_logical(LogicalType::list(TypeId::BigInt)) // LIST(BIGINT) input .returns(TypeId::BigInt) .function(flatten_list_fn) .register(con)?; }
These methods are also available on ScalarOverloadBuilder for function sets:
#![allow(unused)] fn main() { ScalarOverloadBuilder::new() .param(TypeId::Varchar) .returns_logical(LogicalType::list(TypeId::Timestamp)) // LIST(TIMESTAMP) output .function(my_fn) }
Key points
VectorReader::new(input, column_index)— the column index is zero-based- Always check
is_valid(row)before reading — skipping this reads garbage for NULL rows set_nullmust be called for NULL outputs — it callsensure_validity_writableautomatically (Pitfall L4)read_boolreturnsbool— handles DuckDB's non-0/1 boolean bytes correctly (Pitfall L5)read_strhandles both inline and pointer string formats automatically (Pitfall P7)
DuckDB 1.5.0 Additions (duckdb-1-5)
The following ScalarFunctionBuilder methods are available when the duckdb-1-5
feature is enabled:
varargs(type_id: TypeId)
Declares that the function accepts a variable number of trailing arguments, all
of the given TypeId. Maps to duckdb_scalar_function_set_varargs.
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("concat_all") .varargs(TypeId::Varchar) .returns(TypeId::Varchar) .function(concat_all_fn) .register(con)?; }
varargs_logical(logical_type: LogicalType)
Like varargs, but accepts a LogicalType for parameterized variadic arguments.
Maps to duckdb_scalar_function_set_varargs.
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("merge_lists") .varargs_logical(LogicalType::list(TypeId::BigInt)) .returns_logical(LogicalType::list(TypeId::BigInt)) .function(merge_lists_fn) .register(con)?; }
volatile()
Marks the function as volatile, meaning DuckDB will not cache or reuse its
results across calls with the same arguments. Maps to
duckdb_scalar_function_set_volatile.
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("random_int") .returns(TypeId::Integer) .volatile() .function(random_int_fn) .register(con)?; }
bind(bind_fn)
Sets a custom bind callback that runs at plan time. Use this to inspect argument
types and set the return type dynamically. Maps to
duckdb_scalar_function_set_bind.
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("dynamic_return") .varargs(TypeId::Varchar) .returns(TypeId::Varchar) // default; overridden in bind .bind(my_bind_fn) .function(dynamic_return_fn) .register(con)?; }
init(init_fn)
Sets a local-init callback invoked once per thread before execution begins. Use
this to allocate per-thread state. Maps to
duckdb_scalar_function_set_init.
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("stateful_fn") .param(TypeId::BigInt) .returns(TypeId::BigInt) .init(my_init_fn) .function(stateful_fn) .register(con)?; }
Extra info
Attach arbitrary data to a scalar function using extra_info. This is useful for
parameterising the function behaviour (e.g., a locale or configuration struct).
The method is available on both ScalarFunctionBuilder and ScalarOverloadBuilder.
#![allow(unused)] fn main() { use std::os::raw::c_void; let config = Box::into_raw(Box::new("en_US".to_string())).cast::<c_void>(); unsafe { ScalarFunctionBuilder::new("locale_upper") .param(TypeId::Varchar) .returns(TypeId::Varchar) .extra_info(config, Some(my_destroy)) .function(locale_upper_fn) .register(con)?; } }
Inside the callback, retrieve the extra info with ScalarFunctionInfo::get_extra_info().
ScalarFunctionInfo
ScalarFunctionInfo wraps the duckdb_function_info handle provided to a scalar
function callback. It exposes:
get_extra_info() -> *mut c_void— retrieves the extra-info pointer set during registrationset_error(message)— reports an error, causing DuckDB to abort the query
#![allow(unused)] fn main() { use quack_rs::scalar::ScalarFunctionInfo; unsafe extern "C" fn my_fn( info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let info = unsafe { ScalarFunctionInfo::new(info) }; let extra = unsafe { info.get_extra_info() }; // ... use extra info, or report errors via info.set_error("...") ... } }
With the duckdb-1-5 feature, ScalarFunctionInfo also provides:
get_bind_data() -> *mut c_void— retrieves bind data set during the bind callbackget_state() -> *mut c_void— retrieves per-thread state set during the init callback
ScalarBindInfo (duckdb-1-5)
ScalarBindInfo wraps the duckdb_bind_info handle provided to a scalar function
bind callback. It exposes:
argument_count() -> u64— number of argumentsget_argument(index) -> duckdb_expression— argument expression atindexget_extra_info() -> *mut c_void— the extra-info pointer from registrationset_bind_data(data, destroy)— stores per-query data retrievable during executionset_error(message)— reports an errorget_client_context() -> ClientContext— access to the connection's catalog and config
ScalarInitInfo (duckdb-1-5)
ScalarInitInfo wraps the duckdb_init_info handle provided to a scalar function
init callback. It exposes:
get_extra_info() -> *mut c_void— the extra-info pointer from registrationget_bind_data() -> *mut c_void— the bind data from the bind callbackset_state(state, destroy)— stores per-thread state retrievable during executionset_error(message)— reports an errorget_client_context() -> ClientContext— access to the connection's catalog and config
Aggregate Functions
Aggregate functions reduce multiple rows into a single value per group — like SUM(),
COUNT(), or AVG(). DuckDB supports parallel aggregation, which introduces a combine
step that merges partial results from parallel workers.
The aggregate lifecycle
flowchart TD
REG["**Registration**<br/>AggregateFunctionBuilder<br/>→ duckdb_register_aggregate_function"]
REG --> SIZE
SIZE --> INIT
INIT --> UPDATE
UPDATE --> COMBINE
COMBINE --> FINAL
FINAL --> DESTROY
SIZE["**state_size**()<br/>How many bytes to allocate per group?"]
INIT["**state_init**(state)<br/>Initialize a fresh state"]
UPDATE["**update**(chunk, states[])<br/>Process one input batch"]
COMBINE["**combine**(src[], tgt[], count)<br/>Merge partial results from parallel workers<br/>⚠️ Pitfall L1: target starts fresh — copy ALL config fields"]
FINAL["**finalize**(states[], out, count)<br/>Write results to output vector"]
DESTROY["**state_destroy**(states[], count)<br/>Free memory"]
style COMBINE fill:#fff3cd,stroke:#e6ac00,color:#333
DuckDB may call combine multiple times as it merges results from parallel segments.
Target states in combine are always fresh (zero-initialized via state_init).
Registration
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionBuilder; use quack_rs::types::TypeId; unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { AggregateFunctionBuilder::new("my_agg") .param(TypeId::Varchar) // input type(s) .returns(TypeId::BigInt) // output type .state_size(state_size) .init(state_init) .update(update) .combine(combine) .finalize(finalize) .destructor(state_destroy) .register(con)?; } Ok(()) } }
The five core callbacks (state_size, init, update, combine, finalize) must be
set before register — the builder will return an error if any are missing. The
destructor callback is optional but strongly recommended when your state allocates
heap memory (e.g., when using FfiState<T>).
Callback signatures
state_size
#![allow(unused)] fn main() { unsafe extern "C" fn state_size(_info: duckdb_function_info) -> idx_t { FfiState::<MyState>::size_callback(_info) } }
Returns the size DuckDB must allocate per group. This is always size_of::<*mut MyState>()
— a pointer, since FfiState<T> stores a Box<T> pointer in the allocated slot.
state_init
#![allow(unused)] fn main() { unsafe extern "C" fn state_init(info: duckdb_function_info, state: duckdb_aggregate_state) { unsafe { FfiState::<MyState>::init_callback(info, state) }; } }
Allocates a Box<MyState> (using MyState::default()) and writes its raw pointer into
the DuckDB-allocated state slot.
update
#![allow(unused)] fn main() { unsafe extern "C" fn update( _info: duckdb_function_info, input: duckdb_data_chunk, states: *mut duckdb_aggregate_state, ) { let reader = unsafe { VectorReader::new(input, 0) }; let row_count = reader.row_count(); for row in 0..row_count { if unsafe { !reader.is_valid(row) } { continue; } let value = unsafe { reader.read_i64(row) }; let state_ptr = unsafe { *states.add(row) }; if let Some(st) = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) } { st.accumulate(value); } } } }
states[i] corresponds to chunk row i. Each state belongs to one group.
combine
#![allow(unused)] fn main() { unsafe extern "C" fn combine( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, target: *mut duckdb_aggregate_state, count: idx_t, ) { for i in 0..count as usize { let src = unsafe { FfiState::<MyState>::with_state(*source.add(i)) }; let tgt = unsafe { FfiState::<MyState>::with_state_mut(*target.add(i)) }; if let (Some(s), Some(t)) = (src, tgt) { // ⚠️ MUST copy ALL fields — see Pitfall L1 t.config_field = s.config_field; // configuration t.accumulator += s.accumulator; // data } } } }
Pitfall L1 — critical: Target states are fresh
T::default()values. You must copy every field, including configuration fields set duringupdate. Forgetting even one config field produces silently wrong results. See Pitfall L1.
finalize
#![allow(unused)] fn main() { unsafe extern "C" fn finalize( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, result: duckdb_vector, count: idx_t, offset: idx_t, ) { let mut writer = unsafe { VectorWriter::new(result) }; for i in 0..count as usize { let state_ptr = unsafe { *source.add(i) }; match unsafe { FfiState::<MyState>::with_state(state_ptr) } { Some(st) => unsafe { writer.write_i64(offset as usize + i, st.result()) }, None => unsafe { writer.set_null(offset as usize + i) }, } } } }
The offset parameter is non-zero when DuckDB is writing into a portion of a larger vector.
Always add it to your index.
state_destroy
#![allow(unused)] fn main() { unsafe extern "C" fn state_destroy(states: *mut duckdb_aggregate_state, count: idx_t) { unsafe { FfiState::<WordCountState>::destroy_callback(states, count) }; } }
destroy_callback calls Box::from_raw for each state and then nulls the pointer,
preventing double-free. See Pitfall L2.
Complex parameter and return types
For functions that accept or return parameterized types like LIST(BIGINT),
MAP(VARCHAR, INTEGER), or STRUCT(...), use param_logical and
returns_logical instead of param and returns:
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionBuilder; use quack_rs::types::{LogicalType, TypeId}; unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { AggregateFunctionBuilder::new("retention") .param(TypeId::Boolean) .param(TypeId::Boolean) .returns_logical(LogicalType::list(TypeId::Boolean)) // LIST(BOOLEAN) .state_size(state_size) .init(state_init) .update(update) .combine(combine) .finalize(finalize) .destructor(state_destroy) .register(con)?; } Ok(()) } }
param_logical and param can be interleaved — the parameter position is
determined by the total number of calls made so far:
#![allow(unused)] fn main() { AggregateFunctionBuilder::new("my_func") .param(TypeId::Varchar) // position 0: VARCHAR .param_logical(LogicalType::list(TypeId::BigInt)) // position 1: LIST(BIGINT) .param(TypeId::Integer) // position 2: INTEGER .returns(TypeId::BigInt) // ... }
If both returns and returns_logical are called, the logical type takes precedence.
Extra info
Attach arbitrary data to an aggregate function using extra_info. This is useful
for parameterising the function behaviour (e.g., passing configuration):
#![allow(unused)] fn main() { use std::os::raw::c_void; let config = Box::into_raw(Box::new(42u64)).cast::<c_void>(); unsafe { AggregateFunctionBuilder::new("my_agg") .param(TypeId::BigInt) .returns(TypeId::BigInt) .extra_info(config, Some(my_destroy)) .state_size(state_size) .init(state_init) .update(update) .combine(combine) .finalize(finalize) .destructor(state_destroy) .register(con)?; } }
Inside callbacks, retrieve the extra info with AggregateFunctionInfo::get_extra_info().
AggregateFunctionInfo
AggregateFunctionInfo wraps the duckdb_function_info handle provided to
aggregate function callbacks (update, combine, finalize, etc.). It exposes:
get_extra_info() -> *mut c_void— retrieves the extra-info pointer set during registrationset_error(message)— reports an error, causing DuckDB to abort the query
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionInfo; unsafe extern "C" fn update( info: duckdb_function_info, input: duckdb_data_chunk, states: *mut duckdb_aggregate_state, ) { let info = unsafe { AggregateFunctionInfo::new(info) }; let extra = unsafe { info.get_extra_info() }; // ... use extra info, or report errors via info.set_error("...") ... } }
Next steps
- State Management —
FfiState<T>,AggregateState, and lifecycle details - Overloading with Function Sets — register multiple signatures under one name
State Management
FfiState<T> manages the lifecycle of aggregate state — allocation, initialization, access,
and destruction — so you never write raw pointer code for state management.
AggregateState trait
Any type that is Default + Send + 'static can be used as aggregate state by implementing
the AggregateState marker trait:
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateState; #[derive(Default, Debug)] struct MyState { config: usize, // set in update, must be propagated in combine total: i64, // accumulated data } impl AggregateState for MyState {} }
AggregateState has no required methods. The Default bound is used in state_init to
create fresh states.
FfiState<T>
FfiState<T> is a #[repr(C)] struct containing a single raw pointer:
#![allow(unused)] fn main() { #[repr(C)] pub struct FfiState<T> { inner: *mut T, } }
This matches DuckDB's expectation: DuckDB allocates state_size() bytes per group,
and your state lives in a Box<T> heap allocation whose pointer is stored in that space.
Memory layout
DuckDB-allocated slot (state_size bytes = sizeof(*mut T)):
[ inner: *mut T ] ──→ Box<T> (on the Rust heap)
Lifecycle callbacks
#![allow(unused)] fn main() { // state_size: DuckDB calls this once to know how many bytes to allocate per group FfiState::<MyState>::size_callback(_info) // Returns: size_of::<*mut MyState>() // state_init: DuckDB calls this once per group after allocating the slot FfiState::<MyState>::init_callback(info, state) // Effect: writes Box::into_raw(Box::new(MyState::default())) into the slot // state_destroy: DuckDB calls this after finalize for every group FfiState::<MyState>::destroy_callback(states, count) // Effect: for each state: drop(Box::from_raw(inner)); inner = null }
Accessing state in callbacks
#![allow(unused)] fn main() { // Immutable access (in finalize, combine source): if let Some(st) = FfiState::<MyState>::with_state(state_ptr) { let value = st.total; } // Mutable access (in update, combine target): if let Some(st) = FfiState::<MyState>::with_state_mut(state_ptr) { st.total += delta; } }
Both methods return Option<&T> / Option<&mut T>. They return None if inner is
null (which happens after destroy_callback or if initialization failed). Using Option
rather than panicking on null is what keeps the extension panic-free.
The double-free problem — solved
Without quack-rs, a naive destructor looks like:
#![allow(unused)] fn main() { // ❌ Naive — causes double-free if DuckDB calls destroy twice unsafe extern "C" fn destroy(states: *mut duckdb_aggregate_state, count: idx_t) { for i in 0..count as usize { let ffi = &mut *(*states.add(i) as *mut FfiState<MyState>); drop(Box::from_raw(ffi.inner)); // inner is now dangling — crash on second call } } }
FfiState::destroy_callback does:
#![allow(unused)] fn main() { // After drop(Box::from_raw(ffi.inner)): ffi.inner = std::ptr::null_mut(); // ← prevents double-free }
If DuckDB calls destroy again, with_state returns None and the loop body is a no-op.
Testing state logic without DuckDB
AggregateTestHarness<S> simulates the DuckDB aggregate lifecycle in pure Rust:
#![allow(unused)] fn main() { use quack_rs::testing::AggregateTestHarness; #[test] fn combine_propagates_config() { let mut source = AggregateTestHarness::<MyState>::new(); source.update(|s| { s.config = 5; // config field set during update s.total += 100; }); let mut target = AggregateTestHarness::<MyState>::new(); target.combine(&source, |src, tgt| { tgt.config = src.config; // must propagate config — Pitfall L1 tgt.total += src.total; }); let result = target.finalize(); assert_eq!(result.config, 5, "config must be propagated in combine"); assert_eq!(result.total, 100); } }
See the Testing Guide for the full test strategy.
Overloading with Function Sets
DuckDB supports multiple signatures for the same function name via function sets.
This is how you implement variadic aggregates like retention(c1, c2, ..., c32).
Note: For scalar function overloads, see
ScalarFunctionSetBuilder.
When to use function sets
Use AggregateFunctionSetBuilder when you need:
- Multiple type signatures for the same function name (e.g.,
my_agg(INT)andmy_agg(BIGINT)) - Variadic arity under one name (e.g.,
retention(2 columns),retention(3 columns), ...)
For a single signature, use AggregateFunctionBuilder directly.
Registration
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionSetBuilder; use quack_rs::types::TypeId; unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { AggregateFunctionSetBuilder::new("retention") .returns(TypeId::Varchar) .overloads(2..=3, |n, builder| { // Each overload gets `n` BOOLEAN parameters let b = (0..n).fold(builder, |b, _| b.param(TypeId::Boolean)); b.state_size(state_size) .init(state_init) .update(update) .combine(combine) .finalize(finalize) .destructor(state_destroy) }) .register(con)?; } Ok(()) } }
The overloads method accepts a RangeInclusive<usize> and a closure that
receives the arity n and a fresh OverloadBuilder. The builder sets the
function name on each individual member internally.
The silent name bug — solved
Pitfall L6: When using a function set, the name must be set on each individual
duckdb_aggregate_functionviaduckdb_aggregate_function_set_name, not just on the set. If any member lacks a name, it is silently not registered — no error is returned.This is completely undocumented. It was discovered by reading DuckDB's C++ test code at
test/api/capi/test_capi_aggregate_functions.cpp. Induckdb-behavioral, 6 of 7 functions failed to register silently due to this bug.
AggregateFunctionSetBuilder enforces that each member has its name set internally
when the overloads closure builds each function.
See Pitfall L6.
Complex return types
If all overloads share a complex return type, use returns_logical on the set builder:
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionSetBuilder; use quack_rs::types::{LogicalType, TypeId}; AggregateFunctionSetBuilder::new("retention") .returns_logical(LogicalType::list(TypeId::Boolean)) // LIST(BOOLEAN) for all overloads .overloads(2..=32, |n, builder| { (0..n).fold(builder, |b, _| b.param(TypeId::Boolean)) .state_size(state_size) .init(state_init) .update(update) .combine(combine) .finalize(finalize) .destructor(destroy) }) .register(con)?; }
Individual overloads can also use param_logical for complex parameter types:
#![allow(unused)] fn main() { .overloads(2..=8, |n, builder| { builder .param(TypeId::Interval) .param_logical(LogicalType::list(TypeId::Timestamp)) // LIST(TIMESTAMP) parameter // ... }) }
Why not varargs?
DuckDB's C API does not provide duckdb_aggregate_function_set_varargs. For true variadic
aggregates, you must register N overloads — one for each supported arity. Function sets make
this tractable.
Note: As of DuckDB 1.5.0, scalar functions now support varargs directly via
ScalarFunctionBuilder::varargs()(requires theduckdb-1-5feature). This limitation still applies to aggregate functions, which have no varargs counterpart in the C API.
ADR-002 in the architecture docs explains this design decision in detail.
Table Functions
Table functions implement the SELECT * FROM my_function(args) pattern — they
return a result set rather than a scalar value. DuckDB table functions have three
lifecycle callbacks: bind, init, and scan.
quack-rs provides TableFunctionBuilder plus the helper types BindInfo,
InitInfo, FunctionInfo, FfiBindData<T>, FfiInitData<T>, and
FfiLocalInitData<T> to eliminate the raw FFI boilerplate.
Lifecycle
| Phase | Callback | Called when | Typical work |
|---|---|---|---|
| bind | bind_fn | Query is planned | Extract parameters; register output columns; store config in bind data |
| init | init_fn | Execution starts | Allocate per-scan state (cursor, row index, etc.) |
| scan | scan_fn | Each output batch | Fill duckdb_data_chunk with rows; call duckdb_data_chunk_set_size |
The scan callback is called repeatedly until it writes 0 rows in a batch, signalling end-of-results.
Builder API
#![allow(unused)] fn main() { use quack_rs::table::{TableFunctionBuilder, BindInfo, FfiBindData, FfiInitData}; use quack_rs::types::TypeId; TableFunctionBuilder::new("my_function") .param(TypeId::BigInt) // positional parameter types .bind(my_bind_callback) // declare output columns inside bind .init(my_init_callback) .scan(my_scan_callback) .register(con)?; }
Output columns are declared inside the bind callback using BindInfo::add_result_column,
not on the builder itself.
State management
Bind data
Bind data persists from the bind phase through all scan batches. Use
FfiBindData<T> to allocate it safely:
#![allow(unused)] fn main() { struct MyBindData { limit: i64, } unsafe extern "C" fn my_bind(info: duckdb_bind_info) { let n = unsafe { duckdb_get_int64(duckdb_bind_get_parameter(info, 0)) }; unsafe { FfiBindData::<MyBindData>::set(info, MyBindData { limit: n }) }; } }
FfiBindData::set stores the value and registers a destructor so DuckDB frees
it at the right time — no Box::into_raw / Box::from_raw needed.
Init (scan) state
Per-scan state (e.g., a current row index) uses FfiInitData<T>:
#![allow(unused)] fn main() { struct MyScanState { pos: i64, } unsafe extern "C" fn my_init(info: duckdb_init_info) { unsafe { FfiInitData::<MyScanState>::set(info, MyScanState { pos: 0 }) }; } }
Complete example: generate_series_ext
The hello-ext example registers generate_series_ext(n BIGINT) which emits
integers 0 .. n-1. See examples/hello-ext/src/lib.rs for the full source.
#![allow(unused)] fn main() { // Bind: extract `n`, register one output column unsafe extern "C" fn gs_bind(info: duckdb_bind_info) { let param = unsafe { duckdb_bind_get_parameter(info, 0) }; let n = unsafe { duckdb_get_int64(param) }; unsafe { duckdb_destroy_value(&mut { param }) }; let out_type = LogicalType::new(TypeId::BigInt); unsafe { duckdb_bind_add_result_column(info, c"value".as_ptr(), out_type.as_raw()) }; unsafe { FfiBindData::<GsBindData>::set(info, GsBindData { total: n }) }; } // Init: zero-initialise the scan cursor unsafe extern "C" fn gs_init(info: duckdb_init_info) { unsafe { FfiInitData::<GsScanState>::set(info, GsScanState { pos: 0 }) }; } // Scan: emit a batch of rows unsafe extern "C" fn gs_scan(info: duckdb_function_info, output: duckdb_data_chunk) { let bind = unsafe { FfiBindData::<GsBindData>::get_from_function(info) }.unwrap(); let state = unsafe { FfiInitData::<GsScanState>::get_mut(info) }.unwrap(); let remaining = bind.total - state.pos; let batch = remaining.min(2048).max(0) as usize; let mut writer = unsafe { VectorWriter::new(duckdb_data_chunk_get_vector(output, 0)) }; for i in 0..batch { unsafe { writer.write_i64(i, state.pos + i as i64) }; } unsafe { duckdb_data_chunk_set_size(output, batch as idx_t) }; state.pos += batch as i64; } }
Registration
#![allow(unused)] fn main() { TableFunctionBuilder::new("generate_series_ext") .param(TypeId::BigInt) .bind(gs_bind) .init(gs_init) .scan(gs_scan) .register(con)?; }
Advanced features
Named parameters
Named parameters let callers pass optional arguments by name (e.g., step := 10):
#![allow(unused)] fn main() { TableFunctionBuilder::new("gen_series_v2") .param(TypeId::BigInt) // positional: n .named_param("step", TypeId::BigInt) // named: step := <value> .bind(gs_v2_bind) .init(gs_v2_init) .scan(gs_v2_scan) .register(con)?; }
In the bind callback, read the named parameter with
duckdb_bind_get_named_parameter(info, c"step".as_ptr()).
Local init (per-thread state)
For multi-threaded table functions, use local_init to allocate per-thread state:
#![allow(unused)] fn main() { TableFunctionBuilder::new("gen_series_v2") .param(TypeId::BigInt) .bind(gs_v2_bind) .init(gs_v2_init) .local_init(gs_v2_local_init) // per-thread state allocation .scan(gs_v2_scan) .register(con)?; }
The local init callback receives duckdb_init_info and can use
FfiLocalInitData<T>::set to store per-thread state.
Thread control
Use InitInfo::set_max_threads in the global init callback to tell DuckDB how
many threads can scan concurrently:
#![allow(unused)] fn main() { unsafe extern "C" fn gs_v2_init(info: duckdb_init_info) { let init_info = unsafe { InitInfo::new(info) }; unsafe { init_info.set_max_threads(1) }; unsafe { FfiInitData::<MyState>::set(info, MyState { pos: 0 }) }; } }
Projection pushdown
Enable projection pushdown to let DuckDB skip unrequested columns:
#![allow(unused)] fn main() { TableFunctionBuilder::new("my_func") .projection_pushdown(true) // ... }
Caution: When projection pushdown is enabled, your scan callback must check which columns DuckDB actually needs using
InitInfo::projected_column_countandInitInfo::projected_column_index. Writing to non-projected columns causes crashes.
See examples/hello-ext/src/lib.rs for a complete example using named_param,
local_init, and set_max_threads.
Complex parameter types
For parameterised types that TypeId cannot express (e.g. LIST(BIGINT),
MAP(VARCHAR, INTEGER), STRUCT(...)), use param_logical and
named_param_logical:
#![allow(unused)] fn main() { use quack_rs::types::LogicalType; TableFunctionBuilder::new("read_data") .param_logical(LogicalType::list(TypeId::Varchar)) // positional LIST param .named_param_logical("options", LogicalType::map( // named MAP param TypeId::Varchar, TypeId::Varchar, )) .bind(bind_fn) .init(init_fn) .scan(scan_fn) .register(con)?; }
BindInfo helpers
BindInfo wraps duckdb_bind_info and exposes these methods:
| Method | Description |
|---|---|
add_result_column(name, TypeId) | Declares an output column |
add_result_column_with_type(name, &LogicalType) | Output column with complex type |
set_cardinality(rows, is_exact) | Cardinality hint for the optimizer |
set_error(message) | Report a bind-time error |
parameter_count() | Number of positional parameters |
get_parameter(index) | Returns a positional parameter value (duckdb_value) |
get_named_parameter(name) | Returns a named parameter value (duckdb_value) |
get_extra_info() | Returns the extra-info pointer set on the function |
get_client_context() | Returns a ClientContext (requires duckdb-1-5 feature) |
InitInfo helpers
InitInfo wraps duckdb_init_info:
| Method | Description |
|---|---|
projected_column_count() | Number of projected columns (with pushdown) |
projected_column_index(idx) | Output column index at projection position |
set_max_threads(n) | Maximum parallel scan threads |
set_error(message) | Report an init-time error |
get_extra_info() | Returns the extra-info pointer set on the function |
FunctionInfo helpers
FunctionInfo wraps duckdb_function_info (scan callbacks):
| Method | Description |
|---|---|
set_error(message) | Report a scan-time error |
get_extra_info() | Returns the extra-info pointer set on the function |
Extra info
Use TableFunctionBuilder::extra_info to attach function-level data that is
accessible from all callbacks (bind, init, and scan) via get_extra_info().
Verified output (DuckDB 1.4.4 and 1.5.0)
SELECT * FROM generate_series_ext(5);
-- 0
-- 1
-- 2
-- 3
-- 4
SELECT value * value AS sq FROM generate_series_ext(4);
-- 0
-- 1
-- 4
-- 9
See also
tablemodule documentationreplacement_scan— for file-path-triggered table scanshello-extREADME
Replacement Scans
A replacement scan lets users write:
SELECT * FROM 'myfile.myformat'
and have DuckDB automatically invoke your extension's table-valued scan instead of trying to open the path as a built-in file type. This is how DuckDB's built-in CSV, Parquet, and JSON readers work.
quack-rs provides ReplacementScanBuilder (a static registration helper) and
ReplacementScanInfo (an ergonomic wrapper for callbacks).
Registration API
Unlike the other builders in quack-rs, ReplacementScanBuilder uses a single
static call because the DuckDB C API takes all arguments at once:
#![allow(unused)] fn main() { use quack_rs::replacement_scan::ReplacementScanBuilder; // Low-level: pass raw extra_data and an optional delete callback. unsafe { ReplacementScanBuilder::register( db, // duckdb_database my_scan_callback, // ReplacementScanFn std::ptr::null_mut(), // extra_data (or a raw pointer) None, // delete_callback ); } // Ergonomic: pass owned Rust data; boxing and destructor are handled for you. unsafe { ReplacementScanBuilder::register_with_data(db, my_scan_callback, my_state); } }
Note: Replacement scans are registered on a database handle (
duckdb_database), not a connection. Register them before opening connections.
Callback signature
The raw callback receives duckdb_replacement_scan_info, but you can wrap it
with ReplacementScanInfo for ergonomic, safe access:
#![allow(unused)] fn main() { use quack_rs::replacement_scan::ReplacementScanInfo; unsafe extern "C" fn my_scan_callback( info: duckdb_replacement_scan_info, table_name: *const ::std::os::raw::c_char, _data: *mut ::std::os::raw::c_void, ) { let path = unsafe { std::ffi::CStr::from_ptr(table_name) } .to_str() .unwrap_or(""); if !path.ends_with(".myformat") { return; // pass — DuckDB will try other handlers } // Use ReplacementScanInfo for ergonomic access unsafe { ReplacementScanInfo::new(info) .set_function("read_myformat") .add_varchar_parameter(path); } } }
ReplacementScanInfo methods
| Method | Description |
|---|---|
set_function(name) | Redirect to the named table function |
add_varchar_parameter(value) | Add a VARCHAR parameter to the redirected call |
set_error(message) | Report an error (aborts this replacement scan) |
When to use replacement scans vs table functions
| Scenario | Use |
|---|---|
SELECT * FROM my_function('file.ext') | Table function |
SELECT * FROM 'file.ext' (bare path) | Replacement scan → delegates to a table function |
| File type auto-detection | Replacement scan |
Most extensions implement both: a table function that does the actual work, and a replacement scan that detects the file extension and transparently routes bare-path queries to the table function.
See also
replacement_scanmodule documentation- Table Functions
Cast Functions
Cast functions let your extension define how DuckDB converts values from one type to
another. Once registered, both explicit CAST(x AS T) syntax and (optionally) implicit
coercions will use your callback.
When to use cast functions
- Your extension introduces a new logical type and needs
CASTto/from standard types. - You want to override DuckDB's built-in cast behaviour for a specific type pair.
- You need to control implicit cast priority relative to other registered casts.
Registering a cast
#![allow(unused)] fn main() { use quack_rs::cast::{CastFunctionBuilder, CastFunctionInfo, CastMode}; use quack_rs::types::TypeId; use quack_rs::vector::{VectorReader, VectorWriter}; use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t}; unsafe extern "C" fn varchar_to_int( info: duckdb_function_info, count: idx_t, input: duckdb_vector, output: duckdb_vector, ) -> bool { let cast_info = unsafe { CastFunctionInfo::new(info) }; let reader = unsafe { VectorReader::from_vector(input, count as usize) }; let mut writer = unsafe { VectorWriter::new(output) }; for row in 0..count as usize { if !unsafe { reader.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let s = unsafe { reader.read_str(row) }; match s.parse::<i32>() { Ok(v) => unsafe { writer.write_i32(row, v) }, Err(e) => { let msg = format!("cannot cast {:?} to INTEGER: {e}", s); if cast_info.cast_mode() == CastMode::Try { // TRY_CAST: write NULL and record a per-row error unsafe { cast_info.set_row_error(&msg, row as idx_t, output) }; unsafe { writer.set_null(row) }; } else { // Regular CAST: abort the whole query unsafe { cast_info.set_error(&msg) }; return false; } } } } true } fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> { unsafe { CastFunctionBuilder::new(TypeId::Varchar, TypeId::Integer) .function(varchar_to_int) .register(con) } } }
Implicit casts
Provide an implicit_cost to allow DuckDB to use the cast automatically in
expressions where the types do not match:
#![allow(unused)] fn main() { use quack_rs::cast::CastFunctionBuilder; use quack_rs::types::TypeId; use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t}; unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true } fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> { unsafe { CastFunctionBuilder::new(TypeId::Varchar, TypeId::Integer) .function(my_cast) .implicit_cost(100) // lower = higher priority .register(con) } } }
Extra info
Attach arbitrary data to a cast function using extra_info. This is useful for
parameterising the cast behaviour (e.g., a rounding mode):
#![allow(unused)] fn main() { use quack_rs::cast::CastFunctionBuilder; use quack_rs::types::TypeId; use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t}; use std::os::raw::c_void; unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true } unsafe extern "C" fn my_destroy(_: *mut c_void) {} fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> { let mode = Box::into_raw(Box::new("round".to_string())).cast::<c_void>(); unsafe { CastFunctionBuilder::new(TypeId::Double, TypeId::BigInt) .function(my_cast) .implicit_cost(100) .extra_info(mode, Some(my_destroy)) .register(con) } } }
Inside the cast callback, retrieve the extra info with
CastFunctionInfo::get_extra_info().
TRY_CAST vs CAST
Inside your callback, check [CastFunctionInfo::cast_mode()] to distinguish between
the two modes:
| Mode | User wrote | Expected behaviour on error |
|---|---|---|
CastMode::Normal | CAST(x AS T) | Call set_error and return false |
CastMode::Try | TRY_CAST(x AS T) | Call set_row_error, write NULL, continue |
Working example
The examples/hello-ext extension registers two cast functions:
CAST(VARCHAR AS INTEGER)/TRY_CAST(VARCHAR AS INTEGER)— basic castCAST(DOUBLE AS BIGINT)— withimplicit_cost(100)andextra_infofor rounding mode
See examples/hello-ext/src/lib.rs for complete, copy-paste-ready references.
Complex source and target types
For casts involving complex types like DECIMAL(18, 3) or LIST(VARCHAR), use
the new_logical constructor instead of new:
#![allow(unused)] fn main() { use quack_rs::cast::CastFunctionBuilder; use quack_rs::types::{LogicalType, TypeId}; use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t}; unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true } fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> { unsafe { CastFunctionBuilder::new_logical( LogicalType::list(TypeId::Varchar), // LIST(VARCHAR) source LogicalType::list(TypeId::Integer), // LIST(INTEGER) target ) .function(my_cast) .register(con) } } }
The source() and target() accessor methods return Option<TypeId> — they
return None when the type was set via new_logical (since a LogicalType
cannot always be expressed as a simple TypeId).
API reference
- [
CastFunctionBuilder][quack_rs::cast::CastFunctionBuilder] — the main builder - [
CastFunctionInfo][quack_rs::cast::CastFunctionInfo] — info handle inside callbacks - [
CastMode][quack_rs::cast::CastMode] —NormalvsTrycast mode
NULL Handling
By default, DuckDB automatically propagates NULLs: if any argument to a function is NULL, the result is NULL without your function callback being called. This matches the SQL standard and works well for most functions.
However, some functions need to handle NULLs explicitly. For example:
COALESCE— returns the first non-NULL argumentIS_NULL/IS_NOT_NULL— tests whether the value is NULL- Custom aggregates that need to count NULLs
NullHandling enum
#![allow(unused)] fn main() { use quack_rs::types::NullHandling; // Default: DuckDB auto-returns NULL for any NULL input NullHandling::DefaultNullHandling // Special: DuckDB passes NULLs to your callback NullHandling::SpecialNullHandling }
Scalar functions
#![allow(unused)] fn main() { use quack_rs::scalar::ScalarFunctionBuilder; use quack_rs::types::{TypeId, NullHandling}; ScalarFunctionBuilder::new("my_coalesce") .param(TypeId::BigInt) .param(TypeId::BigInt) .returns(TypeId::BigInt) .null_handling(NullHandling::SpecialNullHandling) .function(my_coalesce_fn) .register(con)?; }
With SpecialNullHandling, your callback must check VectorReader::is_valid(row) for
each input column and handle NULLs yourself.
Aggregate functions
#![allow(unused)] fn main() { use quack_rs::aggregate::AggregateFunctionBuilder; use quack_rs::types::{TypeId, NullHandling}; AggregateFunctionBuilder::new("count_with_nulls") .param(TypeId::BigInt) .returns(TypeId::BigInt) .null_handling(NullHandling::SpecialNullHandling) .state_size(my_state_size) .init(my_init) .update(my_update) // will be called even for NULL rows .combine(my_combine) .finalize(my_finalize) .register(con)?; }
When to use special NULL handling
| Use case | NULL handling |
|---|---|
| Most scalar/aggregate functions | DefaultNullHandling (the default) |
| Functions that need to see NULLs | SpecialNullHandling |
COALESCE-like functions | SpecialNullHandling |
| NULL-counting aggregates | SpecialNullHandling |
If you don't call .null_handling(), the default (DefaultNullHandling) is used
automatically.
SQL Macros
SQL macros let you package reusable SQL expressions and queries as named DuckDB functions —
no FFI callbacks required. quack-rs makes this pure Rust: you define the macro body as a
string and call .register(con).
Two macro types
| Type | SQL generated | Returns |
|---|---|---|
| Scalar | CREATE OR REPLACE MACRO name(params) AS (expression) | one value per row |
| Table | CREATE OR REPLACE MACRO name(params) AS TABLE query | a result set |
Scalar macros
A scalar macro wraps a SQL expression. Think of it as a parameterized SQL alias:
#![allow(unused)] fn main() { use quack_rs::sql_macro::SqlMacro; fn register(con: duckdb_connection) -> Result<(), ExtensionError> { unsafe { // clamp(x, lo, hi) → greatest(lo, least(hi, x)) SqlMacro::scalar("clamp", &["x", "lo", "hi"], "greatest(lo, least(hi, x))")? .register(con)?; // pi() → 3.14159265358979 SqlMacro::scalar("pi", &[], "3.14159265358979")? .register(con)?; // safe_div(a, b) → CASE WHEN b = 0 THEN NULL ELSE a / b END SqlMacro::scalar( "safe_div", &["a", "b"], "CASE WHEN b = 0 THEN NULL ELSE a / b END", )? .register(con)?; } Ok(()) } }
Use in DuckDB:
SELECT clamp(rating, 1, 5) FROM reviews;
SELECT safe_div(revenue, orders) FROM monthly_stats;
Table macros
A table macro wraps a SQL query that returns rows:
#![allow(unused)] fn main() { unsafe { // active_users(tbl) → SELECT * FROM tbl WHERE active = true SqlMacro::table( "active_users", &["tbl"], "SELECT * FROM tbl WHERE active = true", )? .register(con)?; // recent_orders(days) → last N days of orders SqlMacro::table( "recent_orders", &["days"], "SELECT * FROM orders WHERE order_date >= current_date - INTERVAL (days) DAY", )? .register(con)?; } }
Use in DuckDB:
SELECT * FROM active_users(users);
SELECT count(*) FROM recent_orders(7);
Inspecting the generated SQL
to_sql() returns the CREATE OR REPLACE MACRO statement without requiring a live connection.
Use it for logging, debugging, or assertions in tests:
#![allow(unused)] fn main() { let m = SqlMacro::scalar("add", &["a", "b"], "a + b")?; assert_eq!( m.to_sql(), "CREATE OR REPLACE MACRO add(a, b) AS (a + b)" ); let t = SqlMacro::table("active_users", &["tbl"], "SELECT * FROM tbl WHERE active = true")?; assert_eq!( t.to_sql(), "CREATE OR REPLACE MACRO active_users(tbl) AS TABLE SELECT * FROM tbl WHERE active = true" ); }
Name and parameter validation
Macro names and parameter names are validated against the same rules as function names:
- Must match
[a-z_][a-z0-9_]* - Not exceed 256 characters
- No null bytes
#![allow(unused)] fn main() { SqlMacro::scalar("MyMacro", &[], "1") // ❌ Err — uppercase SqlMacro::scalar("my-macro", &[], "1") // ❌ Err — hyphen SqlMacro::scalar("f", &["X"], "1") // ❌ Err — uppercase param SqlMacro::scalar("f", &["_x"], "1") // ✅ Ok — underscore prefix allowed }
SQL injection safety
Macro and parameter names are restricted to [a-z_][a-z0-9_]*, preventing SQL
injection at the identifier level. They are interpolated literally (no quoting required,
since the character set is already safe).
The body (expression or query) is your own extension code — it is included verbatim. Never build macro bodies from untrusted user input.
How it works under the hood
SqlMacro::register executes the CREATE OR REPLACE MACRO statement via duckdb_query:
#![allow(unused)] fn main() { pub unsafe fn register(self, con: duckdb_connection) -> Result<(), ExtensionError> { let sql = self.to_sql(); unsafe { execute_sql(con, &sql) } } }
execute_sql zero-initializes a duckdb_result, calls duckdb_query, extracts any error
message via duckdb_result_error, and always calls duckdb_destroy_result — even on failure.
Choosing between macros and scalar functions
| Scenario | Use |
|---|---|
| Logic expressible in SQL | SQL macro — simpler, no FFI |
| Logic needs Rust code (algorithms, external crates, etc.) | Scalar function |
| Best performance for simple expressions | SQL macro (no FFI overhead) |
| Type-specific overloads | Scalar function with multiple registrations |
| Returning a table | SQL table macro |
Copy Functions
Requires the
duckdb-1-5feature flag (DuckDB 1.5.0+).
Copy functions let you implement custom COPY TO file format handlers. When a
user runs COPY table TO 'file.xyz' (FORMAT my_format), DuckDB invokes your
extension's bind, init, sink, and finalize callbacks.
Lifecycle
- Bind — called once. Inspect output columns, configure the export.
- Global init — called once. Open the output file, allocate global state.
- Sink — called once per data chunk. Write rows to the output.
- Finalize — called once. Flush buffers, close the file.
Builder API
#![allow(unused)] fn main() { use quack_rs::copy_function::CopyFunctionBuilder; let builder = CopyFunctionBuilder::try_new("my_format")? .bind(my_bind_fn) .global_init(my_global_init_fn) .sink(my_sink_fn) .finalize(my_finalize_fn); // Register on a connection (inside entry_point_v2! callback): // unsafe { builder.register(con)?; } Ok::<(), quack_rs::error::ExtensionError>(()) }
Callback signatures
| Phase | Signature |
|---|---|
| Bind | unsafe extern "C" fn(info: duckdb_copy_function_bind_info) |
| Global init | unsafe extern "C" fn(info: duckdb_copy_function_global_init_info) |
| Sink | unsafe extern "C" fn(info: duckdb_copy_function_sink_info, chunk: duckdb_data_chunk) |
| Finalize | unsafe extern "C" fn(info: duckdb_copy_function_finalize_info) |
Callback info wrappers
Each phase provides an ergonomic wrapper type around its raw info handle. Wrap the handle at the top of your callback to access helper methods:
CopyBindInfo
| Method | Description |
|---|---|
column_count() | Number of output columns |
column_type(index) | LogicalType of the column at index |
get_extra_info() | Extra-info pointer set on the copy function |
set_bind_data(data, destroy) | Store bind data and its destructor |
set_error(message) | Report a bind-time error |
get_client_context() | Returns a ClientContext for catalog/config access |
CopyGlobalInitInfo
| Method | Description |
|---|---|
get_bind_data() | Retrieve the bind data pointer |
get_extra_info() | Extra-info pointer set on the copy function |
get_file_path() | Output file path for the COPY operation |
set_global_state(state, destroy) | Store global state and its destructor |
set_error(message) | Report an init-time error |
get_client_context() | Returns a ClientContext |
CopySinkInfo
| Method | Description |
|---|---|
get_bind_data() | Retrieve the bind data pointer |
get_extra_info() | Extra-info pointer set on the copy function |
get_global_state() | Retrieve the global state pointer |
set_error(message) | Report a sink-time error |
get_client_context() | Returns a ClientContext |
CopyFinalizeInfo
| Method | Description |
|---|---|
get_bind_data() | Retrieve the bind data pointer |
get_extra_info() | Extra-info pointer set on the copy function |
get_global_state() | Retrieve the global state pointer |
set_error(message) | Report a finalize-time error |
get_client_context() | Returns a ClientContext |
All four wrappers are re-exported from quack_rs::copy_function:
#![allow(unused)] fn main() { use quack_rs::copy_function::{CopyBindInfo, CopyGlobalInitInfo, CopySinkInfo, CopyFinalizeInfo}; }
Related modules
config_option— register custom settings for your formatclient_context— access the file system and catalog from callbackstable_description— inspect table metadatacatalog— look up catalog entries
Reading & Writing Vectors
DuckDB passes data to and from your extension as vectors — columnar arrays of typed
values, with a separate NULL bitmap. VectorReader and VectorWriter provide safe,
typed access to these vectors.
VectorReader
Construction
#![allow(unused)] fn main() { // In a scalar function callback: let reader = unsafe { VectorReader::new(input, column_index) }; // In an aggregate update callback: let reader = unsafe { VectorReader::new(input, 0) }; // first column }
VectorReader::new takes the duckdb_data_chunk and a zero-based column index. The
reader borrows the chunk — it must not outlive the callback.
Row count
#![allow(unused)] fn main() { let n = reader.row_count(); // number of rows in this chunk }
Chunk sizes vary. Always loop from 0..reader.row_count(), never assume a fixed size.
NULL check
#![allow(unused)] fn main() { if unsafe { !reader.is_valid(row) } { // row is NULL — skip or propagate NULL to output unsafe { writer.set_null(row) }; continue; } }
Always check is_valid before reading. Reading from a NULL row returns garbage data.
Reading values
#![allow(unused)] fn main() { let i: i8 = unsafe { reader.read_i8(row) }; let i: i16 = unsafe { reader.read_i16(row) }; let i: i32 = unsafe { reader.read_i32(row) }; let i: i64 = unsafe { reader.read_i64(row) }; let u: u8 = unsafe { reader.read_u8(row) }; let u: u16 = unsafe { reader.read_u16(row) }; let u: u32 = unsafe { reader.read_u32(row) }; let u: u64 = unsafe { reader.read_u64(row) }; let f: f32 = unsafe { reader.read_f32(row) }; let f: f64 = unsafe { reader.read_f64(row) }; let b: bool = unsafe { reader.read_bool(row) }; // safe: uses u8 != 0 let s: &str = unsafe { reader.read_str(row) }; // handles inline + pointer format let iv = unsafe { reader.read_interval(row) }; // returns DuckInterval }
VectorWriter
Construction
#![allow(unused)] fn main() { // In a scalar function callback: let mut writer = unsafe { VectorWriter::new(output) }; // In an aggregate finalize callback: let mut writer = unsafe { VectorWriter::new(result) }; }
Writing values
#![allow(unused)] fn main() { unsafe { writer.write_i8(row, value) }; unsafe { writer.write_i16(row, value) }; unsafe { writer.write_i32(row, value) }; unsafe { writer.write_i64(row, value) }; unsafe { writer.write_u8(row, value) }; unsafe { writer.write_u16(row, value) }; unsafe { writer.write_u32(row, value) }; unsafe { writer.write_u64(row, value) }; unsafe { writer.write_f32(row, value) }; unsafe { writer.write_f64(row, value) }; unsafe { writer.write_bool(row, value) }; unsafe { writer.write_varchar(row, s) }; // &str unsafe { writer.write_interval(row, interval) }; // DuckInterval }
Writing NULL
#![allow(unused)] fn main() { unsafe { writer.set_null(row) }; }
Pitfall L4:
set_nullcallsduckdb_vector_ensure_validity_writableautomatically before accessing the validity bitmap. Callingduckdb_vector_get_validitywithout this prerequisite returns an uninitialized pointer → SEGFAULT.VectorWriter::set_nullhandles this correctly. See Pitfall L4.
Utility functions
The quack_rs::vector module provides two utility functions:
#![allow(unused)] fn main() { use quack_rs::vector::{vector_size, vector_get_column_type}; // Returns the default vector size used by DuckDB (typically 2048). let size: u64 = vector_size(); // Returns the LogicalType of a vector (unsafe — requires a valid duckdb_vector). let lt = unsafe { vector_get_column_type(some_vector) }; }
Memory layout details
DuckDB stores vector data as flat arrays. VectorReader and VectorWriter compute
element addresses as base_ptr + row * stride:
[value0][value1][value2]...[valueN] ← typed array
[validity bitmap] ← separate bit array, 1 bit per row
The validity bitmap is lazily allocated — it may be null if no NULLs have been written.
This is why ensure_validity_writable must be called before any get_validity call
that follows a write path.
Complete scalar function pattern
#![allow(unused)] fn main() { unsafe extern "C" fn my_scalar( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; for row in 0..reader.row_count() { if unsafe { !reader.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let value = unsafe { reader.read_i64(row) }; unsafe { writer.write_i64(row, transform(value)) }; } } }
Complex Types: STRUCT, LIST, MAP, ARRAY
DuckDB's complex types — STRUCT, LIST, MAP, and ARRAY — are stored as nested vectors.
quack-rs provides four helper types in vector::complex to access the child
vectors without manual offset arithmetic.
Overview
| DuckDB type | Storage | quack-rs helper |
|---|---|---|
STRUCT{a T, b U, …} | Parent vector + N child vectors (one per field) | StructVector |
LIST<T> | Parent vector holds {offset, length} per row; flat child vector holds elements | ListVector |
MAP<K, V> | Stored as LIST<STRUCT{key K, value V}> | MapVector |
ARRAY<T>[N] | Fixed-size array; single child vector | ArrayVector |
Reading complex types (input vectors)
STRUCT
#![allow(unused)] fn main() { use quack_rs::vector::{VectorReader, complex::StructVector}; // Inside a scan or finalize callback: // parent_vec comes from duckdb_data_chunk_get_vector(chunk, col_idx) let x_reader = unsafe { StructVector::field_reader(parent_vec, 0, row_count) }; let y_reader = unsafe { StructVector::field_reader(parent_vec, 1, row_count) }; for row in 0..row_count { if unsafe { x_reader.is_valid(row) } { let x: f64 = unsafe { x_reader.read_f64(row) }; let y: f64 = unsafe { y_reader.read_f64(row) }; // process (x, y) … } } }
LIST
#![allow(unused)] fn main() { use quack_rs::vector::{VectorReader, complex::ListVector}; let total_elements = unsafe { ListVector::get_size(list_vec) }; let elem_reader = unsafe { ListVector::child_reader(list_vec, total_elements) }; for row in 0..row_count { let entry = unsafe { ListVector::get_entry(list_vec, row) }; for i in 0..entry.length as usize { let elem_idx = entry.offset as usize + i; if unsafe { elem_reader.is_valid(elem_idx) } { let val: i64 = unsafe { elem_reader.read_i64(elem_idx) }; // process val … } } } }
MAP
MAP is LIST<STRUCT{key, value}>. Access keys and values via the inner struct:
#![allow(unused)] fn main() { use quack_rs::vector::{VectorReader, complex::MapVector}; let total = unsafe { MapVector::total_entry_count(map_vec) }; let key_reader = unsafe { VectorReader::from_vector(MapVector::keys(map_vec), total) }; let value_reader = unsafe { VectorReader::from_vector(MapVector::values(map_vec), total) }; for row in 0..row_count { let entry = unsafe { MapVector::get_entry(map_vec, row) }; for i in 0..entry.length as usize { let idx = entry.offset as usize + i; let k = unsafe { key_reader.read_str(idx) }; let v: i64 = unsafe { value_reader.read_i64(idx) }; // process (k, v) … } } }
Writing complex types (output vectors)
STRUCT
#![allow(unused)] fn main() { use quack_rs::vector::{VectorWriter, complex::StructVector}; let mut x_writer = unsafe { StructVector::field_writer(out_vec, 0) }; let mut y_writer = unsafe { StructVector::field_writer(out_vec, 1) }; for row in 0..batch_size { unsafe { x_writer.write_f64(row, x_values[row]) }; unsafe { y_writer.write_f64(row, y_values[row]) }; } }
LIST
#![allow(unused)] fn main() { use quack_rs::vector::{VectorWriter, complex::ListVector}; let total_elements: usize = rows.iter().map(|r| r.len()).sum(); unsafe { ListVector::reserve(list_vec, total_elements) }; let mut child_writer = unsafe { ListVector::child_writer(list_vec) }; let mut offset = 0usize; for (row, elements) in rows.iter().enumerate() { for (i, &val) in elements.iter().enumerate() { unsafe { child_writer.write_i64(offset + i, val) }; } unsafe { ListVector::set_entry(list_vec, row, offset as u64, elements.len() as u64) }; offset += elements.len(); } unsafe { ListVector::set_size(list_vec, total_elements) }; }
MAP
The MAP write workflow is identical to LIST, but keys and values are written into the two struct child vectors:
#![allow(unused)] fn main() { use quack_rs::vector::{VectorWriter, complex::MapVector}; unsafe { MapVector::reserve(map_vec, total_pairs) }; let mut key_writer = unsafe { VectorWriter::from_vector(MapVector::keys(map_vec)) }; let mut val_writer = unsafe { VectorWriter::from_vector(MapVector::values(map_vec)) }; let mut offset = 0usize; for (row, pairs) in all_pairs.iter().enumerate() { for (i, (k, v)) in pairs.iter().enumerate() { unsafe { key_writer.write_varchar(offset + i, k) }; unsafe { val_writer.write_i64(offset + i, *v) }; } unsafe { MapVector::set_entry(map_vec, row, offset as u64, pairs.len() as u64) }; offset += pairs.len(); } unsafe { MapVector::set_size(map_vec, total_pairs) }; }
Constructing complex logical types
Use LogicalType constructors to define complex column types. Each constructor
has a variant that accepts TypeId values (for simple element types) and a
_from_logical variant (for nested complex types):
| Constructor | _from_logical variant | Creates |
|---|---|---|
LogicalType::list(TypeId) | list_from_logical(&LogicalType) | LIST<T> |
LogicalType::map(TypeId, TypeId) | map_from_logical(&LogicalType, &LogicalType) | MAP<K, V> |
LogicalType::struct_type(&[(&str, TypeId)]) | struct_type_from_logical(&[(&str, LogicalType)]) | STRUCT{...} |
LogicalType::union_type(&[(&str, TypeId)]) | union_type_from_logical(&[(&str, LogicalType)]) | UNION(...) |
LogicalType::array(TypeId, u64) | array_from_logical(&LogicalType, u64) | ARRAY<T>[N] |
LogicalType::enum_type(&[&str]) | — | ENUM(...) |
LogicalType::decimal(u8, u8) | — | DECIMAL(w, s) |
API reference
All helpers are in quack_rs::vector::complex (re-exported from quack_rs::prelude).
StructVector
| Method | Description |
|---|---|
get_child(vec, field_idx) | Returns the raw child vector for field field_idx |
field_reader(vec, field_idx, row_count) | Creates a VectorReader for a STRUCT field |
field_writer(vec, field_idx) | Creates a VectorWriter for a STRUCT field |
ListVector
| Method | Description |
|---|---|
get_child(vec) | Returns the flat element child vector |
get_size(vec) | Total number of elements across all rows |
set_size(vec, n) | Sets the number of elements after writing |
reserve(vec, capacity) | Reserves capacity in the child vector |
get_entry(vec, row) | Returns {offset, length} for a row (reading) |
set_entry(vec, row, offset, length) | Sets {offset, length} for a row (writing) |
child_reader(vec, count) | Creates a VectorReader for the element vector |
child_writer(vec) | Creates a VectorWriter for the element vector |
MapVector
| Method | Description |
|---|---|
struct_child(vec) | Returns the inner STRUCT vector |
keys(vec) | Returns the key vector (STRUCT field 0) |
values(vec) | Returns the value vector (STRUCT field 1) |
total_entry_count(vec) | Total key-value pairs |
reserve(vec, n) | Reserves capacity |
set_size(vec, n) | Sets total entry count after writing |
get_entry(vec, row) | Returns {offset, length} for a row (reading) |
set_entry(vec, row, offset, length) | Sets {offset, length} for a row (writing) |
ArrayVector
| Method | Description |
|---|---|
get_child(vec) | Returns the child vector of a fixed-size ARRAY vector |
NULL Handling & Strings
This page covers two topics that are handled together in practice: checking for NULL before reading, and reading VARCHAR values from DuckDB vectors.
NULL checks
Every row in a DuckDB vector may be NULL. Always check validity before reading:
#![allow(unused)] fn main() { for row in 0..reader.row_count() { if unsafe { !reader.is_valid(row) } { // Propagate NULL to output unsafe { writer.set_null(row) }; continue; } // Safe to read let value = unsafe { reader.read_str(row) }; } }
Reading from a NULL row returns garbage data — the vector's data buffer is not zeroed at NULL positions. There is no bounds check or error; you get random bytes from the data buffer.
Writing NULL
#![allow(unused)] fn main() { unsafe { writer.set_null(row) }; }
Pitfall L4:
VectorWriter::set_nullcallsduckdb_vector_ensure_validity_writablebefore accessing the validity bitmap. Callingduckdb_vector_get_validitywithout this prerequisite returns an uninitialized pointer → SEGFAULT. Never write NULL manually; always useset_null. See Pitfall L4.
VARCHAR reading
Read VARCHAR columns with VectorReader::read_str:
#![allow(unused)] fn main() { let s: &str = unsafe { reader.read_str(row) }; }
The returned &str borrows from the DuckDB vector — it must not outlive the
callback. Do not store it in a struct; clone it to a String if you need to
keep it.
The duckdb_string_t format
Pitfall P7 — The
duckdb_string_tformat is not documented in the Rust bindings. This is the internalized knowledge encoded inquack-rs.
DuckDB stores VARCHAR values in a 16-byte duckdb_string_t struct with two
representations, selected at runtime based on string length:
| Format | Condition | Layout |
|---|---|---|
| Inline | length ≤ 12 | [len: u32][data: [u8; 12]] |
| Pointer | length > 12 | [len: u32][prefix: [u8; 4]][ptr: *const u8][unused: u32] |
VectorReader::read_str and the underlying read_duck_string function handle
both formats transparently. You never need to inspect the raw struct.
Empty strings vs NULL
An empty string ("") and NULL are distinct values:
#![allow(unused)] fn main() { // NULL: is_valid returns false // Empty string: is_valid returns true, read_str returns "" if unsafe { !reader.is_valid(row) } { // This is NULL } else { let s = unsafe { reader.read_str(row) }; if s.is_empty() { // This is an empty string, not NULL } } }
Writing VARCHAR
#![allow(unused)] fn main() { unsafe { writer.write_varchar(row, my_str) }; // &str }
write_varchar copies the string bytes into DuckDB's managed storage. The
&str reference is no longer needed after the call returns.
Complete NULL-safe VARCHAR pattern
#![allow(unused)] fn main() { unsafe extern "C" fn my_scalar( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; for row in 0..reader.row_count() { if unsafe { !reader.is_valid(row) } { unsafe { writer.set_null(row) }; continue; } let s = unsafe { reader.read_str(row) }; let upper = s.to_uppercase(); unsafe { writer.write_varchar(row, &upper) }; } } }
DuckStringView
For advanced use cases where you need access to the raw string bytes or the
inline/pointer distinction, quack_rs::vector::string::DuckStringView is
available:
#![allow(unused)] fn main() { use quack_rs::vector::string::{DuckStringView, DUCK_STRING_SIZE}; // From raw 16-byte data (inside a vector callback) let raw: &[u8; 16] = unsafe { &*data.add(idx * DUCK_STRING_SIZE).cast() }; let view = DuckStringView::from_bytes(raw); println!("length: {}", view.len()); println!("is_empty: {}", view.is_empty()); if let Some(s) = view.as_str() { println!("content: {s}"); } }
In practice, prefer reader.read_str(row) — DuckStringView is only needed
when you have a raw pointer and want to avoid creating a full VectorReader.
Constants
| Constant | Value | Meaning |
|---|---|---|
DUCK_STRING_SIZE | 16 | Size of one duckdb_string_t in bytes |
DUCK_STRING_INLINE_MAX_LEN | 12 | Max length stored inline (no heap ptr) |
INTERVAL Type
DuckDB's INTERVAL type represents a duration with three independent components:
months, days, and sub-day microseconds. The quack_rs::interval module provides
the DuckInterval struct and safe conversion utilities.
Why a custom struct?
Pitfall P8 — The
INTERVALstruct layout and its conversion semantics are not documented in the Rust bindings. This module encodes that knowledge.
DuckDB's C duckdb_interval struct is 16 bytes with this exact layout:
offset 0: months (i32) — calendar months
offset 4: days (i32) — calendar days
offset 8: micros (i64) — sub-day microseconds
total: 16 bytes
DuckInterval is #[repr(C)] with the same field order and is verified at
compile time to be exactly 16 bytes.
Reading INTERVAL values
#![allow(unused)] fn main() { let iv: DuckInterval = unsafe { reader.read_interval(row) }; println!("{} months, {} days, {} µs", iv.months, iv.days, iv.micros); }
VectorReader::read_interval handles the raw pointer arithmetic and alignment
using read_interval_at internally.
DuckInterval fields
#![allow(unused)] fn main() { use quack_rs::interval::DuckInterval; let iv = DuckInterval { months: 1, // 1 calendar month days: 15, // 15 calendar days micros: 3600_000_000, // 1 hour in microseconds }; }
Fields are public and can be constructed directly.
Zero interval
#![allow(unused)] fn main() { let zero = DuckInterval::zero(); // { months: 0, days: 0, micros: 0 } let zero = DuckInterval::default(); // same }
Converting to microseconds
Intervals are not directly comparable because months and days have variable lengths in wall-clock time. When you need a single numeric value, convert to microseconds using the DuckDB approximation: 1 month = 30 days.
Checked conversion (returns Option)
#![allow(unused)] fn main() { use quack_rs::interval::interval_to_micros; let iv = DuckInterval { months: 0, days: 1, micros: 500_000 }; match interval_to_micros(iv) { Some(us) => println!("{us} microseconds"), None => println!("overflow"), } // Method form: let us: Option<i64> = iv.to_micros(); }
Returns None if the result would overflow i64. This can happen with extreme
values (e.g., months: i32::MAX).
Saturating conversion (never panics)
#![allow(unused)] fn main() { use quack_rs::interval::interval_to_micros_saturating; let iv = DuckInterval { months: i32::MAX, days: i32::MAX, micros: i64::MAX }; let us: i64 = interval_to_micros_saturating(iv); // i64::MAX // Method form: let us: i64 = iv.to_micros_saturating(); }
Use the saturating form in FFI callbacks where panics are not allowed.
Conversion constants
| Constant | Value | Meaning |
|---|---|---|
MICROS_PER_DAY | 86_400_000_000 | Microseconds in 24 hours |
MICROS_PER_MONTH | 2_592_000_000_000 | Microseconds in 30 days |
#![allow(unused)] fn main() { use quack_rs::interval::{MICROS_PER_DAY, MICROS_PER_MONTH}; assert_eq!(MICROS_PER_DAY, 86_400 * 1_000_000); assert_eq!(MICROS_PER_MONTH, 30 * MICROS_PER_DAY); }
Low-level: read_interval_at
If you have a raw data pointer (e.g., from duckdb_vector_get_data), you can
read an interval directly:
#![allow(unused)] fn main() { use quack_rs::interval::read_interval_at; // SAFETY: data is a valid DuckDB INTERVAL vector data pointer, idx is in bounds. let iv = unsafe { read_interval_at(data_ptr, row_idx) }; }
In practice you should use VectorReader::read_interval(row) instead, which
handles all safety invariants.
Complete example: aggregate over INTERVAL
#![allow(unused)] fn main() { #[derive(Default)] struct TotalDurationState { total_micros: i64, } impl AggregateState for TotalDurationState {} unsafe extern "C" fn update( _info: duckdb_function_info, input: duckdb_data_chunk, states: *mut duckdb_aggregate_state, ) { let reader = unsafe { VectorReader::new(input, 0) }; for row in 0..reader.row_count() { if unsafe { !reader.is_valid(row) } { continue; } let iv = unsafe { reader.read_interval(row) }; let us = iv.to_micros_saturating(); let state_ptr = unsafe { *states.add(row) }; if let Some(st) = unsafe { FfiState::<TotalDurationState>::with_state_mut(state_ptr) } { st.total_micros = st.total_micros.saturating_add(us); } } } }
Memory layout verification
DuckInterval includes a compile-time assertion that validates its size and
alignment against DuckDB's C struct. If the assertion fails, the crate will not
compile — catching any future mismatch at build time rather than runtime.
Testing Guide
quack-rs provides a two-tier testing strategy: pure-Rust unit tests for business logic (no DuckDB required), and SQLLogicTest E2E tests that run inside an actual DuckDB process.
Architectural limitation: the loadable-extension dispatch wall
This is the most important thing to understand before writing tests.
DuckDB loadable extensions use libduckdb-sys with
features = ["loadable-extension"]. This intentionally does not link the
DuckDB runtime into the extension binary. Instead, every DuckDB C API call
(duckdb_vector_get_data, duckdb_create_logical_type, etc.) goes through a
lazy dispatch table — a global struct of AtomicPtr<fn> pointers initialized
only when DuckDB calls duckdb_rs_extension_api_init at extension-load time.
In cargo test, no DuckDB process loads your extension. The dispatch table
is never initialized, and the first call to any DuckDB C API function panics:
DuckDB API not initialized
What this breaks
| API | Why it fails |
|---|---|
VectorReader::new | calls duckdb_vector_get_data |
VectorWriter::new | calls duckdb_vector_get_data |
Connection::register_* | calls DuckDB registration C API |
LogicalType::new | calls duckdb_create_logical_type |
LogicalType::drop | calls duckdb_destroy_logical_type |
BindInfo::add_result_column | calls duckdb_bind_add_result_column |
What still works in cargo test
| API | Why it works |
|---|---|
AggregateTestHarness | pure Rust, zero DuckDB dependency |
MockVectorWriter / MockVectorReader | in-memory buffers, zero DuckDB dependency |
MockRegistrar | records registrations without calling C API |
SqlMacro::to_sql() | generates SQL strings, no DuckDB needed |
interval_to_micros | pure arithmetic |
validate / scaffold | pure Rust |
InMemoryDb | uses bundled DuckDB via duckdb crate (bundled-test feature) |
Mock types for callback logic
When your scalar or table function callback reads inputs and writes outputs,
extract that logic into a pure-Rust function. Then test it with
MockVectorReader (input) and MockVectorWriter (output):
#![allow(unused)] fn main() { use quack_rs::testing::{MockVectorReader, MockVectorWriter}; // Pure Rust logic — extracted from the FFI callback fn compute_upper(reader: &MockVectorReader, writer: &mut MockVectorWriter) { for i in 0..reader.row_count() { if reader.is_valid(i) { let s = reader.try_get_str(i).unwrap_or(""); writer.write_varchar(i, &s.to_uppercase()); } else { writer.set_null(i); } } } #[test] fn test_compute_upper() { let reader = MockVectorReader::from_strs([Some("hello"), None, Some("world")]); let mut writer = MockVectorWriter::new(3); compute_upper(&reader, &mut writer); assert_eq!(writer.try_get_str(0), Some("HELLO")); assert!(writer.is_null(1)); assert_eq!(writer.try_get_str(2), Some("WORLD")); } }
The real FFI callback becomes a thin wrapper:
#![allow(unused)] fn main() { unsafe extern "C" fn my_scalar( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { // Real DuckDB wrappers — only used in production, not in cargo test let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; // TODO: adapt mock-compatible logic to real readers/writers } }
Testing registration with MockRegistrar
MockRegistrar implements the Registrar trait without calling any DuckDB C API.
Use it to verify your registration function registers the right set of functions:
#![allow(unused)] fn main() { use quack_rs::connection::Registrar; use quack_rs::testing::MockRegistrar; use quack_rs::scalar::ScalarFunctionBuilder; use quack_rs::types::TypeId; use quack_rs::error::ExtensionError; fn register_all(reg: &impl Registrar) -> Result<(), ExtensionError> { let upper = ScalarFunctionBuilder::new("upper_ext") .param(TypeId::Varchar) .returns(TypeId::Varchar); let lower = ScalarFunctionBuilder::new("lower_ext") .param(TypeId::Varchar) .returns(TypeId::Varchar); unsafe { reg.register_scalar(upper)?; reg.register_scalar(lower)?; } Ok(()) } #[test] fn test_register_all() { let mock = MockRegistrar::new(); register_all(&mock).unwrap(); assert_eq!(mock.total_registrations(), 2); assert!(mock.has_scalar("upper_ext")); assert!(mock.has_scalar("lower_ext")); } }
Limitation:
MockRegistrarcannot be used with builders that holdLogicalTypevalues (created via.returns_logical()or.param_logical()), becauseLogicalType::dropcallsduckdb_destroy_logical_type. UseTypeIdparameters withMockRegistrar.
SQL-level testing with InMemoryDb (bundled-test feature)
For SQL-level assertions — verifying that a SQL macro produces the correct output,
or that a CREATE TABLE + INSERT + SELECT pipeline works — enable the bundled-test
Cargo feature. This provides InMemoryDb, which wraps the duckdb crate's bundled
DuckDB and automatically initialises the loadable-extension dispatch table before
opening a connection (see Pitfall P9):
# In your extension's Cargo.toml
[dev-dependencies]
quack-rs = { version = "0.7", features = ["bundled-test"] }
Build time: enabling
bundled-testcompiles a full copy of DuckDB from source (theduckdbRust crate withfeatures = ["bundled"]) and a small C++ shim via theccbuild dependency. Expect a 2–5 minute incremental build the first time, depending on your machine. This only affects the test build — it has no impact on your extension's release binary.
#![allow(unused)] fn main() { #[cfg(feature = "bundled-test")] use quack_rs::testing::InMemoryDb; use quack_rs::sql_macro::SqlMacro; #[test] fn test_clamp_macro_sql() { let db = InMemoryDb::open().unwrap(); // Generate and execute the CREATE MACRO SQL let m = SqlMacro::scalar("clamp", &["x", "lo", "hi"], "greatest(lo, least(hi, x))").unwrap(); db.execute_batch(&m.to_sql()).unwrap(); // Verify correct output let result: i64 = db.query_one("SELECT clamp(5, 1, 10)").unwrap(); assert_eq!(result, 5); let clamped: i64 = db.query_one("SELECT clamp(15, 1, 10)").unwrap(); assert_eq!(clamped, 10); } }
Note:
InMemoryDbcannot test your FFI callbacks (VectorReader,VectorWriter) because those still route through theloadable-extensiondispatch. UseInMemoryDbfor SQL logic and mocks for callback logic.
Why two tiers?
Pitfall P3 — Unit tests are insufficient. 435 unit tests passed in duckdb-behavioral while the extension had three critical bugs: a SEGFAULT on load, 6 of 7 functions not registering, and wrong results from a combine bug. E2E tests caught all three.
| Test tier | What it catches | What it misses |
|---|---|---|
| Unit tests | Logic bugs in state structs | FFI wiring, registration failures, SEGFAULT |
| E2E tests | Everything above + FFI integration | Nothing (it's real DuckDB) |
Both tiers are required. Unit tests give fast, deterministic feedback. E2E tests prove the extension actually works inside DuckDB.
Unit tests with AggregateTestHarness
AggregateTestHarness<S> simulates the DuckDB aggregate lifecycle in pure Rust
without any DuckDB dependency:
flowchart LR
N["new()"] --> U["update() × N"]
U --> C["combine() *(optional)*"]
C --> F["finalize()"]
Basic usage
#![allow(unused)] fn main() { use quack_rs::testing::AggregateTestHarness; use quack_rs::aggregate::AggregateState; #[derive(Default, Debug, PartialEq)] struct SumState { total: i64 } impl AggregateState for SumState {} #[test] fn test_sum() { let mut h = AggregateTestHarness::<SumState>::new(); h.update(|s| s.total += 10); h.update(|s| s.total += 20); h.update(|s| s.total += 5); assert_eq!(h.finalize().total, 35); } }
Convenience: aggregate
For testing over a collection of inputs:
#![allow(unused)] fn main() { #[test] fn test_word_count() { let result = AggregateTestHarness::<WordCountState>::aggregate( ["hello world", "one", "two three four", ""], |s, text| s.count += count_words(text), ); assert_eq!(result.count, 6); // 2 + 1 + 3 + 0 } }
Testing combine (Pitfall L1)
DuckDB creates fresh zero-initialized target states and calls combine to merge
into them. You MUST propagate ALL fields — including configuration fields —
not just accumulated data. Test this explicitly:
#![allow(unused)] fn main() { #[test] fn combine_propagates_config() { let mut h1 = AggregateTestHarness::<MyState>::new(); h1.update(|s| { s.window_size = 3600; // config field s.count += 5; // data field }); // h2 simulates a fresh zero-initialized state created by DuckDB let mut h2 = AggregateTestHarness::<MyState>::new(); h2.combine(&h1, |src, tgt| { tgt.window_size = src.window_size; // MUST propagate config tgt.count += src.count; }); let result = h2.finalize(); assert_eq!(result.window_size, 3600); // Would be 0 if forgotten assert_eq!(result.count, 5); } }
Inspecting intermediate state
#![allow(unused)] fn main() { let mut h = AggregateTestHarness::<SumState>::new(); h.update(|s| s.total += 5); assert_eq!(h.state().total, 5); // borrow without consuming h.update(|s| s.total += 3); assert_eq!(h.state().total, 8); }
Resetting
#![allow(unused)] fn main() { let mut h = AggregateTestHarness::<SumState>::new(); h.update(|s| s.total = 999); h.reset(); assert_eq!(h.state().total, 0); // back to S::default() }
Pre-populating state
#![allow(unused)] fn main() { let initial = MyState { window_size: 3600, count: 0 }; let h = AggregateTestHarness::with_state(initial); }
Unit tests for scalar functions
Scalar logic is pure Rust — test it directly:
#![allow(unused)] fn main() { // From examples/hello-ext/src/lib.rs — scalar function logic pub fn first_word(s: &str) -> &str { s.split_whitespace().next().unwrap_or("") } #[test] fn first_word_basic() { assert_eq!(first_word("hello world"), "hello"); assert_eq!(first_word(" padded "), "padded"); assert_eq!(first_word(""), ""); assert_eq!(first_word(" "), ""); } }
Unit tests for SQL macros
SqlMacro::to_sql() is pure Rust — no DuckDB connection needed:
#![allow(unused)] fn main() { use quack_rs::sql_macro::SqlMacro; #[test] fn scalar_macro_sql() { let m = SqlMacro::scalar("double_it", &["x"], "x * 2").unwrap(); assert_eq!(m.to_sql(), "CREATE OR REPLACE MACRO double_it(x) AS (x * 2)"); } #[test] fn table_macro_sql() { let m = SqlMacro::table("recent", &["n"], "SELECT * FROM events LIMIT n").unwrap(); assert_eq!(m.to_sql(), "CREATE OR REPLACE MACRO recent(n) AS TABLE SELECT * FROM events LIMIT n"); } }
E2E testing with SQLLogicTest
Community extensions are tested using DuckDB's SQLLogicTest format. This format runs SQL directly in DuckDB and verifies output line-by-line.
File location
test/sql/my_extension.test
Format
# my_extension tests
require my_extension
statement ok
LOAD my_extension;
query I
SELECT my_function('hello world');
----
2
Directives:
| Directive | Meaning |
|---|---|
require | Skip test if extension not available |
statement ok | SQL must succeed |
statement error | SQL must fail |
query I | Query returning one INTEGER column |
query II | Query returning two columns |
query T | Query returning one TEXT column |
---- | Expected output follows |
Installing DuckDB (1.4.4, 1.5.0, or 1.5.1)
A live DuckDB CLI is required for E2E testing. Install it via curl
(no system package manager needed). DuckDB 1.4.4, 1.5.0, or 1.5.1 all work —
they use the same C API version (v1.2.0). We recommend 1.5.1 for critical
WAL and ART index fixes:
# DuckDB 1.5.1 (recommended)
curl -fsSL https://github.com/duckdb/duckdb/releases/download/v1.5.1/duckdb_cli-linux-amd64.zip \
-o /tmp/duckdb.zip \
&& unzip -o /tmp/duckdb.zip -d /tmp/ \
&& chmod +x /tmp/duckdb \
&& /tmp/duckdb --version
# → v1.5.1
For macOS, replace linux-amd64 with osx-universal. For Windows, use
windows-amd64 and unzip to a directory on %PATH%.
Running E2E tests
# Build the extension
cargo build --release
# Package with metadata footer (required by DuckDB's extension loader)
cargo run --bin append_metadata -- \
target/release/libmy_extension.so \
/tmp/my_extension.duckdb_extension \
--abi-type C_STRUCT \
--extension-version v0.1.0 \
--duckdb-version v1.2.0 \
--platform linux_amd64
# Load it in DuckDB CLI (-unsigned allows loading without a signed certificate)
/tmp/duckdb -unsigned -c "
SET allow_extensions_metadata_mismatch=true;
LOAD '/tmp/my_extension.duckdb_extension';
SELECT my_function('hello world');
"
The community extension CI runs SQLLogicTest automatically. Each function must have at least one test:
# Test NULL handling
query I
SELECT my_function(NULL);
----
NULL
# Test empty input
query I
SELECT my_function('');
----
0
# Test normal case
query I
SELECT my_function('hello world');
----
2
Pitfall P5 — SQLLogicTest does exact string matching. Copy expected values directly from DuckDB CLI output. NULL is represented as
NULL(uppercase). Floats must match to the number of decimal places DuckDB outputs.
Property-based testing with proptest
The proptest crate is well-suited for testing aggregate logic over arbitrary
inputs:
#![allow(unused)] fn main() { use proptest::prelude::*; proptest! { #[test] fn saturating_never_panics(months: i32, days: i32, micros: i64) { let iv = DuckInterval { months, days, micros }; // Must not panic for any input let _ = interval_to_micros_saturating(iv); } } }
quack-rs's own test suite uses proptest for interval conversion and aggregate harness properties.
What to test
| Scenario | Unit | E2E |
|---|---|---|
| NULL input → NULL output | ✓ | |
| Empty string | ✓ | ✓ |
| Unicode strings | ✓ | |
| Numeric edge cases (0, MAX, MIN) | ✓ | |
| Combine propagates config | ✓ | |
| Multi-group aggregation | ✓ | |
| Function registration success | ✓ | |
| Extension loads without crash | ✓ | |
| SQL macro produces correct output | ✓ (to_sql) | ✓ |
Dev dependencies
[dev-dependencies]
quack-rs = { version = "0.7", features = [] }
proptest = "1"
The testing module is compiled unconditionally (not #[cfg(test)]) so it is
available as a dev-dependency to downstream crates.
Community Extensions
DuckDB's community extension ecosystem allows anyone to publish a loadable extension that DuckDB users can install with a single SQL command. This page covers everything you need to submit and maintain a community extension built with quack-rs.
Prerequisites
- A working extension that passes local E2E tests
- A GitHub repository (the community build runs from it)
- All functions tested with SQLLogicTest format
- A globally unique extension name
Scaffolding a new project
quack_rs::scaffold::generate_scaffold generates all required files from a
single function call:
#![allow(unused)] fn main() { use quack_rs::scaffold::{ScaffoldConfig, generate_scaffold}; let config = ScaffoldConfig { name: "my_extension".to_string(), description: "Does something useful".to_string(), version: "0.1.0".to_string(), license: "MIT".to_string(), maintainer: "Your Name".to_string(), github_repo: "yourorg/duckdb-my-extension".to_string(), excluded_platforms: vec![], }; let files = generate_scaffold(&config).expect("scaffold failed"); for file in &files { std::fs::create_dir_all(std::path::Path::new(&file.path).parent().unwrap()).unwrap(); std::fs::write(&file.path, &file.content).unwrap(); } }
This generates:
my_extension/
├── Cargo.toml
├── Makefile
├── extension_config.cmake
├── src/lib.rs
├── src/wasm_lib.rs
├── description.yml
├── test/sql/my_extension.test
├── .github/workflows/extension-ci.yml
├── .gitmodules
├── .gitignore
└── .cargo/config.toml
description.yml
Required fields for community submission:
extension:
name: my_extension
description: One-line description of what your extension does
version: 0.1.0
language: Rust
build: cargo
license: MIT
requires_toolchains: rust;python3
excluded_platforms: "" # or "wasm_mvp;wasm_eh;wasm_threads"
maintainers:
- Your Name
repo:
github: yourorg/duckdb-my-extension
ref: main
Use quack_rs::validate to pre-validate fields before submission:
#![allow(unused)] fn main() { use quack_rs::validate::{ validate_extension_name, validate_extension_version, validate_spdx_license, validate_excluded_platforms_str, }; validate_extension_name("my_extension")?; validate_extension_version("0.1.0")?; validate_spdx_license("MIT")?; validate_excluded_platforms_str("wasm_mvp;wasm_eh")?; }
Naming rules
Extension names must satisfy all of the following:
- Match
^[a-z][a-z0-9_-]*$(lowercase, digits, hyphens, underscores) - Not exceed 64 characters
- Be globally unique across the entire DuckDB community extensions ecosystem
Check existing names at community-extensions.duckdb.org before choosing. Use vendor-prefixed names to avoid collisions:
myorg_analytics ✓
analytics ✗ (likely taken or too generic)
Pitfall P1 — The
[lib] nameinCargo.tomlMUST exactly match the extension name. If your crate name isduckdb-my-ext(producinglibduckdb_my_ext.so) butdescription.ymlsaysname: my_ext, the community build fails withFileNotFoundError.
Versioning
| Format | Example | Meaning |
|---|---|---|
| 7+ hex chars | 690bfc5 | Unstable — no guarantees |
0.y.z | 0.1.0 | Pre-release — working toward stability |
x.y.z (x > 0) | 1.0.0 | Stable — full semver guarantees |
Use validate_extension_version to accept all three formats, and
classify_extension_version to determine the stability tier:
#![allow(unused)] fn main() { use quack_rs::validate::semver::classify_extension_version; match classify_extension_version("0.1.0")? { ExtensionStability::Unstable => println!("git hash"), ExtensionStability::PreRelease => println!("0.y.z"), ExtensionStability::Stable => println!("x.y.z, x>0"), } }
Platform targets
Community extensions are built for:
| Platform | Description |
|---|---|
linux_amd64 | Linux x86_64 |
linux_amd64_gcc4 | Linux x86_64 (GCC 4 ABI) |
linux_arm64 | Linux AArch64 |
osx_amd64 | macOS x86_64 |
osx_arm64 | macOS Apple Silicon |
windows_amd64 | Windows x86_64 |
windows_amd64_mingw | Windows x86_64 (MinGW) |
windows_arm64 | Windows AArch64 |
wasm_mvp | WebAssembly (MVP) |
wasm_eh | WebAssembly (exception handling) |
wasm_threads | WebAssembly (threads) |
If your extension cannot be built for a platform (e.g., it uses a
platform-specific system library), add it to excluded_platforms:
#![allow(unused)] fn main() { ScaffoldConfig { excluded_platforms: vec![ "wasm_mvp".to_string(), "wasm_eh".to_string(), "wasm_threads".to_string(), ], // ... } }
Validate individual platform names with validate_platform:
#![allow(unused)] fn main() { use quack_rs::validate::validate_platform; validate_platform("linux_amd64")?; // Ok validate_platform("invalid")?; // Err }
Cargo.toml requirements
[package]
name = "my_extension"
version = "0.1.0"
edition = "2021"
[lib]
name = "my_extension" # Must match description.yml `name`
crate-type = ["cdylib", "rlib"]
[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }
[profile.release]
panic = "abort" # Required — no stack unwinding in FFI
opt-level = 3
lto = "thin"
strip = "symbols"
Pitfall ADR-1 — Do NOT use the
duckdbcrate'sbundledfeature. A loadable extension must link against the DuckDB that loads it, not bundle its own copy.libduckdb-syswithloadable-extensionprovides lazy function pointers populated by DuckDB at load time.
Release profile check
The validate_release_profile validator checks that your release profile is
correctly configured:
#![allow(unused)] fn main() { use quack_rs::validate::validate_release_profile; // Pass all four release profile settings from your Cargo.toml validate_release_profile("abort", "true", "3", "1")?; // Ok validate_release_profile("unwind", "true", "3", "1")?; // Err — panics across FFI are UB }
CI workflow
The scaffold generates .github/workflows/extension-ci.yml which:
- Runs on push and pull request
- Checks, lints, and tests in Rust (all platforms)
- Calls
extension-ci-toolsto build the.duckdb_extensionartifact - Runs SQLLogicTest integration tests
After scaffolding:
cd my_extension
git init
git submodule add https://github.com/duckdb/extension-ci-tools.git extension-ci-tools
git submodule update --init --recursive
make configure
make release
Pitfall P4 — The
extension-ci-toolssubmodule must be initialized.make configurefails if the submodule is missing.
Submitting to the community registry
- Create a pull request against the community-extensions repository
- Add your
description.ymlunderextensions/my_extension/description.yml - CI runs automatically to verify the build
- Once approved, users can install your extension:
INSTALL my_extension FROM community;
LOAD my_extension;
Binary compatibility
Extension binaries are tied to a specific DuckDB version. When DuckDB releases a new version:
- New binaries must be built against that version
- Old binaries will be refused by the new DuckDB runtime
- The community build pipeline re-builds all extensions for each DuckDB release
Pin libduckdb-sys with = (exact version) to ensure you always build against
the exact version you intend. The quack_rs::DUCKDB_API_VERSION constant
("v1.2.0") is passed to init_extension and must match the C API version
of your pinned libduckdb-sys.
Pitfall P2 — The
-dvflag toappend_extension_metadata.pymust be the C API version (v1.2.0), not the DuckDB release version (v1.4.4). Usequack_rs::DUCKDB_API_VERSIONto avoid hardcoding this.
Security considerations
Community extensions are not vetted for security by the DuckDB team:
- Never panic across FFI boundaries (
panic = "abort"enforces this) - Validate user inputs at system boundaries (extension entry point is the boundary)
- Do not include secrets, API keys, or credentials in your binary
- Dynamic SQL in SQL macros must not construct queries from unsanitized user data
Pitfall Catalog
All known DuckDB Rust FFI pitfalls, discovered while building duckdb-behavioral, a production DuckDB community extension. Every future developer who builds a Rust DuckDB extension will hit the majority of these. quack-rs makes most of them impossible.
L1: COMBINE must propagate ALL config fields
Status: Testable with AggregateTestHarness.
Symptom: Aggregate function returns wrong results. No error, no crash.
Root cause: DuckDB's segment tree creates fresh zero-initialized target
states via state_init, then calls combine to merge source states into them.
If your combine only propagates data fields (count, sum) but omits
configuration fields (window_size, mode), the configuration will be zero at
finalize time, silently corrupting results.
This bug passed 435 unit tests before being caught by E2E tests.
Fix:
#![allow(unused)] fn main() { unsafe extern "C" fn combine( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, target: *mut duckdb_aggregate_state, count: idx_t, ) { for i in 0..count as usize { let src_ptr = unsafe { *source.add(i) }; let tgt_ptr = unsafe { *target.add(i) }; if let (Some(src), Some(tgt)) = ( FfiState::<MyState>::with_state(src_ptr), FfiState::<MyState>::with_state_mut(tgt_ptr), ) { tgt.window_size = src.window_size; // config — MUST copy tgt.mode = src.mode; // config — MUST copy tgt.count += src.count; // data — accumulate } } } }
Test this with AggregateTestHarness::combine — see Testing Guide.
L2: State destroy double-free
Status: Made impossible by FfiState<T>.
Symptom: Crash or memory corruption on extension unload.
Root cause: If state_destroy frees the inner Box but does not null the
pointer, a second state_destroy call (common in error paths) frees
already-freed memory → undefined behavior.
Fix: FfiState<T>::destroy_callback nulls inner after freeing. Use it
instead of writing your own destructor:
#![allow(unused)] fn main() { unsafe extern "C" fn state_destroy(states: *mut duckdb_aggregate_state, count: idx_t) { unsafe { FfiState::<MyState>::destroy_callback(states, count) }; } }
L3: No panic across FFI boundaries
Status: Made impossible by init_extension and panic = "abort".
Symptom: Extension causes DuckDB to crash or behave unpredictably.
Root cause: panic!() and .unwrap() in unsafe extern "C" functions is
undefined behavior. Panics cannot unwind across FFI boundaries in Rust.
Fix: Use Result and ? inside init_extension. Never use unwrap() in
FFI callbacks. FfiState::with_state_mut returns Option, not Result, so
callers use if let:
#![allow(unused)] fn main() { // Safe pattern — no unwrap in FFI callback if let Some(st) = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) } { st.count += 1; } // Dangerous — never do this in an FFI callback let st = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) }.unwrap(); // UB if None }
The scaffold-generated Cargo.toml sets panic = "abort" in the release
profile, which terminates the process instead of unwinding — still bad, but not
undefined behavior.
L4: ensure_validity_writable is required before NULL output
Status: Made impossible by VectorWriter::set_null.
Symptom: SEGFAULT when writing NULL values to the output vector.
Root cause: duckdb_vector_get_validity returns an uninitialized pointer if
duckdb_vector_ensure_validity_writable has not been called first. Writing to
an uninitialized address → SEGFAULT.
Fix: Always call duckdb_vector_ensure_validity_writable before accessing
the validity bitmap on the write path. VectorWriter::set_null does this
automatically:
#![allow(unused)] fn main() { // Correct — handled by set_null unsafe { writer.set_null(row) }; // Wrong — validity bitmap may not be allocated yet // let validity = duckdb_vector_get_validity(output); // set_bit(validity, row, false); // SEGFAULT }
L5: Boolean reading must use u8 != 0, not *const bool
Status: Made impossible by VectorReader::read_bool.
Symptom: Undefined behavior; Rust requires bool to be exactly 0 or 1.
Root cause: DuckDB's C API does not guarantee that boolean values in vectors
are exactly 0 or 1. Values of 2, 255, etc. cast to Rust bool is undefined
behavior.
Fix: Read as u8 and compare with != 0. VectorReader::read_bool always
does this:
#![allow(unused)] fn main() { let b: bool = unsafe { reader.read_bool(row) }; // safe: uses u8 != 0 internally }
L6: Function set name must be set on EACH member
Status: Made impossible by AggregateFunctionSetBuilder.
Symptom: Functions are silently not registered. No error returned.
Root cause: When using duckdb_register_aggregate_function_set, the function
name must be set on EACH individual duckdb_aggregate_function using
duckdb_aggregate_function_set_name, not just on the set.
This is completely undocumented. Discovered by reading DuckDB's C++ test code
at test/api/capi/test_capi_aggregate_functions.cpp.
In duckdb-behavioral, 6 of 7 functions failed to register silently due to this bug.
Fix: AggregateFunctionSetBuilder calls duckdb_aggregate_function_set_name
on every individual function before adding it to the set. Use it instead of
managing the set manually.
L7: LogicalType memory leak
Status: Made impossible by LogicalType RAII wrapper.
Symptom: Memory leak proportional to number of registered functions.
Root cause: duckdb_create_logical_type allocates memory that must be freed
with duckdb_destroy_logical_type. Forgetting leaks memory.
Fix: LogicalType implements Drop and calls duckdb_destroy_logical_type
automatically when it goes out of scope.
P1: Library name must match extension name
Status: Must be configured in Cargo.toml. Scaffold handles this.
Symptom: Community build fails with FileNotFoundError.
Root cause: The community build expects lib{extension_name}.so. If the
Cargo crate name produces a different .so filename, the build fails.
Fix: Set name explicitly in [lib]:
[lib]
name = "my_extension" # Must match description.yml `name: my_extension`
crate-type = ["cdylib", "rlib"]
P2: Metadata version is C API version, not DuckDB version
Status: DUCKDB_API_VERSION constant encodes the correct value.
Symptom: Metadata script fails or produces incorrect metadata.
Root cause: The -dv flag to append_extension_metadata.py must be the
C API version (v1.2.0), not the DuckDB release version (v1.4.4). These are
different strings.
Fix: Use quack_rs::DUCKDB_API_VERSION ("v1.2.0") in init_extension,
and use the same version with append_extension_metadata.py -dv v1.2.0.
P3: E2E testing is mandatory
Status: Documented. See Testing Guide.
Symptom: All unit tests pass but the extension is completely broken.
Root cause: Unit tests cannot detect SEGFAULTs on load, silent registration failures, or wrong results from combine bugs.
Fix: Always run E2E tests using an actual DuckDB binary. The scaffold generates a complete SQLLogicTest skeleton.
P4: extension-ci-tools submodule must be initialized
Status: Build-time check.
Symptom: make configure or make release fails.
Fix:
git submodule update --init --recursive
P5: SQLLogicTest expected values must match exactly
Status: Test-authoring care required.
Symptom: Tests fail in CI but pass locally (or vice versa).
Root cause: SQLLogicTest does exact string matching. Output format (decimal places, NULL representation, column separators) must match character-for-character.
Fix: Generate expected values by running the SQL in DuckDB CLI and copying
the output. NULL is NULL (uppercase). Integers have no decimal places.
P6: duckdb_register_aggregate_function_set silently fails
Status: Builder returns Err. Also see L6.
Symptom: Function appears registered but is not found in SQL.
Root cause: The return value of duckdb_register_aggregate_function_set is
often ignored. When it returns DuckDBError, the function set is not registered.
Fix: The builder checks the return value and propagates it as Err.
P7: duckdb_string_t format is undocumented
Status: Handled by VectorReader::read_str and DuckStringView.
Symptom: VARCHAR reading produces garbage, empty strings, or crashes.
Root cause: DuckDB stores strings in a 16-byte struct with two formats
(inline ≤ 12 bytes, pointer > 12 bytes) that are not documented in
libduckdb-sys.
Fix: Use VectorReader::read_str(row). See
NULL Handling & Strings.
P8: INTERVAL struct layout is undocumented
Status: Handled by DuckInterval and read_interval_at.
Symptom: Interval calculations produce wrong results or crashes.
Root cause: DuckDB's INTERVAL is { months: i32, days: i32, micros: i64 }
(16 bytes total). This is not documented in libduckdb-sys. Month conversion
uses 1 month = 30 days (DuckDB's approximation).
Fix: Use VectorReader::read_interval(row) and DuckInterval. See
INTERVAL Type.
P9: loadable-extension dispatch table uninitialised in cargo test
Status: Fixed. InMemoryDb::open() initialises the dispatch table
automatically.
Symptom: All three InMemoryDb unit tests panic at runtime:
thread 'testing::in_memory_db::tests::in_memory_db_opens' panicked at
'DuckDB API not initialized or DuckDB feature omitted'
This failure appears only when running cargo test --features bundled-test.
Regular cargo test (no feature) does not exercise this code path, so CI can
miss it entirely.
Root cause: Cargo's feature-unification merges loadable-extension (from
the main libduckdb-sys dependency) and bundled-full (pulled in by the
duckdb crate's features = ["bundled"]) into a single libduckdb-sys build
with both features active. In loadable-extension mode every DuckDB C API
call is routed through an AtomicPtr<fn> dispatch table, which is normally
populated at extension-load time when DuckDB calls
duckdb_rs_extension_api_init. In cargo test, no DuckDB host process loads
the extension, so the table stays uninitialised and every call panics.
Discovery: This was triggered by the crates.io release workflow (which runs
--all-features) failing on macOS. Regular CI (--no-default-features,
--all-targets) never compiled the bundled-test path, so the bug was hidden
during development and code review.
Fix (implemented in quack-rs 0.6.0):
-
src/testing/bundled_api_init.cpp— a thin C++ shim that wraps DuckDB's internalCreateAPIv1()(fromduckdb/main/capi/extension_api.hpp) as a C-linkage symbol:#include "duckdb/main/capi/extension_api.hpp" extern "C" duckdb_ext_api_v1 quack_rs_create_api_v1() { return CreateAPIv1(); } -
build.rs— compiles the shim (via thecccrate) only when thebundled-testfeature is active, locating the DuckDB headers from thelibduckdb-sysbuild output directory. -
InMemoryDb::open()— callsinit_dispatch_table_once()before opening the connection. That function callsquack_rs_create_api_v1()once and feeds the result throughduckdb_rs_extension_api_init, populating all 459AtomicPtrslots in the dispatch table. Astd::sync::Onceguard makes it safe to call from any number of threads and test cases. -
CI
test-bundledjob — runscargo test --all-targets --features bundled-teston Linux, macOS, and Windows on every PR, so this class of failure is caught before release.
ABI compatibility note: DuckDB's duckdb_ext_api_v1 struct is defined
identically in both the public duckdb_extension.h (used by libduckdb-sys
bindgen) and the internal extension_api.hpp (used by CreateAPIv1()). Both
include the DUCKDB_EXTENSION_API_VERSION_UNSTABLE fields. CreateAPIv1() sets
all 459 fields. The Rust and C++ structs are produced from the same DuckDB
release and therefore stay in sync.
Risk table (using DuckDB's internal C++ API):
| Risk | Mitigation |
|---|---|
extension_api.hpp is renamed or moved | build.rs fails with a clear compile error |
CreateAPIv1() is renamed | Same — C++ compile error |
duckdb_ext_api_v1 gains new fields | CreateAPIv1() fills new fields too |
duckdb_ext_api_v1 field order changes | Both structs from same DuckDB release, stay in sync |
libduckdb-sys drops loadable-extension dispatch | Problem disappears; Once guard becomes cheap no-op |
Summary
| Pitfall | SDK status | Your action |
|---|---|---|
| L1: combine config fields | Testable | Test with AggregateTestHarness::combine |
| L2: state double-free | Prevented | Use FfiState::destroy_callback |
| L3: panic across FFI | Prevented | Use init_extension, no unwrap in callbacks |
| L4: validity bitmap SEGFAULT | Prevented | Use VectorWriter::set_null |
| L5: bool UB | Prevented | Use VectorReader::read_bool |
| L6: function set name | Prevented | Use AggregateFunctionSetBuilder |
| L7: LogicalType leak | Prevented | Use LogicalType (RAII) |
| P1: lib name mismatch | Scaffold | Set [lib] name in Cargo.toml |
| P2: API version string | Constant | Use DUCKDB_API_VERSION |
| P3: unit tests insufficient | Documented | Write SQLLogicTest E2E tests |
| P4: submodule not initialized | Build-time | git submodule update --init |
| P5: SQLLogicTest exact match | Documented | Copy output from DuckDB CLI |
| P6: register set silent fail | Prevented | Builder returns Err |
| P7: VARCHAR format undocumented | Prevented | Use VectorReader::read_str |
| P8: INTERVAL layout undocumented | Prevented | Use DuckInterval |
| P9: dispatch table uninitialised | Fixed | InMemoryDb::open() initialises it via C++ shim |
TypeId Reference
quack_rs::types::TypeId is an ergonomic enum of all DuckDB column types
supported by the builder APIs. It wraps the DUCKDB_TYPE_* integer constants
from libduckdb-sys and provides safe, named variants.
Full variant table
| Variant | SQL name | libduckdb-sys constant | Notes |
|---|---|---|---|
TypeId::Boolean | BOOLEAN | DUCKDB_TYPE_BOOLEAN | true/false stored as u8 |
TypeId::TinyInt | TINYINT | DUCKDB_TYPE_TINYINT | 8-bit signed |
TypeId::SmallInt | SMALLINT | DUCKDB_TYPE_SMALLINT | 16-bit signed |
TypeId::Integer | INTEGER | DUCKDB_TYPE_INTEGER | 32-bit signed |
TypeId::BigInt | BIGINT | DUCKDB_TYPE_BIGINT | 64-bit signed |
TypeId::UTinyInt | UTINYINT | DUCKDB_TYPE_UTINYINT | 8-bit unsigned |
TypeId::USmallInt | USMALLINT | DUCKDB_TYPE_USMALLINT | 16-bit unsigned |
TypeId::UInteger | UINTEGER | DUCKDB_TYPE_UINTEGER | 32-bit unsigned |
TypeId::UBigInt | UBIGINT | DUCKDB_TYPE_UBIGINT | 64-bit unsigned |
TypeId::HugeInt | HUGEINT | DUCKDB_TYPE_HUGEINT | 128-bit signed |
TypeId::Float | FLOAT | DUCKDB_TYPE_FLOAT | 32-bit IEEE 754 |
TypeId::Double | DOUBLE | DUCKDB_TYPE_DOUBLE | 64-bit IEEE 754 |
TypeId::Timestamp | TIMESTAMP | DUCKDB_TYPE_TIMESTAMP | µs since Unix epoch |
TypeId::TimestampTz | TIMESTAMPTZ | DUCKDB_TYPE_TIMESTAMP_TZ | timezone-aware timestamp |
TypeId::Date | DATE | DUCKDB_TYPE_DATE | days since epoch |
TypeId::Time | TIME | DUCKDB_TYPE_TIME | µs since midnight |
TypeId::Interval | INTERVAL | DUCKDB_TYPE_INTERVAL | months + days + µs |
TypeId::Varchar | VARCHAR | DUCKDB_TYPE_VARCHAR | UTF-8 string |
TypeId::Blob | BLOB | DUCKDB_TYPE_BLOB | binary data |
TypeId::Decimal | DECIMAL | DUCKDB_TYPE_DECIMAL | fixed-point decimal |
TypeId::TimestampS | TIMESTAMP_S | DUCKDB_TYPE_TIMESTAMP_S | seconds since epoch |
TypeId::TimestampMs | TIMESTAMP_MS | DUCKDB_TYPE_TIMESTAMP_MS | milliseconds since epoch |
TypeId::TimestampNs | TIMESTAMP_NS | DUCKDB_TYPE_TIMESTAMP_NS | nanoseconds since epoch |
TypeId::Enum | ENUM | DUCKDB_TYPE_ENUM | enumeration type |
TypeId::List | LIST | DUCKDB_TYPE_LIST | variable-length list |
TypeId::Struct | STRUCT | DUCKDB_TYPE_STRUCT | named fields (row type) |
TypeId::Map | MAP | DUCKDB_TYPE_MAP | key-value pairs |
TypeId::Uuid | UUID | DUCKDB_TYPE_UUID | 128-bit UUID |
TypeId::Union | UNION | DUCKDB_TYPE_UNION | tagged union of types |
TypeId::Bit | BIT | DUCKDB_TYPE_BIT | bitstring |
TypeId::TimeTz | TIMETZ | DUCKDB_TYPE_TIME_TZ | timezone-aware time |
TypeId::UHugeInt | UHUGEINT | DUCKDB_TYPE_UHUGEINT | 128-bit unsigned |
TypeId::Array | ARRAY | DUCKDB_TYPE_ARRAY | fixed-length array |
TypeId::TimeNs | TIME_NS | DUCKDB_TYPE_TIME_NS | nanosecond-precision time (duckdb-1-5) |
TypeId::Any | ANY | DUCKDB_TYPE_ANY | wildcard for function signatures (duckdb-1-5) |
TypeId::Varint | VARINT | DUCKDB_TYPE_BIGNUM | variable-length integer (duckdb-1-5) |
TypeId::SqlNull | SQLNULL | DUCKDB_TYPE_SQLNULL | explicit SQL NULL type (duckdb-1-5) |
TypeId::IntegerLiteral | INTEGER_LITERAL | DUCKDB_TYPE_INTEGER_LITERAL | unresolved integer literal (duckdb-1-5) |
TypeId::StringLiteral | STRING_LITERAL | DUCKDB_TYPE_STRING_LITERAL | unresolved string literal (duckdb-1-5) |
Methods
to_duckdb_type() → DUCKDB_TYPE
Converts to the raw C API integer constant. Used internally by the builder APIs.
#![allow(unused)] fn main() { use quack_rs::types::TypeId; let raw: libduckdb_sys::DUCKDB_TYPE = TypeId::BigInt.to_duckdb_type(); }
from_duckdb_type(raw) → TypeId
Converts a raw DUCKDB_TYPE constant back into a TypeId. Panics if the value
does not match any known DUCKDB_TYPE constant.
#![allow(unused)] fn main() { use quack_rs::types::TypeId; let type_id = TypeId::from_duckdb_type(libduckdb_sys::DUCKDB_TYPE_DUCKDB_TYPE_BIGINT); assert_eq!(type_id, TypeId::BigInt); }
sql_name() → &'static str
Returns the SQL type name as a static string.
#![allow(unused)] fn main() { assert_eq!(TypeId::BigInt.sql_name(), "BIGINT"); assert_eq!(TypeId::Varchar.sql_name(), "VARCHAR"); assert_eq!(TypeId::TimestampTz.sql_name(), "TIMESTAMPTZ"); }
Display
TypeId implements Display, which outputs the SQL name:
#![allow(unused)] fn main() { println!("{}", TypeId::Interval); // prints: INTERVAL let s = format!("{}", TypeId::UBigInt); // "UBIGINT" }
VectorReader/VectorWriter mapping
The read and write methods on VectorReader/VectorWriter map to TypeId
variants as follows:
| TypeId | Read method | Write method | Rust type |
|---|---|---|---|
Boolean | read_bool | write_bool | bool |
TinyInt | read_i8 | write_i8 | i8 |
SmallInt | read_i16 | write_i16 | i16 |
Integer | read_i32 | write_i32 | i32 |
BigInt | read_i64 | write_i64 | i64 |
UTinyInt | read_u8 | write_u8 | u8 |
USmallInt | read_u16 | write_u16 | u16 |
UInteger | read_u32 | write_u32 | u32 |
UBigInt | read_u64 | write_u64 | u64 |
Float | read_f32 | write_f32 | f32 |
Double | read_f64 | write_f64 | f64 |
Varchar | read_str | write_varchar | &str |
Interval | read_interval | write_interval | DuckInterval |
HugeInt, Blob, List, Struct, Map, Uuid, Date, Time, Timestamp,
TimestampTz, Decimal, TimestampS, TimestampMs, TimestampNs, Enum,
Union, Bit, TimeTz, UHugeInt, Array, TimeNs, Any, Varint, SqlNull,
IntegerLiteral, StringLiteral do not yet have dedicated read/write helpers.
Access these via the raw data pointer from duckdb_vector_get_data.
Properties
TypeId implements Debug, Clone, Copy, PartialEq, Eq, and Hash,
making it usable as map keys, set elements, and in match expressions:
#![allow(unused)] fn main() { use std::collections::HashMap; use quack_rs::types::TypeId; let mut type_names: HashMap<TypeId, &str> = HashMap::new(); type_names.insert(TypeId::BigInt, "count"); type_names.insert(TypeId::Varchar, "label"); }
#[non_exhaustive]
TypeId is marked #[non_exhaustive]. This means future DuckDB versions may
add new variants without it being a breaking change. If you match on TypeId,
include a wildcard arm:
#![allow(unused)] fn main() { match type_id { TypeId::BigInt => { /* ... */ } TypeId::Varchar => { /* ... */ } _ => { /* handle future types */ } } }
LogicalType
For types that require runtime parameters (such as DECIMAL(p, s) or
parameterized LIST), use quack_rs::types::LogicalType:
#![allow(unused)] fn main() { use quack_rs::types::{LogicalType, TypeId}; let lt = LogicalType::new(TypeId::BigInt); // or use the From impl: let lt: LogicalType = TypeId::BigInt.into(); // LogicalType implements Drop → calls duckdb_destroy_logical_type automatically }
LogicalType wraps duckdb_logical_type with RAII cleanup, preventing the
memory leak described in Pitfall L7.
Constructors
| Constructor | Creates |
|---|---|
new(type_id) | Simple type from a TypeId |
from_raw(ptr) | Takes ownership of a raw handle (unsafe) |
decimal(width, scale) | DECIMAL(width, scale) |
list(element_type) | LIST<T> from a TypeId |
list_from_logical(element) | LIST<T> from an existing LogicalType |
map(key, value) | MAP<K, V> from TypeIds |
map_from_logical(key, value) | MAP<K, V> from existing LogicalTypes |
struct_type(fields) | STRUCT from &[(&str, TypeId)] |
struct_type_from_logical(fields) | STRUCT from &[(&str, LogicalType)] |
union_type(members) | UNION from &[(&str, TypeId)] |
union_type_from_logical(members) | UNION from &[(&str, LogicalType)] |
enum_type(members) | ENUM from &[&str] |
array(element_type, size) | ARRAY<T>[size] from a TypeId |
array_from_logical(element, size) | ARRAY<T>[size] from an existing LogicalType |
Introspection methods
All introspection methods are unsafe (require a valid DuckDB runtime handle):
get_type_id, get_alias, set_alias, decimal_width, decimal_scale,
decimal_internal_type, enum_internal_type, enum_dictionary_size,
enum_dictionary_value, list_child_type, map_key_type, map_value_type,
struct_child_count, struct_child_name, struct_child_type,
union_member_count, union_member_name, union_member_type,
array_size, array_child_type.
See Type System for the full introspection table.
Known Limitations
Window functions are not available
DuckDB window functions (OVER (...) clauses) are implemented entirely in
DuckDB's C++ layer and have no counterpart in the public C extension API.
This is not a gap in quack-rs or in libduckdb-sys — the relevant symbol
(duckdb_create_window_function) simply does not exist in the C API:
| Symbol | C API (1.4.x)? | C API (1.5.0+)? | C++ API? |
|---|---|---|---|
duckdb_create_window_function | No | No | Yes |
duckdb_create_copy_function | No | Yes | Yes |
duckdb_create_scalar_function | Yes | Yes | Yes |
duckdb_create_aggregate_function | Yes | Yes | Yes |
duckdb_create_table_function | Yes | Yes | Yes |
duckdb_create_cast_function | Yes | Yes | Yes |
What this means for your extension:
If your extension needs window-function semantics, you can approximate them with aggregate functions in most cases (DuckDB will push down the window logic). True custom window operator registration requires writing a C++ extension.
If DuckDB exposes window registration in a future C API version, quack-rs
will add wrappers in the corresponding release.
COPY functions (resolved in DuckDB 1.5.0)
DuckDB 1.5.0 added duckdb_create_copy_function and related symbols to the public
C extension API. quack-rs wraps these in the copy_function module behind the
duckdb-1-5 feature flag. See CopyFunctionBuilder for usage.
This was previously listed as a known limitation (no C API counterpart prior to 1.5.0).
Callback accessor wrappers (resolved)
quack-rs now wraps all major callback accessor functions — the C API functions used inside your callbacks to retrieve arguments, set errors, access bind data, etc.
| Category | Wrapper type | Available |
|---|---|---|
| Scalar function execution | ScalarFunctionInfo | Always |
| Scalar function bind | ScalarBindInfo | duckdb-1-5 |
| Scalar function init | ScalarInitInfo | duckdb-1-5 |
| Aggregate function callbacks | AggregateFunctionInfo | Always |
| Table function bind | BindInfo | Always |
| Table function init | InitInfo | Always |
| Table function scan | FunctionInfo | Always |
| Cast function callbacks | CastFunctionInfo | Always |
| Copy function bind | CopyBindInfo | duckdb-1-5 |
| Copy function global init | CopyGlobalInitInfo | duckdb-1-5 |
| Copy function sink | CopySinkInfo | duckdb-1-5 |
| Copy function finalize | CopyFinalizeInfo | duckdb-1-5 |
All callback accessor functions are now wrapped, including get_client_context
on all callback types (returns a [ClientContext][crate::client_context::ClientContext]).
Complex type creation (resolved)
LogicalType now provides constructors for all complex parameterized types:
| Method | Type created |
|---|---|
LogicalType::decimal(width, scale) | DECIMAL(p, s) |
LogicalType::enum_type(members) | ENUM('a', 'b', ...) |
LogicalType::array(child, size) | type[N] |
LogicalType::union_type(members) | UNION(a INT, b VARCHAR) |
LogicalType::list(child) | LIST(type) |
LogicalType::struct_type(fields) | STRUCT(...) |
LogicalType::map(key, value) | MAP(K, V) |
All constructors have _from_logical variants for nested complex types.
Introspection methods (get_type_id, list_child_type, struct_child_count,
decimal_width, etc.) are also available.
VARIANT type (Iceberg v3)
DuckDB v1.5.1 introduced the VARIANT type for Iceberg v3 support.
This type is not yet exposed in the DuckDB C Extension API
(DUCKDB_TYPE_VARIANT does not exist in libduckdb-sys 1.10501.0).
quack-rs will add TypeId::Variant when the C API exposes it.
Changelog
All notable changes to quack-rs, mirrored from
CHANGELOG.md.
The format follows Keep a Changelog. quack-rs adheres to Semantic Versioning.
Unreleased
[0.8.0] — 2026-03-28
Added
LogicalType::from_raw(ptr)— construct from raw handle- Complex type constructors —
decimal,array,array_from_logical,union_type,union_type_from_logical,enum_type _from_logicalvariants —struct_type_from_logical,list_from_logical,map_from_logicalfor nested complex types- 20 introspection methods on
LogicalType—get_type_id,get_alias,set_alias, decimal/enum/list/map/struct/union/array child access TypeId::from_duckdb_type()— reverse conversion from raw C enumextra_infoonScalarFunctionBuilder,ScalarOverloadBuilder,AggregateFunctionBuilderparam_logical/named_param_logicalonTableFunctionBuilderCastFunctionBuilder::new_logical()for complex source/target types- Callback info wrappers —
ScalarFunctionInfo,ScalarBindInfo(duckdb-1-5),ScalarInitInfo(duckdb-1-5),AggregateFunctionInfo,CopyBindInfo(duckdb-1-5),CopyGlobalInitInfo(duckdb-1-5),CopySinkInfo(duckdb-1-5),CopyFinalizeInfo(duckdb-1-5) get_client_context()on all callback info typesBindInfo—get_parameter,get_named_parameter,get_extra_info,get_client_contextInitInfo/FunctionInfo—get_extra_infoArrayVectorhelper withget_child()vector_size()andvector_get_column_type()utilities- Prelude —
StructVector,ListVector,MapVector,ArrayVector,ScalarFunctionInfo,AggregateFunctionInfo
Changed
- Breaking:
CastFunctionBuilder::source()/target()returnOption<TypeId>(wasTypeId) - Breaking:
CastRecord::source/targetfields changed toOption<TypeId>
0.7.1 — 2026-03-27
Added
TypeId::Any— wildcard type for function overload resolution (duckdb-1-5)TypeId::Varint— variable-length arbitrary-precision integer (duckdb-1-5)TypeId::SqlNull— explicit SQL NULL type for bareNULLliterals (duckdb-1-5)TypeId::IntegerLiteral— integer literal type for overload resolution (duckdb-1-5)TypeId::StringLiteral— string literal type for overload resolution (duckdb-1-5)MockVectorReader/MockVectorWritertests — 12 new tests for untested constructors and getters- DuckDB v1.5.1 evaluation — see
docs/duckdb-v1.5.1-evaluation.md
Fixed
- ARM64 / aarch64 build — use
c_charinstead ofi8for cross-platform pointer casts
Changed
- DuckDB v1.5.1 compatibility — documentation updated to explicitly cover
v1.5.1. C API version unchanged (
v1.2.0). Recommend upgrading DuckDB runtime for WAL corruption and ART index fixes.
0.7.0 — 2026-03-22
Added
-
duckdb-1-5feature modules — theduckdb-1-5feature flag is no longer a placeholder. When enabled, it gates five new modules wrapping DuckDB 1.5.0 C Extension API additions:catalog— catalog entry lookup (CatalogEntry,Catalog,CatalogEntryType)client_context— client context access (ClientContext) for retrieving catalogs, config options, and connection IDs from within registered function callbacksconfig_option— extension-defined configuration options (ConfigOptionBuilder,ConfigOptionScope) registered viaSET/RESET/current_setting()copy_function— customCOPY TOhandlers (CopyFunctionBuilder) with bind → global init → sink → finalize lifecycletable_description— table metadata queries (TableDescription) for column count, names, and logical types
-
TypeId::TimeNs— newTIME_NScolumn type variant for nanosecond- precision time of day (DuckDB 1.5.0+, requiresduckdb-1-5feature) -
ScalarFunctionBuilder::varargs()/varargs_logical()— mark a scalar function as accepting variadic arguments (requiresduckdb-1-5) -
ScalarFunctionBuilder::volatile()— mark a scalar function as volatile (re-evaluated for every row even with constant arguments, requiresduckdb-1-5) -
ScalarFunctionBuilder::bind()— set a bind callback invoked once during query planning for per-query state allocation (requiresduckdb-1-5) -
ScalarFunctionBuilder::init()— set an init callback invoked once per thread for per-thread local state allocation (requiresduckdb-1-5)
Changed
-
DuckDB 1.5.0 support — upgraded default
libduckdb-sysfrom 1.4.4 to 1.10500.0 (DuckDB 1.5.0) andduckdbfrom 1.4.4 to 1.10500.0. The version range">=1.4.4, <2"inCargo.tomlis unchanged, preserving backward compatibility with DuckDB 1.4.x. -
CI action updates —
Swatinem/rust-cachev2.8.2→v2.9.1,actions/download-artifactv8.0.0→v8.0.1,actions/cache5.0.3→5.0.4,codecov/codecov-action5.4.3→5.5.3.
Fixed
- COPY format handlers — previously listed as a known limitation (no C API
counterpart). DuckDB 1.5.0 adds
duckdb_create_copy_functionand related symbols; the newcopy_functionmodule wraps them behindduckdb-1-5.
0.6.0 — 2026-03-12
Added
-
InMemoryDbdispatch table initialisation —InMemoryDb::open()now correctly initialises theloadable-extensiondispatch table from bundled DuckDB symbols before opening a connection. Previously, every call panicked with"DuckDB API not initialized"when thebundled-testfeature was enabled incargo test. See Pitfall P9 for the full technical analysis. -
src/testing/bundled_api_init.cpp— thin C++ shim exposing DuckDB's internalCreateAPIv1()as a C-linkage symbol, compiled at build time via thecccrate. Populates all 459AtomicPtrdispatch table slots with real bundled DuckDB function pointers. -
build.rs— Cargo build script that locates thelibduckdb-sysinclude path and compiles the C++ shim when thebundled-testfeature is active. -
CI:
test-bundledjob — new CI job runscargo test --all-targets --features bundled-teston Linux, macOS, and Windows on every PR, closing the gap that allowed this failure to reach the release workflow undetected. -
Pitfall P9 documented — full analysis in
LESSONS.mdand the Pitfall Catalog: root cause,CreateAPIv1()solution, ABI compatibility details, risks of the internal C++ API, and a mitigation table.
Fixed
InMemoryDb::open()no longer panics undercargo test --features bundled-test. This was broken from the initial 0.5.1 release.
Changed
bundled-testfeature documentation updated to describe dispatch table initialisation accurately.
0.5.1 — 2026-03-12
Added
-
Testing primitives (
quack_rs::testing) —MockVectorWriter,MockVectorReader,MockDuckValue,MockRegistrar,CastRecord. -
bundled-testCargo feature — enablesInMemoryDbfor SQL-level assertions incargo test. (Note:InMemoryDb::open()was broken in this release and fixed in 0.6.0.) -
InMemoryDb— wrapsduckdb::Connectionfor SQL-level integration tests; available behind thebundled-testfeature. -
Builder introspection accessors —
name()on all function builders;source()/target()onCastFunctionBuilder.
Security
- Bump
quinn-proto0.11.13 → 0.11.14 (addresses RUSTSEC advisory).
0.5.0 — 2026-03-10
Added
-
param_logical(LogicalType)on all builders — register parameters with complex parameterized types (LIST(BIGINT),MAP(VARCHAR, INTEGER),STRUCT(...)) thatTypeIdalone cannot express. Available onAggregateFunctionBuilder,AggregateFunctionSetBuilder::OverloadBuilder,ScalarFunctionBuilder, andScalarOverloadBuilder. Parameters added viaparam()andparam_logical()are interleaved by position, so the order you call them is the order DuckDB sees them. -
returns_logical(LogicalType)on all builders — set a complex parameterized return type. When bothreturns(TypeId)andreturns_logical(LogicalType)are called, the logical type takes precedence. Available onAggregateFunctionBuilder,AggregateFunctionSetBuilder,ScalarFunctionBuilder, andScalarOverloadBuilder. This eliminates the need for raw FFI when returningLIST(BOOLEAN),LIST(TIMESTAMP),MAP(K, V), or any other parameterized type. -
null_handling(NullHandling)on set overload builders — per-overload NULL handling configuration forAggregateFunctionSetBuilder::OverloadBuilderandScalarOverloadBuilder. Previously only available on single-function builders.
Notes
- Upstream fix:
duckdb-loadable-macrospanic-at-FFI-boundary — the safe entry-point pattern developed inquack-rs(using?/ok_or_elsethroughout instead of.unwrap()) was contributed upstream as duckdb/duckdb-rs#696 and merged 2026-03-09. All users of theduckdb_entrypoint_c_api!macro fromduckdb-loadable-macroswill receive this fix in the nextduckdb-rsrelease.quack-rsusers have always been protected via the safeentry_point!/entry_point_v2!macros provided by this crate.
0.4.0 — 2026-03-09
Added
-
ConnectionandRegistrartrait — version-agnostic extension registration facade.Connectionwraps theduckdb_connectionandduckdb_databasehandles provided at initialization time. TheRegistrartrait provides uniform methods for registering all extension components (scalar, scalar set, aggregate, aggregate set, table, SQL macro, cast), making registration code interchangeable across DuckDB 1.4.x and 1.5.x. -
init_extension_v2— new entry point helper that passes&Connectionto the registration callback instead of a rawduckdb_connection. Prefer this overinit_extensionfor new extensions. -
entry_point_v2!macro — companion macro toentry_point!that generates the#[no_mangle] unsafe extern "C"entry point usinginit_extension_v2. -
duckdb-1-5cargo feature — placeholder feature flag for DuckDB 1.5.0-specific C API wrappers. Currently empty; will be populated whenlibduckdb-sys1.5.0 is published on crates.io.
Changed
- DuckDB version support broadened to 1.4.x and 1.5.x — the
libduckdb-sysdependency requirement was relaxed from an exact pin (=1.4.4) to a range (>=1.4.4, <2). DuckDB v1.5.0 does not change the C API version string (v1.2.0); the existingDUCKDB_API_VERSIONconstant remains correct for both releases. Extension authors can pin their ownlibduckdb-systo either=1.4.4or=1.5.0and resolve cleanly againstquack-rs. The scaffold template and CI workflow template were updated to default to DuckDB v1.5.0.
0.3.0 — 2026-03-08
Added
-
TableFunctionBuilder— type-safe builder for registering DuckDB table functions (SELECT * FROM my_function(args)). Covers the full bind/init/scan lifecycle with ergonomic callbacks;BindInfo,FfiBindData<T>, andFfiInitData<T>eliminate all raw pointer manipulation. Verified end-to-end against DuckDB 1.4.4. See Table Functions. -
ReplacementScanBuilder— builder for registering DuckDB replacement scans (SELECT * FROM 'file.xyz'patterns). 4-method chain handles callback registration, path extraction, and bind-info population. See Replacement Scans. -
StructVector,ListVector,MapVector— safe wrappers for reading and writing nested-type vectors. Eliminate manual offset arithmetic and raw pointer casts over child vector handles. Re-exported fromquack_rs::vector::complex. See Complex Types. -
CastFunctionBuilder— type-safe builder for registering custom type cast functions. Covers explicitCAST(x AS T)and implicit coercions (optionalimplicit_cost).CastFunctionInfoexposescast_mode(),set_error(), andset_row_error()inside callbacks for correctTRY_CAST/CASTerror handling. See Cast Functions. -
DbConfig— RAII wrapper forduckdb_config. Builder-style.set(name, value)?chain with automaticduckdb_destroy_configon drop andflag_count()/get_flag(index)for enumerating all available options. Seequack_rs::config. -
ScalarFunctionSetBuilder— builder for registering scalar function overload sets, mirroringAggregateFunctionSetBuilder. -
NullHandlingenum and.null_handling()builder method — configurable NULL propagation for scalar and aggregate functions. -
TypeIdvariants —Decimal,Struct,Map,UHugeInt,TimeTz,TimestampS,TimestampMs,TimestampNs,Array,Enum,Union,Bit. -
From<TypeId> for LogicalType— idiomatic conversion fromTypeId. -
#[must_use]on builder structs — compile-time warning if a builder is constructed but never consumed. -
VectorWriter::write_interval— writes INTERVAL values to output vectors. -
append_metadatabinary — native Rust replacement for the Python metadata script. Install withcargo install quack-rs --bin append_metadata. -
hello-extcast demo — the example extension now registersCAST(VARCHAR AS INTEGER)andTRY_CAST(VARCHAR AS INTEGER)usingCastFunctionBuilder, demonstrating both error modes with five unit tests. -
preludeadditions —TableFunctionBuilder,BindInfo,FfiBindData,FfiInitData,ReplacementScanBuilder,StructVector,ListVector,MapVector,CastFunctionBuilder,CastFunctionInfo,CastModeadded toquack_rs::prelude.
Not implemented (upstream C API gap)
- Window functions and COPY format handlers are absent from DuckDB's public C extension API and cannot be wrapped. See Known Limitations.
Fixed
hello-extgs_bindcallback — replaced incorrectduckdb_value_int64(param)withduckdb_get_int64(param). All 11 live SQL tests now pass against DuckDB 1.4.4.
Changed
- Bump
criteriondev-dependency from0.5to0.8. - Bump
Swatinem/rust-cacheGitHub Action fromv2.7.5tov2.8.2. - Bump
dtolnay/rust-toolchainCI pin fromv2.7.5to latest SHA. - Bump
actions/attest-build-provenancefromv2tov4. - Bump
actions/configure-pagesto latest SHA (d5606572…). - Bump
actions/upload-pages-artifactfromv3.0.1tov4.0.0.
0.2.0 — 2026-03-07
Added
-
validate::description_ymlmodule — parse and validate a completedescription.ymlmetadata file end-to-end. Includes:DescriptionYmlstruct — structured representation of all required and optional fieldsparse_description_yml(content: &str)— parse and validate in one stepvalidate_description_yml_str(content: &str)— pass/fail validationvalidate_rust_extension(desc: &DescriptionYml)— enforce Rust-specific fields (language: Rust,build: cargo,requires_toolchainsincludesrust)- 25+ unit tests covering all required fields, optional fields, error paths, and edge cases
-
preludemodule — ergonomic glob-import for the most commonly used items.use quack_rs::prelude::*;brings in all builder types, state traits, vector helpers, types, error handling, and the API version constant. Reduces boilerplate for extension authors. -
Scaffold:
extension_config.cmakegeneration — the scaffold generator now producesextension_config.cmake, which is referenced by theEXT_CONFIGvariable in the Makefile and required byextension-ci-toolsfor CI integration. -
Scaffold: SQLLogicTest skeleton —
generate_scaffoldnow producestest/sql/{name}.test, a ready-to-fill SQLLogicTest file withrequiredirective, format comments, and example query/result blocks. E2E tests are required for community extension submission (Pitfall P3). -
Scaffold: GitHub Actions CI workflow —
generate_scaffoldnow produces.github/workflows/extension-ci.yml, a complete cross-platform CI workflow that builds and tests the extension on Linux, macOS, and Windows against a real DuckDB binary. -
validate::validate_excluded_platforms_str— validates theexcluded_platformsfield fromdescription.ymlas a semicolon-delimited string (e.g.,"wasm_mvp;wasm_eh;wasm_threads"). Splits on;and validates each token. An empty string is valid (no exclusions). -
validate::validate_excluded_platforms— re-exported at thevalidatemodule level (previously only accessible asvalidate::platform::validate_excluded_platforms). -
validate::semver::classify_extension_version— returnsExtensionStability(Unstable/PreRelease/Stable) classifying the tier a version falls into. -
validate::semver::ExtensionStability— enum for DuckDB extension version stability tiers (Unstable,PreRelease,Stable) withDisplayimplementation. -
scalarmodule —ScalarFunctionBuilderfor registering scalar functions with the DuckDB C Extension API. Includestry_newwith name validation,param,returns,functionsetters, andregister. Full unit tests included. -
entry_point!macro — generates the required#[no_mangle] extern "C"entry point with zero boilerplate from an identifier and registration closure. -
VectorWriter::write_varchar— writes VARCHAR string values to output vectors usingduckdb_vector_assign_string_element_len(handles both inline and pointer formats). -
VectorWriter::write_bool— writes BOOLEAN values as a single byte. -
VectorWriter::write_u16— writes USMALLINT values. -
VectorWriter::write_i16— writes SMALLINT values. -
VectorReader::read_interval— reads INTERVAL values from input vectors via the correct 16-byte layout helper. -
CI: Windows testing — the CI matrix now includes
windows-latestin thetestjob, covering all three major platforms (Linux, macOS, Windows). -
CI:
example-checkjob — CI now checks, lints, and testsexamples/hello-extas part of every PR, ensuring the example extension always compiles and its tests pass. -
validate::validate_release_profile— checks Cargo release profile settings for loadable-extension correctness. Validatespanic,lto,opt-level, andcodegen-units.
Fixed
- MSRV documentation now consistently states 1.84.1 across
README.md,CONTRIBUTING.md, andCargo.toml(previouslyREADME.mdstated 1.80).
0.1.0 — 2025-05-01
Added
- Initial release
entry_pointmodule:init_extensionhelper for correct extension initializationaggregatemodule:AggregateFunctionBuilder,AggregateFunctionSetBuilderaggregate::statemodule:AggregateStatetrait,FfiState<T>wrapperaggregate::callbacksmodule: type aliases for all 6 aggregate callback signaturesvectormodule:VectorReader,VectorWriter,ValidityBitmap,DuckStringViewtypesmodule:TypeIdenum (33 variants),LogicalTypeRAII wrapperintervalmodule:DuckInterval,interval_to_micros,read_interval_aterrormodule:ExtensionError,ExtResult<T>testingmodule:AggregateTestHarness<S>for pure-Rust aggregate testingscaffoldmodule:generate_scaffoldfor generating complete extension projectssql_macromodule:SqlMacrofor registering SQL macros without FFI callbacks- Complete
hello-extexample extension - Documentation of all 15 DuckDB Rust FFI pitfalls (
LESSONS.md) - CI pipeline: check, test, clippy, fmt, doc, msrv, bench-compile
SECURITY.mdvulnerability disclosure policy
FAQ
Frequently asked questions about quack-rs and building DuckDB extensions in Rust.
General
What is quack-rs?
quack-rs is a Rust SDK for building DuckDB loadable extensions using DuckDB's
pure C Extension API. It provides safe, ergonomic builders for registering
scalar functions, aggregate functions, table functions, cast functions,
replacement scans, SQL macros, and copy functions (via the duckdb-1-5
feature), along with helpers for reading and writing DuckDB vectors, and
utilities for publishing community extensions.
Why does this exist?
Building a DuckDB extension in Rust requires solving a set of undocumented FFI problems that every developer discovers independently. quack-rs encodes solutions to all 16 known pitfalls so you don't have to rediscover them. See the Pitfall Catalog.
What DuckDB version does quack-rs target?
quack-rs requires libduckdb-sys = ">=1.4.4, <2" (DuckDB 1.4.x and 1.5.x).
The C API version string passed to the dispatch-table initializer is "v1.2.0",
available as quack_rs::DUCKDB_API_VERSION. Both DuckDB 1.4.x and 1.5.x use
the same C API version. These are two distinct version identifiers — the crate
version and the C API protocol version.
What is the minimum supported Rust version (MSRV)?
Rust 1.84.1 or later. This is enforced in Cargo.toml with
rust-version = "1.84.1".
Is quack-rs production-ready?
Yes. It was extracted from duckdb-behavioral, a production DuckDB community extension. All 16 pitfalls it solves were discovered in production.
Functions
Can I expose SQL macros as an extension?
Yes, without any C++ wrapper code. Use quack_rs::sql_macro::SqlMacro:
#![allow(unused)] fn main() { use quack_rs::sql_macro::SqlMacro; // Scalar macro let m = SqlMacro::scalar("double_it", &["x"], "x * 2")?; unsafe { m.register(con) }?; // Table macro let m = SqlMacro::table("recent_events", &["n"], "SELECT * FROM events ORDER BY ts DESC LIMIT n")?; unsafe { m.register(con) }?; }
Register them inside your init_extension closure alongside aggregate and
scalar functions. See SQL Macros.
Can I register multiple overloads of the same function?
Yes, using AggregateFunctionSetBuilder (for aggregates) or
ScalarFunctionSetBuilder (for scalars). Both support complex parameter types
via param_logical(LogicalType) and complex return types via
returns_logical(LogicalType). See
Overloading with Function Sets.
Can I register multiple functions in one extension?
Yes. The init_extension closure receives a duckdb_connection and can call
as many register_* functions as needed:
#![allow(unused)] fn main() { quack_rs::entry_point::init_extension(info, access, DUCKDB_API_VERSION, |con| { unsafe { register_word_count(con) }?; unsafe { register_sentence_count(con) }?; unsafe { SqlMacro::scalar("double_it", &["x"], "x * 2")? .register(con)?; } Ok(()) }) }
Can I use the duckdb crate instead of libduckdb-sys?
No. The duckdb crate's bundled feature embeds its own copy of DuckDB. A
loadable extension must link against the DuckDB that loads it, not bundle a
separate copy. Use libduckdb-sys with the loadable-extension feature.
Can I have a scalar function with no parameters?
Yes. Pass an empty slice to param:
#![allow(unused)] fn main() { ScalarFunctionBuilder::new("current_quack") .returns(TypeId::Varchar) .function(quack_callback) .register(con)?; }
Testing
Do I need a DuckDB instance to run unit tests?
No. AggregateTestHarness simulates the aggregate lifecycle in pure Rust
without any DuckDB dependency. You can run cargo test without loading a DuckDB
binary.
My unit tests all pass but the extension crashes. Why?
Unit tests cannot detect FFI wiring bugs. See Pitfall P3 and the Testing Guide. Always run E2E tests by loading the extension into an actual DuckDB process.
How do I test SQL macros?
SqlMacro::to_sql() is pure Rust and requires no DuckDB connection:
#![allow(unused)] fn main() { let m = SqlMacro::scalar("triple", &["x"], "x * 3").unwrap(); assert_eq!(m.to_sql(), "CREATE OR REPLACE MACRO triple(x) AS (x * 3)"); }
For E2E testing, include the macro in your SQLLogicTest file:
query I
SELECT double_it(21);
----
42
Publishing
How do I publish to the DuckDB community extensions registry?
- Scaffold your project with
generate_scaffold - Push to GitHub
- Submit a pull request to the
community-extensions repo
with your
description.yml
See Community Extensions for the full workflow.
My extension name is taken. What should I do?
Use a vendor-prefixed name: myorg_analytics instead of analytics. Extension
names must be globally unique across the entire DuckDB ecosystem. Check
community-extensions.duckdb.org
first.
Do I need to set up CI manually?
No. generate_scaffold produces .github/workflows/extension-ci.yml which
builds and tests your extension on Linux, macOS, and Windows automatically.
Can my extension be installed with INSTALL ... FROM community?
Yes, once your pull request is merged into the community-extensions repository.
Until then, users load the .duckdb_extension binary directly:
LOAD './path/to/libmy_extension.duckdb_extension';
Troubleshooting
My aggregate returns wrong results with no error.
The most common cause is Pitfall L1: your combine callback is not propagating
all configuration fields. See
Pitfall L1
and test with AggregateTestHarness::combine.
I'm getting a SEGFAULT when writing NULL.
You are likely calling duckdb_vector_get_validity without first calling
duckdb_vector_ensure_validity_writable. Use VectorWriter::set_null instead.
See Pitfall L4.
My function is not found in SQL after LOAD.
Most likely cause: the function was not registered (Pitfall L6 — function set
name not set on each member), or the entry point symbol name does not match
the extension name. The symbol must be {extension_name}_init_c_api (all
lowercase, underscores).
make configure fails with a missing file error.
The extension-ci-tools submodule is not initialized:
git submodule update --init --recursive
My SQLLogicTest fails in CI but passes locally.
SQLLogicTest does exact string matching. The most common issue is a difference in NULL representation, decimal places, or line endings. Run the query in the same DuckDB version used by CI and copy the output verbatim.
How do I read a VARCHAR that is longer than 12 bytes?
VectorReader::read_str handles both the inline (≤ 12 bytes) and pointer
(> 12 bytes) formats automatically. No special handling needed.
What happens if I read from a NULL row?
You get garbage data from the vector's data buffer. Always check is_valid
before reading. See NULL Handling & Strings.
Architecture
Why use libduckdb-sys with loadable-extension instead of the duckdb crate?
The duckdb crate is designed for embedding DuckDB, not for extending it. Its
bundled feature includes a statically linked DuckDB binary, which conflicts
with the DuckDB runtime that loads your extension. libduckdb-sys with
loadable-extension provides lazy-initialized function pointers that are
populated by DuckDB at extension load time.
Why not use duckdb-loadable-macros?
duckdb-loadable-macros relies on extract_raw_connection which uses the
internal Rc<RefCell<InnerConnection>> layout. This is fragile and causes
SEGFAULTs when the layout changes between duckdb crate versions.
init_extension uses the correct C API entry sequence directly.
Why is panic = "abort" required?
Panics cannot unwind across FFI boundaries in Rust. A panic in an
unsafe extern "C" callback is undefined behavior. panic = "abort" converts
panics to process termination, which is still bad but not undefined behavior.
Always use Result and ? in your callbacks instead.
Can I use async Rust in my extension?
Not directly in FFI callbacks. DuckDB's callbacks are synchronous C functions.
You can run a Tokio or async-std runtime and block on async tasks inside
callbacks (using Runtime::block_on), but the callbacks themselves must return
synchronously.
How does FfiState<T> prevent double-free?
FfiState<T> stores the Box<T> as a raw pointer in inner. When
destroy_callback is called, it reconstitutes the Box (which drops T and
frees memory) and then sets inner to null. A second call to destroy_callback
on the same state sees a null inner and returns without freeing.
Contributing
quack-rs is an open source project. Contributions of all kinds are welcome: bug reports, documentation improvements, new pitfall discoveries, and code.
Development prerequisites
| Tool | Version | Purpose |
|---|---|---|
| Rust | ≥ 1.84.1 (MSRV) | Compiler |
rustfmt | stable | Formatting |
clippy | stable | Linting |
cargo-msrv | latest | MSRV verification |
Install the Rust toolchain via rustup.rs.
Building
# Build the library
cargo build
# Build in release mode (enables LTO + strip)
cargo build --release
# Build the hello-ext example extension
cargo build --release --manifest-path examples/hello-ext/Cargo.toml
Quality gates
All of the following must pass before merging any pull request:
# Tests — zero failures, zero ignored
cargo test
# Integration tests
cargo test --test integration_test
# Linting — zero warnings (warnings are errors)
cargo clippy --all-targets -- -D warnings
# Formatting
cargo fmt -- --check
# Documentation — zero broken links or missing docs
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps
# MSRV — must compile on Rust 1.84.1 (excludes benches; matches CI)
cargo +1.84.1 check
These same checks run in CI on every push and pull request.
Test strategy
Unit tests
Unit tests live in #[cfg(test)] modules within each source file. They test
pure-Rust logic that does not require a live DuckDB instance.
Important constraint: libduckdb-sys with features = ["loadable-extension"]
makes all DuckDB C API functions go through lazy AtomicPtr dispatch. These
pointers are only populated when duckdb_rs_extension_api_init is called from
within a real DuckDB extension load. Calling any duckdb_* function in a unit
test will panic. Move such tests to integration tests or example-extension tests.
Integration tests
tests/integration_test.rs contains pure-Rust tests that cross module
boundaries — testing interval with AggregateTestHarness, verifying FfiState
lifecycle, and so on. These still cannot call duckdb_* functions.
Property-based tests
Selected modules include proptest-based tests:
interval.rs— overflow edge cases across the fulli32/i64rangetesting/harness.rs— sum associativity, identity element forAggregateState
Example-extension tests
examples/hello-ext/ contains #[cfg(test)] unit tests for the pure logic
(count_words). Full E2E testing (loading the .so into DuckDB) is left to
consumers.
Code standards
Safety documentation
Every unsafe block must have a // SAFETY: comment explaining:
- Which invariant the caller guarantees
- Why the operation is valid given that invariant
#![allow(unused)] fn main() { // SAFETY: `states` is a valid array of `count` pointers, each initialized // by `init_callback`. We are the only owner of `inner` at this point. unsafe { drop(Box::from_raw(ffi.inner)) }; }
No panics across FFI
unwrap(), expect(), and panic!() are forbidden in any function that may
be called by DuckDB (callbacks and entry points). Use Option/Result and ?
throughout.
Clippy lint policy
The crate enables pedantic, nursery, and cargo lint groups. All warnings
are treated as errors in CI. Lints are suppressed only where they produce
false positives for SDK API patterns:
[lints.clippy]
module_name_repetitions = "allow" # e.g., AggregateFunctionBuilder
must_use_candidate = "allow" # builder methods
missing_errors_doc = "allow" # unsafe extern "C" callbacks
return_self_not_must_use = "allow" # builder pattern
Documentation
Every public item must have a doc comment. Follow these conventions:
- First line: short summary (noun phrase, no trailing period)
# Safety: mandatory on everyunsafe fn# Panics: mandatory if the function can panic# Errors: mandatory on functions returningResult# Example: encouraged on public types and key methods
Repository structure
quack-rs/
├── src/
│ ├── lib.rs # Crate root; module declarations; DUCKDB_API_VERSION
│ ├── entry_point.rs # init_extension() / init_extension_v2() + entry_point! / entry_point_v2!
│ ├── connection.rs # Connection facade + Registrar trait (version-agnostic registration)
│ ├── config.rs # DbConfig — RAII wrapper for duckdb_config
│ ├── error.rs # ExtensionError, ExtResult<T>
│ ├── interval.rs # DuckInterval, interval_to_micros
│ ├── sql_macro.rs # SqlMacro — CREATE MACRO without FFI callbacks
│ ├── aggregate/
│ │ ├── mod.rs
│ │ ├── builder/ # Builder types for aggregate function registration
│ │ │ ├── mod.rs # Module doc + re-exports
│ │ │ ├── single.rs # AggregateFunctionBuilder (single-signature)
│ │ │ ├── set.rs # AggregateFunctionSetBuilder, OverloadBuilder
│ │ │ └── tests.rs # Unit tests
│ │ ├── info.rs # AggregateFunctionInfo
│ │ ├── callbacks.rs # Callback type aliases
│ │ └── state.rs # AggregateState trait, FfiState<T>
│ ├── scalar/
│ │ ├── mod.rs
│ │ ├── info.rs # ScalarFunctionInfo, ScalarBindInfo, ScalarInitInfo
│ │ └── builder/ # Builder types for scalar function registration
│ │ ├── mod.rs # Module doc + re-exports
│ │ ├── single.rs # ScalarFn type alias, ScalarFunctionBuilder
│ │ ├── set.rs # ScalarFunctionSetBuilder, ScalarOverloadBuilder
│ │ └── tests.rs # Unit tests
│ ├── catalog.rs # Catalog access helpers (requires `duckdb-1-5`)
│ ├── cast/
│ │ ├── mod.rs # Re-exports
│ │ └── builder.rs # CastFunctionBuilder, CastFunctionInfo, CastMode
│ ├── client_context.rs # ClientContext wrapper (requires `duckdb-1-5`)
│ ├── config_option.rs # ConfigOption registration (requires `duckdb-1-5`)
│ ├── copy_function/
│ │ ├── mod.rs # CopyFunctionBuilder (requires `duckdb-1-5`)
│ │ └── info.rs # CopyBindInfo, CopySinkInfo, etc.
│ ├── replacement_scan/
│ │ └── mod.rs # ReplacementScanBuilder — SELECT * FROM 'file.xyz' patterns
│ ├── types/
│ │ ├── mod.rs
│ │ ├── type_id.rs # TypeId enum (33 base + 6 with duckdb-1-5)
│ │ └── logical_type.rs # LogicalType RAII wrapper
│ ├── vector/
│ │ ├── mod.rs
│ │ ├── reader.rs # VectorReader
│ │ ├── writer.rs # VectorWriter
│ │ ├── validity.rs # ValidityBitmap
│ │ ├── string.rs # DuckStringView, read_duck_string
│ │ └── complex.rs # StructVector, ListVector, MapVector, ArrayVector
│ ├── validate/
│ │ ├── mod.rs
│ │ ├── description_yml/ # Parse and validate description.yml metadata
│ │ │ ├── mod.rs # Module doc + re-exports
│ │ │ ├── model.rs # DescriptionYml struct
│ │ │ ├── parser.rs # parse_description_yml and helpers
│ │ │ ├── validator.rs # validate_description_yml_str, validate_rust_extension
│ │ │ └── tests.rs # Unit tests
│ │ ├── extension_name.rs
│ │ ├── function_name.rs
│ │ ├── platform.rs
│ │ ├── release_profile.rs
│ │ ├── semver.rs
│ │ └── spdx.rs
│ ├── scaffold/
│ │ ├── mod.rs # ScaffoldConfig, GeneratedFile, generate_scaffold
│ │ ├── templates.rs # Template generators for scaffold files (pub(super))
│ │ └── tests.rs # Unit tests
│ ├── table_description.rs # TableDescription wrapper (requires `duckdb-1-5`)
│ ├── table/
│ │ ├── mod.rs
│ │ ├── builder.rs # TableFunctionBuilder, BindFn/InitFn/ScanFn aliases
│ │ ├── info.rs # BindInfo, InitInfo, FunctionInfo
│ │ ├── bind_data.rs # FfiBindData<T>
│ │ └── init_data.rs # FfiInitData<T>, FfiLocalInitData<T>
│ └── testing/
│ ├── mod.rs
│ ├── harness.rs # AggregateTestHarness<S>
│ ├── mock_vector.rs # MockVectorReader, MockVectorWriter, MockDuckValue
│ ├── mock_registrar.rs # MockRegistrar, CastRecord
│ └── in_memory_db.rs # InMemoryDb (requires `bundled-test`)
├── tests/
│ └── integration_test.rs
├── benches/
│ └── interval_bench.rs # Criterion benchmarks
├── examples/
│ └── hello-ext/ # Reference example: word_count (aggregate) + first_word (scalar)
├── book/ # mdBook documentation source
│ ├── src/ # Markdown pages (this site)
│ └── theme/custom.css
├── .github/workflows/ci.yml # CI pipeline
├── .github/workflows/docs.yml # GitHub Pages deployment
├── CONTRIBUTING.md
├── LESSONS.md # The 16 DuckDB Rust FFI pitfalls
├── CHANGELOG.md
└── README.md
Releasing
quack-rs uses libduckdb-sys = ">=1.4.4, <2" — a bounded range covering DuckDB 1.4.x
and 1.5.x, whose C API (v1.2.0) is stable across both releases. The <2 upper bound
prevents silent adoption of a future major release that may change the C API.
Before broadening the range to a new major band:
- Read the DuckDB changelog for C API changes
- Check the new C API version string (used in
duckdb_rs_extension_api_init) - Update
DUCKDB_API_VERSIONinsrc/lib.rsif the C API version changed - Audit all callback signatures against the new
bindgen.rsoutput - Update the range bounds in
Cargo.toml(runtime and dev-deps)
Versions follow Semantic Versioning. Breaking changes to the public API require a major version bump.
Reporting issues
Use GitHub Issues. For security
vulnerabilities, see SECURITY.md
for responsible disclosure policy.
License
quack-rs is licensed under the MIT License. Contributions are accepted under the same license. By submitting a pull request, you agree to license your contribution under MIT.