The Rust SDK for building DuckDB loadable extensions — no C++ required.

CI Crates.io Documentation License: MIT MSRV: 1.84.1


What is quack-rs?

quack-rs is a production-grade Rust SDK that makes building DuckDB loadable extensions straightforward and safe. It wraps the DuckDB C Extension API — the same API used by official DuckDB extensions — and eliminates every known FFI pitfall so you can focus on writing extension logic in pure Rust.

DuckDB's own documentation acknowledges the gap:

"Writing a Rust-based DuckDB extension requires writing glue code in C++ and will force you to build through DuckDB's CMake & C++ based extension template. We understand that this is not ideal and acknowledge the fact that Rust developers prefer to work on pure Rust codebases."

DuckDB Community Extensions FAQ

quack-rs closes that gap. No C++. No CMake. No glue code.


What you can build

Extension typequack-rs support
Scalar functionsScalarFunctionBuilder
Overloaded scalarsScalarFunctionSetBuilder
Aggregate functionsAggregateFunctionBuilder
Overloaded aggregatesAggregateFunctionSetBuilder
Table functionsTableFunctionBuilder
Cast / TRY_CAST functionsCastFunctionBuilder
Replacement scansReplacementScanBuilder
SQL macros (scalar)SqlMacro::scalar
SQL macros (table)SqlMacro::table
Copy functions (COPY TO)CopyFunctionBuilder (requires duckdb-1-5)

Note: Window functions have no counterpart in DuckDB's public C Extension API and cannot be implemented from Rust (or any language) via that API. See Known Limitations.


Why does this exist?

quack-rs was extracted from duckdb-behavioral, a production DuckDB community extension. Building that extension revealed 16 undocumented pitfalls in DuckDB's Rust FFI surface — struct layouts, callback contracts, and initialization sequences that aren't covered anywhere in the DuckDB documentation or libduckdb-sys docs.

Three of those pitfalls caused extension-breaking bugs that passed 435 unit tests before being caught by end-to-end tests:

  1. A SEGFAULT on load (wrong entry point sequence)
  2. 6 of 7 functions silently not registered (undocumented function-set naming rule)
  3. Wrong aggregate results under parallel plans (combine callback not propagating configuration fields to fresh target states)

quack-rs makes each of these impossible through type-safe builders and safe wrappers. The full catalog is documented in the Pitfall Reference.


Key features

  • Zero C++ — no CMakeLists.txt, no header files, no glue code
  • All C API function types — scalar, aggregate, table, cast, replacement scan, SQL macro, copy function (duckdb-1-5)
  • Panic-free FFIinit_extension never panics; errors surface via Result
  • RAII memory managementLogicalType and FfiState<T> prevent leaks and double-frees
  • Type-safe buildersScalarFunctionBuilder, AggregateFunctionBuilder, TableFunctionBuilder, CastFunctionBuilder, ReplacementScanBuilder
  • SQL macros — register CREATE MACRO statements without any FFI callbacks
  • Testable stateAggregateTestHarness<T> tests aggregate logic without a live DuckDB
  • Scaffold generator — produces a submission-ready community extension project from code
  • 16 pitfalls documented — every known DuckDB Rust FFI pitfall, with symptoms and fixes

New to DuckDB extensions? → Start with Quick Start

Adding quack-rs to an existing project? → See Installation

Writing your first function? → See Scalar Functions or Aggregate Functions

Want SQL macros without FFI callbacks? → See SQL Macros

Submitting a community extension? → See Community Extensions

Something broke? → See Pitfall Catalog

Quick Start

This page gets you from zero to a working DuckDB extension in three steps.


Prerequisites

  • Rust ≥ 1.84.1 (MSRV) — install via rustup
  • DuckDB CLI (for testing the built extension) — download

Step 1 — Add quack-rs to your extension

In your extension's Cargo.toml:

[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }

[lib]
name = "my_extension"       # must match your extension name — see Pitfall P1
crate-type = ["cdylib", "rlib"]

[profile.release]
panic = "abort"             # required — panics across FFI are undefined behavior
lto = true
opt-level = 3
codegen-units = 1
strip = true

Start fresh? Use the scaffold generator to generate a complete, submission-ready project from code.


Step 2 — Write the extension

#![allow(unused)]
fn main() {
// src/lib.rs
use quack_rs::entry_point;
use quack_rs::error::ExtensionError;
use quack_rs::scalar::ScalarFunctionBuilder;
use quack_rs::types::TypeId;
use quack_rs::vector::{VectorReader, VectorWriter};
use libduckdb_sys::{duckdb_connection, duckdb_function_info, duckdb_data_chunk, duckdb_vector};

/// Scalar function: double_it(BIGINT) → BIGINT
unsafe extern "C" fn double_it(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    // SAFETY: input is a valid data chunk provided by DuckDB.
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if unsafe { !reader.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let value = unsafe { reader.read_i64(row) };
        unsafe { writer.write_i64(row, value * 2) };
    }
}

fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        ScalarFunctionBuilder::new("double_it")
            .param(TypeId::BigInt)
            .returns(TypeId::BigInt)
            .function(double_it)
            .register(con)?;
    }
    Ok(())
}

entry_point!(my_extension_init_c_api, |con| register(con));
}

Step 3 — Build and test

# Build the extension
cargo build --release

# Load in DuckDB CLI
duckdb -cmd "LOAD './target/release/libmy_extension.so'; SELECT double_it(21);"
# ┌───────────────┐
# │ double_it(21) │
# │     int64     │
# ├───────────────┤
# │            42 │
# └───────────────┘

macOS: use .dylib extension. Windows: use .dll.


What's next?

Installation

Adding quack-rs to an existing extension

Add the following to your extension's Cargo.toml:

[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }

Why >=1.4.4, <2? DuckDB 1.4.x and 1.5.x expose the same C API version (v1.2.0), so quack-rs supports both with a single bounded range. The <2 upper bound prevents silent adoption of a future major release whose C API may change in breaking ways — making any such upgrade an explicit, auditable decision. See Extension Anatomy.


Required Cargo.toml settings

Every DuckDB extension requires specific Cargo settings to link and behave correctly:

[lib]
name = "my_extension"       # ← must match extension name exactly (Pitfall P1)
crate-type = ["cdylib", "rlib"]
#             ^^^^^^  cdylib produces the .so/.dylib/.dll DuckDB loads
#                      rlib  allows unit tests and documentation to work

[profile.release]
panic = "abort"             # REQUIRED — panics across FFI are undefined behavior
lto = true                  # recommended — reduces binary size, improves performance
opt-level = 3               # recommended
codegen-units = 1           # recommended — enables full LTO
strip = true                # recommended — reduces binary size

Why panic = "abort"?

Rust's default panic behavior unwinds the stack. When a panic crosses an FFI boundary into DuckDB's C++ code, the result is undefined behavior — DuckDB may crash, corrupt memory, or silently produce wrong results. The panic = "abort" setting converts panics into immediate process termination, which is far safer.

quack-rs itself never panics in FFI callbacks, but this setting protects you if a dependency or your own code panics.


Minimum Supported Rust Version

quack-rs requires Rust ≥ 1.84.1.

This MSRV is required for:

  • &raw mut expr syntax for creating raw pointers without references (sound and stable since 1.84.0)
  • const extern fn support

Install or update via:

rustup update stable
rustup default stable

Verify:

rustc --version   # must be ≥ 1.84.1

Development dependencies

For testing with a live DuckDB instance (example-extension tests only):

[dev-dependencies]
duckdb = { version = ">=1.4.4, <2", features = ["bundled"] }

Important: you cannot call any duckdb_* function in a cargo test process when using the loadable-extension feature. See Testing Guide for the full explanation.


Starting a new extension from scratch

Use the scaffold generator to produce a complete project with all required files pre-configured. This is the fastest and most reliable way to start a new extension.

Your First Extension

This page walks through hello-ext, the complete reference example bundled with quack-rs. It registers four functions that together cover every major pattern:

SQLKindSignature
word_count(text)AggregateVARCHAR → BIGINT
first_word(text)ScalarVARCHAR → VARCHAR
generate_series_ext(n)TableBIGINT → TABLE(value BIGINT)
CAST(VARCHAR AS INTEGER)CastVARCHAR → INTEGER

Full source: examples/hello-ext/src/lib.rs


Build and try it

cargo build --release --manifest-path examples/hello-ext/Cargo.toml

Then in the DuckDB CLI:

LOAD './examples/hello-ext/target/release/libhello_ext.so';

-- Aggregate: total words across all rows
SELECT word_count(sentence) FROM (
    VALUES ('hello world'), ('one two three'), (NULL)
) t(sentence);
-- → 5  (2 + 3; NULL contributes 0)

-- Scalar: first word of each row
SELECT first_word(sentence) FROM (
    VALUES ('hello world'), ('  padded  '), (''), (NULL)
) t(sentence);
-- → 'hello', 'padded', '', NULL

Overview

An extension has four parts:

  1. State struct — holds data accumulated during aggregation (aggregate only)
  2. Callbacksupdate, combine, finalize, state_size, state_init, state_destroy (aggregate) or a single function callback (scalar)
  3. Registration — wire callbacks to DuckDB via AggregateFunctionBuilder / ScalarFunctionBuilder
  4. Entry point — DuckDB's initialization hook, generated by entry_point!

Part 1 — Aggregate function: word_count

An aggregate function accumulates state across many rows and emits one result per group.

1a. The state struct

#![allow(unused)]
fn main() {
#[derive(Default, Debug)]
struct WordCountState {
    count: i64,
}

impl AggregateState for WordCountState {}
}

AggregateState is a marker trait — no methods required. FfiState<WordCountState> wraps it in a heap-allocated Box<T> behind a raw pointer and manages the full lifecycle (init, combine, destroy).

1b. state_size and state_init

These two callbacks are always identical boilerplate — delegate to FfiState:

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_state_size(_info: duckdb_function_info) -> idx_t {
    FfiState::<WordCountState>::size_callback(_info)
}

unsafe extern "C" fn wc_state_init(info: duckdb_function_info, state: duckdb_aggregate_state) {
    unsafe { FfiState::<WordCountState>::init_callback(info, state) };
}
}

size_callback returns size_of::<*mut WordCountState>() — DuckDB allocates a pointer-slot per group. init_callback runs Box::new(WordCountState::default()) and writes the pointer into that slot.

1c. update — accumulate one batch

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_update(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    states: *mut duckdb_aggregate_state,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if !unsafe { reader.is_valid(row) } {
            continue; // NULL input → skip (contributes 0 words)
        }
        let s = unsafe { reader.read_str(row) };
        let words = count_words(s);

        let state_ptr = unsafe { *states.add(row) };
        if let Some(st) = unsafe { FfiState::<WordCountState>::with_state_mut(state_ptr) } {
            st.count += words;
        }
    }
}
}

Key points:

  • Check is_valid(row) before reading — never dereference an invalid (NULL) row
  • VectorReader::new(chunk, col) gives column col from the chunk
  • count_words is pure Rust — no unsafe, easy to unit-test separately

1d. combine — merge parallel results

Pitfall L1: DuckDB creates fresh zero-initialized target states before calling combine. You must copy all fields — not just the result field. In an aggregate with config fields (e.g., a histogram with a bin_width) you must also copy those, or results will be silently corrupted.

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_combine(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    target: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    for i in 0..count as usize {
        let src_ptr = unsafe { *source.add(i) };
        let tgt_ptr = unsafe { *target.add(i) };
        let src = unsafe { FfiState::<WordCountState>::with_state(src_ptr) };
        let tgt = unsafe { FfiState::<WordCountState>::with_state_mut(tgt_ptr) };
        if let (Some(s), Some(t)) = (src, tgt) {
            t.count += s.count;
            // If you add fields to WordCountState, combine them here too.
        }
    }
}
}

1e. finalize — write output

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_finalize(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    result: duckdb_vector,
    count: idx_t,
    offset: idx_t,
) {
    let mut writer = unsafe { VectorWriter::new(result) };

    for i in 0..count as usize {
        let state_ptr = unsafe { *source.add(i) };
        match unsafe { FfiState::<WordCountState>::with_state(state_ptr) } {
            Some(st) => unsafe { writer.write_i64(offset as usize + i, st.count) },
            None     => unsafe { writer.set_null(offset as usize + i) },
        }
    }
}
}

offset is DuckDB's output row offset — always use offset as usize + i, not just i.

1f. state_destroy

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_state_destroy(
    states: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    unsafe { FfiState::<WordCountState>::destroy_callback(states, count) };
}
}

destroy_callback calls Box::from_raw and nulls each pointer, preventing double-free.


Part 2 — Scalar function: first_word

A scalar function processes one data chunk and returns one output value per row. The callback receives the full chunk and an output vector (not per-row state pointers).

Key rule: always propagate NULL

If the input row is NULL, write NULL to output — never read from an invalid row.

#![allow(unused)]
fn main() {
unsafe extern "C" fn first_word_scalar(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if !unsafe { reader.is_valid(row) } {
            unsafe { writer.set_null(row) }; // NULL in → NULL out
            continue;
        }
        let s = unsafe { reader.read_str(row) };
        unsafe { writer.write_varchar(row, first_word(s)) };
    }
}
}

The pure logic:

#![allow(unused)]
fn main() {
pub fn first_word(s: &str) -> &str {
    s.split_whitespace().next().unwrap_or("")
}
}

Note: set_null internally calls duckdb_vector_ensure_validity_writable before writing the null flag — this is required by DuckDB and handled for you by VectorWriter.


Part 3 — Registration

#![allow(unused)]
fn main() {
unsafe fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        AggregateFunctionBuilder::new("word_count")
            .param(TypeId::Varchar)
            .returns(TypeId::BigInt)
            .state_size(wc_state_size)
            .init(wc_state_init)
            .update(wc_update)
            .combine(wc_combine)
            .finalize(wc_finalize)
            .destructor(wc_state_destroy)
            .register(con)?;

        ScalarFunctionBuilder::new("first_word")
            .param(TypeId::Varchar)
            .returns(TypeId::Varchar)
            .function(first_word_scalar)
            .register(con)?;
    }
    Ok(())
}
}

Both builders call the DuckDB C API internally. register returns Err if DuckDB reports a failure — this propagates to the entry point and is surfaced to the user.


Part 4 — Entry point

#![allow(unused)]
fn main() {
quack_rs::entry_point!(hello_ext_init_c_api, |con| unsafe { register(con) });
}

This one line emits:

#![allow(unused)]
fn main() {
#[no_mangle]
pub unsafe extern "C" fn hello_ext_init_c_api(
    info: duckdb_extension_info,
    access: *const duckdb_extension_access,
) -> bool {
    unsafe {
        quack_rs::entry_point::init_extension(
            info, access, quack_rs::DUCKDB_API_VERSION,
            |con| unsafe { register(con) },
        )
    }
}
}

Pass the full symbol namehello_ext_init_c_api here. DuckDB looks up this exact symbol when loading the extension. See The Entry Point for the full initialization sequence.


Unit tests (no DuckDB process needed)

Test pure logic directly:

#![allow(unused)]
fn main() {
#[test]
fn count_words_whitespace_variants() {
    assert_eq!(count_words("  hello  world  "), 2);
    assert_eq!(count_words("\t\nhello\tworld\n"), 2);
    assert_eq!(count_words("   "), 0); // all whitespace → 0
}

#[test]
fn first_word_empty_and_whitespace() {
    assert_eq!(first_word(""), "");
    assert_eq!(first_word("   "), "");
}
}

Test aggregate state with AggregateTestHarness:

#![allow(unused)]
fn main() {
#[test]
fn word_count_null_rows_are_skipped() {
    // NULL rows: the callback skips them (no update call)
    let mut h = AggregateTestHarness::<WordCountState>::new();
    h.update(|s| s.count += count_words("hello"));
    // NULL row omitted — models callback skip
    h.update(|s| s.count += count_words("world"));
    assert_eq!(h.finalize().count, 2);
}

#[test]
fn word_count_combine() {
    let mut h1 = AggregateTestHarness::<WordCountState>::new();
    h1.update(|s| s.count += count_words("hello world")); // 2

    let mut h2 = AggregateTestHarness::<WordCountState>::new();
    h2.update(|s| s.count += count_words("one two three four")); // 4

    h2.combine(&h1, |src, tgt| tgt.count += src.count);
    assert_eq!(h2.finalize().count, 6);
}
}

Run all tests with:

cargo test --manifest-path examples/hello-ext/Cargo.toml

See the Testing Guide for the full test strategy.

Project Scaffold

quack_rs::scaffold::generate_scaffold generates a complete, submission-ready DuckDB community extension project from a single function call. No manual file creation, no copy-pasting templates.


What it generates

my_extension/
├── Cargo.toml                          # cdylib crate, pinned deps, release profile
├── Makefile                            # delegates to cargo + extension-ci-tools
├── extension_config.cmake              # required by extension-ci-tools
├── src/
│   ├── lib.rs                          # entry point template
│   └── wasm_lib.rs                     # WASM staticlib shim
├── description.yml                     # community extension metadata
├── test/
│   └── sql/
│       └── my_extension.test           # SQLLogicTest skeleton
├── .github/
│   └── workflows/
│       └── extension-ci.yml            # cross-platform CI workflow
├── .gitmodules                         # extension-ci-tools submodule
├── .gitignore
└── .cargo/
    └── config.toml                     # Windows CRT static linking

Usage

use quack_rs::scaffold::{ScaffoldConfig, generate_scaffold};
use std::path::Path;

fn main() {
    let config = ScaffoldConfig {
        name: "my_extension".to_string(),
        description: "My DuckDB extension".to_string(),
        version: "0.1.0".to_string(),
        license: "MIT".to_string(),
        maintainer: "Your Name".to_string(),
        github_repo: "yourorg/duckdb-my-extension".to_string(),
        excluded_platforms: vec![],
    };

    let files = generate_scaffold(&config).expect("scaffold generation failed");

    for file in &files {
        let path = Path::new(&file.path);
        if let Some(parent) = path.parent() {
            std::fs::create_dir_all(parent).unwrap();
        }
        std::fs::write(path, &file.content).unwrap();
        println!("created {}", file.path);
    }
}

ScaffoldConfig fields

FieldTypeDescription
nameStringExtension name — must match [lib] name in Cargo.toml and description.yml
descriptionStringOne-line description for description.yml
versionStringSemver or git hash — validated by validate_extension_version
licenseStringSPDX license identifier (e.g., "MIT", "Apache-2.0")
maintainerStringYour name or org, listed in description.yml
github_repoString"owner/repo" format
excluded_platformsVec<String>Platforms to skip (e.g., ["wasm_mvp", "wasm_eh"])

Name validation

Extension names must satisfy all of:

Use vendor-prefixed names to avoid collisions: myorg_analytics, not analytics.

The scaffold generator validates the name before generating any files and returns an error if it violates the rules.


After scaffolding

cd my_extension
git init
git submodule add https://github.com/duckdb/extension-ci-tools.git extension-ci-tools
git submodule update --init --recursive
make configure
make release

Then add your function logic in src/lib.rs, write your SQLLogicTests in test/sql/my_extension.test, and push to GitHub — CI runs automatically.


Excluded platforms

Some extensions cannot be built for all platforms (e.g., extensions that depend on platform-specific system libraries, or WASM environments that lack threading).

#![allow(unused)]
fn main() {
ScaffoldConfig {
    excluded_platforms: vec![
        "wasm_mvp".to_string(),
        "wasm_eh".to_string(),
        "wasm_threads".to_string(),
    ],
    // ...
}
}

Validate individual platform names with quack_rs::validate::validate_platform, or a semicolon-delimited string (as used in description.yml) with quack_rs::validate::validate_excluded_platforms_str.

Extension Anatomy

A DuckDB loadable extension is a shared library (.so / .dylib / .dll) that DuckDB loads at runtime. Understanding what DuckDB expects makes every other part of quack-rs click.


The initialization sequence

When DuckDB loads your extension, it:

  1. Opens the shared library and looks up the symbol {name}_init_c_api
  2. Calls that function with an info handle and a pointer to function dispatch pointers
  3. Your function must: a. Call duckdb_rs_extension_api_init(info, access, api_version) to initialize the dispatch table b. Get the duckdb_database handle via access.get_database(info) c. Open a duckdb_connection via duckdb_connect d. Register functions on that connection e. Disconnect f. Return true (success) or false (failure)

quack_rs::entry_point::init_extension performs all of this correctly. The entry_point! macro generates the required #[no_mangle] extern "C" symbol:

#![allow(unused)]
fn main() {
entry_point!(my_extension_init_c_api, |con| register(con));
// emits: #[no_mangle] pub unsafe extern "C" fn my_extension_init_c_api(...)
}

Symbol naming

The symbol name must be {extension_name}_init_c_api — all lowercase, underscores only. If the symbol is missing or misnamed, DuckDB fails to load the extension.

Extension name: "word_count_ext"
Required symbol: word_count_ext_init_c_api

Pass the full symbol name to entry_point!. This keeps the exported name explicit and visible at the call site — no hidden identifier manipulation at compile time.


The loadable-extension feature

libduckdb-sys with features = ["loadable-extension"] changes how DuckDB API functions work fundamentally:

Without feature:  duckdb_query(...)  →  calls linked libduckdb directly
With feature:     duckdb_query(...)  →  dispatches through an AtomicPtr table

The AtomicPtr table starts as null. DuckDB fills it in by calling duckdb_rs_extension_api_init. This means:

  • Any call before duckdb_rs_extension_api_init panics with "DuckDB API not initialized"
  • In cargo test, you cannot call any duckdb_* function — the table is never initialized

This is why quack-rs uses AggregateTestHarness for testing: it simulates the aggregate lifecycle in pure Rust, with zero DuckDB API calls.


Dependency model

graph TD
    EXT["your-extension"]
    QR["quack-rs"]
    LDS["libduckdb-sys >=1.4.4, <2<br/>{loadable-extension}<br/>(headers only — no linked library)"]

    EXT --> QR
    EXT --> LDS
    QR  --> LDS

The loadable-extension feature produces a shared library that does not statically link DuckDB. Instead, it receives DuckDB's function pointers at load time. This is the correct model for extensions: you run inside DuckDB's process, using its memory and threading.


Version support

libduckdb-sys = ">=1.4.4, <2" — the bounded range is intentional.

DuckDB 1.4.x and 1.5.x both expose C API version v1.2.0 (the version string embedded in duckdb_rs_extension_api_init). quack-rs has been E2E tested against both releases. Using a range rather than an exact pin means:

  • Extension authors can choose their DuckDB target (pin to =1.4.4 or =1.5.0 in their own Cargo.toml) and resolve cleanly against quack-rs
  • quack-rs itself doesn't force a DuckDB downgrade on users

The <2 upper bound is equally intentional: it prevents silent adoption of a future major release that may introduce breaking C API changes. Upgrading beyond the 1.x band requires an explicit quack-rs release that audits the new C API surface.

For your own extension's Cargo.toml: pin libduckdb-sys to the exact DuckDB version you build and test against (e.g., =1.5.0). Your extension binary will only load in the DuckDB version it was compiled for regardless — the range only matters for quack-rs itself as a library dependency.


Binary compatibility

Extension binaries are tied to a specific DuckDB version and platform. Key facts:

  • An extension compiled for DuckDB 1.4.4 will not load in DuckDB 1.5.0
  • DuckDB verifies binary compatibility at load time and refuses mismatched binaries
  • Official DuckDB extensions are cryptographically signed; community extensions are not
  • To load unsigned extensions: SET allow_unsigned_extensions = true (development only)
  • The community extension CI provides automated cross-platform builds for each DuckDB release

The Entry Point

Every DuckDB extension must export a single C-callable symbol that DuckDB invokes at load time. quack-rs provides two ways to create it.


Option A: entry_point_v2! with Connection (recommended)

Added in v0.4.0.

The entry_point_v2! macro gives your closure a &Connection instead of a raw duckdb_connection. The Connection type implements the Registrar trait, which provides ergonomic methods for registering every function type:

#![allow(unused)]
fn main() {
use quack_rs::entry_point_v2;
use quack_rs::connection::{Connection, Registrar};
use quack_rs::error::ExtensionError;

unsafe fn register(con: &Connection) -> Result<(), ExtensionError> {
    unsafe {
        con.register_scalar(/* ScalarFunctionBuilder */)?;
        con.register_aggregate(/* AggregateFunctionBuilder */)?;
        con.register_table(/* TableFunctionBuilder */)?;
        con.register_cast(/* CastFunctionBuilder */)?;
        con.register_scalar_set(/* ScalarFunctionSetBuilder */)?;
        con.register_aggregate_set(/* AggregateFunctionSetBuilder */)?;
        con.register_sql_macro(/* SqlMacro */)?;
        con.register_replacement_scan(/* callback, data, destructor */);
        // con.register_copy_function(/* CopyFunctionBuilder */)?;  // requires duckdb-1-5
    }
    Ok(())
}

entry_point_v2!(my_extension_init_c_api, |con| unsafe { register(con) });
}

This emits:

#![allow(unused)]
fn main() {
#[no_mangle]
pub unsafe extern "C" fn my_extension_init_c_api(
    info: duckdb_extension_info,
    access: *const duckdb_extension_access,
) -> bool {
    unsafe {
        quack_rs::entry_point::init_extension_v2(
            info, access, quack_rs::DUCKDB_API_VERSION,
            |con| unsafe { register(con) },
        )
    }
}
}

Pass the full symbol name to the macro. The symbol {name}_init_c_api must match the name field in description.yml and the [lib] name in Cargo.toml.

Why Connection over raw duckdb_connection?

Featureentry_point! (raw)entry_point_v2! (Connection)
Receivesduckdb_connection&Connection
RegistrationCall builders' .register(con)Call con.register_*()
Type safetyRaw pointerWrapper with lifetime
Future-proofingTied to C pointerCan evolve without breaking extensions

Option B: The entry_point! macro

The original macro passes a raw duckdb_connection to your closure. It works identically but requires you to pass the connection to each builder's .register():

#![allow(unused)]
fn main() {
use quack_rs::entry_point;
use quack_rs::error::ExtensionError;

fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        // register your functions here
        Ok(())
    }
}

entry_point!(my_extension_init_c_api, |con| register(con));
}

Option C: Manual entry point

If you need full control (e.g., multiple registration functions, conditional logic):

#![allow(unused)]
fn main() {
use quack_rs::entry_point::init_extension;
use libduckdb_sys::{duckdb_extension_info, duckdb_extension_access};

#[no_mangle]
pub unsafe extern "C" fn my_extension_init_c_api(
    info: duckdb_extension_info,
    access: *const duckdb_extension_access,
) -> bool {
    unsafe {
        init_extension(info, access, quack_rs::DUCKDB_API_VERSION, |con| {
            register_scalar_functions(con)?;
            register_aggregate_functions(con)?;
            register_sql_macros(con)?;
            Ok(())
        })
    }
}
}

What init_extension does

flowchart TD
    A["**1. duckdb_rs_extension_api_init**(info, access, version)<br/>Fills the global AtomicPtr dispatch table"]
    B["**2. access.get_database**(info)<br/>Returns the duckdb_database handle"]
    C["**3. duckdb_connect**(db, &amp;mut con)<br/>Opens a connection for function registration"]
    D["**4. register**(con) ← your closure"]
    E["**5. duckdb_disconnect**(&amp;mut con)<br/>Always runs, even if registration failed"]
    F{Error?}
    G["return **true**"]
    H["return **false**<br/>error reported via access.set_error"]

    A --> B --> C --> D --> E --> F
    F -->|no| G
    F -->|yes| H

    style G fill:#1c3b1c,stroke:#4a9e4a,color:#c8ecc8
    style H fill:#3b1c1c,stroke:#9e4a4a,color:#ecc8c8

Errors from step 4 are reported back to DuckDB via access.set_error and the function returns false. DuckDB then surfaces the error message to the user.


The C API version constant

#![allow(unused)]
fn main() {
pub const DUCKDB_API_VERSION: &str = "v1.2.0";
}

Pitfall P2: This is the C API version, not the DuckDB release version. DuckDB 1.4.x, 1.5.0, and 1.5.1 all use C API version v1.2.0. Passing the wrong string causes the metadata script to fail or produce incorrect metadata. See Pitfall P2.


No panics in the entry point

init_extension never panics. All error paths use Result and ?. If your registration closure returns Err, the error message is reported to DuckDB via access.set_error and the extension fails to load gracefully.

Never use unwrap() or expect() in FFI callbacks. See Pitfall L3.

Error Handling

quack-rs uses a single error type throughout: ExtensionError.


ExtensionError

#![allow(unused)]
fn main() {
use quack_rs::error::{ExtensionError, ExtResult};

// From a string literal
let e = ExtensionError::from("something went wrong");

// From a format string
let e = ExtensionError::new(format!("failed to register '{}': code {}", name, code));

// Wrapping another error
let e = ExtensionError::from_error(some_std_error);
}

ExtensionError implements:

  • std::error::Error
  • Display, Debug, Clone, PartialEq, Eq
  • From<&str>, From<String>, From<Box<dyn Error>>

ExtResult<T>

A type alias for Result<T, ExtensionError>, used throughout the SDK:

#![allow(unused)]
fn main() {
pub type ExtResult<T> = Result<T, ExtensionError>;
}

Propagating errors with ?

In your registration function:

#![allow(unused)]
fn main() {
fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        ScalarFunctionBuilder::new("my_fn")
            .param(TypeId::BigInt)
            .returns(TypeId::BigInt)
            .function(my_fn)
            .register(con)?;   // ← ? propagates registration errors

        SqlMacro::scalar("my_macro", &["x"], "x + 1")?
            .register(con)?;

        Ok(())
    }
}
}

If any registration call fails, ? returns the error from register, which init_extension then reports to DuckDB via access.set_error.


Error reporting to DuckDB

init_extension converts ExtensionError to a CString for the DuckDB error callback:

#![allow(unused)]
fn main() {
pub fn to_c_string(&self) -> CString {
    // Truncates at the first null byte if message contains one
    CString::new(self.message.as_bytes()).unwrap_or_else(...)
}
}

DuckDB surfaces this string to the user as the extension load error.


No panics, ever

The cardinal rule of DuckDB extension development:

Never unwrap(), expect(), or panic!() in any code path that DuckDB may call.

Rust panics that cross FFI boundaries are undefined behavior. With panic = "abort" in the release profile, a panic terminates the process — which is safer than UB, but still unacceptable in production.

Safe patterns

#![allow(unused)]
fn main() {
// ✅ Use Option methods
if let Some(s) = FfiState::<MyState>::with_state_mut(state_ptr) {
    s.count += 1;
}

// ✅ Use Result and ?
let value = some_fallible_call()?;

// ✅ Use unwrap_or / unwrap_or_else / map
let count = maybe_count.unwrap_or(0);

// ❌ Never in FFI callbacks
let s = FfiState::<MyState>::with_state_mut(state_ptr).unwrap(); // undefined behavior
}

In init_extension

init_extension wraps everything in match and reports errors via set_error — it can never panic regardless of what your registration closure returns.

Type System

quack-rs provides TypeId and LogicalType to bridge Rust types and DuckDB column types.


TypeId

TypeId is an ergonomic enum covering all DuckDB column types:

#![allow(unused)]
fn main() {
use quack_rs::types::TypeId;

TypeId::Boolean
TypeId::TinyInt     // i8
TypeId::SmallInt    // i16
TypeId::Integer     // i32
TypeId::BigInt      // i64
TypeId::UTinyInt    // u8
TypeId::USmallInt   // u16
TypeId::UInteger    // u32
TypeId::UBigInt     // u64
TypeId::HugeInt     // i128
TypeId::UHugeInt    // u128
TypeId::Float       // f32
TypeId::Double      // f64
TypeId::Timestamp
TypeId::TimestampTz
TypeId::TimestampS
TypeId::TimestampMs
TypeId::TimestampNs
TypeId::Date
TypeId::Time
TypeId::TimeTz
TypeId::Interval
TypeId::Varchar
TypeId::Blob
TypeId::Decimal
TypeId::Enum
TypeId::List
TypeId::Struct
TypeId::Map
TypeId::Uuid
TypeId::Union
TypeId::Bit
TypeId::Array
TypeId::TimeNs      // duckdb-1-5
TypeId::Any              // duckdb-1-5
TypeId::Varint           // duckdb-1-5
TypeId::SqlNull          // duckdb-1-5
TypeId::IntegerLiteral   // duckdb-1-5
TypeId::StringLiteral    // duckdb-1-5
}

TypeId is Copy, Clone, Debug, PartialEq, Eq, and Display.

SQL name

#![allow(unused)]
fn main() {
assert_eq!(TypeId::BigInt.sql_name(), "BIGINT");
assert_eq!(TypeId::Varchar.sql_name(), "VARCHAR");
assert_eq!(format!("{}", TypeId::Timestamp), "TIMESTAMP");
}

DuckDB constant

TypeId::to_duckdb_type() returns the DUCKDB_TYPE_* integer constant from libduckdb-sys. You rarely need this directly — it's called internally by LogicalType::new.

Reverse conversion

TypeId::from_duckdb_type(raw) converts a raw DUCKDB_TYPE constant back into a TypeId. Panics if the value does not match any known constant.

#![allow(unused)]
fn main() {
use quack_rs::types::TypeId;

let type_id = TypeId::from_duckdb_type(libduckdb_sys::DUCKDB_TYPE_DUCKDB_TYPE_BIGINT);
assert_eq!(type_id, TypeId::BigInt);
}

LogicalType

LogicalType is a RAII wrapper around DuckDB's duckdb_logical_type. It is used internally by the function builders.

#![allow(unused)]
fn main() {
use quack_rs::types::{LogicalType, TypeId};

let lt = LogicalType::new(TypeId::Varchar);
// lt.as_raw() returns the duckdb_logical_type pointer
// Drop calls duckdb_destroy_logical_type automatically
}

Pitfall L7: duckdb_create_logical_type allocates memory that must be freed with duckdb_destroy_logical_type. LogicalType's Drop implementation does this automatically, preventing the memory leak that occurs when calling the DuckDB C API directly. See Pitfall L7.

You almost never need to create LogicalType directly. The function builders (ScalarFunctionBuilder, AggregateFunctionBuilder) create and destroy them internally.

Constructors

ConstructorCreates
LogicalType::new(type_id)Simple type from a TypeId
LogicalType::from_raw(ptr)Takes ownership of a raw duckdb_logical_type handle (unsafe)
LogicalType::decimal(width, scale)DECIMAL(width, scale)
LogicalType::list(element_type)LIST<element_type> from a TypeId
LogicalType::list_from_logical(element)LIST<element> from an existing LogicalType
LogicalType::map(key, value)MAP<key, value> from TypeIds
LogicalType::map_from_logical(key, value)MAP<key, value> from existing LogicalTypes
LogicalType::struct_type(fields)STRUCT from &[(&str, TypeId)]
LogicalType::struct_type_from_logical(fields)STRUCT from &[(&str, LogicalType)]
LogicalType::union_type(members)UNION from &[(&str, TypeId)]
LogicalType::union_type_from_logical(members)UNION from &[(&str, LogicalType)]
LogicalType::enum_type(members)ENUM from &[&str]
LogicalType::array(element_type, size)ARRAY<element_type>[size] from a TypeId
LogicalType::array_from_logical(element, size)ARRAY<element>[size] from an existing LogicalType

Introspection methods

All introspection methods are unsafe (require a valid DuckDB runtime handle).

MethodReturnsApplicable to
get_type_id()TypeIdAny
get_alias()Option<String>Any
set_alias(alias)()Any
decimal_width()u8DECIMAL
decimal_scale()u8DECIMAL
decimal_internal_type()TypeIdDECIMAL
enum_internal_type()TypeIdENUM
enum_dictionary_size()u32ENUM
enum_dictionary_value(index)StringENUM
list_child_type()LogicalTypeLIST
map_key_type()LogicalTypeMAP
map_value_type()LogicalTypeMAP
struct_child_count()u64STRUCT
struct_child_name(index)StringSTRUCT
struct_child_type(index)LogicalTypeSTRUCT
union_member_count()u64UNION
union_member_name(index)StringUNION
union_member_type(index)LogicalTypeUNION
array_size()u64ARRAY
array_child_type()LogicalTypeARRAY

Rust type ↔ DuckDB type mapping

When reading from or writing to vectors, use the corresponding VectorReader/VectorWriter method:

DuckDB typeTypeIdReader methodWriter method
BOOLEANBooleanread_boolwrite_bool
TINYINTTinyIntread_i8write_i8
SMALLINTSmallIntread_i16write_i16
INTEGERIntegerread_i32write_i32
BIGINTBigIntread_i64write_i64
UTINYINTUTinyIntread_u8write_u8
USMALLINTUSmallIntread_u16write_u16
UINTEGERUIntegerread_u32write_u32
UBIGINTUBigIntread_u64write_u64
FLOATFloatread_f32write_f32
DOUBLEDoubleread_f64write_f64
VARCHARVarcharread_strwrite_varchar
INTERVALIntervalread_intervalwrite_interval

NULLs are handled separately — see NULL Handling & Strings.

Scalar Functions

Scalar functions transform a batch of input rows into a corresponding batch of output values. They are the most common DuckDB extension pattern — equivalent to SQL's built-in functions like length(), upper(), or sin().


Function signature

DuckDB calls your scalar function once per data chunk (not once per row). The signature is:

#![allow(unused)]
fn main() {
unsafe extern "C" fn my_fn(
    info: duckdb_function_info,     // function metadata (rarely needed)
    input: duckdb_data_chunk,       // input data — one or more columns
    output: duckdb_vector,          // output vector — one value per input row
)
}

Inside the function, you:

  1. Create a VectorReader for each input column
  2. Create a VectorWriter for the output
  3. Loop over rows, checking for NULLs and transforming values

Registration

#![allow(unused)]
fn main() {
use quack_rs::scalar::ScalarFunctionBuilder;
use quack_rs::types::TypeId;

unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        ScalarFunctionBuilder::new("my_fn")
            .param(TypeId::BigInt)      // first parameter type
            .param(TypeId::BigInt)      // second parameter type (if any)
            .returns(TypeId::BigInt)    // return type
            .function(my_fn)            // callback
            .register(con)?;
    }
    Ok(())
}
}

The builder validates that returns and function are set before calling duckdb_register_scalar_function. If DuckDB reports failure, register returns Err.

Validated registration

For user-configurable function names (e.g., from a config file), use try_new:

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::try_new(name)?   // validates name before building
    .param(TypeId::Varchar)
    .returns(TypeId::Varchar)
    .function(my_fn)
    .register(con)?;
}

try_new validates the name against DuckDB naming rules: [a-z_][a-z0-9_]*, max 256 characters. new panics on invalid names (suitable for compile-time-known names only).


Complete example: double_it(BIGINT) → BIGINT

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorReader, VectorWriter};
use libduckdb_sys::{duckdb_function_info, duckdb_data_chunk, duckdb_vector};

unsafe extern "C" fn double_it(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    // SAFETY: DuckDB provides valid chunk and vector pointers.
    let reader = unsafe { VectorReader::new(input, 0) };   // column 0
    let mut writer = unsafe { VectorWriter::new(output) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if unsafe { !reader.is_valid(row) } {
            // NULL input → NULL output
            // SAFETY: row < row_count, writer is valid.
            unsafe { writer.set_null(row) };
            continue;
        }
        let value = unsafe { reader.read_i64(row) };
        unsafe { writer.write_i64(row, value * 2) };
    }
}
}

Multi-parameter example: add(BIGINT, BIGINT) → BIGINT

#![allow(unused)]
fn main() {
unsafe extern "C" fn add(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let col0 = unsafe { VectorReader::new(input, 0) };  // first param
    let col1 = unsafe { VectorReader::new(input, 1) };  // second param
    let mut writer = unsafe { VectorWriter::new(output) };

    for row in 0..col0.row_count() {
        if unsafe { !col0.is_valid(row) || !col1.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let a = unsafe { col0.read_i64(row) };
        let b = unsafe { col1.read_i64(row) };
        unsafe { writer.write_i64(row, a + b) };
    }
}
}

VARCHAR example: shout(VARCHAR) → VARCHAR

#![allow(unused)]
fn main() {
unsafe extern "C" fn shout(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };

    for row in 0..reader.row_count() {
        if unsafe { !reader.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let s = unsafe { reader.read_str(row) };
        let upper = s.to_uppercase();
        unsafe { writer.write_varchar(row, &upper) };
    }
}
}

Overloading with Function Sets

If your function accepts different parameter types or arities, use ScalarFunctionSetBuilder to register multiple overloads under a single name:

#![allow(unused)]
fn main() {
use quack_rs::scalar::{ScalarFunctionSetBuilder, ScalarOverloadBuilder};
use quack_rs::types::TypeId;

unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        ScalarFunctionSetBuilder::new("my_add")
            .overload(
                ScalarOverloadBuilder::new()
                    .param(TypeId::Integer).param(TypeId::Integer)
                    .returns(TypeId::Integer)
                    .function(add_ints)
            )
            .overload(
                ScalarOverloadBuilder::new()
                    .param(TypeId::Double).param(TypeId::Double)
                    .returns(TypeId::Double)
                    .function(add_doubles)
            )
            .register(con)?;
    }
    Ok(())
}
}

Like AggregateFunctionSetBuilder, this builder calls duckdb_scalar_function_set_name on every individual function before adding it to the set (Pitfall L6).


NULL Handling

By default, DuckDB returns NULL if any argument is NULL — your function callback is never called for those rows. If you need to handle NULLs explicitly (e.g., for a COALESCE-like function), set SpecialNullHandling:

#![allow(unused)]
fn main() {
use quack_rs::types::NullHandling;

ScalarFunctionBuilder::new("coalesce_custom")
    .param(TypeId::BigInt)
    .returns(TypeId::BigInt)
    .null_handling(NullHandling::SpecialNullHandling)
    .function(my_coalesce_fn)
    .register(con)?;
}

With SpecialNullHandling, your callback must check VectorReader::is_valid(row) and handle NULLs yourself.


Complex parameter and return types

For scalar functions that accept or return parameterized types like LIST(BIGINT), use param_logical and returns_logical:

#![allow(unused)]
fn main() {
use quack_rs::scalar::ScalarFunctionBuilder;
use quack_rs::types::{LogicalType, TypeId};

ScalarFunctionBuilder::new("flatten_list")
    .param_logical(LogicalType::list(TypeId::BigInt))  // LIST(BIGINT) input
    .returns(TypeId::BigInt)
    .function(flatten_list_fn)
    .register(con)?;
}

These methods are also available on ScalarOverloadBuilder for function sets:

#![allow(unused)]
fn main() {
ScalarOverloadBuilder::new()
    .param(TypeId::Varchar)
    .returns_logical(LogicalType::list(TypeId::Timestamp))  // LIST(TIMESTAMP) output
    .function(my_fn)
}

Key points

  • VectorReader::new(input, column_index) — the column index is zero-based
  • Always check is_valid(row) before reading — skipping this reads garbage for NULL rows
  • set_null must be called for NULL outputs — it calls ensure_validity_writable automatically (Pitfall L4)
  • read_bool returns bool — handles DuckDB's non-0/1 boolean bytes correctly (Pitfall L5)
  • read_str handles both inline and pointer string formats automatically (Pitfall P7)

DuckDB 1.5.0 Additions (duckdb-1-5)

The following ScalarFunctionBuilder methods are available when the duckdb-1-5 feature is enabled:

varargs(type_id: TypeId)

Declares that the function accepts a variable number of trailing arguments, all of the given TypeId. Maps to duckdb_scalar_function_set_varargs.

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("concat_all")
    .varargs(TypeId::Varchar)
    .returns(TypeId::Varchar)
    .function(concat_all_fn)
    .register(con)?;
}

varargs_logical(logical_type: LogicalType)

Like varargs, but accepts a LogicalType for parameterized variadic arguments. Maps to duckdb_scalar_function_set_varargs.

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("merge_lists")
    .varargs_logical(LogicalType::list(TypeId::BigInt))
    .returns_logical(LogicalType::list(TypeId::BigInt))
    .function(merge_lists_fn)
    .register(con)?;
}

volatile()

Marks the function as volatile, meaning DuckDB will not cache or reuse its results across calls with the same arguments. Maps to duckdb_scalar_function_set_volatile.

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("random_int")
    .returns(TypeId::Integer)
    .volatile()
    .function(random_int_fn)
    .register(con)?;
}

bind(bind_fn)

Sets a custom bind callback that runs at plan time. Use this to inspect argument types and set the return type dynamically. Maps to duckdb_scalar_function_set_bind.

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("dynamic_return")
    .varargs(TypeId::Varchar)
    .returns(TypeId::Varchar)   // default; overridden in bind
    .bind(my_bind_fn)
    .function(dynamic_return_fn)
    .register(con)?;
}

init(init_fn)

Sets a local-init callback invoked once per thread before execution begins. Use this to allocate per-thread state. Maps to duckdb_scalar_function_set_init.

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("stateful_fn")
    .param(TypeId::BigInt)
    .returns(TypeId::BigInt)
    .init(my_init_fn)
    .function(stateful_fn)
    .register(con)?;
}

Extra info

Attach arbitrary data to a scalar function using extra_info. This is useful for parameterising the function behaviour (e.g., a locale or configuration struct). The method is available on both ScalarFunctionBuilder and ScalarOverloadBuilder.

#![allow(unused)]
fn main() {
use std::os::raw::c_void;

let config = Box::into_raw(Box::new("en_US".to_string())).cast::<c_void>();
unsafe {
    ScalarFunctionBuilder::new("locale_upper")
        .param(TypeId::Varchar)
        .returns(TypeId::Varchar)
        .extra_info(config, Some(my_destroy))
        .function(locale_upper_fn)
        .register(con)?;
}
}

Inside the callback, retrieve the extra info with ScalarFunctionInfo::get_extra_info().


ScalarFunctionInfo

ScalarFunctionInfo wraps the duckdb_function_info handle provided to a scalar function callback. It exposes:

  • get_extra_info() -> *mut c_void — retrieves the extra-info pointer set during registration
  • set_error(message) — reports an error, causing DuckDB to abort the query
#![allow(unused)]
fn main() {
use quack_rs::scalar::ScalarFunctionInfo;

unsafe extern "C" fn my_fn(
    info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let info = unsafe { ScalarFunctionInfo::new(info) };
    let extra = unsafe { info.get_extra_info() };
    // ... use extra info, or report errors via info.set_error("...") ...
}
}

With the duckdb-1-5 feature, ScalarFunctionInfo also provides:

  • get_bind_data() -> *mut c_void — retrieves bind data set during the bind callback
  • get_state() -> *mut c_void — retrieves per-thread state set during the init callback

ScalarBindInfo (duckdb-1-5)

ScalarBindInfo wraps the duckdb_bind_info handle provided to a scalar function bind callback. It exposes:

  • argument_count() -> u64 — number of arguments
  • get_argument(index) -> duckdb_expression — argument expression at index
  • get_extra_info() -> *mut c_void — the extra-info pointer from registration
  • set_bind_data(data, destroy) — stores per-query data retrievable during execution
  • set_error(message) — reports an error
  • get_client_context() -> ClientContext — access to the connection's catalog and config

ScalarInitInfo (duckdb-1-5)

ScalarInitInfo wraps the duckdb_init_info handle provided to a scalar function init callback. It exposes:

  • get_extra_info() -> *mut c_void — the extra-info pointer from registration
  • get_bind_data() -> *mut c_void — the bind data from the bind callback
  • set_state(state, destroy) — stores per-thread state retrievable during execution
  • set_error(message) — reports an error
  • get_client_context() -> ClientContext — access to the connection's catalog and config

Aggregate Functions

Aggregate functions reduce multiple rows into a single value per group — like SUM(), COUNT(), or AVG(). DuckDB supports parallel aggregation, which introduces a combine step that merges partial results from parallel workers.


The aggregate lifecycle

flowchart TD
    REG["**Registration**<br/>AggregateFunctionBuilder<br/>→ duckdb_register_aggregate_function"]

    REG     --> SIZE
    SIZE    --> INIT
    INIT    --> UPDATE
    UPDATE  --> COMBINE
    COMBINE --> FINAL
    FINAL   --> DESTROY

    SIZE["**state_size**()<br/>How many bytes to allocate per group?"]
    INIT["**state_init**(state)<br/>Initialize a fresh state"]
    UPDATE["**update**(chunk, states[])<br/>Process one input batch"]
    COMBINE["**combine**(src[], tgt[], count)<br/>Merge partial results from parallel workers<br/>⚠️ Pitfall L1: target starts fresh — copy ALL config fields"]
    FINAL["**finalize**(states[], out, count)<br/>Write results to output vector"]
    DESTROY["**state_destroy**(states[], count)<br/>Free memory"]

    style COMBINE fill:#fff3cd,stroke:#e6ac00,color:#333

DuckDB may call combine multiple times as it merges results from parallel segments. Target states in combine are always fresh (zero-initialized via state_init).


Registration

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionBuilder;
use quack_rs::types::TypeId;

unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        AggregateFunctionBuilder::new("my_agg")
            .param(TypeId::Varchar)       // input type(s)
            .returns(TypeId::BigInt)      // output type
            .state_size(state_size)
            .init(state_init)
            .update(update)
            .combine(combine)
            .finalize(finalize)
            .destructor(state_destroy)
            .register(con)?;
    }
    Ok(())
}
}

The five core callbacks (state_size, init, update, combine, finalize) must be set before register — the builder will return an error if any are missing. The destructor callback is optional but strongly recommended when your state allocates heap memory (e.g., when using FfiState<T>).


Callback signatures

state_size

#![allow(unused)]
fn main() {
unsafe extern "C" fn state_size(_info: duckdb_function_info) -> idx_t {
    FfiState::<MyState>::size_callback(_info)
}
}

Returns the size DuckDB must allocate per group. This is always size_of::<*mut MyState>() — a pointer, since FfiState<T> stores a Box<T> pointer in the allocated slot.

state_init

#![allow(unused)]
fn main() {
unsafe extern "C" fn state_init(info: duckdb_function_info, state: duckdb_aggregate_state) {
    unsafe { FfiState::<MyState>::init_callback(info, state) };
}
}

Allocates a Box<MyState> (using MyState::default()) and writes its raw pointer into the DuckDB-allocated state slot.

update

#![allow(unused)]
fn main() {
unsafe extern "C" fn update(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    states: *mut duckdb_aggregate_state,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if unsafe { !reader.is_valid(row) } { continue; }
        let value = unsafe { reader.read_i64(row) };

        let state_ptr = unsafe { *states.add(row) };
        if let Some(st) = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) } {
            st.accumulate(value);
        }
    }
}
}

states[i] corresponds to chunk row i. Each state belongs to one group.

combine

#![allow(unused)]
fn main() {
unsafe extern "C" fn combine(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    target: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    for i in 0..count as usize {
        let src = unsafe { FfiState::<MyState>::with_state(*source.add(i)) };
        let tgt = unsafe { FfiState::<MyState>::with_state_mut(*target.add(i)) };
        if let (Some(s), Some(t)) = (src, tgt) {
            // ⚠️  MUST copy ALL fields — see Pitfall L1
            t.config_field = s.config_field;   // configuration
            t.accumulator  += s.accumulator;    // data
        }
    }
}
}

Pitfall L1 — critical: Target states are fresh T::default() values. You must copy every field, including configuration fields set during update. Forgetting even one config field produces silently wrong results. See Pitfall L1.

finalize

#![allow(unused)]
fn main() {
unsafe extern "C" fn finalize(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    result: duckdb_vector,
    count: idx_t,
    offset: idx_t,
) {
    let mut writer = unsafe { VectorWriter::new(result) };
    for i in 0..count as usize {
        let state_ptr = unsafe { *source.add(i) };
        match unsafe { FfiState::<MyState>::with_state(state_ptr) } {
            Some(st) => unsafe { writer.write_i64(offset as usize + i, st.result()) },
            None     => unsafe { writer.set_null(offset as usize + i) },
        }
    }
}
}

The offset parameter is non-zero when DuckDB is writing into a portion of a larger vector. Always add it to your index.

state_destroy

#![allow(unused)]
fn main() {
unsafe extern "C" fn state_destroy(states: *mut duckdb_aggregate_state, count: idx_t) {
    unsafe { FfiState::<WordCountState>::destroy_callback(states, count) };
}
}

destroy_callback calls Box::from_raw for each state and then nulls the pointer, preventing double-free. See Pitfall L2.


Complex parameter and return types

For functions that accept or return parameterized types like LIST(BIGINT), MAP(VARCHAR, INTEGER), or STRUCT(...), use param_logical and returns_logical instead of param and returns:

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionBuilder;
use quack_rs::types::{LogicalType, TypeId};

unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        AggregateFunctionBuilder::new("retention")
            .param(TypeId::Boolean)
            .param(TypeId::Boolean)
            .returns_logical(LogicalType::list(TypeId::Boolean))  // LIST(BOOLEAN)
            .state_size(state_size)
            .init(state_init)
            .update(update)
            .combine(combine)
            .finalize(finalize)
            .destructor(state_destroy)
            .register(con)?;
    }
    Ok(())
}
}

param_logical and param can be interleaved — the parameter position is determined by the total number of calls made so far:

#![allow(unused)]
fn main() {
AggregateFunctionBuilder::new("my_func")
    .param(TypeId::Varchar)                          // position 0: VARCHAR
    .param_logical(LogicalType::list(TypeId::BigInt)) // position 1: LIST(BIGINT)
    .param(TypeId::Integer)                           // position 2: INTEGER
    .returns(TypeId::BigInt)
    // ...
}

If both returns and returns_logical are called, the logical type takes precedence.


Extra info

Attach arbitrary data to an aggregate function using extra_info. This is useful for parameterising the function behaviour (e.g., passing configuration):

#![allow(unused)]
fn main() {
use std::os::raw::c_void;

let config = Box::into_raw(Box::new(42u64)).cast::<c_void>();
unsafe {
    AggregateFunctionBuilder::new("my_agg")
        .param(TypeId::BigInt)
        .returns(TypeId::BigInt)
        .extra_info(config, Some(my_destroy))
        .state_size(state_size)
        .init(state_init)
        .update(update)
        .combine(combine)
        .finalize(finalize)
        .destructor(state_destroy)
        .register(con)?;
}
}

Inside callbacks, retrieve the extra info with AggregateFunctionInfo::get_extra_info().


AggregateFunctionInfo

AggregateFunctionInfo wraps the duckdb_function_info handle provided to aggregate function callbacks (update, combine, finalize, etc.). It exposes:

  • get_extra_info() -> *mut c_void — retrieves the extra-info pointer set during registration
  • set_error(message) — reports an error, causing DuckDB to abort the query
#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionInfo;

unsafe extern "C" fn update(
    info: duckdb_function_info,
    input: duckdb_data_chunk,
    states: *mut duckdb_aggregate_state,
) {
    let info = unsafe { AggregateFunctionInfo::new(info) };
    let extra = unsafe { info.get_extra_info() };
    // ... use extra info, or report errors via info.set_error("...") ...
}
}

Next steps

State Management

FfiState<T> manages the lifecycle of aggregate state — allocation, initialization, access, and destruction — so you never write raw pointer code for state management.


AggregateState trait

Any type that is Default + Send + 'static can be used as aggregate state by implementing the AggregateState marker trait:

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateState;

#[derive(Default, Debug)]
struct MyState {
    config: usize,    // set in update, must be propagated in combine
    total: i64,       // accumulated data
}

impl AggregateState for MyState {}
}

AggregateState has no required methods. The Default bound is used in state_init to create fresh states.


FfiState<T>

FfiState<T> is a #[repr(C)] struct containing a single raw pointer:

#![allow(unused)]
fn main() {
#[repr(C)]
pub struct FfiState<T> {
    inner: *mut T,
}
}

This matches DuckDB's expectation: DuckDB allocates state_size() bytes per group, and your state lives in a Box<T> heap allocation whose pointer is stored in that space.

Memory layout

DuckDB-allocated slot (state_size bytes = sizeof(*mut T)):
  [ inner: *mut T ]  ──→  Box<T>  (on the Rust heap)

Lifecycle callbacks

#![allow(unused)]
fn main() {
// state_size: DuckDB calls this once to know how many bytes to allocate per group
FfiState::<MyState>::size_callback(_info)
// Returns: size_of::<*mut MyState>()

// state_init: DuckDB calls this once per group after allocating the slot
FfiState::<MyState>::init_callback(info, state)
// Effect: writes Box::into_raw(Box::new(MyState::default())) into the slot

// state_destroy: DuckDB calls this after finalize for every group
FfiState::<MyState>::destroy_callback(states, count)
// Effect: for each state: drop(Box::from_raw(inner)); inner = null
}

Accessing state in callbacks

#![allow(unused)]
fn main() {
// Immutable access (in finalize, combine source):
if let Some(st) = FfiState::<MyState>::with_state(state_ptr) {
    let value = st.total;
}

// Mutable access (in update, combine target):
if let Some(st) = FfiState::<MyState>::with_state_mut(state_ptr) {
    st.total += delta;
}
}

Both methods return Option<&T> / Option<&mut T>. They return None if inner is null (which happens after destroy_callback or if initialization failed). Using Option rather than panicking on null is what keeps the extension panic-free.


The double-free problem — solved

Without quack-rs, a naive destructor looks like:

#![allow(unused)]
fn main() {
// ❌ Naive — causes double-free if DuckDB calls destroy twice
unsafe extern "C" fn destroy(states: *mut duckdb_aggregate_state, count: idx_t) {
    for i in 0..count as usize {
        let ffi = &mut *(*states.add(i) as *mut FfiState<MyState>);
        drop(Box::from_raw(ffi.inner));   // inner is now dangling — crash on second call
    }
}
}

FfiState::destroy_callback does:

#![allow(unused)]
fn main() {
// After drop(Box::from_raw(ffi.inner)):
ffi.inner = std::ptr::null_mut();   // ← prevents double-free
}

If DuckDB calls destroy again, with_state returns None and the loop body is a no-op.


Testing state logic without DuckDB

AggregateTestHarness<S> simulates the DuckDB aggregate lifecycle in pure Rust:

#![allow(unused)]
fn main() {
use quack_rs::testing::AggregateTestHarness;

#[test]
fn combine_propagates_config() {
    let mut source = AggregateTestHarness::<MyState>::new();
    source.update(|s| {
        s.config = 5;    // config field set during update
        s.total += 100;
    });

    let mut target = AggregateTestHarness::<MyState>::new();
    target.combine(&source, |src, tgt| {
        tgt.config = src.config;   // must propagate config — Pitfall L1
        tgt.total  += src.total;
    });

    let result = target.finalize();
    assert_eq!(result.config, 5, "config must be propagated in combine");
    assert_eq!(result.total, 100);
}
}

See the Testing Guide for the full test strategy.

Overloading with Function Sets

DuckDB supports multiple signatures for the same function name via function sets. This is how you implement variadic aggregates like retention(c1, c2, ..., c32).

Note: For scalar function overloads, see ScalarFunctionSetBuilder.


When to use function sets

Use AggregateFunctionSetBuilder when you need:

  • Multiple type signatures for the same function name (e.g., my_agg(INT) and my_agg(BIGINT))
  • Variadic arity under one name (e.g., retention(2 columns), retention(3 columns), ...)

For a single signature, use AggregateFunctionBuilder directly.


Registration

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionSetBuilder;
use quack_rs::types::TypeId;

unsafe fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        AggregateFunctionSetBuilder::new("retention")
            .returns(TypeId::Varchar)
            .overloads(2..=3, |n, builder| {
                // Each overload gets `n` BOOLEAN parameters
                let b = (0..n).fold(builder, |b, _| b.param(TypeId::Boolean));
                b.state_size(state_size)
                    .init(state_init)
                    .update(update)
                    .combine(combine)
                    .finalize(finalize)
                    .destructor(state_destroy)
            })
            .register(con)?;
    }
    Ok(())
}
}

The overloads method accepts a RangeInclusive<usize> and a closure that receives the arity n and a fresh OverloadBuilder. The builder sets the function name on each individual member internally.


The silent name bug — solved

Pitfall L6: When using a function set, the name must be set on each individual duckdb_aggregate_function via duckdb_aggregate_function_set_name, not just on the set. If any member lacks a name, it is silently not registered — no error is returned.

This is completely undocumented. It was discovered by reading DuckDB's C++ test code at test/api/capi/test_capi_aggregate_functions.cpp. In duckdb-behavioral, 6 of 7 functions failed to register silently due to this bug.

AggregateFunctionSetBuilder enforces that each member has its name set internally when the overloads closure builds each function.

See Pitfall L6.


Complex return types

If all overloads share a complex return type, use returns_logical on the set builder:

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionSetBuilder;
use quack_rs::types::{LogicalType, TypeId};

AggregateFunctionSetBuilder::new("retention")
    .returns_logical(LogicalType::list(TypeId::Boolean))  // LIST(BOOLEAN) for all overloads
    .overloads(2..=32, |n, builder| {
        (0..n).fold(builder, |b, _| b.param(TypeId::Boolean))
            .state_size(state_size)
            .init(state_init)
            .update(update)
            .combine(combine)
            .finalize(finalize)
            .destructor(destroy)
    })
    .register(con)?;
}

Individual overloads can also use param_logical for complex parameter types:

#![allow(unused)]
fn main() {
.overloads(2..=8, |n, builder| {
    builder
        .param(TypeId::Interval)
        .param_logical(LogicalType::list(TypeId::Timestamp)) // LIST(TIMESTAMP) parameter
        // ...
})
}

Why not varargs?

DuckDB's C API does not provide duckdb_aggregate_function_set_varargs. For true variadic aggregates, you must register N overloads — one for each supported arity. Function sets make this tractable.

Note: As of DuckDB 1.5.0, scalar functions now support varargs directly via ScalarFunctionBuilder::varargs() (requires the duckdb-1-5 feature). This limitation still applies to aggregate functions, which have no varargs counterpart in the C API.

ADR-002 in the architecture docs explains this design decision in detail.

Table Functions

Table functions implement the SELECT * FROM my_function(args) pattern — they return a result set rather than a scalar value. DuckDB table functions have three lifecycle callbacks: bind, init, and scan.

quack-rs provides TableFunctionBuilder plus the helper types BindInfo, InitInfo, FunctionInfo, FfiBindData<T>, FfiInitData<T>, and FfiLocalInitData<T> to eliminate the raw FFI boilerplate.

Lifecycle

PhaseCallbackCalled whenTypical work
bindbind_fnQuery is plannedExtract parameters; register output columns; store config in bind data
initinit_fnExecution startsAllocate per-scan state (cursor, row index, etc.)
scanscan_fnEach output batchFill duckdb_data_chunk with rows; call duckdb_data_chunk_set_size

The scan callback is called repeatedly until it writes 0 rows in a batch, signalling end-of-results.

Builder API

#![allow(unused)]
fn main() {
use quack_rs::table::{TableFunctionBuilder, BindInfo, FfiBindData, FfiInitData};
use quack_rs::types::TypeId;

TableFunctionBuilder::new("my_function")
    .param(TypeId::BigInt)                 // positional parameter types
    .bind(my_bind_callback)               // declare output columns inside bind
    .init(my_init_callback)
    .scan(my_scan_callback)
    .register(con)?;
}

Output columns are declared inside the bind callback using BindInfo::add_result_column, not on the builder itself.

State management

Bind data

Bind data persists from the bind phase through all scan batches. Use FfiBindData<T> to allocate it safely:

#![allow(unused)]
fn main() {
struct MyBindData {
    limit: i64,
}

unsafe extern "C" fn my_bind(info: duckdb_bind_info) {
    let n = unsafe { duckdb_get_int64(duckdb_bind_get_parameter(info, 0)) };
    unsafe { FfiBindData::<MyBindData>::set(info, MyBindData { limit: n }) };
}
}

FfiBindData::set stores the value and registers a destructor so DuckDB frees it at the right time — no Box::into_raw / Box::from_raw needed.

Init (scan) state

Per-scan state (e.g., a current row index) uses FfiInitData<T>:

#![allow(unused)]
fn main() {
struct MyScanState {
    pos: i64,
}

unsafe extern "C" fn my_init(info: duckdb_init_info) {
    unsafe { FfiInitData::<MyScanState>::set(info, MyScanState { pos: 0 }) };
}
}

Complete example: generate_series_ext

The hello-ext example registers generate_series_ext(n BIGINT) which emits integers 0 .. n-1. See examples/hello-ext/src/lib.rs for the full source.

#![allow(unused)]
fn main() {
// Bind: extract `n`, register one output column
unsafe extern "C" fn gs_bind(info: duckdb_bind_info) {
    let param = unsafe { duckdb_bind_get_parameter(info, 0) };
    let n = unsafe { duckdb_get_int64(param) };
    unsafe { duckdb_destroy_value(&mut { param }) };

    let out_type = LogicalType::new(TypeId::BigInt);
    unsafe { duckdb_bind_add_result_column(info, c"value".as_ptr(), out_type.as_raw()) };

    unsafe { FfiBindData::<GsBindData>::set(info, GsBindData { total: n }) };
}

// Init: zero-initialise the scan cursor
unsafe extern "C" fn gs_init(info: duckdb_init_info) {
    unsafe { FfiInitData::<GsScanState>::set(info, GsScanState { pos: 0 }) };
}

// Scan: emit a batch of rows
unsafe extern "C" fn gs_scan(info: duckdb_function_info, output: duckdb_data_chunk) {
    let bind = unsafe { FfiBindData::<GsBindData>::get_from_function(info) }.unwrap();
    let state = unsafe { FfiInitData::<GsScanState>::get_mut(info) }.unwrap();

    let remaining = bind.total - state.pos;
    let batch = remaining.min(2048).max(0) as usize;

    let mut writer = unsafe { VectorWriter::new(duckdb_data_chunk_get_vector(output, 0)) };
    for i in 0..batch {
        unsafe { writer.write_i64(i, state.pos + i as i64) };
    }
    unsafe { duckdb_data_chunk_set_size(output, batch as idx_t) };
    state.pos += batch as i64;
}
}

Registration

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("generate_series_ext")
    .param(TypeId::BigInt)
    .bind(gs_bind)
    .init(gs_init)
    .scan(gs_scan)
    .register(con)?;
}

Advanced features

Named parameters

Named parameters let callers pass optional arguments by name (e.g., step := 10):

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("gen_series_v2")
    .param(TypeId::BigInt)                    // positional: n
    .named_param("step", TypeId::BigInt)      // named: step := <value>
    .bind(gs_v2_bind)
    .init(gs_v2_init)
    .scan(gs_v2_scan)
    .register(con)?;
}

In the bind callback, read the named parameter with duckdb_bind_get_named_parameter(info, c"step".as_ptr()).

Local init (per-thread state)

For multi-threaded table functions, use local_init to allocate per-thread state:

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("gen_series_v2")
    .param(TypeId::BigInt)
    .bind(gs_v2_bind)
    .init(gs_v2_init)
    .local_init(gs_v2_local_init)            // per-thread state allocation
    .scan(gs_v2_scan)
    .register(con)?;
}

The local init callback receives duckdb_init_info and can use FfiLocalInitData<T>::set to store per-thread state.

Thread control

Use InitInfo::set_max_threads in the global init callback to tell DuckDB how many threads can scan concurrently:

#![allow(unused)]
fn main() {
unsafe extern "C" fn gs_v2_init(info: duckdb_init_info) {
    let init_info = unsafe { InitInfo::new(info) };
    unsafe { init_info.set_max_threads(1) };
    unsafe { FfiInitData::<MyState>::set(info, MyState { pos: 0 }) };
}
}

Projection pushdown

Enable projection pushdown to let DuckDB skip unrequested columns:

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("my_func")
    .projection_pushdown(true)
    // ...
}

Caution: When projection pushdown is enabled, your scan callback must check which columns DuckDB actually needs using InitInfo::projected_column_count and InitInfo::projected_column_index. Writing to non-projected columns causes crashes.

See examples/hello-ext/src/lib.rs for a complete example using named_param, local_init, and set_max_threads.

Complex parameter types

For parameterised types that TypeId cannot express (e.g. LIST(BIGINT), MAP(VARCHAR, INTEGER), STRUCT(...)), use param_logical and named_param_logical:

#![allow(unused)]
fn main() {
use quack_rs::types::LogicalType;

TableFunctionBuilder::new("read_data")
    .param_logical(LogicalType::list(TypeId::Varchar))        // positional LIST param
    .named_param_logical("options", LogicalType::map(          // named MAP param
        TypeId::Varchar, TypeId::Varchar,
    ))
    .bind(bind_fn)
    .init(init_fn)
    .scan(scan_fn)
    .register(con)?;
}

BindInfo helpers

BindInfo wraps duckdb_bind_info and exposes these methods:

MethodDescription
add_result_column(name, TypeId)Declares an output column
add_result_column_with_type(name, &LogicalType)Output column with complex type
set_cardinality(rows, is_exact)Cardinality hint for the optimizer
set_error(message)Report a bind-time error
parameter_count()Number of positional parameters
get_parameter(index)Returns a positional parameter value (duckdb_value)
get_named_parameter(name)Returns a named parameter value (duckdb_value)
get_extra_info()Returns the extra-info pointer set on the function
get_client_context()Returns a ClientContext (requires duckdb-1-5 feature)

InitInfo helpers

InitInfo wraps duckdb_init_info:

MethodDescription
projected_column_count()Number of projected columns (with pushdown)
projected_column_index(idx)Output column index at projection position
set_max_threads(n)Maximum parallel scan threads
set_error(message)Report an init-time error
get_extra_info()Returns the extra-info pointer set on the function

FunctionInfo helpers

FunctionInfo wraps duckdb_function_info (scan callbacks):

MethodDescription
set_error(message)Report a scan-time error
get_extra_info()Returns the extra-info pointer set on the function

Extra info

Use TableFunctionBuilder::extra_info to attach function-level data that is accessible from all callbacks (bind, init, and scan) via get_extra_info().

Verified output (DuckDB 1.4.4 and 1.5.0)

SELECT * FROM generate_series_ext(5);
-- 0
-- 1
-- 2
-- 3
-- 4

SELECT value * value AS sq FROM generate_series_ext(4);
-- 0
-- 1
-- 4
-- 9

See also

Replacement Scans

A replacement scan lets users write:

SELECT * FROM 'myfile.myformat'

and have DuckDB automatically invoke your extension's table-valued scan instead of trying to open the path as a built-in file type. This is how DuckDB's built-in CSV, Parquet, and JSON readers work.

quack-rs provides ReplacementScanBuilder (a static registration helper) and ReplacementScanInfo (an ergonomic wrapper for callbacks).

Registration API

Unlike the other builders in quack-rs, ReplacementScanBuilder uses a single static call because the DuckDB C API takes all arguments at once:

#![allow(unused)]
fn main() {
use quack_rs::replacement_scan::ReplacementScanBuilder;

// Low-level: pass raw extra_data and an optional delete callback.
unsafe {
    ReplacementScanBuilder::register(
        db,                            // duckdb_database
        my_scan_callback,              // ReplacementScanFn
        std::ptr::null_mut(),          // extra_data (or a raw pointer)
        None,                          // delete_callback
    );
}

// Ergonomic: pass owned Rust data; boxing and destructor are handled for you.
unsafe {
    ReplacementScanBuilder::register_with_data(db, my_scan_callback, my_state);
}
}

Note: Replacement scans are registered on a database handle (duckdb_database), not a connection. Register them before opening connections.

Callback signature

The raw callback receives duckdb_replacement_scan_info, but you can wrap it with ReplacementScanInfo for ergonomic, safe access:

#![allow(unused)]
fn main() {
use quack_rs::replacement_scan::ReplacementScanInfo;

unsafe extern "C" fn my_scan_callback(
    info: duckdb_replacement_scan_info,
    table_name: *const ::std::os::raw::c_char,
    _data: *mut ::std::os::raw::c_void,
) {
    let path = unsafe { std::ffi::CStr::from_ptr(table_name) }
        .to_str()
        .unwrap_or("");

    if !path.ends_with(".myformat") {
        return; // pass — DuckDB will try other handlers
    }

    // Use ReplacementScanInfo for ergonomic access
    unsafe {
        ReplacementScanInfo::new(info)
            .set_function("read_myformat")
            .add_varchar_parameter(path);
    }
}
}

ReplacementScanInfo methods

MethodDescription
set_function(name)Redirect to the named table function
add_varchar_parameter(value)Add a VARCHAR parameter to the redirected call
set_error(message)Report an error (aborts this replacement scan)

When to use replacement scans vs table functions

ScenarioUse
SELECT * FROM my_function('file.ext')Table function
SELECT * FROM 'file.ext' (bare path)Replacement scan → delegates to a table function
File type auto-detectionReplacement scan

Most extensions implement both: a table function that does the actual work, and a replacement scan that detects the file extension and transparently routes bare-path queries to the table function.

See also

Cast Functions

Cast functions let your extension define how DuckDB converts values from one type to another. Once registered, both explicit CAST(x AS T) syntax and (optionally) implicit coercions will use your callback.

When to use cast functions

  • Your extension introduces a new logical type and needs CAST to/from standard types.
  • You want to override DuckDB's built-in cast behaviour for a specific type pair.
  • You need to control implicit cast priority relative to other registered casts.

Registering a cast

#![allow(unused)]
fn main() {
use quack_rs::cast::{CastFunctionBuilder, CastFunctionInfo, CastMode};
use quack_rs::types::TypeId;
use quack_rs::vector::{VectorReader, VectorWriter};
use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t};

unsafe extern "C" fn varchar_to_int(
    info: duckdb_function_info,
    count: idx_t,
    input: duckdb_vector,
    output: duckdb_vector,
) -> bool {
    let cast_info = unsafe { CastFunctionInfo::new(info) };
    let reader = unsafe { VectorReader::from_vector(input, count as usize) };
    let mut writer = unsafe { VectorWriter::new(output) };

    for row in 0..count as usize {
        if !unsafe { reader.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let s = unsafe { reader.read_str(row) };
        match s.parse::<i32>() {
            Ok(v) => unsafe { writer.write_i32(row, v) },
            Err(e) => {
                let msg = format!("cannot cast {:?} to INTEGER: {e}", s);
                if cast_info.cast_mode() == CastMode::Try {
                    // TRY_CAST: write NULL and record a per-row error
                    unsafe { cast_info.set_row_error(&msg, row as idx_t, output) };
                    unsafe { writer.set_null(row) };
                } else {
                    // Regular CAST: abort the whole query
                    unsafe { cast_info.set_error(&msg) };
                    return false;
                }
            }
        }
    }
    true
}

fn register(con: libduckdb_sys::duckdb_connection)
    -> Result<(), quack_rs::error::ExtensionError>
{
    unsafe {
        CastFunctionBuilder::new(TypeId::Varchar, TypeId::Integer)
            .function(varchar_to_int)
            .register(con)
    }
}
}

Implicit casts

Provide an implicit_cost to allow DuckDB to use the cast automatically in expressions where the types do not match:

#![allow(unused)]
fn main() {
use quack_rs::cast::CastFunctionBuilder;
use quack_rs::types::TypeId;
use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t};
unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true }
fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> {
unsafe {
    CastFunctionBuilder::new(TypeId::Varchar, TypeId::Integer)
        .function(my_cast)
        .implicit_cost(100) // lower = higher priority
        .register(con)
}
}
}

Extra info

Attach arbitrary data to a cast function using extra_info. This is useful for parameterising the cast behaviour (e.g., a rounding mode):

#![allow(unused)]
fn main() {
use quack_rs::cast::CastFunctionBuilder;
use quack_rs::types::TypeId;
use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t};
use std::os::raw::c_void;
unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true }
unsafe extern "C" fn my_destroy(_: *mut c_void) {}
fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> {
let mode = Box::into_raw(Box::new("round".to_string())).cast::<c_void>();
unsafe {
    CastFunctionBuilder::new(TypeId::Double, TypeId::BigInt)
        .function(my_cast)
        .implicit_cost(100)
        .extra_info(mode, Some(my_destroy))
        .register(con)
}
}
}

Inside the cast callback, retrieve the extra info with CastFunctionInfo::get_extra_info().

TRY_CAST vs CAST

Inside your callback, check [CastFunctionInfo::cast_mode()] to distinguish between the two modes:

ModeUser wroteExpected behaviour on error
CastMode::NormalCAST(x AS T)Call set_error and return false
CastMode::TryTRY_CAST(x AS T)Call set_row_error, write NULL, continue

Working example

The examples/hello-ext extension registers two cast functions:

  • CAST(VARCHAR AS INTEGER) / TRY_CAST(VARCHAR AS INTEGER) — basic cast
  • CAST(DOUBLE AS BIGINT) — with implicit_cost(100) and extra_info for rounding mode

See examples/hello-ext/src/lib.rs for complete, copy-paste-ready references.

Complex source and target types

For casts involving complex types like DECIMAL(18, 3) or LIST(VARCHAR), use the new_logical constructor instead of new:

#![allow(unused)]
fn main() {
use quack_rs::cast::CastFunctionBuilder;
use quack_rs::types::{LogicalType, TypeId};
use libduckdb_sys::{duckdb_function_info, duckdb_vector, idx_t};
unsafe extern "C" fn my_cast(_: duckdb_function_info, _: idx_t, _: duckdb_vector, _: duckdb_vector) -> bool { true }
fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), quack_rs::error::ExtensionError> {
unsafe {
    CastFunctionBuilder::new_logical(
        LogicalType::list(TypeId::Varchar),   // LIST(VARCHAR) source
        LogicalType::list(TypeId::Integer),   // LIST(INTEGER) target
    )
    .function(my_cast)
    .register(con)
}
}
}

The source() and target() accessor methods return Option<TypeId> — they return None when the type was set via new_logical (since a LogicalType cannot always be expressed as a simple TypeId).

API reference

  • [CastFunctionBuilder][quack_rs::cast::CastFunctionBuilder] — the main builder
  • [CastFunctionInfo][quack_rs::cast::CastFunctionInfo] — info handle inside callbacks
  • [CastMode][quack_rs::cast::CastMode] — Normal vs Try cast mode

NULL Handling

By default, DuckDB automatically propagates NULLs: if any argument to a function is NULL, the result is NULL without your function callback being called. This matches the SQL standard and works well for most functions.

However, some functions need to handle NULLs explicitly. For example:

  • COALESCE — returns the first non-NULL argument
  • IS_NULL / IS_NOT_NULL — tests whether the value is NULL
  • Custom aggregates that need to count NULLs

NullHandling enum

#![allow(unused)]
fn main() {
use quack_rs::types::NullHandling;

// Default: DuckDB auto-returns NULL for any NULL input
NullHandling::DefaultNullHandling

// Special: DuckDB passes NULLs to your callback
NullHandling::SpecialNullHandling
}

Scalar functions

#![allow(unused)]
fn main() {
use quack_rs::scalar::ScalarFunctionBuilder;
use quack_rs::types::{TypeId, NullHandling};

ScalarFunctionBuilder::new("my_coalesce")
    .param(TypeId::BigInt)
    .param(TypeId::BigInt)
    .returns(TypeId::BigInt)
    .null_handling(NullHandling::SpecialNullHandling)
    .function(my_coalesce_fn)
    .register(con)?;
}

With SpecialNullHandling, your callback must check VectorReader::is_valid(row) for each input column and handle NULLs yourself.


Aggregate functions

#![allow(unused)]
fn main() {
use quack_rs::aggregate::AggregateFunctionBuilder;
use quack_rs::types::{TypeId, NullHandling};

AggregateFunctionBuilder::new("count_with_nulls")
    .param(TypeId::BigInt)
    .returns(TypeId::BigInt)
    .null_handling(NullHandling::SpecialNullHandling)
    .state_size(my_state_size)
    .init(my_init)
    .update(my_update)   // will be called even for NULL rows
    .combine(my_combine)
    .finalize(my_finalize)
    .register(con)?;
}

When to use special NULL handling

Use caseNULL handling
Most scalar/aggregate functionsDefaultNullHandling (the default)
Functions that need to see NULLsSpecialNullHandling
COALESCE-like functionsSpecialNullHandling
NULL-counting aggregatesSpecialNullHandling

If you don't call .null_handling(), the default (DefaultNullHandling) is used automatically.

SQL Macros

SQL macros let you package reusable SQL expressions and queries as named DuckDB functions — no FFI callbacks required. quack-rs makes this pure Rust: you define the macro body as a string and call .register(con).


Two macro types

TypeSQL generatedReturns
ScalarCREATE OR REPLACE MACRO name(params) AS (expression)one value per row
TableCREATE OR REPLACE MACRO name(params) AS TABLE querya result set

Scalar macros

A scalar macro wraps a SQL expression. Think of it as a parameterized SQL alias:

#![allow(unused)]
fn main() {
use quack_rs::sql_macro::SqlMacro;

fn register(con: duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        // clamp(x, lo, hi) → greatest(lo, least(hi, x))
        SqlMacro::scalar("clamp", &["x", "lo", "hi"], "greatest(lo, least(hi, x))")?
            .register(con)?;

        // pi() → 3.14159265358979
        SqlMacro::scalar("pi", &[], "3.14159265358979")?
            .register(con)?;

        // safe_div(a, b) → CASE WHEN b = 0 THEN NULL ELSE a / b END
        SqlMacro::scalar(
            "safe_div",
            &["a", "b"],
            "CASE WHEN b = 0 THEN NULL ELSE a / b END",
        )?
        .register(con)?;
    }
    Ok(())
}
}

Use in DuckDB:

SELECT clamp(rating, 1, 5) FROM reviews;
SELECT safe_div(revenue, orders) FROM monthly_stats;

Table macros

A table macro wraps a SQL query that returns rows:

#![allow(unused)]
fn main() {
unsafe {
    // active_users(tbl) → SELECT * FROM tbl WHERE active = true
    SqlMacro::table(
        "active_users",
        &["tbl"],
        "SELECT * FROM tbl WHERE active = true",
    )?
    .register(con)?;

    // recent_orders(days) → last N days of orders
    SqlMacro::table(
        "recent_orders",
        &["days"],
        "SELECT * FROM orders WHERE order_date >= current_date - INTERVAL (days) DAY",
    )?
    .register(con)?;
}
}

Use in DuckDB:

SELECT * FROM active_users(users);
SELECT count(*) FROM recent_orders(7);

Inspecting the generated SQL

to_sql() returns the CREATE OR REPLACE MACRO statement without requiring a live connection. Use it for logging, debugging, or assertions in tests:

#![allow(unused)]
fn main() {
let m = SqlMacro::scalar("add", &["a", "b"], "a + b")?;
assert_eq!(
    m.to_sql(),
    "CREATE OR REPLACE MACRO add(a, b) AS (a + b)"
);

let t = SqlMacro::table("active_users", &["tbl"], "SELECT * FROM tbl WHERE active = true")?;
assert_eq!(
    t.to_sql(),
    "CREATE OR REPLACE MACRO active_users(tbl) AS TABLE SELECT * FROM tbl WHERE active = true"
);
}

Name and parameter validation

Macro names and parameter names are validated against the same rules as function names:

  • Must match [a-z_][a-z0-9_]*
  • Not exceed 256 characters
  • No null bytes
#![allow(unused)]
fn main() {
SqlMacro::scalar("MyMacro", &[], "1")   // ❌ Err — uppercase
SqlMacro::scalar("my-macro", &[], "1") // ❌ Err — hyphen
SqlMacro::scalar("f", &["X"], "1")     // ❌ Err — uppercase param
SqlMacro::scalar("f", &["_x"], "1")    // ✅ Ok  — underscore prefix allowed
}

SQL injection safety

Macro and parameter names are restricted to [a-z_][a-z0-9_]*, preventing SQL injection at the identifier level. They are interpolated literally (no quoting required, since the character set is already safe).

The body (expression or query) is your own extension code — it is included verbatim. Never build macro bodies from untrusted user input.


How it works under the hood

SqlMacro::register executes the CREATE OR REPLACE MACRO statement via duckdb_query:

#![allow(unused)]
fn main() {
pub unsafe fn register(self, con: duckdb_connection) -> Result<(), ExtensionError> {
    let sql = self.to_sql();
    unsafe { execute_sql(con, &sql) }
}
}

execute_sql zero-initializes a duckdb_result, calls duckdb_query, extracts any error message via duckdb_result_error, and always calls duckdb_destroy_result — even on failure.


Choosing between macros and scalar functions

ScenarioUse
Logic expressible in SQLSQL macro — simpler, no FFI
Logic needs Rust code (algorithms, external crates, etc.)Scalar function
Best performance for simple expressionsSQL macro (no FFI overhead)
Type-specific overloadsScalar function with multiple registrations
Returning a tableSQL table macro

Copy Functions

Requires the duckdb-1-5 feature flag (DuckDB 1.5.0+).

Copy functions let you implement custom COPY TO file format handlers. When a user runs COPY table TO 'file.xyz' (FORMAT my_format), DuckDB invokes your extension's bind, init, sink, and finalize callbacks.

Lifecycle

  1. Bind — called once. Inspect output columns, configure the export.
  2. Global init — called once. Open the output file, allocate global state.
  3. Sink — called once per data chunk. Write rows to the output.
  4. Finalize — called once. Flush buffers, close the file.

Builder API

#![allow(unused)]
fn main() {
use quack_rs::copy_function::CopyFunctionBuilder;

let builder = CopyFunctionBuilder::try_new("my_format")?
    .bind(my_bind_fn)
    .global_init(my_global_init_fn)
    .sink(my_sink_fn)
    .finalize(my_finalize_fn);

// Register on a connection (inside entry_point_v2! callback):
// unsafe { builder.register(con)?; }
Ok::<(), quack_rs::error::ExtensionError>(())
}

Callback signatures

PhaseSignature
Bindunsafe extern "C" fn(info: duckdb_copy_function_bind_info)
Global initunsafe extern "C" fn(info: duckdb_copy_function_global_init_info)
Sinkunsafe extern "C" fn(info: duckdb_copy_function_sink_info, chunk: duckdb_data_chunk)
Finalizeunsafe extern "C" fn(info: duckdb_copy_function_finalize_info)

Callback info wrappers

Each phase provides an ergonomic wrapper type around its raw info handle. Wrap the handle at the top of your callback to access helper methods:

CopyBindInfo

MethodDescription
column_count()Number of output columns
column_type(index)LogicalType of the column at index
get_extra_info()Extra-info pointer set on the copy function
set_bind_data(data, destroy)Store bind data and its destructor
set_error(message)Report a bind-time error
get_client_context()Returns a ClientContext for catalog/config access

CopyGlobalInitInfo

MethodDescription
get_bind_data()Retrieve the bind data pointer
get_extra_info()Extra-info pointer set on the copy function
get_file_path()Output file path for the COPY operation
set_global_state(state, destroy)Store global state and its destructor
set_error(message)Report an init-time error
get_client_context()Returns a ClientContext

CopySinkInfo

MethodDescription
get_bind_data()Retrieve the bind data pointer
get_extra_info()Extra-info pointer set on the copy function
get_global_state()Retrieve the global state pointer
set_error(message)Report a sink-time error
get_client_context()Returns a ClientContext

CopyFinalizeInfo

MethodDescription
get_bind_data()Retrieve the bind data pointer
get_extra_info()Extra-info pointer set on the copy function
get_global_state()Retrieve the global state pointer
set_error(message)Report a finalize-time error
get_client_context()Returns a ClientContext

All four wrappers are re-exported from quack_rs::copy_function:

#![allow(unused)]
fn main() {
use quack_rs::copy_function::{CopyBindInfo, CopyGlobalInitInfo, CopySinkInfo, CopyFinalizeInfo};
}

Reading & Writing Vectors

DuckDB passes data to and from your extension as vectors — columnar arrays of typed values, with a separate NULL bitmap. VectorReader and VectorWriter provide safe, typed access to these vectors.


VectorReader

Construction

#![allow(unused)]
fn main() {
// In a scalar function callback:
let reader = unsafe { VectorReader::new(input, column_index) };

// In an aggregate update callback:
let reader = unsafe { VectorReader::new(input, 0) };   // first column
}

VectorReader::new takes the duckdb_data_chunk and a zero-based column index. The reader borrows the chunk — it must not outlive the callback.

Row count

#![allow(unused)]
fn main() {
let n = reader.row_count();   // number of rows in this chunk
}

Chunk sizes vary. Always loop from 0..reader.row_count(), never assume a fixed size.

NULL check

#![allow(unused)]
fn main() {
if unsafe { !reader.is_valid(row) } {
    // row is NULL — skip or propagate NULL to output
    unsafe { writer.set_null(row) };
    continue;
}
}

Always check is_valid before reading. Reading from a NULL row returns garbage data.

Reading values

#![allow(unused)]
fn main() {
let i: i8  = unsafe { reader.read_i8(row) };
let i: i16 = unsafe { reader.read_i16(row) };
let i: i32 = unsafe { reader.read_i32(row) };
let i: i64 = unsafe { reader.read_i64(row) };
let u: u8  = unsafe { reader.read_u8(row) };
let u: u16 = unsafe { reader.read_u16(row) };
let u: u32 = unsafe { reader.read_u32(row) };
let u: u64 = unsafe { reader.read_u64(row) };
let f: f32 = unsafe { reader.read_f32(row) };
let f: f64 = unsafe { reader.read_f64(row) };
let b: bool = unsafe { reader.read_bool(row) };   // safe: uses u8 != 0
let s: &str = unsafe { reader.read_str(row) };    // handles inline + pointer format
let iv = unsafe { reader.read_interval(row) };    // returns DuckInterval
}

VectorWriter

Construction

#![allow(unused)]
fn main() {
// In a scalar function callback:
let mut writer = unsafe { VectorWriter::new(output) };

// In an aggregate finalize callback:
let mut writer = unsafe { VectorWriter::new(result) };
}

Writing values

#![allow(unused)]
fn main() {
unsafe { writer.write_i8(row, value) };
unsafe { writer.write_i16(row, value) };
unsafe { writer.write_i32(row, value) };
unsafe { writer.write_i64(row, value) };
unsafe { writer.write_u8(row, value) };
unsafe { writer.write_u16(row, value) };
unsafe { writer.write_u32(row, value) };
unsafe { writer.write_u64(row, value) };
unsafe { writer.write_f32(row, value) };
unsafe { writer.write_f64(row, value) };
unsafe { writer.write_bool(row, value) };
unsafe { writer.write_varchar(row, s) };   // &str
unsafe { writer.write_interval(row, interval) };  // DuckInterval
}

Writing NULL

#![allow(unused)]
fn main() {
unsafe { writer.set_null(row) };
}

Pitfall L4: set_null calls duckdb_vector_ensure_validity_writable automatically before accessing the validity bitmap. Calling duckdb_vector_get_validity without this prerequisite returns an uninitialized pointer → SEGFAULT. VectorWriter::set_null handles this correctly. See Pitfall L4.


Utility functions

The quack_rs::vector module provides two utility functions:

#![allow(unused)]
fn main() {
use quack_rs::vector::{vector_size, vector_get_column_type};

// Returns the default vector size used by DuckDB (typically 2048).
let size: u64 = vector_size();

// Returns the LogicalType of a vector (unsafe — requires a valid duckdb_vector).
let lt = unsafe { vector_get_column_type(some_vector) };
}

Memory layout details

DuckDB stores vector data as flat arrays. VectorReader and VectorWriter compute element addresses as base_ptr + row * stride:

[value0][value1][value2]...[valueN]   ← typed array
[validity bitmap]                      ← separate bit array, 1 bit per row

The validity bitmap is lazily allocated — it may be null if no NULLs have been written. This is why ensure_validity_writable must be called before any get_validity call that follows a write path.


Complete scalar function pattern

#![allow(unused)]
fn main() {
unsafe extern "C" fn my_scalar(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };

    for row in 0..reader.row_count() {
        if unsafe { !reader.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let value = unsafe { reader.read_i64(row) };
        unsafe { writer.write_i64(row, transform(value)) };
    }
}
}

Complex Types: STRUCT, LIST, MAP, ARRAY

DuckDB's complex types — STRUCT, LIST, MAP, and ARRAY — are stored as nested vectors. quack-rs provides four helper types in vector::complex to access the child vectors without manual offset arithmetic.

Overview

DuckDB typeStoragequack-rs helper
STRUCT{a T, b U, …}Parent vector + N child vectors (one per field)StructVector
LIST<T>Parent vector holds {offset, length} per row; flat child vector holds elementsListVector
MAP<K, V>Stored as LIST<STRUCT{key K, value V}>MapVector
ARRAY<T>[N]Fixed-size array; single child vectorArrayVector

Reading complex types (input vectors)

STRUCT

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorReader, complex::StructVector};

// Inside a scan or finalize callback:
// parent_vec comes from duckdb_data_chunk_get_vector(chunk, col_idx)
let x_reader = unsafe { StructVector::field_reader(parent_vec, 0, row_count) };
let y_reader = unsafe { StructVector::field_reader(parent_vec, 1, row_count) };

for row in 0..row_count {
    if unsafe { x_reader.is_valid(row) } {
        let x: f64 = unsafe { x_reader.read_f64(row) };
        let y: f64 = unsafe { y_reader.read_f64(row) };
        // process (x, y) …
    }
}
}

LIST

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorReader, complex::ListVector};

let total_elements = unsafe { ListVector::get_size(list_vec) };
let elem_reader = unsafe { ListVector::child_reader(list_vec, total_elements) };

for row in 0..row_count {
    let entry = unsafe { ListVector::get_entry(list_vec, row) };
    for i in 0..entry.length as usize {
        let elem_idx = entry.offset as usize + i;
        if unsafe { elem_reader.is_valid(elem_idx) } {
            let val: i64 = unsafe { elem_reader.read_i64(elem_idx) };
            // process val …
        }
    }
}
}

MAP

MAP is LIST<STRUCT{key, value}>. Access keys and values via the inner struct:

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorReader, complex::MapVector};

let total = unsafe { MapVector::total_entry_count(map_vec) };
let key_reader   = unsafe { VectorReader::from_vector(MapVector::keys(map_vec), total) };
let value_reader = unsafe { VectorReader::from_vector(MapVector::values(map_vec), total) };

for row in 0..row_count {
    let entry = unsafe { MapVector::get_entry(map_vec, row) };
    for i in 0..entry.length as usize {
        let idx = entry.offset as usize + i;
        let k = unsafe { key_reader.read_str(idx) };
        let v: i64 = unsafe { value_reader.read_i64(idx) };
        // process (k, v) …
    }
}
}

Writing complex types (output vectors)

STRUCT

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorWriter, complex::StructVector};

let mut x_writer = unsafe { StructVector::field_writer(out_vec, 0) };
let mut y_writer = unsafe { StructVector::field_writer(out_vec, 1) };

for row in 0..batch_size {
    unsafe { x_writer.write_f64(row, x_values[row]) };
    unsafe { y_writer.write_f64(row, y_values[row]) };
}
}

LIST

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorWriter, complex::ListVector};

let total_elements: usize = rows.iter().map(|r| r.len()).sum();
unsafe { ListVector::reserve(list_vec, total_elements) };

let mut child_writer = unsafe { ListVector::child_writer(list_vec) };
let mut offset = 0usize;
for (row, elements) in rows.iter().enumerate() {
    for (i, &val) in elements.iter().enumerate() {
        unsafe { child_writer.write_i64(offset + i, val) };
    }
    unsafe { ListVector::set_entry(list_vec, row, offset as u64, elements.len() as u64) };
    offset += elements.len();
}
unsafe { ListVector::set_size(list_vec, total_elements) };
}

MAP

The MAP write workflow is identical to LIST, but keys and values are written into the two struct child vectors:

#![allow(unused)]
fn main() {
use quack_rs::vector::{VectorWriter, complex::MapVector};

unsafe { MapVector::reserve(map_vec, total_pairs) };

let mut key_writer   = unsafe { VectorWriter::from_vector(MapVector::keys(map_vec)) };
let mut val_writer   = unsafe { VectorWriter::from_vector(MapVector::values(map_vec)) };
let mut offset = 0usize;
for (row, pairs) in all_pairs.iter().enumerate() {
    for (i, (k, v)) in pairs.iter().enumerate() {
        unsafe { key_writer.write_varchar(offset + i, k) };
        unsafe { val_writer.write_i64(offset + i, *v) };
    }
    unsafe { MapVector::set_entry(map_vec, row, offset as u64, pairs.len() as u64) };
    offset += pairs.len();
}
unsafe { MapVector::set_size(map_vec, total_pairs) };
}

Constructing complex logical types

Use LogicalType constructors to define complex column types. Each constructor has a variant that accepts TypeId values (for simple element types) and a _from_logical variant (for nested complex types):

Constructor_from_logical variantCreates
LogicalType::list(TypeId)list_from_logical(&LogicalType)LIST<T>
LogicalType::map(TypeId, TypeId)map_from_logical(&LogicalType, &LogicalType)MAP<K, V>
LogicalType::struct_type(&[(&str, TypeId)])struct_type_from_logical(&[(&str, LogicalType)])STRUCT{...}
LogicalType::union_type(&[(&str, TypeId)])union_type_from_logical(&[(&str, LogicalType)])UNION(...)
LogicalType::array(TypeId, u64)array_from_logical(&LogicalType, u64)ARRAY<T>[N]
LogicalType::enum_type(&[&str])ENUM(...)
LogicalType::decimal(u8, u8)DECIMAL(w, s)

API reference

All helpers are in quack_rs::vector::complex (re-exported from quack_rs::prelude).

StructVector

MethodDescription
get_child(vec, field_idx)Returns the raw child vector for field field_idx
field_reader(vec, field_idx, row_count)Creates a VectorReader for a STRUCT field
field_writer(vec, field_idx)Creates a VectorWriter for a STRUCT field

ListVector

MethodDescription
get_child(vec)Returns the flat element child vector
get_size(vec)Total number of elements across all rows
set_size(vec, n)Sets the number of elements after writing
reserve(vec, capacity)Reserves capacity in the child vector
get_entry(vec, row)Returns {offset, length} for a row (reading)
set_entry(vec, row, offset, length)Sets {offset, length} for a row (writing)
child_reader(vec, count)Creates a VectorReader for the element vector
child_writer(vec)Creates a VectorWriter for the element vector

MapVector

MethodDescription
struct_child(vec)Returns the inner STRUCT vector
keys(vec)Returns the key vector (STRUCT field 0)
values(vec)Returns the value vector (STRUCT field 1)
total_entry_count(vec)Total key-value pairs
reserve(vec, n)Reserves capacity
set_size(vec, n)Sets total entry count after writing
get_entry(vec, row)Returns {offset, length} for a row (reading)
set_entry(vec, row, offset, length)Sets {offset, length} for a row (writing)

ArrayVector

MethodDescription
get_child(vec)Returns the child vector of a fixed-size ARRAY vector

NULL Handling & Strings

This page covers two topics that are handled together in practice: checking for NULL before reading, and reading VARCHAR values from DuckDB vectors.


NULL checks

Every row in a DuckDB vector may be NULL. Always check validity before reading:

#![allow(unused)]
fn main() {
for row in 0..reader.row_count() {
    if unsafe { !reader.is_valid(row) } {
        // Propagate NULL to output
        unsafe { writer.set_null(row) };
        continue;
    }
    // Safe to read
    let value = unsafe { reader.read_str(row) };
}
}

Reading from a NULL row returns garbage data — the vector's data buffer is not zeroed at NULL positions. There is no bounds check or error; you get random bytes from the data buffer.

Writing NULL

#![allow(unused)]
fn main() {
unsafe { writer.set_null(row) };
}

Pitfall L4: VectorWriter::set_null calls duckdb_vector_ensure_validity_writable before accessing the validity bitmap. Calling duckdb_vector_get_validity without this prerequisite returns an uninitialized pointer → SEGFAULT. Never write NULL manually; always use set_null. See Pitfall L4.


VARCHAR reading

Read VARCHAR columns with VectorReader::read_str:

#![allow(unused)]
fn main() {
let s: &str = unsafe { reader.read_str(row) };
}

The returned &str borrows from the DuckDB vector — it must not outlive the callback. Do not store it in a struct; clone it to a String if you need to keep it.

The duckdb_string_t format

Pitfall P7 — The duckdb_string_t format is not documented in the Rust bindings. This is the internalized knowledge encoded in quack-rs.

DuckDB stores VARCHAR values in a 16-byte duckdb_string_t struct with two representations, selected at runtime based on string length:

FormatConditionLayout
Inlinelength ≤ 12[len: u32][data: [u8; 12]]
Pointerlength > 12[len: u32][prefix: [u8; 4]][ptr: *const u8][unused: u32]

VectorReader::read_str and the underlying read_duck_string function handle both formats transparently. You never need to inspect the raw struct.

Empty strings vs NULL

An empty string ("") and NULL are distinct values:

#![allow(unused)]
fn main() {
// NULL: is_valid returns false
// Empty string: is_valid returns true, read_str returns ""
if unsafe { !reader.is_valid(row) } {
    // This is NULL
} else {
    let s = unsafe { reader.read_str(row) };
    if s.is_empty() {
        // This is an empty string, not NULL
    }
}
}

Writing VARCHAR

#![allow(unused)]
fn main() {
unsafe { writer.write_varchar(row, my_str) };  // &str
}

write_varchar copies the string bytes into DuckDB's managed storage. The &str reference is no longer needed after the call returns.


Complete NULL-safe VARCHAR pattern

#![allow(unused)]
fn main() {
unsafe extern "C" fn my_scalar(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };

    for row in 0..reader.row_count() {
        if unsafe { !reader.is_valid(row) } {
            unsafe { writer.set_null(row) };
            continue;
        }
        let s = unsafe { reader.read_str(row) };
        let upper = s.to_uppercase();
        unsafe { writer.write_varchar(row, &upper) };
    }
}
}

DuckStringView

For advanced use cases where you need access to the raw string bytes or the inline/pointer distinction, quack_rs::vector::string::DuckStringView is available:

#![allow(unused)]
fn main() {
use quack_rs::vector::string::{DuckStringView, DUCK_STRING_SIZE};

// From raw 16-byte data (inside a vector callback)
let raw: &[u8; 16] = unsafe { &*data.add(idx * DUCK_STRING_SIZE).cast() };
let view = DuckStringView::from_bytes(raw);

println!("length: {}", view.len());
println!("is_empty: {}", view.is_empty());
if let Some(s) = view.as_str() {
    println!("content: {s}");
}
}

In practice, prefer reader.read_str(row)DuckStringView is only needed when you have a raw pointer and want to avoid creating a full VectorReader.


Constants

ConstantValueMeaning
DUCK_STRING_SIZE16Size of one duckdb_string_t in bytes
DUCK_STRING_INLINE_MAX_LEN12Max length stored inline (no heap ptr)

INTERVAL Type

DuckDB's INTERVAL type represents a duration with three independent components: months, days, and sub-day microseconds. The quack_rs::interval module provides the DuckInterval struct and safe conversion utilities.


Why a custom struct?

Pitfall P8 — The INTERVAL struct layout and its conversion semantics are not documented in the Rust bindings. This module encodes that knowledge.

DuckDB's C duckdb_interval struct is 16 bytes with this exact layout:

offset 0:  months (i32)  — calendar months
offset 4:  days   (i32)  — calendar days
offset 8:  micros (i64)  — sub-day microseconds
total:     16 bytes

DuckInterval is #[repr(C)] with the same field order and is verified at compile time to be exactly 16 bytes.


Reading INTERVAL values

#![allow(unused)]
fn main() {
let iv: DuckInterval = unsafe { reader.read_interval(row) };
println!("{} months, {} days, {} µs", iv.months, iv.days, iv.micros);
}

VectorReader::read_interval handles the raw pointer arithmetic and alignment using read_interval_at internally.


DuckInterval fields

#![allow(unused)]
fn main() {
use quack_rs::interval::DuckInterval;

let iv = DuckInterval {
    months: 1,    // 1 calendar month
    days: 15,     // 15 calendar days
    micros: 3600_000_000,  // 1 hour in microseconds
};
}

Fields are public and can be constructed directly.

Zero interval

#![allow(unused)]
fn main() {
let zero = DuckInterval::zero();    // { months: 0, days: 0, micros: 0 }
let zero = DuckInterval::default(); // same
}

Converting to microseconds

Intervals are not directly comparable because months and days have variable lengths in wall-clock time. When you need a single numeric value, convert to microseconds using the DuckDB approximation: 1 month = 30 days.

Checked conversion (returns Option)

#![allow(unused)]
fn main() {
use quack_rs::interval::interval_to_micros;

let iv = DuckInterval { months: 0, days: 1, micros: 500_000 };
match interval_to_micros(iv) {
    Some(us) => println!("{us} microseconds"),
    None => println!("overflow"),
}

// Method form:
let us: Option<i64> = iv.to_micros();
}

Returns None if the result would overflow i64. This can happen with extreme values (e.g., months: i32::MAX).

Saturating conversion (never panics)

#![allow(unused)]
fn main() {
use quack_rs::interval::interval_to_micros_saturating;

let iv = DuckInterval { months: i32::MAX, days: i32::MAX, micros: i64::MAX };
let us: i64 = interval_to_micros_saturating(iv); // i64::MAX

// Method form:
let us: i64 = iv.to_micros_saturating();
}

Use the saturating form in FFI callbacks where panics are not allowed.


Conversion constants

ConstantValueMeaning
MICROS_PER_DAY86_400_000_000Microseconds in 24 hours
MICROS_PER_MONTH2_592_000_000_000Microseconds in 30 days
#![allow(unused)]
fn main() {
use quack_rs::interval::{MICROS_PER_DAY, MICROS_PER_MONTH};

assert_eq!(MICROS_PER_DAY, 86_400 * 1_000_000);
assert_eq!(MICROS_PER_MONTH, 30 * MICROS_PER_DAY);
}

Low-level: read_interval_at

If you have a raw data pointer (e.g., from duckdb_vector_get_data), you can read an interval directly:

#![allow(unused)]
fn main() {
use quack_rs::interval::read_interval_at;

// SAFETY: data is a valid DuckDB INTERVAL vector data pointer, idx is in bounds.
let iv = unsafe { read_interval_at(data_ptr, row_idx) };
}

In practice you should use VectorReader::read_interval(row) instead, which handles all safety invariants.


Complete example: aggregate over INTERVAL

#![allow(unused)]
fn main() {
#[derive(Default)]
struct TotalDurationState {
    total_micros: i64,
}
impl AggregateState for TotalDurationState {}

unsafe extern "C" fn update(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    states: *mut duckdb_aggregate_state,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    for row in 0..reader.row_count() {
        if unsafe { !reader.is_valid(row) } { continue; }
        let iv = unsafe { reader.read_interval(row) };
        let us = iv.to_micros_saturating();
        let state_ptr = unsafe { *states.add(row) };
        if let Some(st) = unsafe { FfiState::<TotalDurationState>::with_state_mut(state_ptr) } {
            st.total_micros = st.total_micros.saturating_add(us);
        }
    }
}
}

Memory layout verification

DuckInterval includes a compile-time assertion that validates its size and alignment against DuckDB's C struct. If the assertion fails, the crate will not compile — catching any future mismatch at build time rather than runtime.

Testing Guide

quack-rs provides a two-tier testing strategy: pure-Rust unit tests for business logic (no DuckDB required), and SQLLogicTest E2E tests that run inside an actual DuckDB process.


Architectural limitation: the loadable-extension dispatch wall

This is the most important thing to understand before writing tests.

DuckDB loadable extensions use libduckdb-sys with features = ["loadable-extension"]. This intentionally does not link the DuckDB runtime into the extension binary. Instead, every DuckDB C API call (duckdb_vector_get_data, duckdb_create_logical_type, etc.) goes through a lazy dispatch table — a global struct of AtomicPtr<fn> pointers initialized only when DuckDB calls duckdb_rs_extension_api_init at extension-load time.

In cargo test, no DuckDB process loads your extension. The dispatch table is never initialized, and the first call to any DuckDB C API function panics:

DuckDB API not initialized

What this breaks

APIWhy it fails
VectorReader::newcalls duckdb_vector_get_data
VectorWriter::newcalls duckdb_vector_get_data
Connection::register_*calls DuckDB registration C API
LogicalType::newcalls duckdb_create_logical_type
LogicalType::dropcalls duckdb_destroy_logical_type
BindInfo::add_result_columncalls duckdb_bind_add_result_column

What still works in cargo test

APIWhy it works
AggregateTestHarnesspure Rust, zero DuckDB dependency
MockVectorWriter / MockVectorReaderin-memory buffers, zero DuckDB dependency
MockRegistrarrecords registrations without calling C API
SqlMacro::to_sql()generates SQL strings, no DuckDB needed
interval_to_microspure arithmetic
validate / scaffoldpure Rust
InMemoryDbuses bundled DuckDB via duckdb crate (bundled-test feature)

Mock types for callback logic

When your scalar or table function callback reads inputs and writes outputs, extract that logic into a pure-Rust function. Then test it with MockVectorReader (input) and MockVectorWriter (output):

#![allow(unused)]
fn main() {
use quack_rs::testing::{MockVectorReader, MockVectorWriter};

// Pure Rust logic — extracted from the FFI callback
fn compute_upper(reader: &MockVectorReader, writer: &mut MockVectorWriter) {
    for i in 0..reader.row_count() {
        if reader.is_valid(i) {
            let s = reader.try_get_str(i).unwrap_or("");
            writer.write_varchar(i, &s.to_uppercase());
        } else {
            writer.set_null(i);
        }
    }
}

#[test]
fn test_compute_upper() {
    let reader = MockVectorReader::from_strs([Some("hello"), None, Some("world")]);
    let mut writer = MockVectorWriter::new(3);
    compute_upper(&reader, &mut writer);

    assert_eq!(writer.try_get_str(0), Some("HELLO"));
    assert!(writer.is_null(1));
    assert_eq!(writer.try_get_str(2), Some("WORLD"));
}
}

The real FFI callback becomes a thin wrapper:

#![allow(unused)]
fn main() {
unsafe extern "C" fn my_scalar(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    // Real DuckDB wrappers — only used in production, not in cargo test
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };
    // TODO: adapt mock-compatible logic to real readers/writers
}
}

Testing registration with MockRegistrar

MockRegistrar implements the Registrar trait without calling any DuckDB C API. Use it to verify your registration function registers the right set of functions:

#![allow(unused)]
fn main() {
use quack_rs::connection::Registrar;
use quack_rs::testing::MockRegistrar;
use quack_rs::scalar::ScalarFunctionBuilder;
use quack_rs::types::TypeId;
use quack_rs::error::ExtensionError;

fn register_all(reg: &impl Registrar) -> Result<(), ExtensionError> {
    let upper = ScalarFunctionBuilder::new("upper_ext")
        .param(TypeId::Varchar)
        .returns(TypeId::Varchar);
    let lower = ScalarFunctionBuilder::new("lower_ext")
        .param(TypeId::Varchar)
        .returns(TypeId::Varchar);
    unsafe {
        reg.register_scalar(upper)?;
        reg.register_scalar(lower)?;
    }
    Ok(())
}

#[test]
fn test_register_all() {
    let mock = MockRegistrar::new();
    register_all(&mock).unwrap();
    assert_eq!(mock.total_registrations(), 2);
    assert!(mock.has_scalar("upper_ext"));
    assert!(mock.has_scalar("lower_ext"));
}
}

Limitation: MockRegistrar cannot be used with builders that hold LogicalType values (created via .returns_logical() or .param_logical()), because LogicalType::drop calls duckdb_destroy_logical_type. Use TypeId parameters with MockRegistrar.


SQL-level testing with InMemoryDb (bundled-test feature)

For SQL-level assertions — verifying that a SQL macro produces the correct output, or that a CREATE TABLE + INSERT + SELECT pipeline works — enable the bundled-test Cargo feature. This provides InMemoryDb, which wraps the duckdb crate's bundled DuckDB and automatically initialises the loadable-extension dispatch table before opening a connection (see Pitfall P9):

# In your extension's Cargo.toml
[dev-dependencies]
quack-rs = { version = "0.7", features = ["bundled-test"] }

Build time: enabling bundled-test compiles a full copy of DuckDB from source (the duckdb Rust crate with features = ["bundled"]) and a small C++ shim via the cc build dependency. Expect a 2–5 minute incremental build the first time, depending on your machine. This only affects the test build — it has no impact on your extension's release binary.

#![allow(unused)]
fn main() {
#[cfg(feature = "bundled-test")]
use quack_rs::testing::InMemoryDb;
use quack_rs::sql_macro::SqlMacro;

#[test]
fn test_clamp_macro_sql() {
    let db = InMemoryDb::open().unwrap();

    // Generate and execute the CREATE MACRO SQL
    let m = SqlMacro::scalar("clamp", &["x", "lo", "hi"], "greatest(lo, least(hi, x))").unwrap();
    db.execute_batch(&m.to_sql()).unwrap();

    // Verify correct output
    let result: i64 = db.query_one("SELECT clamp(5, 1, 10)").unwrap();
    assert_eq!(result, 5);

    let clamped: i64 = db.query_one("SELECT clamp(15, 1, 10)").unwrap();
    assert_eq!(clamped, 10);
}
}

Note: InMemoryDb cannot test your FFI callbacks (VectorReader, VectorWriter) because those still route through the loadable-extension dispatch. Use InMemoryDb for SQL logic and mocks for callback logic.


Why two tiers?

Pitfall P3 — Unit tests are insufficient. 435 unit tests passed in duckdb-behavioral while the extension had three critical bugs: a SEGFAULT on load, 6 of 7 functions not registering, and wrong results from a combine bug. E2E tests caught all three.

Test tierWhat it catchesWhat it misses
Unit testsLogic bugs in state structsFFI wiring, registration failures, SEGFAULT
E2E testsEverything above + FFI integrationNothing (it's real DuckDB)

Both tiers are required. Unit tests give fast, deterministic feedback. E2E tests prove the extension actually works inside DuckDB.


Unit tests with AggregateTestHarness

AggregateTestHarness<S> simulates the DuckDB aggregate lifecycle in pure Rust without any DuckDB dependency:

flowchart LR
    N["new()"] --> U["update() × N"]
    U --> C["combine() *(optional)*"]
    C --> F["finalize()"]

Basic usage

#![allow(unused)]
fn main() {
use quack_rs::testing::AggregateTestHarness;
use quack_rs::aggregate::AggregateState;

#[derive(Default, Debug, PartialEq)]
struct SumState { total: i64 }
impl AggregateState for SumState {}

#[test]
fn test_sum() {
    let mut h = AggregateTestHarness::<SumState>::new();
    h.update(|s| s.total += 10);
    h.update(|s| s.total += 20);
    h.update(|s| s.total += 5);
    assert_eq!(h.finalize().total, 35);
}
}

Convenience: aggregate

For testing over a collection of inputs:

#![allow(unused)]
fn main() {
#[test]
fn test_word_count() {
    let result = AggregateTestHarness::<WordCountState>::aggregate(
        ["hello world", "one", "two three four", ""],
        |s, text| s.count += count_words(text),
    );
    assert_eq!(result.count, 6);  // 2 + 1 + 3 + 0
}
}

Testing combine (Pitfall L1)

DuckDB creates fresh zero-initialized target states and calls combine to merge into them. You MUST propagate ALL fields — including configuration fields — not just accumulated data. Test this explicitly:

#![allow(unused)]
fn main() {
#[test]
fn combine_propagates_config() {
    let mut h1 = AggregateTestHarness::<MyState>::new();
    h1.update(|s| {
        s.window_size = 3600;  // config field
        s.count += 5;          // data field
    });

    // h2 simulates a fresh zero-initialized state created by DuckDB
    let mut h2 = AggregateTestHarness::<MyState>::new();

    h2.combine(&h1, |src, tgt| {
        tgt.window_size = src.window_size;  // MUST propagate config
        tgt.count += src.count;
    });

    let result = h2.finalize();
    assert_eq!(result.window_size, 3600);  // Would be 0 if forgotten
    assert_eq!(result.count, 5);
}
}

Inspecting intermediate state

#![allow(unused)]
fn main() {
let mut h = AggregateTestHarness::<SumState>::new();
h.update(|s| s.total += 5);
assert_eq!(h.state().total, 5);   // borrow without consuming
h.update(|s| s.total += 3);
assert_eq!(h.state().total, 8);
}

Resetting

#![allow(unused)]
fn main() {
let mut h = AggregateTestHarness::<SumState>::new();
h.update(|s| s.total = 999);
h.reset();
assert_eq!(h.state().total, 0);  // back to S::default()
}

Pre-populating state

#![allow(unused)]
fn main() {
let initial = MyState { window_size: 3600, count: 0 };
let h = AggregateTestHarness::with_state(initial);
}

Unit tests for scalar functions

Scalar logic is pure Rust — test it directly:

#![allow(unused)]
fn main() {
// From examples/hello-ext/src/lib.rs — scalar function logic
pub fn first_word(s: &str) -> &str {
    s.split_whitespace().next().unwrap_or("")
}

#[test]
fn first_word_basic() {
    assert_eq!(first_word("hello world"), "hello");
    assert_eq!(first_word("  padded  "), "padded");
    assert_eq!(first_word(""), "");
    assert_eq!(first_word("   "), "");
}
}

Unit tests for SQL macros

SqlMacro::to_sql() is pure Rust — no DuckDB connection needed:

#![allow(unused)]
fn main() {
use quack_rs::sql_macro::SqlMacro;

#[test]
fn scalar_macro_sql() {
    let m = SqlMacro::scalar("double_it", &["x"], "x * 2").unwrap();
    assert_eq!(m.to_sql(),
        "CREATE OR REPLACE MACRO double_it(x) AS (x * 2)");
}

#[test]
fn table_macro_sql() {
    let m = SqlMacro::table("recent", &["n"], "SELECT * FROM events LIMIT n").unwrap();
    assert_eq!(m.to_sql(),
        "CREATE OR REPLACE MACRO recent(n) AS TABLE SELECT * FROM events LIMIT n");
}
}

E2E testing with SQLLogicTest

Community extensions are tested using DuckDB's SQLLogicTest format. This format runs SQL directly in DuckDB and verifies output line-by-line.

File location

test/sql/my_extension.test

Format

# my_extension tests

require my_extension

statement ok
LOAD my_extension;

query I
SELECT my_function('hello world');
----
2

Directives:

DirectiveMeaning
requireSkip test if extension not available
statement okSQL must succeed
statement errorSQL must fail
query IQuery returning one INTEGER column
query IIQuery returning two columns
query TQuery returning one TEXT column
----Expected output follows

Installing DuckDB (1.4.4, 1.5.0, or 1.5.1)

A live DuckDB CLI is required for E2E testing. Install it via curl (no system package manager needed). DuckDB 1.4.4, 1.5.0, or 1.5.1 all work — they use the same C API version (v1.2.0). We recommend 1.5.1 for critical WAL and ART index fixes:

# DuckDB 1.5.1 (recommended)
curl -fsSL https://github.com/duckdb/duckdb/releases/download/v1.5.1/duckdb_cli-linux-amd64.zip \
    -o /tmp/duckdb.zip \
    && unzip -o /tmp/duckdb.zip -d /tmp/ \
    && chmod +x /tmp/duckdb \
    && /tmp/duckdb --version
# → v1.5.1

For macOS, replace linux-amd64 with osx-universal. For Windows, use windows-amd64 and unzip to a directory on %PATH%.

Running E2E tests

# Build the extension
cargo build --release

# Package with metadata footer (required by DuckDB's extension loader)
cargo run --bin append_metadata -- \
    target/release/libmy_extension.so \
    /tmp/my_extension.duckdb_extension \
    --abi-type C_STRUCT \
    --extension-version v0.1.0 \
    --duckdb-version v1.2.0 \
    --platform linux_amd64

# Load it in DuckDB CLI (-unsigned allows loading without a signed certificate)
/tmp/duckdb -unsigned -c "
SET allow_extensions_metadata_mismatch=true;
LOAD '/tmp/my_extension.duckdb_extension';
SELECT my_function('hello world');
"

The community extension CI runs SQLLogicTest automatically. Each function must have at least one test:

# Test NULL handling
query I
SELECT my_function(NULL);
----
NULL

# Test empty input
query I
SELECT my_function('');
----
0

# Test normal case
query I
SELECT my_function('hello world');
----
2

Pitfall P5 — SQLLogicTest does exact string matching. Copy expected values directly from DuckDB CLI output. NULL is represented as NULL (uppercase). Floats must match to the number of decimal places DuckDB outputs.


Property-based testing with proptest

The proptest crate is well-suited for testing aggregate logic over arbitrary inputs:

#![allow(unused)]
fn main() {
use proptest::prelude::*;

proptest! {
    #[test]
    fn saturating_never_panics(months: i32, days: i32, micros: i64) {
        let iv = DuckInterval { months, days, micros };
        // Must not panic for any input
        let _ = interval_to_micros_saturating(iv);
    }
}
}

quack-rs's own test suite uses proptest for interval conversion and aggregate harness properties.


What to test

ScenarioUnitE2E
NULL input → NULL output
Empty string
Unicode strings
Numeric edge cases (0, MAX, MIN)
Combine propagates config
Multi-group aggregation
Function registration success
Extension loads without crash
SQL macro produces correct output✓ (to_sql)

Dev dependencies

[dev-dependencies]
quack-rs = { version = "0.7", features = [] }
proptest = "1"

The testing module is compiled unconditionally (not #[cfg(test)]) so it is available as a dev-dependency to downstream crates.

Community Extensions

DuckDB's community extension ecosystem allows anyone to publish a loadable extension that DuckDB users can install with a single SQL command. This page covers everything you need to submit and maintain a community extension built with quack-rs.


Prerequisites

  • A working extension that passes local E2E tests
  • A GitHub repository (the community build runs from it)
  • All functions tested with SQLLogicTest format
  • A globally unique extension name

Scaffolding a new project

quack_rs::scaffold::generate_scaffold generates all required files from a single function call:

#![allow(unused)]
fn main() {
use quack_rs::scaffold::{ScaffoldConfig, generate_scaffold};

let config = ScaffoldConfig {
    name: "my_extension".to_string(),
    description: "Does something useful".to_string(),
    version: "0.1.0".to_string(),
    license: "MIT".to_string(),
    maintainer: "Your Name".to_string(),
    github_repo: "yourorg/duckdb-my-extension".to_string(),
    excluded_platforms: vec![],
};

let files = generate_scaffold(&config).expect("scaffold failed");
for file in &files {
    std::fs::create_dir_all(std::path::Path::new(&file.path).parent().unwrap()).unwrap();
    std::fs::write(&file.path, &file.content).unwrap();
}
}

This generates:

my_extension/
├── Cargo.toml
├── Makefile
├── extension_config.cmake
├── src/lib.rs
├── src/wasm_lib.rs
├── description.yml
├── test/sql/my_extension.test
├── .github/workflows/extension-ci.yml
├── .gitmodules
├── .gitignore
└── .cargo/config.toml

description.yml

Required fields for community submission:

extension:
  name: my_extension
  description: One-line description of what your extension does
  version: 0.1.0
  language: Rust
  build: cargo
  license: MIT
  requires_toolchains: rust;python3
  excluded_platforms: ""   # or "wasm_mvp;wasm_eh;wasm_threads"
  maintainers:
    - Your Name

repo:
  github: yourorg/duckdb-my-extension
  ref: main

Use quack_rs::validate to pre-validate fields before submission:

#![allow(unused)]
fn main() {
use quack_rs::validate::{
    validate_extension_name,
    validate_extension_version,
    validate_spdx_license,
    validate_excluded_platforms_str,
};

validate_extension_name("my_extension")?;
validate_extension_version("0.1.0")?;
validate_spdx_license("MIT")?;
validate_excluded_platforms_str("wasm_mvp;wasm_eh")?;
}

Naming rules

Extension names must satisfy all of the following:

  • Match ^[a-z][a-z0-9_-]*$ (lowercase, digits, hyphens, underscores)
  • Not exceed 64 characters
  • Be globally unique across the entire DuckDB community extensions ecosystem

Check existing names at community-extensions.duckdb.org before choosing. Use vendor-prefixed names to avoid collisions:

myorg_analytics   ✓
analytics         ✗  (likely taken or too generic)

Pitfall P1 — The [lib] name in Cargo.toml MUST exactly match the extension name. If your crate name is duckdb-my-ext (producing libduckdb_my_ext.so) but description.yml says name: my_ext, the community build fails with FileNotFoundError.


Versioning

FormatExampleMeaning
7+ hex chars690bfc5Unstable — no guarantees
0.y.z0.1.0Pre-release — working toward stability
x.y.z (x > 0)1.0.0Stable — full semver guarantees

Use validate_extension_version to accept all three formats, and classify_extension_version to determine the stability tier:

#![allow(unused)]
fn main() {
use quack_rs::validate::semver::classify_extension_version;

match classify_extension_version("0.1.0")? {
    ExtensionStability::Unstable => println!("git hash"),
    ExtensionStability::PreRelease => println!("0.y.z"),
    ExtensionStability::Stable => println!("x.y.z, x>0"),
}
}

Platform targets

Community extensions are built for:

PlatformDescription
linux_amd64Linux x86_64
linux_amd64_gcc4Linux x86_64 (GCC 4 ABI)
linux_arm64Linux AArch64
osx_amd64macOS x86_64
osx_arm64macOS Apple Silicon
windows_amd64Windows x86_64
windows_amd64_mingwWindows x86_64 (MinGW)
windows_arm64Windows AArch64
wasm_mvpWebAssembly (MVP)
wasm_ehWebAssembly (exception handling)
wasm_threadsWebAssembly (threads)

If your extension cannot be built for a platform (e.g., it uses a platform-specific system library), add it to excluded_platforms:

#![allow(unused)]
fn main() {
ScaffoldConfig {
    excluded_platforms: vec![
        "wasm_mvp".to_string(),
        "wasm_eh".to_string(),
        "wasm_threads".to_string(),
    ],
    // ...
}
}

Validate individual platform names with validate_platform:

#![allow(unused)]
fn main() {
use quack_rs::validate::validate_platform;
validate_platform("linux_amd64")?;  // Ok
validate_platform("invalid")?;       // Err
}

Cargo.toml requirements

[package]
name = "my_extension"
version = "0.1.0"
edition = "2021"

[lib]
name = "my_extension"       # Must match description.yml `name`
crate-type = ["cdylib", "rlib"]

[dependencies]
quack-rs = "0.7"
libduckdb-sys = { version = ">=1.4.4, <2", features = ["loadable-extension"] }

[profile.release]
panic = "abort"              # Required — no stack unwinding in FFI
opt-level = 3
lto = "thin"
strip = "symbols"

Pitfall ADR-1 — Do NOT use the duckdb crate's bundled feature. A loadable extension must link against the DuckDB that loads it, not bundle its own copy. libduckdb-sys with loadable-extension provides lazy function pointers populated by DuckDB at load time.


Release profile check

The validate_release_profile validator checks that your release profile is correctly configured:

#![allow(unused)]
fn main() {
use quack_rs::validate::validate_release_profile;

// Pass all four release profile settings from your Cargo.toml
validate_release_profile("abort", "true", "3", "1")?;   // Ok
validate_release_profile("unwind", "true", "3", "1")?;   // Err — panics across FFI are UB
}

CI workflow

The scaffold generates .github/workflows/extension-ci.yml which:

  1. Runs on push and pull request
  2. Checks, lints, and tests in Rust (all platforms)
  3. Calls extension-ci-tools to build the .duckdb_extension artifact
  4. Runs SQLLogicTest integration tests

After scaffolding:

cd my_extension
git init
git submodule add https://github.com/duckdb/extension-ci-tools.git extension-ci-tools
git submodule update --init --recursive
make configure
make release

Pitfall P4 — The extension-ci-tools submodule must be initialized. make configure fails if the submodule is missing.


Submitting to the community registry

  1. Create a pull request against the community-extensions repository
  2. Add your description.yml under extensions/my_extension/description.yml
  3. CI runs automatically to verify the build
  4. Once approved, users can install your extension:
INSTALL my_extension FROM community;
LOAD my_extension;

Binary compatibility

Extension binaries are tied to a specific DuckDB version. When DuckDB releases a new version:

  • New binaries must be built against that version
  • Old binaries will be refused by the new DuckDB runtime
  • The community build pipeline re-builds all extensions for each DuckDB release

Pin libduckdb-sys with = (exact version) to ensure you always build against the exact version you intend. The quack_rs::DUCKDB_API_VERSION constant ("v1.2.0") is passed to init_extension and must match the C API version of your pinned libduckdb-sys.

Pitfall P2 — The -dv flag to append_extension_metadata.py must be the C API version (v1.2.0), not the DuckDB release version (v1.4.4). Use quack_rs::DUCKDB_API_VERSION to avoid hardcoding this.


Security considerations

Community extensions are not vetted for security by the DuckDB team:

  • Never panic across FFI boundaries (panic = "abort" enforces this)
  • Validate user inputs at system boundaries (extension entry point is the boundary)
  • Do not include secrets, API keys, or credentials in your binary
  • Dynamic SQL in SQL macros must not construct queries from unsanitized user data

Pitfall Catalog

All known DuckDB Rust FFI pitfalls, discovered while building duckdb-behavioral, a production DuckDB community extension. Every future developer who builds a Rust DuckDB extension will hit the majority of these. quack-rs makes most of them impossible.


L1: COMBINE must propagate ALL config fields

Status: Testable with AggregateTestHarness.

Symptom: Aggregate function returns wrong results. No error, no crash.

Root cause: DuckDB's segment tree creates fresh zero-initialized target states via state_init, then calls combine to merge source states into them. If your combine only propagates data fields (count, sum) but omits configuration fields (window_size, mode), the configuration will be zero at finalize time, silently corrupting results.

This bug passed 435 unit tests before being caught by E2E tests.

Fix:

#![allow(unused)]
fn main() {
unsafe extern "C" fn combine(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    target: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    for i in 0..count as usize {
        let src_ptr = unsafe { *source.add(i) };
        let tgt_ptr = unsafe { *target.add(i) };
        if let (Some(src), Some(tgt)) = (
            FfiState::<MyState>::with_state(src_ptr),
            FfiState::<MyState>::with_state_mut(tgt_ptr),
        ) {
            tgt.window_size = src.window_size;  // config — MUST copy
            tgt.mode = src.mode;                // config — MUST copy
            tgt.count += src.count;             // data — accumulate
        }
    }
}
}

Test this with AggregateTestHarness::combine — see Testing Guide.


L2: State destroy double-free

Status: Made impossible by FfiState<T>.

Symptom: Crash or memory corruption on extension unload.

Root cause: If state_destroy frees the inner Box but does not null the pointer, a second state_destroy call (common in error paths) frees already-freed memory → undefined behavior.

Fix: FfiState<T>::destroy_callback nulls inner after freeing. Use it instead of writing your own destructor:

#![allow(unused)]
fn main() {
unsafe extern "C" fn state_destroy(states: *mut duckdb_aggregate_state, count: idx_t) {
    unsafe { FfiState::<MyState>::destroy_callback(states, count) };
}
}

L3: No panic across FFI boundaries

Status: Made impossible by init_extension and panic = "abort".

Symptom: Extension causes DuckDB to crash or behave unpredictably.

Root cause: panic!() and .unwrap() in unsafe extern "C" functions is undefined behavior. Panics cannot unwind across FFI boundaries in Rust.

Fix: Use Result and ? inside init_extension. Never use unwrap() in FFI callbacks. FfiState::with_state_mut returns Option, not Result, so callers use if let:

#![allow(unused)]
fn main() {
// Safe pattern — no unwrap in FFI callback
if let Some(st) = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) } {
    st.count += 1;
}

// Dangerous — never do this in an FFI callback
let st = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) }.unwrap(); // UB if None
}

The scaffold-generated Cargo.toml sets panic = "abort" in the release profile, which terminates the process instead of unwinding — still bad, but not undefined behavior.


L4: ensure_validity_writable is required before NULL output

Status: Made impossible by VectorWriter::set_null.

Symptom: SEGFAULT when writing NULL values to the output vector.

Root cause: duckdb_vector_get_validity returns an uninitialized pointer if duckdb_vector_ensure_validity_writable has not been called first. Writing to an uninitialized address → SEGFAULT.

Fix: Always call duckdb_vector_ensure_validity_writable before accessing the validity bitmap on the write path. VectorWriter::set_null does this automatically:

#![allow(unused)]
fn main() {
// Correct — handled by set_null
unsafe { writer.set_null(row) };

// Wrong — validity bitmap may not be allocated yet
// let validity = duckdb_vector_get_validity(output);
// set_bit(validity, row, false);  // SEGFAULT
}

L5: Boolean reading must use u8 != 0, not *const bool

Status: Made impossible by VectorReader::read_bool.

Symptom: Undefined behavior; Rust requires bool to be exactly 0 or 1.

Root cause: DuckDB's C API does not guarantee that boolean values in vectors are exactly 0 or 1. Values of 2, 255, etc. cast to Rust bool is undefined behavior.

Fix: Read as u8 and compare with != 0. VectorReader::read_bool always does this:

#![allow(unused)]
fn main() {
let b: bool = unsafe { reader.read_bool(row) };  // safe: uses u8 != 0 internally
}

L6: Function set name must be set on EACH member

Status: Made impossible by AggregateFunctionSetBuilder.

Symptom: Functions are silently not registered. No error returned.

Root cause: When using duckdb_register_aggregate_function_set, the function name must be set on EACH individual duckdb_aggregate_function using duckdb_aggregate_function_set_name, not just on the set.

This is completely undocumented. Discovered by reading DuckDB's C++ test code at test/api/capi/test_capi_aggregate_functions.cpp.

In duckdb-behavioral, 6 of 7 functions failed to register silently due to this bug.

Fix: AggregateFunctionSetBuilder calls duckdb_aggregate_function_set_name on every individual function before adding it to the set. Use it instead of managing the set manually.


L7: LogicalType memory leak

Status: Made impossible by LogicalType RAII wrapper.

Symptom: Memory leak proportional to number of registered functions.

Root cause: duckdb_create_logical_type allocates memory that must be freed with duckdb_destroy_logical_type. Forgetting leaks memory.

Fix: LogicalType implements Drop and calls duckdb_destroy_logical_type automatically when it goes out of scope.


P1: Library name must match extension name

Status: Must be configured in Cargo.toml. Scaffold handles this.

Symptom: Community build fails with FileNotFoundError.

Root cause: The community build expects lib{extension_name}.so. If the Cargo crate name produces a different .so filename, the build fails.

Fix: Set name explicitly in [lib]:

[lib]
name = "my_extension"   # Must match description.yml `name: my_extension`
crate-type = ["cdylib", "rlib"]

P2: Metadata version is C API version, not DuckDB version

Status: DUCKDB_API_VERSION constant encodes the correct value.

Symptom: Metadata script fails or produces incorrect metadata.

Root cause: The -dv flag to append_extension_metadata.py must be the C API version (v1.2.0), not the DuckDB release version (v1.4.4). These are different strings.

Fix: Use quack_rs::DUCKDB_API_VERSION ("v1.2.0") in init_extension, and use the same version with append_extension_metadata.py -dv v1.2.0.


P3: E2E testing is mandatory

Status: Documented. See Testing Guide.

Symptom: All unit tests pass but the extension is completely broken.

Root cause: Unit tests cannot detect SEGFAULTs on load, silent registration failures, or wrong results from combine bugs.

Fix: Always run E2E tests using an actual DuckDB binary. The scaffold generates a complete SQLLogicTest skeleton.


P4: extension-ci-tools submodule must be initialized

Status: Build-time check.

Symptom: make configure or make release fails.

Fix:

git submodule update --init --recursive

P5: SQLLogicTest expected values must match exactly

Status: Test-authoring care required.

Symptom: Tests fail in CI but pass locally (or vice versa).

Root cause: SQLLogicTest does exact string matching. Output format (decimal places, NULL representation, column separators) must match character-for-character.

Fix: Generate expected values by running the SQL in DuckDB CLI and copying the output. NULL is NULL (uppercase). Integers have no decimal places.


P6: duckdb_register_aggregate_function_set silently fails

Status: Builder returns Err. Also see L6.

Symptom: Function appears registered but is not found in SQL.

Root cause: The return value of duckdb_register_aggregate_function_set is often ignored. When it returns DuckDBError, the function set is not registered.

Fix: The builder checks the return value and propagates it as Err.


P7: duckdb_string_t format is undocumented

Status: Handled by VectorReader::read_str and DuckStringView.

Symptom: VARCHAR reading produces garbage, empty strings, or crashes.

Root cause: DuckDB stores strings in a 16-byte struct with two formats (inline ≤ 12 bytes, pointer > 12 bytes) that are not documented in libduckdb-sys.

Fix: Use VectorReader::read_str(row). See NULL Handling & Strings.


P8: INTERVAL struct layout is undocumented

Status: Handled by DuckInterval and read_interval_at.

Symptom: Interval calculations produce wrong results or crashes.

Root cause: DuckDB's INTERVAL is { months: i32, days: i32, micros: i64 } (16 bytes total). This is not documented in libduckdb-sys. Month conversion uses 1 month = 30 days (DuckDB's approximation).

Fix: Use VectorReader::read_interval(row) and DuckInterval. See INTERVAL Type.


P9: loadable-extension dispatch table uninitialised in cargo test

Status: Fixed. InMemoryDb::open() initialises the dispatch table automatically.

Symptom: All three InMemoryDb unit tests panic at runtime:

thread 'testing::in_memory_db::tests::in_memory_db_opens' panicked at
'DuckDB API not initialized or DuckDB feature omitted'

This failure appears only when running cargo test --features bundled-test. Regular cargo test (no feature) does not exercise this code path, so CI can miss it entirely.

Root cause: Cargo's feature-unification merges loadable-extension (from the main libduckdb-sys dependency) and bundled-full (pulled in by the duckdb crate's features = ["bundled"]) into a single libduckdb-sys build with both features active. In loadable-extension mode every DuckDB C API call is routed through an AtomicPtr<fn> dispatch table, which is normally populated at extension-load time when DuckDB calls duckdb_rs_extension_api_init. In cargo test, no DuckDB host process loads the extension, so the table stays uninitialised and every call panics.

Discovery: This was triggered by the crates.io release workflow (which runs --all-features) failing on macOS. Regular CI (--no-default-features, --all-targets) never compiled the bundled-test path, so the bug was hidden during development and code review.

Fix (implemented in quack-rs 0.6.0):

  1. src/testing/bundled_api_init.cpp — a thin C++ shim that wraps DuckDB's internal CreateAPIv1() (from duckdb/main/capi/extension_api.hpp) as a C-linkage symbol:

    #include "duckdb/main/capi/extension_api.hpp"
    extern "C" duckdb_ext_api_v1 quack_rs_create_api_v1() {
        return CreateAPIv1();
    }
    
  2. build.rs — compiles the shim (via the cc crate) only when the bundled-test feature is active, locating the DuckDB headers from the libduckdb-sys build output directory.

  3. InMemoryDb::open() — calls init_dispatch_table_once() before opening the connection. That function calls quack_rs_create_api_v1() once and feeds the result through duckdb_rs_extension_api_init, populating all 459 AtomicPtr slots in the dispatch table. A std::sync::Once guard makes it safe to call from any number of threads and test cases.

  4. CI test-bundled job — runs cargo test --all-targets --features bundled-test on Linux, macOS, and Windows on every PR, so this class of failure is caught before release.

ABI compatibility note: DuckDB's duckdb_ext_api_v1 struct is defined identically in both the public duckdb_extension.h (used by libduckdb-sys bindgen) and the internal extension_api.hpp (used by CreateAPIv1()). Both include the DUCKDB_EXTENSION_API_VERSION_UNSTABLE fields. CreateAPIv1() sets all 459 fields. The Rust and C++ structs are produced from the same DuckDB release and therefore stay in sync.

Risk table (using DuckDB's internal C++ API):

RiskMitigation
extension_api.hpp is renamed or movedbuild.rs fails with a clear compile error
CreateAPIv1() is renamedSame — C++ compile error
duckdb_ext_api_v1 gains new fieldsCreateAPIv1() fills new fields too
duckdb_ext_api_v1 field order changesBoth structs from same DuckDB release, stay in sync
libduckdb-sys drops loadable-extension dispatchProblem disappears; Once guard becomes cheap no-op

Summary

PitfallSDK statusYour action
L1: combine config fieldsTestableTest with AggregateTestHarness::combine
L2: state double-freePreventedUse FfiState::destroy_callback
L3: panic across FFIPreventedUse init_extension, no unwrap in callbacks
L4: validity bitmap SEGFAULTPreventedUse VectorWriter::set_null
L5: bool UBPreventedUse VectorReader::read_bool
L6: function set namePreventedUse AggregateFunctionSetBuilder
L7: LogicalType leakPreventedUse LogicalType (RAII)
P1: lib name mismatchScaffoldSet [lib] name in Cargo.toml
P2: API version stringConstantUse DUCKDB_API_VERSION
P3: unit tests insufficientDocumentedWrite SQLLogicTest E2E tests
P4: submodule not initializedBuild-timegit submodule update --init
P5: SQLLogicTest exact matchDocumentedCopy output from DuckDB CLI
P6: register set silent failPreventedBuilder returns Err
P7: VARCHAR format undocumentedPreventedUse VectorReader::read_str
P8: INTERVAL layout undocumentedPreventedUse DuckInterval
P9: dispatch table uninitialisedFixedInMemoryDb::open() initialises it via C++ shim

TypeId Reference

quack_rs::types::TypeId is an ergonomic enum of all DuckDB column types supported by the builder APIs. It wraps the DUCKDB_TYPE_* integer constants from libduckdb-sys and provides safe, named variants.


Full variant table

VariantSQL namelibduckdb-sys constantNotes
TypeId::BooleanBOOLEANDUCKDB_TYPE_BOOLEANtrue/false stored as u8
TypeId::TinyIntTINYINTDUCKDB_TYPE_TINYINT8-bit signed
TypeId::SmallIntSMALLINTDUCKDB_TYPE_SMALLINT16-bit signed
TypeId::IntegerINTEGERDUCKDB_TYPE_INTEGER32-bit signed
TypeId::BigIntBIGINTDUCKDB_TYPE_BIGINT64-bit signed
TypeId::UTinyIntUTINYINTDUCKDB_TYPE_UTINYINT8-bit unsigned
TypeId::USmallIntUSMALLINTDUCKDB_TYPE_USMALLINT16-bit unsigned
TypeId::UIntegerUINTEGERDUCKDB_TYPE_UINTEGER32-bit unsigned
TypeId::UBigIntUBIGINTDUCKDB_TYPE_UBIGINT64-bit unsigned
TypeId::HugeIntHUGEINTDUCKDB_TYPE_HUGEINT128-bit signed
TypeId::FloatFLOATDUCKDB_TYPE_FLOAT32-bit IEEE 754
TypeId::DoubleDOUBLEDUCKDB_TYPE_DOUBLE64-bit IEEE 754
TypeId::TimestampTIMESTAMPDUCKDB_TYPE_TIMESTAMPµs since Unix epoch
TypeId::TimestampTzTIMESTAMPTZDUCKDB_TYPE_TIMESTAMP_TZtimezone-aware timestamp
TypeId::DateDATEDUCKDB_TYPE_DATEdays since epoch
TypeId::TimeTIMEDUCKDB_TYPE_TIMEµs since midnight
TypeId::IntervalINTERVALDUCKDB_TYPE_INTERVALmonths + days + µs
TypeId::VarcharVARCHARDUCKDB_TYPE_VARCHARUTF-8 string
TypeId::BlobBLOBDUCKDB_TYPE_BLOBbinary data
TypeId::DecimalDECIMALDUCKDB_TYPE_DECIMALfixed-point decimal
TypeId::TimestampSTIMESTAMP_SDUCKDB_TYPE_TIMESTAMP_Sseconds since epoch
TypeId::TimestampMsTIMESTAMP_MSDUCKDB_TYPE_TIMESTAMP_MSmilliseconds since epoch
TypeId::TimestampNsTIMESTAMP_NSDUCKDB_TYPE_TIMESTAMP_NSnanoseconds since epoch
TypeId::EnumENUMDUCKDB_TYPE_ENUMenumeration type
TypeId::ListLISTDUCKDB_TYPE_LISTvariable-length list
TypeId::StructSTRUCTDUCKDB_TYPE_STRUCTnamed fields (row type)
TypeId::MapMAPDUCKDB_TYPE_MAPkey-value pairs
TypeId::UuidUUIDDUCKDB_TYPE_UUID128-bit UUID
TypeId::UnionUNIONDUCKDB_TYPE_UNIONtagged union of types
TypeId::BitBITDUCKDB_TYPE_BITbitstring
TypeId::TimeTzTIMETZDUCKDB_TYPE_TIME_TZtimezone-aware time
TypeId::UHugeIntUHUGEINTDUCKDB_TYPE_UHUGEINT128-bit unsigned
TypeId::ArrayARRAYDUCKDB_TYPE_ARRAYfixed-length array
TypeId::TimeNsTIME_NSDUCKDB_TYPE_TIME_NSnanosecond-precision time (duckdb-1-5)
TypeId::AnyANYDUCKDB_TYPE_ANYwildcard for function signatures (duckdb-1-5)
TypeId::VarintVARINTDUCKDB_TYPE_BIGNUMvariable-length integer (duckdb-1-5)
TypeId::SqlNullSQLNULLDUCKDB_TYPE_SQLNULLexplicit SQL NULL type (duckdb-1-5)
TypeId::IntegerLiteralINTEGER_LITERALDUCKDB_TYPE_INTEGER_LITERALunresolved integer literal (duckdb-1-5)
TypeId::StringLiteralSTRING_LITERALDUCKDB_TYPE_STRING_LITERALunresolved string literal (duckdb-1-5)

Methods

to_duckdb_type() → DUCKDB_TYPE

Converts to the raw C API integer constant. Used internally by the builder APIs.

#![allow(unused)]
fn main() {
use quack_rs::types::TypeId;

let raw: libduckdb_sys::DUCKDB_TYPE = TypeId::BigInt.to_duckdb_type();
}

from_duckdb_type(raw) → TypeId

Converts a raw DUCKDB_TYPE constant back into a TypeId. Panics if the value does not match any known DUCKDB_TYPE constant.

#![allow(unused)]
fn main() {
use quack_rs::types::TypeId;

let type_id = TypeId::from_duckdb_type(libduckdb_sys::DUCKDB_TYPE_DUCKDB_TYPE_BIGINT);
assert_eq!(type_id, TypeId::BigInt);
}

sql_name() → &'static str

Returns the SQL type name as a static string.

#![allow(unused)]
fn main() {
assert_eq!(TypeId::BigInt.sql_name(), "BIGINT");
assert_eq!(TypeId::Varchar.sql_name(), "VARCHAR");
assert_eq!(TypeId::TimestampTz.sql_name(), "TIMESTAMPTZ");
}

Display

TypeId implements Display, which outputs the SQL name:

#![allow(unused)]
fn main() {
println!("{}", TypeId::Interval);  // prints: INTERVAL
let s = format!("{}", TypeId::UBigInt); // "UBIGINT"
}

VectorReader/VectorWriter mapping

The read and write methods on VectorReader/VectorWriter map to TypeId variants as follows:

TypeIdRead methodWrite methodRust type
Booleanread_boolwrite_boolbool
TinyIntread_i8write_i8i8
SmallIntread_i16write_i16i16
Integerread_i32write_i32i32
BigIntread_i64write_i64i64
UTinyIntread_u8write_u8u8
USmallIntread_u16write_u16u16
UIntegerread_u32write_u32u32
UBigIntread_u64write_u64u64
Floatread_f32write_f32f32
Doubleread_f64write_f64f64
Varcharread_strwrite_varchar&str
Intervalread_intervalwrite_intervalDuckInterval

HugeInt, Blob, List, Struct, Map, Uuid, Date, Time, Timestamp, TimestampTz, Decimal, TimestampS, TimestampMs, TimestampNs, Enum, Union, Bit, TimeTz, UHugeInt, Array, TimeNs, Any, Varint, SqlNull, IntegerLiteral, StringLiteral do not yet have dedicated read/write helpers. Access these via the raw data pointer from duckdb_vector_get_data.


Properties

TypeId implements Debug, Clone, Copy, PartialEq, Eq, and Hash, making it usable as map keys, set elements, and in match expressions:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use quack_rs::types::TypeId;

let mut type_names: HashMap<TypeId, &str> = HashMap::new();
type_names.insert(TypeId::BigInt, "count");
type_names.insert(TypeId::Varchar, "label");
}

#[non_exhaustive]

TypeId is marked #[non_exhaustive]. This means future DuckDB versions may add new variants without it being a breaking change. If you match on TypeId, include a wildcard arm:

#![allow(unused)]
fn main() {
match type_id {
    TypeId::BigInt => { /* ... */ }
    TypeId::Varchar => { /* ... */ }
    _ => { /* handle future types */ }
}
}

LogicalType

For types that require runtime parameters (such as DECIMAL(p, s) or parameterized LIST), use quack_rs::types::LogicalType:

#![allow(unused)]
fn main() {
use quack_rs::types::{LogicalType, TypeId};

let lt = LogicalType::new(TypeId::BigInt);
// or use the From impl:
let lt: LogicalType = TypeId::BigInt.into();
// LogicalType implements Drop → calls duckdb_destroy_logical_type automatically
}

LogicalType wraps duckdb_logical_type with RAII cleanup, preventing the memory leak described in Pitfall L7.

Constructors

ConstructorCreates
new(type_id)Simple type from a TypeId
from_raw(ptr)Takes ownership of a raw handle (unsafe)
decimal(width, scale)DECIMAL(width, scale)
list(element_type)LIST<T> from a TypeId
list_from_logical(element)LIST<T> from an existing LogicalType
map(key, value)MAP<K, V> from TypeIds
map_from_logical(key, value)MAP<K, V> from existing LogicalTypes
struct_type(fields)STRUCT from &[(&str, TypeId)]
struct_type_from_logical(fields)STRUCT from &[(&str, LogicalType)]
union_type(members)UNION from &[(&str, TypeId)]
union_type_from_logical(members)UNION from &[(&str, LogicalType)]
enum_type(members)ENUM from &[&str]
array(element_type, size)ARRAY<T>[size] from a TypeId
array_from_logical(element, size)ARRAY<T>[size] from an existing LogicalType

Introspection methods

All introspection methods are unsafe (require a valid DuckDB runtime handle):

get_type_id, get_alias, set_alias, decimal_width, decimal_scale, decimal_internal_type, enum_internal_type, enum_dictionary_size, enum_dictionary_value, list_child_type, map_key_type, map_value_type, struct_child_count, struct_child_name, struct_child_type, union_member_count, union_member_name, union_member_type, array_size, array_child_type.

See Type System for the full introspection table.

Known Limitations

Window functions are not available

DuckDB window functions (OVER (...) clauses) are implemented entirely in DuckDB's C++ layer and have no counterpart in the public C extension API.

This is not a gap in quack-rs or in libduckdb-sys — the relevant symbol (duckdb_create_window_function) simply does not exist in the C API:

SymbolC API (1.4.x)?C API (1.5.0+)?C++ API?
duckdb_create_window_functionNoNoYes
duckdb_create_copy_functionNoYesYes
duckdb_create_scalar_functionYesYesYes
duckdb_create_aggregate_functionYesYesYes
duckdb_create_table_functionYesYesYes
duckdb_create_cast_functionYesYesYes

What this means for your extension:

If your extension needs window-function semantics, you can approximate them with aggregate functions in most cases (DuckDB will push down the window logic). True custom window operator registration requires writing a C++ extension.

If DuckDB exposes window registration in a future C API version, quack-rs will add wrappers in the corresponding release.

COPY functions (resolved in DuckDB 1.5.0)

DuckDB 1.5.0 added duckdb_create_copy_function and related symbols to the public C extension API. quack-rs wraps these in the copy_function module behind the duckdb-1-5 feature flag. See CopyFunctionBuilder for usage.

This was previously listed as a known limitation (no C API counterpart prior to 1.5.0).

Callback accessor wrappers (resolved)

quack-rs now wraps all major callback accessor functions — the C API functions used inside your callbacks to retrieve arguments, set errors, access bind data, etc.

CategoryWrapper typeAvailable
Scalar function executionScalarFunctionInfoAlways
Scalar function bindScalarBindInfoduckdb-1-5
Scalar function initScalarInitInfoduckdb-1-5
Aggregate function callbacksAggregateFunctionInfoAlways
Table function bindBindInfoAlways
Table function initInitInfoAlways
Table function scanFunctionInfoAlways
Cast function callbacksCastFunctionInfoAlways
Copy function bindCopyBindInfoduckdb-1-5
Copy function global initCopyGlobalInitInfoduckdb-1-5
Copy function sinkCopySinkInfoduckdb-1-5
Copy function finalizeCopyFinalizeInfoduckdb-1-5

All callback accessor functions are now wrapped, including get_client_context on all callback types (returns a [ClientContext][crate::client_context::ClientContext]).

Complex type creation (resolved)

LogicalType now provides constructors for all complex parameterized types:

MethodType created
LogicalType::decimal(width, scale)DECIMAL(p, s)
LogicalType::enum_type(members)ENUM('a', 'b', ...)
LogicalType::array(child, size)type[N]
LogicalType::union_type(members)UNION(a INT, b VARCHAR)
LogicalType::list(child)LIST(type)
LogicalType::struct_type(fields)STRUCT(...)
LogicalType::map(key, value)MAP(K, V)

All constructors have _from_logical variants for nested complex types. Introspection methods (get_type_id, list_child_type, struct_child_count, decimal_width, etc.) are also available.

VARIANT type (Iceberg v3)

DuckDB v1.5.1 introduced the VARIANT type for Iceberg v3 support. This type is not yet exposed in the DuckDB C Extension API (DUCKDB_TYPE_VARIANT does not exist in libduckdb-sys 1.10501.0). quack-rs will add TypeId::Variant when the C API exposes it.

Changelog

All notable changes to quack-rs, mirrored from CHANGELOG.md.

The format follows Keep a Changelog. quack-rs adheres to Semantic Versioning.


Unreleased

[0.8.0] — 2026-03-28

Added

  • LogicalType::from_raw(ptr) — construct from raw handle
  • Complex type constructorsdecimal, array, array_from_logical, union_type, union_type_from_logical, enum_type
  • _from_logical variantsstruct_type_from_logical, list_from_logical, map_from_logical for nested complex types
  • 20 introspection methods on LogicalTypeget_type_id, get_alias, set_alias, decimal/enum/list/map/struct/union/array child access
  • TypeId::from_duckdb_type() — reverse conversion from raw C enum
  • extra_info on ScalarFunctionBuilder, ScalarOverloadBuilder, AggregateFunctionBuilder
  • param_logical / named_param_logical on TableFunctionBuilder
  • CastFunctionBuilder::new_logical() for complex source/target types
  • Callback info wrappersScalarFunctionInfo, ScalarBindInfo (duckdb-1-5), ScalarInitInfo (duckdb-1-5), AggregateFunctionInfo, CopyBindInfo (duckdb-1-5), CopyGlobalInitInfo (duckdb-1-5), CopySinkInfo (duckdb-1-5), CopyFinalizeInfo (duckdb-1-5)
  • get_client_context() on all callback info types
  • BindInfoget_parameter, get_named_parameter, get_extra_info, get_client_context
  • InitInfo / FunctionInfoget_extra_info
  • ArrayVector helper with get_child()
  • vector_size() and vector_get_column_type() utilities
  • PreludeStructVector, ListVector, MapVector, ArrayVector, ScalarFunctionInfo, AggregateFunctionInfo

Changed

  • Breaking: CastFunctionBuilder::source() / target() return Option<TypeId> (was TypeId)
  • Breaking: CastRecord::source / target fields changed to Option<TypeId>

0.7.1 — 2026-03-27

Added

  • TypeId::Any — wildcard type for function overload resolution (duckdb-1-5)
  • TypeId::Varint — variable-length arbitrary-precision integer (duckdb-1-5)
  • TypeId::SqlNull — explicit SQL NULL type for bare NULL literals (duckdb-1-5)
  • TypeId::IntegerLiteral — integer literal type for overload resolution (duckdb-1-5)
  • TypeId::StringLiteral — string literal type for overload resolution (duckdb-1-5)
  • MockVectorReader/MockVectorWriter tests — 12 new tests for untested constructors and getters
  • DuckDB v1.5.1 evaluation — see docs/duckdb-v1.5.1-evaluation.md

Fixed

  • ARM64 / aarch64 build — use c_char instead of i8 for cross-platform pointer casts

Changed

  • DuckDB v1.5.1 compatibility — documentation updated to explicitly cover v1.5.1. C API version unchanged (v1.2.0). Recommend upgrading DuckDB runtime for WAL corruption and ART index fixes.

0.7.0 — 2026-03-22

Added

  • duckdb-1-5 feature modules — the duckdb-1-5 feature flag is no longer a placeholder. When enabled, it gates five new modules wrapping DuckDB 1.5.0 C Extension API additions:

    • catalog — catalog entry lookup (CatalogEntry, Catalog, CatalogEntryType)
    • client_context — client context access (ClientContext) for retrieving catalogs, config options, and connection IDs from within registered function callbacks
    • config_option — extension-defined configuration options (ConfigOptionBuilder, ConfigOptionScope) registered via SET/RESET/current_setting()
    • copy_function — custom COPY TO handlers (CopyFunctionBuilder) with bind → global init → sink → finalize lifecycle
    • table_description — table metadata queries (TableDescription) for column count, names, and logical types
  • TypeId::TimeNs — new TIME_NS column type variant for nanosecond- precision time of day (DuckDB 1.5.0+, requires duckdb-1-5 feature)

  • ScalarFunctionBuilder::varargs() / varargs_logical() — mark a scalar function as accepting variadic arguments (requires duckdb-1-5)

  • ScalarFunctionBuilder::volatile() — mark a scalar function as volatile (re-evaluated for every row even with constant arguments, requires duckdb-1-5)

  • ScalarFunctionBuilder::bind() — set a bind callback invoked once during query planning for per-query state allocation (requires duckdb-1-5)

  • ScalarFunctionBuilder::init() — set an init callback invoked once per thread for per-thread local state allocation (requires duckdb-1-5)

Changed

  • DuckDB 1.5.0 support — upgraded default libduckdb-sys from 1.4.4 to 1.10500.0 (DuckDB 1.5.0) and duckdb from 1.4.4 to 1.10500.0. The version range ">=1.4.4, <2" in Cargo.toml is unchanged, preserving backward compatibility with DuckDB 1.4.x.

  • CI action updatesSwatinem/rust-cache v2.8.2→v2.9.1, actions/download-artifact v8.0.0→v8.0.1, actions/cache 5.0.3→5.0.4, codecov/codecov-action 5.4.3→5.5.3.

Fixed

  • COPY format handlers — previously listed as a known limitation (no C API counterpart). DuckDB 1.5.0 adds duckdb_create_copy_function and related symbols; the new copy_function module wraps them behind duckdb-1-5.

0.6.0 — 2026-03-12

Added

  • InMemoryDb dispatch table initialisationInMemoryDb::open() now correctly initialises the loadable-extension dispatch table from bundled DuckDB symbols before opening a connection. Previously, every call panicked with "DuckDB API not initialized" when the bundled-test feature was enabled in cargo test. See Pitfall P9 for the full technical analysis.

  • src/testing/bundled_api_init.cpp — thin C++ shim exposing DuckDB's internal CreateAPIv1() as a C-linkage symbol, compiled at build time via the cc crate. Populates all 459 AtomicPtr dispatch table slots with real bundled DuckDB function pointers.

  • build.rs — Cargo build script that locates the libduckdb-sys include path and compiles the C++ shim when the bundled-test feature is active.

  • CI: test-bundled job — new CI job runs cargo test --all-targets --features bundled-test on Linux, macOS, and Windows on every PR, closing the gap that allowed this failure to reach the release workflow undetected.

  • Pitfall P9 documented — full analysis in LESSONS.md and the Pitfall Catalog: root cause, CreateAPIv1() solution, ABI compatibility details, risks of the internal C++ API, and a mitigation table.

Fixed

  • InMemoryDb::open() no longer panics under cargo test --features bundled-test. This was broken from the initial 0.5.1 release.

Changed

  • bundled-test feature documentation updated to describe dispatch table initialisation accurately.

0.5.1 — 2026-03-12

Added

  • Testing primitives (quack_rs::testing)MockVectorWriter, MockVectorReader, MockDuckValue, MockRegistrar, CastRecord.

  • bundled-test Cargo feature — enables InMemoryDb for SQL-level assertions in cargo test. (Note: InMemoryDb::open() was broken in this release and fixed in 0.6.0.)

  • InMemoryDb — wraps duckdb::Connection for SQL-level integration tests; available behind the bundled-test feature.

  • Builder introspection accessorsname() on all function builders; source()/target() on CastFunctionBuilder.

Security

  • Bump quinn-proto 0.11.13 → 0.11.14 (addresses RUSTSEC advisory).

0.5.0 — 2026-03-10

Added

  • param_logical(LogicalType) on all builders — register parameters with complex parameterized types (LIST(BIGINT), MAP(VARCHAR, INTEGER), STRUCT(...)) that TypeId alone cannot express. Available on AggregateFunctionBuilder, AggregateFunctionSetBuilder::OverloadBuilder, ScalarFunctionBuilder, and ScalarOverloadBuilder. Parameters added via param() and param_logical() are interleaved by position, so the order you call them is the order DuckDB sees them.

  • returns_logical(LogicalType) on all builders — set a complex parameterized return type. When both returns(TypeId) and returns_logical(LogicalType) are called, the logical type takes precedence. Available on AggregateFunctionBuilder, AggregateFunctionSetBuilder, ScalarFunctionBuilder, and ScalarOverloadBuilder. This eliminates the need for raw FFI when returning LIST(BOOLEAN), LIST(TIMESTAMP), MAP(K, V), or any other parameterized type.

  • null_handling(NullHandling) on set overload builders — per-overload NULL handling configuration for AggregateFunctionSetBuilder::OverloadBuilder and ScalarOverloadBuilder. Previously only available on single-function builders.

Notes

  • Upstream fix: duckdb-loadable-macros panic-at-FFI-boundary — the safe entry-point pattern developed in quack-rs (using ? / ok_or_else throughout instead of .unwrap()) was contributed upstream as duckdb/duckdb-rs#696 and merged 2026-03-09. All users of the duckdb_entrypoint_c_api! macro from duckdb-loadable-macros will receive this fix in the next duckdb-rs release. quack-rs users have always been protected via the safe entry_point! / entry_point_v2! macros provided by this crate.

0.4.0 — 2026-03-09

Added

  • Connection and Registrar trait — version-agnostic extension registration facade. Connection wraps the duckdb_connection and duckdb_database handles provided at initialization time. The Registrar trait provides uniform methods for registering all extension components (scalar, scalar set, aggregate, aggregate set, table, SQL macro, cast), making registration code interchangeable across DuckDB 1.4.x and 1.5.x.

  • init_extension_v2 — new entry point helper that passes &Connection to the registration callback instead of a raw duckdb_connection. Prefer this over init_extension for new extensions.

  • entry_point_v2! macro — companion macro to entry_point! that generates the #[no_mangle] unsafe extern "C" entry point using init_extension_v2.

  • duckdb-1-5 cargo feature — placeholder feature flag for DuckDB 1.5.0-specific C API wrappers. Currently empty; will be populated when libduckdb-sys 1.5.0 is published on crates.io.

Changed

  • DuckDB version support broadened to 1.4.x and 1.5.x — the libduckdb-sys dependency requirement was relaxed from an exact pin (=1.4.4) to a range (>=1.4.4, <2). DuckDB v1.5.0 does not change the C API version string (v1.2.0); the existing DUCKDB_API_VERSION constant remains correct for both releases. Extension authors can pin their own libduckdb-sys to either =1.4.4 or =1.5.0 and resolve cleanly against quack-rs. The scaffold template and CI workflow template were updated to default to DuckDB v1.5.0.

0.3.0 — 2026-03-08

Added

  • TableFunctionBuilder — type-safe builder for registering DuckDB table functions (SELECT * FROM my_function(args)). Covers the full bind/init/scan lifecycle with ergonomic callbacks; BindInfo, FfiBindData<T>, and FfiInitData<T> eliminate all raw pointer manipulation. Verified end-to-end against DuckDB 1.4.4. See Table Functions.

  • ReplacementScanBuilder — builder for registering DuckDB replacement scans (SELECT * FROM 'file.xyz' patterns). 4-method chain handles callback registration, path extraction, and bind-info population. See Replacement Scans.

  • StructVector, ListVector, MapVector — safe wrappers for reading and writing nested-type vectors. Eliminate manual offset arithmetic and raw pointer casts over child vector handles. Re-exported from quack_rs::vector::complex. See Complex Types.

  • CastFunctionBuilder — type-safe builder for registering custom type cast functions. Covers explicit CAST(x AS T) and implicit coercions (optional implicit_cost). CastFunctionInfo exposes cast_mode(), set_error(), and set_row_error() inside callbacks for correct TRY_CAST / CAST error handling. See Cast Functions.

  • DbConfig — RAII wrapper for duckdb_config. Builder-style .set(name, value)? chain with automatic duckdb_destroy_config on drop and flag_count() / get_flag(index) for enumerating all available options. See quack_rs::config.

  • ScalarFunctionSetBuilder — builder for registering scalar function overload sets, mirroring AggregateFunctionSetBuilder.

  • NullHandling enum and .null_handling() builder method — configurable NULL propagation for scalar and aggregate functions.

  • TypeId variantsDecimal, Struct, Map, UHugeInt, TimeTz, TimestampS, TimestampMs, TimestampNs, Array, Enum, Union, Bit.

  • From<TypeId> for LogicalType — idiomatic conversion from TypeId.

  • #[must_use] on builder structs — compile-time warning if a builder is constructed but never consumed.

  • VectorWriter::write_interval — writes INTERVAL values to output vectors.

  • append_metadata binary — native Rust replacement for the Python metadata script. Install with cargo install quack-rs --bin append_metadata.

  • hello-ext cast demo — the example extension now registers CAST(VARCHAR AS INTEGER) and TRY_CAST(VARCHAR AS INTEGER) using CastFunctionBuilder, demonstrating both error modes with five unit tests.

  • prelude additionsTableFunctionBuilder, BindInfo, FfiBindData, FfiInitData, ReplacementScanBuilder, StructVector, ListVector, MapVector, CastFunctionBuilder, CastFunctionInfo, CastMode added to quack_rs::prelude.

Not implemented (upstream C API gap)

  • Window functions and COPY format handlers are absent from DuckDB's public C extension API and cannot be wrapped. See Known Limitations.

Fixed

  • hello-ext gs_bind callback — replaced incorrect duckdb_value_int64(param) with duckdb_get_int64(param). All 11 live SQL tests now pass against DuckDB 1.4.4.

Changed

  • Bump criterion dev-dependency from 0.5 to 0.8.
  • Bump Swatinem/rust-cache GitHub Action from v2.7.5 to v2.8.2.
  • Bump dtolnay/rust-toolchain CI pin from v2.7.5 to latest SHA.
  • Bump actions/attest-build-provenance from v2 to v4.
  • Bump actions/configure-pages to latest SHA (d5606572…).
  • Bump actions/upload-pages-artifact from v3.0.1 to v4.0.0.

0.2.0 — 2026-03-07

Added

  • validate::description_yml module — parse and validate a complete description.yml metadata file end-to-end. Includes:

    • DescriptionYml struct — structured representation of all required and optional fields
    • parse_description_yml(content: &str) — parse and validate in one step
    • validate_description_yml_str(content: &str) — pass/fail validation
    • validate_rust_extension(desc: &DescriptionYml) — enforce Rust-specific fields (language: Rust, build: cargo, requires_toolchains includes rust)
    • 25+ unit tests covering all required fields, optional fields, error paths, and edge cases
  • prelude module — ergonomic glob-import for the most commonly used items. use quack_rs::prelude::*; brings in all builder types, state traits, vector helpers, types, error handling, and the API version constant. Reduces boilerplate for extension authors.

  • Scaffold: extension_config.cmake generation — the scaffold generator now produces extension_config.cmake, which is referenced by the EXT_CONFIG variable in the Makefile and required by extension-ci-tools for CI integration.

  • Scaffold: SQLLogicTest skeletongenerate_scaffold now produces test/sql/{name}.test, a ready-to-fill SQLLogicTest file with require directive, format comments, and example query/result blocks. E2E tests are required for community extension submission (Pitfall P3).

  • Scaffold: GitHub Actions CI workflowgenerate_scaffold now produces .github/workflows/extension-ci.yml, a complete cross-platform CI workflow that builds and tests the extension on Linux, macOS, and Windows against a real DuckDB binary.

  • validate::validate_excluded_platforms_str — validates the excluded_platforms field from description.yml as a semicolon-delimited string (e.g., "wasm_mvp;wasm_eh;wasm_threads"). Splits on ; and validates each token. An empty string is valid (no exclusions).

  • validate::validate_excluded_platforms — re-exported at the validate module level (previously only accessible as validate::platform::validate_excluded_platforms).

  • validate::semver::classify_extension_version — returns ExtensionStability (Unstable/PreRelease/Stable) classifying the tier a version falls into.

  • validate::semver::ExtensionStability — enum for DuckDB extension version stability tiers (Unstable, PreRelease, Stable) with Display implementation.

  • scalar moduleScalarFunctionBuilder for registering scalar functions with the DuckDB C Extension API. Includes try_new with name validation, param, returns, function setters, and register. Full unit tests included.

  • entry_point! macro — generates the required #[no_mangle] extern "C" entry point with zero boilerplate from an identifier and registration closure.

  • VectorWriter::write_varchar — writes VARCHAR string values to output vectors using duckdb_vector_assign_string_element_len (handles both inline and pointer formats).

  • VectorWriter::write_bool — writes BOOLEAN values as a single byte.

  • VectorWriter::write_u16 — writes USMALLINT values.

  • VectorWriter::write_i16 — writes SMALLINT values.

  • VectorReader::read_interval — reads INTERVAL values from input vectors via the correct 16-byte layout helper.

  • CI: Windows testing — the CI matrix now includes windows-latest in the test job, covering all three major platforms (Linux, macOS, Windows).

  • CI: example-check job — CI now checks, lints, and tests examples/hello-ext as part of every PR, ensuring the example extension always compiles and its tests pass.

  • validate::validate_release_profile — checks Cargo release profile settings for loadable-extension correctness. Validates panic, lto, opt-level, and codegen-units.

Fixed

  • MSRV documentation now consistently states 1.84.1 across README.md, CONTRIBUTING.md, and Cargo.toml (previously README.md stated 1.80).

0.1.0 — 2025-05-01

Added

  • Initial release
  • entry_point module: init_extension helper for correct extension initialization
  • aggregate module: AggregateFunctionBuilder, AggregateFunctionSetBuilder
  • aggregate::state module: AggregateState trait, FfiState<T> wrapper
  • aggregate::callbacks module: type aliases for all 6 aggregate callback signatures
  • vector module: VectorReader, VectorWriter, ValidityBitmap, DuckStringView
  • types module: TypeId enum (33 variants), LogicalType RAII wrapper
  • interval module: DuckInterval, interval_to_micros, read_interval_at
  • error module: ExtensionError, ExtResult<T>
  • testing module: AggregateTestHarness<S> for pure-Rust aggregate testing
  • scaffold module: generate_scaffold for generating complete extension projects
  • sql_macro module: SqlMacro for registering SQL macros without FFI callbacks
  • Complete hello-ext example extension
  • Documentation of all 15 DuckDB Rust FFI pitfalls (LESSONS.md)
  • CI pipeline: check, test, clippy, fmt, doc, msrv, bench-compile
  • SECURITY.md vulnerability disclosure policy

FAQ

Frequently asked questions about quack-rs and building DuckDB extensions in Rust.


General

What is quack-rs?

quack-rs is a Rust SDK for building DuckDB loadable extensions using DuckDB's pure C Extension API. It provides safe, ergonomic builders for registering scalar functions, aggregate functions, table functions, cast functions, replacement scans, SQL macros, and copy functions (via the duckdb-1-5 feature), along with helpers for reading and writing DuckDB vectors, and utilities for publishing community extensions.

Why does this exist?

Building a DuckDB extension in Rust requires solving a set of undocumented FFI problems that every developer discovers independently. quack-rs encodes solutions to all 16 known pitfalls so you don't have to rediscover them. See the Pitfall Catalog.

What DuckDB version does quack-rs target?

quack-rs requires libduckdb-sys = ">=1.4.4, <2" (DuckDB 1.4.x and 1.5.x). The C API version string passed to the dispatch-table initializer is "v1.2.0", available as quack_rs::DUCKDB_API_VERSION. Both DuckDB 1.4.x and 1.5.x use the same C API version. These are two distinct version identifiers — the crate version and the C API protocol version.

What is the minimum supported Rust version (MSRV)?

Rust 1.84.1 or later. This is enforced in Cargo.toml with rust-version = "1.84.1".

Is quack-rs production-ready?

Yes. It was extracted from duckdb-behavioral, a production DuckDB community extension. All 16 pitfalls it solves were discovered in production.


Functions

Can I expose SQL macros as an extension?

Yes, without any C++ wrapper code. Use quack_rs::sql_macro::SqlMacro:

#![allow(unused)]
fn main() {
use quack_rs::sql_macro::SqlMacro;

// Scalar macro
let m = SqlMacro::scalar("double_it", &["x"], "x * 2")?;
unsafe { m.register(con) }?;

// Table macro
let m = SqlMacro::table("recent_events", &["n"],
    "SELECT * FROM events ORDER BY ts DESC LIMIT n")?;
unsafe { m.register(con) }?;
}

Register them inside your init_extension closure alongside aggregate and scalar functions. See SQL Macros.

Can I register multiple overloads of the same function?

Yes, using AggregateFunctionSetBuilder (for aggregates) or ScalarFunctionSetBuilder (for scalars). Both support complex parameter types via param_logical(LogicalType) and complex return types via returns_logical(LogicalType). See Overloading with Function Sets.

Can I register multiple functions in one extension?

Yes. The init_extension closure receives a duckdb_connection and can call as many register_* functions as needed:

#![allow(unused)]
fn main() {
quack_rs::entry_point::init_extension(info, access, DUCKDB_API_VERSION, |con| {
    unsafe { register_word_count(con) }?;
    unsafe { register_sentence_count(con) }?;
    unsafe {
        SqlMacro::scalar("double_it", &["x"], "x * 2")?
            .register(con)?;
    }
    Ok(())
})
}

Can I use the duckdb crate instead of libduckdb-sys?

No. The duckdb crate's bundled feature embeds its own copy of DuckDB. A loadable extension must link against the DuckDB that loads it, not bundle a separate copy. Use libduckdb-sys with the loadable-extension feature.

Can I have a scalar function with no parameters?

Yes. Pass an empty slice to param:

#![allow(unused)]
fn main() {
ScalarFunctionBuilder::new("current_quack")
    .returns(TypeId::Varchar)
    .function(quack_callback)
    .register(con)?;
}

Testing

Do I need a DuckDB instance to run unit tests?

No. AggregateTestHarness simulates the aggregate lifecycle in pure Rust without any DuckDB dependency. You can run cargo test without loading a DuckDB binary.

My unit tests all pass but the extension crashes. Why?

Unit tests cannot detect FFI wiring bugs. See Pitfall P3 and the Testing Guide. Always run E2E tests by loading the extension into an actual DuckDB process.

How do I test SQL macros?

SqlMacro::to_sql() is pure Rust and requires no DuckDB connection:

#![allow(unused)]
fn main() {
let m = SqlMacro::scalar("triple", &["x"], "x * 3").unwrap();
assert_eq!(m.to_sql(), "CREATE OR REPLACE MACRO triple(x) AS (x * 3)");
}

For E2E testing, include the macro in your SQLLogicTest file:

query I
SELECT double_it(21);
----
42

Publishing

How do I publish to the DuckDB community extensions registry?

  1. Scaffold your project with generate_scaffold
  2. Push to GitHub
  3. Submit a pull request to the community-extensions repo with your description.yml

See Community Extensions for the full workflow.

My extension name is taken. What should I do?

Use a vendor-prefixed name: myorg_analytics instead of analytics. Extension names must be globally unique across the entire DuckDB ecosystem. Check community-extensions.duckdb.org first.

Do I need to set up CI manually?

No. generate_scaffold produces .github/workflows/extension-ci.yml which builds and tests your extension on Linux, macOS, and Windows automatically.

Can my extension be installed with INSTALL ... FROM community?

Yes, once your pull request is merged into the community-extensions repository. Until then, users load the .duckdb_extension binary directly:

LOAD './path/to/libmy_extension.duckdb_extension';

Troubleshooting

My aggregate returns wrong results with no error.

The most common cause is Pitfall L1: your combine callback is not propagating all configuration fields. See Pitfall L1 and test with AggregateTestHarness::combine.

I'm getting a SEGFAULT when writing NULL.

You are likely calling duckdb_vector_get_validity without first calling duckdb_vector_ensure_validity_writable. Use VectorWriter::set_null instead. See Pitfall L4.

My function is not found in SQL after LOAD.

Most likely cause: the function was not registered (Pitfall L6 — function set name not set on each member), or the entry point symbol name does not match the extension name. The symbol must be {extension_name}_init_c_api (all lowercase, underscores).

make configure fails with a missing file error.

The extension-ci-tools submodule is not initialized:

git submodule update --init --recursive

My SQLLogicTest fails in CI but passes locally.

SQLLogicTest does exact string matching. The most common issue is a difference in NULL representation, decimal places, or line endings. Run the query in the same DuckDB version used by CI and copy the output verbatim.

How do I read a VARCHAR that is longer than 12 bytes?

VectorReader::read_str handles both the inline (≤ 12 bytes) and pointer (> 12 bytes) formats automatically. No special handling needed.

What happens if I read from a NULL row?

You get garbage data from the vector's data buffer. Always check is_valid before reading. See NULL Handling & Strings.


Architecture

Why use libduckdb-sys with loadable-extension instead of the duckdb crate?

The duckdb crate is designed for embedding DuckDB, not for extending it. Its bundled feature includes a statically linked DuckDB binary, which conflicts with the DuckDB runtime that loads your extension. libduckdb-sys with loadable-extension provides lazy-initialized function pointers that are populated by DuckDB at extension load time.

Why not use duckdb-loadable-macros?

duckdb-loadable-macros relies on extract_raw_connection which uses the internal Rc<RefCell<InnerConnection>> layout. This is fragile and causes SEGFAULTs when the layout changes between duckdb crate versions. init_extension uses the correct C API entry sequence directly.

Why is panic = "abort" required?

Panics cannot unwind across FFI boundaries in Rust. A panic in an unsafe extern "C" callback is undefined behavior. panic = "abort" converts panics to process termination, which is still bad but not undefined behavior. Always use Result and ? in your callbacks instead.

Can I use async Rust in my extension?

Not directly in FFI callbacks. DuckDB's callbacks are synchronous C functions. You can run a Tokio or async-std runtime and block on async tasks inside callbacks (using Runtime::block_on), but the callbacks themselves must return synchronously.

How does FfiState<T> prevent double-free?

FfiState<T> stores the Box<T> as a raw pointer in inner. When destroy_callback is called, it reconstitutes the Box (which drops T and frees memory) and then sets inner to null. A second call to destroy_callback on the same state sees a null inner and returns without freeing.

Contributing

quack-rs is an open source project. Contributions of all kinds are welcome: bug reports, documentation improvements, new pitfall discoveries, and code.


Development prerequisites

ToolVersionPurpose
Rust≥ 1.84.1 (MSRV)Compiler
rustfmtstableFormatting
clippystableLinting
cargo-msrvlatestMSRV verification

Install the Rust toolchain via rustup.rs.


Building

# Build the library
cargo build

# Build in release mode (enables LTO + strip)
cargo build --release

# Build the hello-ext example extension
cargo build --release --manifest-path examples/hello-ext/Cargo.toml

Quality gates

All of the following must pass before merging any pull request:

# Tests — zero failures, zero ignored
cargo test

# Integration tests
cargo test --test integration_test

# Linting — zero warnings (warnings are errors)
cargo clippy --all-targets -- -D warnings

# Formatting
cargo fmt -- --check

# Documentation — zero broken links or missing docs
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps

# MSRV — must compile on Rust 1.84.1 (excludes benches; matches CI)
cargo +1.84.1 check

These same checks run in CI on every push and pull request.


Test strategy

Unit tests

Unit tests live in #[cfg(test)] modules within each source file. They test pure-Rust logic that does not require a live DuckDB instance.

Important constraint: libduckdb-sys with features = ["loadable-extension"] makes all DuckDB C API functions go through lazy AtomicPtr dispatch. These pointers are only populated when duckdb_rs_extension_api_init is called from within a real DuckDB extension load. Calling any duckdb_* function in a unit test will panic. Move such tests to integration tests or example-extension tests.

Integration tests

tests/integration_test.rs contains pure-Rust tests that cross module boundaries — testing interval with AggregateTestHarness, verifying FfiState lifecycle, and so on. These still cannot call duckdb_* functions.

Property-based tests

Selected modules include proptest-based tests:

  • interval.rs — overflow edge cases across the full i32/i64 range
  • testing/harness.rs — sum associativity, identity element for AggregateState

Example-extension tests

examples/hello-ext/ contains #[cfg(test)] unit tests for the pure logic (count_words). Full E2E testing (loading the .so into DuckDB) is left to consumers.


Code standards

Safety documentation

Every unsafe block must have a // SAFETY: comment explaining:

  1. Which invariant the caller guarantees
  2. Why the operation is valid given that invariant
#![allow(unused)]
fn main() {
// SAFETY: `states` is a valid array of `count` pointers, each initialized
// by `init_callback`. We are the only owner of `inner` at this point.
unsafe { drop(Box::from_raw(ffi.inner)) };
}

No panics across FFI

unwrap(), expect(), and panic!() are forbidden in any function that may be called by DuckDB (callbacks and entry points). Use Option/Result and ? throughout.

Clippy lint policy

The crate enables pedantic, nursery, and cargo lint groups. All warnings are treated as errors in CI. Lints are suppressed only where they produce false positives for SDK API patterns:

[lints.clippy]
module_name_repetitions = "allow"  # e.g., AggregateFunctionBuilder
must_use_candidate = "allow"       # builder methods
missing_errors_doc = "allow"       # unsafe extern "C" callbacks
return_self_not_must_use = "allow" # builder pattern

Documentation

Every public item must have a doc comment. Follow these conventions:

  • First line: short summary (noun phrase, no trailing period)
  • # Safety: mandatory on every unsafe fn
  • # Panics: mandatory if the function can panic
  • # Errors: mandatory on functions returning Result
  • # Example: encouraged on public types and key methods

Repository structure

quack-rs/
├── src/
│   ├── lib.rs                     # Crate root; module declarations; DUCKDB_API_VERSION
│   ├── entry_point.rs             # init_extension() / init_extension_v2() + entry_point! / entry_point_v2!
│   ├── connection.rs              # Connection facade + Registrar trait (version-agnostic registration)
│   ├── config.rs                  # DbConfig — RAII wrapper for duckdb_config
│   ├── error.rs                   # ExtensionError, ExtResult<T>
│   ├── interval.rs                # DuckInterval, interval_to_micros
│   ├── sql_macro.rs               # SqlMacro — CREATE MACRO without FFI callbacks
│   ├── aggregate/
│   │   ├── mod.rs
│   │   ├── builder/               # Builder types for aggregate function registration
│   │   │   ├── mod.rs             # Module doc + re-exports
│   │   │   ├── single.rs          # AggregateFunctionBuilder (single-signature)
│   │   │   ├── set.rs             # AggregateFunctionSetBuilder, OverloadBuilder
│   │   │   └── tests.rs           # Unit tests
│   │   ├── info.rs                # AggregateFunctionInfo
│   │   ├── callbacks.rs           # Callback type aliases
│   │   └── state.rs               # AggregateState trait, FfiState<T>
│   ├── scalar/
│   │   ├── mod.rs
│   │   ├── info.rs                # ScalarFunctionInfo, ScalarBindInfo, ScalarInitInfo
│   │   └── builder/               # Builder types for scalar function registration
│   │       ├── mod.rs             # Module doc + re-exports
│   │       ├── single.rs          # ScalarFn type alias, ScalarFunctionBuilder
│   │       ├── set.rs             # ScalarFunctionSetBuilder, ScalarOverloadBuilder
│   │       └── tests.rs           # Unit tests
│   ├── catalog.rs                 # Catalog access helpers (requires `duckdb-1-5`)
│   ├── cast/
│   │   ├── mod.rs                 # Re-exports
│   │   └── builder.rs             # CastFunctionBuilder, CastFunctionInfo, CastMode
│   ├── client_context.rs          # ClientContext wrapper (requires `duckdb-1-5`)
│   ├── config_option.rs           # ConfigOption registration (requires `duckdb-1-5`)
│   ├── copy_function/
│   │   ├── mod.rs                 # CopyFunctionBuilder (requires `duckdb-1-5`)
│   │   └── info.rs                # CopyBindInfo, CopySinkInfo, etc.
│   ├── replacement_scan/
│   │   └── mod.rs                 # ReplacementScanBuilder — SELECT * FROM 'file.xyz' patterns
│   ├── types/
│   │   ├── mod.rs
│   │   ├── type_id.rs             # TypeId enum (33 base + 6 with duckdb-1-5)
│   │   └── logical_type.rs        # LogicalType RAII wrapper
│   ├── vector/
│   │   ├── mod.rs
│   │   ├── reader.rs              # VectorReader
│   │   ├── writer.rs              # VectorWriter
│   │   ├── validity.rs            # ValidityBitmap
│   │   ├── string.rs              # DuckStringView, read_duck_string
│   │   └── complex.rs             # StructVector, ListVector, MapVector, ArrayVector
│   ├── validate/
│   │   ├── mod.rs
│   │   ├── description_yml/       # Parse and validate description.yml metadata
│   │   │   ├── mod.rs             # Module doc + re-exports
│   │   │   ├── model.rs           # DescriptionYml struct
│   │   │   ├── parser.rs          # parse_description_yml and helpers
│   │   │   ├── validator.rs       # validate_description_yml_str, validate_rust_extension
│   │   │   └── tests.rs           # Unit tests
│   │   ├── extension_name.rs
│   │   ├── function_name.rs
│   │   ├── platform.rs
│   │   ├── release_profile.rs
│   │   ├── semver.rs
│   │   └── spdx.rs
│   ├── scaffold/
│   │   ├── mod.rs                 # ScaffoldConfig, GeneratedFile, generate_scaffold
│   │   ├── templates.rs           # Template generators for scaffold files (pub(super))
│   │   └── tests.rs               # Unit tests
│   ├── table_description.rs       # TableDescription wrapper (requires `duckdb-1-5`)
│   ├── table/
│   │   ├── mod.rs
│   │   ├── builder.rs             # TableFunctionBuilder, BindFn/InitFn/ScanFn aliases
│   │   ├── info.rs                # BindInfo, InitInfo, FunctionInfo
│   │   ├── bind_data.rs           # FfiBindData<T>
│   │   └── init_data.rs           # FfiInitData<T>, FfiLocalInitData<T>
│   └── testing/
│       ├── mod.rs
│       ├── harness.rs             # AggregateTestHarness<S>
│       ├── mock_vector.rs         # MockVectorReader, MockVectorWriter, MockDuckValue
│       ├── mock_registrar.rs      # MockRegistrar, CastRecord
│       └── in_memory_db.rs        # InMemoryDb (requires `bundled-test`)
├── tests/
│   └── integration_test.rs
├── benches/
│   └── interval_bench.rs          # Criterion benchmarks
├── examples/
│   └── hello-ext/                 # Reference example: word_count (aggregate) + first_word (scalar)
├── book/                          # mdBook documentation source
│   ├── src/                       # Markdown pages (this site)
│   └── theme/custom.css
├── .github/workflows/ci.yml       # CI pipeline
├── .github/workflows/docs.yml     # GitHub Pages deployment
├── CONTRIBUTING.md
├── LESSONS.md                     # The 16 DuckDB Rust FFI pitfalls
├── CHANGELOG.md
└── README.md

Releasing

quack-rs uses libduckdb-sys = ">=1.4.4, <2" — a bounded range covering DuckDB 1.4.x and 1.5.x, whose C API (v1.2.0) is stable across both releases. The <2 upper bound prevents silent adoption of a future major release that may change the C API. Before broadening the range to a new major band:

  1. Read the DuckDB changelog for C API changes
  2. Check the new C API version string (used in duckdb_rs_extension_api_init)
  3. Update DUCKDB_API_VERSION in src/lib.rs if the C API version changed
  4. Audit all callback signatures against the new bindgen.rs output
  5. Update the range bounds in Cargo.toml (runtime and dev-deps)

Versions follow Semantic Versioning. Breaking changes to the public API require a major version bump.


Reporting issues

Use GitHub Issues. For security vulnerabilities, see SECURITY.md for responsible disclosure policy.


License

quack-rs is licensed under the MIT License. Contributions are accepted under the same license. By submitting a pull request, you agree to license your contribution under MIT.