Your First Extension

This page walks through hello-ext, the complete reference example bundled with quack-rs. It registers four functions that together cover every major pattern:

SQLKindSignature
word_count(text)AggregateVARCHAR → BIGINT
first_word(text)ScalarVARCHAR → VARCHAR
generate_series_ext(n)TableBIGINT → TABLE(value BIGINT)
CAST(VARCHAR AS INTEGER)CastVARCHAR → INTEGER

Full source: examples/hello-ext/src/lib.rs


Build and try it

cargo build --release --manifest-path examples/hello-ext/Cargo.toml

Then in the DuckDB CLI:

LOAD './examples/hello-ext/target/release/libhello_ext.so';

-- Aggregate: total words across all rows
SELECT word_count(sentence) FROM (
    VALUES ('hello world'), ('one two three'), (NULL)
) t(sentence);
-- → 5  (2 + 3; NULL contributes 0)

-- Scalar: first word of each row
SELECT first_word(sentence) FROM (
    VALUES ('hello world'), ('  padded  '), (''), (NULL)
) t(sentence);
-- → 'hello', 'padded', '', NULL

Overview

An extension has four parts:

  1. State struct — holds data accumulated during aggregation (aggregate only)
  2. Callbacksupdate, combine, finalize, state_size, state_init, state_destroy (aggregate) or a single function callback (scalar)
  3. Registration — wire callbacks to DuckDB via AggregateFunctionBuilder / ScalarFunctionBuilder
  4. Entry point — DuckDB's initialization hook, generated by entry_point!

Part 1 — Aggregate function: word_count

An aggregate function accumulates state across many rows and emits one result per group.

1a. The state struct

#![allow(unused)]
fn main() {
#[derive(Default, Debug)]
struct WordCountState {
    count: i64,
}

impl AggregateState for WordCountState {}
}

AggregateState is a marker trait — no methods required. FfiState<WordCountState> wraps it in a heap-allocated Box<T> behind a raw pointer and manages the full lifecycle (init, combine, destroy).

1b. state_size and state_init

These two callbacks are always identical boilerplate — delegate to FfiState:

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_state_size(_info: duckdb_function_info) -> idx_t {
    FfiState::<WordCountState>::size_callback(_info)
}

unsafe extern "C" fn wc_state_init(info: duckdb_function_info, state: duckdb_aggregate_state) {
    unsafe { FfiState::<WordCountState>::init_callback(info, state) };
}
}

size_callback returns size_of::<*mut WordCountState>() — DuckDB allocates a pointer-slot per group. init_callback runs Box::new(WordCountState::default()) and writes the pointer into that slot.

1c. update — accumulate one batch

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_update(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    states: *mut duckdb_aggregate_state,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if !unsafe { reader.is_valid(row) } {
            continue; // NULL input → skip (contributes 0 words)
        }
        let s = unsafe { reader.read_str(row) };
        let words = count_words(s);

        let state_ptr = unsafe { *states.add(row) };
        if let Some(st) = unsafe { FfiState::<WordCountState>::with_state_mut(state_ptr) } {
            st.count += words;
        }
    }
}
}

Key points:

  • Check is_valid(row) before reading — never dereference an invalid (NULL) row
  • VectorReader::new(chunk, col) gives column col from the chunk
  • count_words is pure Rust — no unsafe, easy to unit-test separately

1d. combine — merge parallel results

Pitfall L1: DuckDB creates fresh zero-initialized target states before calling combine. You must copy all fields — not just the result field. In an aggregate with config fields (e.g., a histogram with a bin_width) you must also copy those, or results will be silently corrupted.

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_combine(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    target: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    for i in 0..count as usize {
        let src_ptr = unsafe { *source.add(i) };
        let tgt_ptr = unsafe { *target.add(i) };
        let src = unsafe { FfiState::<WordCountState>::with_state(src_ptr) };
        let tgt = unsafe { FfiState::<WordCountState>::with_state_mut(tgt_ptr) };
        if let (Some(s), Some(t)) = (src, tgt) {
            t.count += s.count;
            // If you add fields to WordCountState, combine them here too.
        }
    }
}
}

1e. finalize — write output

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_finalize(
    _info: duckdb_function_info,
    source: *mut duckdb_aggregate_state,
    result: duckdb_vector,
    count: idx_t,
    offset: idx_t,
) {
    let mut writer = unsafe { VectorWriter::new(result) };

    for i in 0..count as usize {
        let state_ptr = unsafe { *source.add(i) };
        match unsafe { FfiState::<WordCountState>::with_state(state_ptr) } {
            Some(st) => unsafe { writer.write_i64(offset as usize + i, st.count) },
            None     => unsafe { writer.set_null(offset as usize + i) },
        }
    }
}
}

offset is DuckDB's output row offset — always use offset as usize + i, not just i.

1f. state_destroy

#![allow(unused)]
fn main() {
unsafe extern "C" fn wc_state_destroy(
    states: *mut duckdb_aggregate_state,
    count: idx_t,
) {
    unsafe { FfiState::<WordCountState>::destroy_callback(states, count) };
}
}

destroy_callback calls Box::from_raw and nulls each pointer, preventing double-free.


Part 2 — Scalar function: first_word

A scalar function processes one data chunk and returns one output value per row. The callback receives the full chunk and an output vector (not per-row state pointers).

Key rule: always propagate NULL

If the input row is NULL, write NULL to output — never read from an invalid row.

#![allow(unused)]
fn main() {
unsafe extern "C" fn first_word_scalar(
    _info: duckdb_function_info,
    input: duckdb_data_chunk,
    output: duckdb_vector,
) {
    let reader = unsafe { VectorReader::new(input, 0) };
    let mut writer = unsafe { VectorWriter::new(output) };
    let row_count = reader.row_count();

    for row in 0..row_count {
        if !unsafe { reader.is_valid(row) } {
            unsafe { writer.set_null(row) }; // NULL in → NULL out
            continue;
        }
        let s = unsafe { reader.read_str(row) };
        unsafe { writer.write_varchar(row, first_word(s)) };
    }
}
}

The pure logic:

#![allow(unused)]
fn main() {
pub fn first_word(s: &str) -> &str {
    s.split_whitespace().next().unwrap_or("")
}
}

Note: set_null internally calls duckdb_vector_ensure_validity_writable before writing the null flag — this is required by DuckDB and handled for you by VectorWriter.


Part 3 — Registration

#![allow(unused)]
fn main() {
unsafe fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> {
    unsafe {
        AggregateFunctionBuilder::new("word_count")
            .param(TypeId::Varchar)
            .returns(TypeId::BigInt)
            .state_size(wc_state_size)
            .init(wc_state_init)
            .update(wc_update)
            .combine(wc_combine)
            .finalize(wc_finalize)
            .destructor(wc_state_destroy)
            .register(con)?;

        ScalarFunctionBuilder::new("first_word")
            .param(TypeId::Varchar)
            .returns(TypeId::Varchar)
            .function(first_word_scalar)
            .register(con)?;
    }
    Ok(())
}
}

Both builders call the DuckDB C API internally. register returns Err if DuckDB reports a failure — this propagates to the entry point and is surfaced to the user.


Part 4 — Entry point

#![allow(unused)]
fn main() {
quack_rs::entry_point!(hello_ext_init_c_api, |con| unsafe { register(con) });
}

This one line emits:

#![allow(unused)]
fn main() {
#[no_mangle]
pub unsafe extern "C" fn hello_ext_init_c_api(
    info: duckdb_extension_info,
    access: *const duckdb_extension_access,
) -> bool {
    unsafe {
        quack_rs::entry_point::init_extension(
            info, access, quack_rs::DUCKDB_API_VERSION,
            |con| unsafe { register(con) },
        )
    }
}
}

Pass the full symbol namehello_ext_init_c_api here. DuckDB looks up this exact symbol when loading the extension. See The Entry Point for the full initialization sequence.


Unit tests (no DuckDB process needed)

Test pure logic directly:

#![allow(unused)]
fn main() {
#[test]
fn count_words_whitespace_variants() {
    assert_eq!(count_words("  hello  world  "), 2);
    assert_eq!(count_words("\t\nhello\tworld\n"), 2);
    assert_eq!(count_words("   "), 0); // all whitespace → 0
}

#[test]
fn first_word_empty_and_whitespace() {
    assert_eq!(first_word(""), "");
    assert_eq!(first_word("   "), "");
}
}

Test aggregate state with AggregateTestHarness:

#![allow(unused)]
fn main() {
#[test]
fn word_count_null_rows_are_skipped() {
    // NULL rows: the callback skips them (no update call)
    let mut h = AggregateTestHarness::<WordCountState>::new();
    h.update(|s| s.count += count_words("hello"));
    // NULL row omitted — models callback skip
    h.update(|s| s.count += count_words("world"));
    assert_eq!(h.finalize().count, 2);
}

#[test]
fn word_count_combine() {
    let mut h1 = AggregateTestHarness::<WordCountState>::new();
    h1.update(|s| s.count += count_words("hello world")); // 2

    let mut h2 = AggregateTestHarness::<WordCountState>::new();
    h2.update(|s| s.count += count_words("one two three four")); // 4

    h2.combine(&h1, |src, tgt| tgt.count += src.count);
    assert_eq!(h2.finalize().count, 6);
}
}

Run all tests with:

cargo test --manifest-path examples/hello-ext/Cargo.toml

See the Testing Guide for the full test strategy.