Your First Extension
This page walks through hello-ext, the complete reference example bundled with quack-rs.
It registers four functions that together cover every major pattern:
| SQL | Kind | Signature |
|---|---|---|
word_count(text) | Aggregate | VARCHAR → BIGINT |
first_word(text) | Scalar | VARCHAR → VARCHAR |
generate_series_ext(n) | Table | BIGINT → TABLE(value BIGINT) |
CAST(VARCHAR AS INTEGER) | Cast | VARCHAR → INTEGER |
Full source: examples/hello-ext/src/lib.rs
Build and try it
cargo build --release --manifest-path examples/hello-ext/Cargo.toml
Then in the DuckDB CLI:
LOAD './examples/hello-ext/target/release/libhello_ext.so';
-- Aggregate: total words across all rows
SELECT word_count(sentence) FROM (
VALUES ('hello world'), ('one two three'), (NULL)
) t(sentence);
-- → 5 (2 + 3; NULL contributes 0)
-- Scalar: first word of each row
SELECT first_word(sentence) FROM (
VALUES ('hello world'), (' padded '), (''), (NULL)
) t(sentence);
-- → 'hello', 'padded', '', NULL
Overview
An extension has four parts:
- State struct — holds data accumulated during aggregation (aggregate only)
- Callbacks —
update,combine,finalize,state_size,state_init,state_destroy(aggregate) or a single function callback (scalar) - Registration — wire callbacks to DuckDB via
AggregateFunctionBuilder/ScalarFunctionBuilder - Entry point — DuckDB's initialization hook, generated by
entry_point!
Part 1 — Aggregate function: word_count
An aggregate function accumulates state across many rows and emits one result per group.
1a. The state struct
#![allow(unused)] fn main() { #[derive(Default, Debug)] struct WordCountState { count: i64, } impl AggregateState for WordCountState {} }
AggregateState is a marker trait — no methods required.
FfiState<WordCountState> wraps it in a heap-allocated Box<T> behind a raw pointer
and manages the full lifecycle (init, combine, destroy).
1b. state_size and state_init
These two callbacks are always identical boilerplate — delegate to FfiState:
#![allow(unused)] fn main() { unsafe extern "C" fn wc_state_size(_info: duckdb_function_info) -> idx_t { FfiState::<WordCountState>::size_callback(_info) } unsafe extern "C" fn wc_state_init(info: duckdb_function_info, state: duckdb_aggregate_state) { unsafe { FfiState::<WordCountState>::init_callback(info, state) }; } }
size_callback returns size_of::<*mut WordCountState>() — DuckDB allocates a pointer-slot
per group. init_callback runs Box::new(WordCountState::default()) and writes the pointer
into that slot.
1c. update — accumulate one batch
#![allow(unused)] fn main() { unsafe extern "C" fn wc_update( _info: duckdb_function_info, input: duckdb_data_chunk, states: *mut duckdb_aggregate_state, ) { let reader = unsafe { VectorReader::new(input, 0) }; let row_count = reader.row_count(); for row in 0..row_count { if !unsafe { reader.is_valid(row) } { continue; // NULL input → skip (contributes 0 words) } let s = unsafe { reader.read_str(row) }; let words = count_words(s); let state_ptr = unsafe { *states.add(row) }; if let Some(st) = unsafe { FfiState::<WordCountState>::with_state_mut(state_ptr) } { st.count += words; } } } }
Key points:
- Check
is_valid(row)before reading — never dereference an invalid (NULL) row VectorReader::new(chunk, col)gives columncolfrom the chunkcount_wordsis pure Rust — no unsafe, easy to unit-test separately
1d. combine — merge parallel results
Pitfall L1: DuckDB creates fresh zero-initialized target states before calling
combine. You must copy all fields — not just the result field. In an aggregate with config fields (e.g., a histogram with abin_width) you must also copy those, or results will be silently corrupted.
#![allow(unused)] fn main() { unsafe extern "C" fn wc_combine( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, target: *mut duckdb_aggregate_state, count: idx_t, ) { for i in 0..count as usize { let src_ptr = unsafe { *source.add(i) }; let tgt_ptr = unsafe { *target.add(i) }; let src = unsafe { FfiState::<WordCountState>::with_state(src_ptr) }; let tgt = unsafe { FfiState::<WordCountState>::with_state_mut(tgt_ptr) }; if let (Some(s), Some(t)) = (src, tgt) { t.count += s.count; // If you add fields to WordCountState, combine them here too. } } } }
1e. finalize — write output
#![allow(unused)] fn main() { unsafe extern "C" fn wc_finalize( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, result: duckdb_vector, count: idx_t, offset: idx_t, ) { let mut writer = unsafe { VectorWriter::new(result) }; for i in 0..count as usize { let state_ptr = unsafe { *source.add(i) }; match unsafe { FfiState::<WordCountState>::with_state(state_ptr) } { Some(st) => unsafe { writer.write_i64(offset as usize + i, st.count) }, None => unsafe { writer.set_null(offset as usize + i) }, } } } }
offset is DuckDB's output row offset — always use offset as usize + i, not just i.
1f. state_destroy
#![allow(unused)] fn main() { unsafe extern "C" fn wc_state_destroy( states: *mut duckdb_aggregate_state, count: idx_t, ) { unsafe { FfiState::<WordCountState>::destroy_callback(states, count) }; } }
destroy_callback calls Box::from_raw and nulls each pointer, preventing double-free.
Part 2 — Scalar function: first_word
A scalar function processes one data chunk and returns one output value per row. The callback receives the full chunk and an output vector (not per-row state pointers).
Key rule: always propagate NULL
If the input row is NULL, write NULL to output — never read from an invalid row.
#![allow(unused)] fn main() { unsafe extern "C" fn first_word_scalar( _info: duckdb_function_info, input: duckdb_data_chunk, output: duckdb_vector, ) { let reader = unsafe { VectorReader::new(input, 0) }; let mut writer = unsafe { VectorWriter::new(output) }; let row_count = reader.row_count(); for row in 0..row_count { if !unsafe { reader.is_valid(row) } { unsafe { writer.set_null(row) }; // NULL in → NULL out continue; } let s = unsafe { reader.read_str(row) }; unsafe { writer.write_varchar(row, first_word(s)) }; } } }
The pure logic:
#![allow(unused)] fn main() { pub fn first_word(s: &str) -> &str { s.split_whitespace().next().unwrap_or("") } }
Note: set_null internally calls duckdb_vector_ensure_validity_writable before writing
the null flag — this is required by DuckDB and handled for you by VectorWriter.
Part 3 — Registration
#![allow(unused)] fn main() { unsafe fn register(con: libduckdb_sys::duckdb_connection) -> Result<(), ExtensionError> { unsafe { AggregateFunctionBuilder::new("word_count") .param(TypeId::Varchar) .returns(TypeId::BigInt) .state_size(wc_state_size) .init(wc_state_init) .update(wc_update) .combine(wc_combine) .finalize(wc_finalize) .destructor(wc_state_destroy) .register(con)?; ScalarFunctionBuilder::new("first_word") .param(TypeId::Varchar) .returns(TypeId::Varchar) .function(first_word_scalar) .register(con)?; } Ok(()) } }
Both builders call the DuckDB C API internally. register returns Err if DuckDB reports
a failure — this propagates to the entry point and is surfaced to the user.
Part 4 — Entry point
#![allow(unused)] fn main() { quack_rs::entry_point!(hello_ext_init_c_api, |con| unsafe { register(con) }); }
This one line emits:
#![allow(unused)] fn main() { #[no_mangle] pub unsafe extern "C" fn hello_ext_init_c_api( info: duckdb_extension_info, access: *const duckdb_extension_access, ) -> bool { unsafe { quack_rs::entry_point::init_extension( info, access, quack_rs::DUCKDB_API_VERSION, |con| unsafe { register(con) }, ) } } }
Pass the full symbol name — hello_ext_init_c_api here. DuckDB looks up this exact
symbol when loading the extension. See The Entry Point for
the full initialization sequence.
Unit tests (no DuckDB process needed)
Test pure logic directly:
#![allow(unused)] fn main() { #[test] fn count_words_whitespace_variants() { assert_eq!(count_words(" hello world "), 2); assert_eq!(count_words("\t\nhello\tworld\n"), 2); assert_eq!(count_words(" "), 0); // all whitespace → 0 } #[test] fn first_word_empty_and_whitespace() { assert_eq!(first_word(""), ""); assert_eq!(first_word(" "), ""); } }
Test aggregate state with AggregateTestHarness:
#![allow(unused)] fn main() { #[test] fn word_count_null_rows_are_skipped() { // NULL rows: the callback skips them (no update call) let mut h = AggregateTestHarness::<WordCountState>::new(); h.update(|s| s.count += count_words("hello")); // NULL row omitted — models callback skip h.update(|s| s.count += count_words("world")); assert_eq!(h.finalize().count, 2); } #[test] fn word_count_combine() { let mut h1 = AggregateTestHarness::<WordCountState>::new(); h1.update(|s| s.count += count_words("hello world")); // 2 let mut h2 = AggregateTestHarness::<WordCountState>::new(); h2.update(|s| s.count += count_words("one two three four")); // 4 h2.combine(&h1, |src, tgt| tgt.count += src.count); assert_eq!(h2.finalize().count, 6); } }
Run all tests with:
cargo test --manifest-path examples/hello-ext/Cargo.toml
See the Testing Guide for the full test strategy.