Pitfall Catalog
All known DuckDB Rust FFI pitfalls, discovered while building duckdb-behavioral, a production DuckDB community extension. Every future developer who builds a Rust DuckDB extension will hit the majority of these. quack-rs makes most of them impossible.
L1: COMBINE must propagate ALL config fields
Status: Testable with AggregateTestHarness.
Symptom: Aggregate function returns wrong results. No error, no crash.
Root cause: DuckDB's segment tree creates fresh zero-initialized target
states via state_init, then calls combine to merge source states into them.
If your combine only propagates data fields (count, sum) but omits
configuration fields (window_size, mode), the configuration will be zero at
finalize time, silently corrupting results.
This bug passed 435 unit tests before being caught by E2E tests.
Fix:
#![allow(unused)] fn main() { unsafe extern "C" fn combine( _info: duckdb_function_info, source: *mut duckdb_aggregate_state, target: *mut duckdb_aggregate_state, count: idx_t, ) { for i in 0..count as usize { let src_ptr = unsafe { *source.add(i) }; let tgt_ptr = unsafe { *target.add(i) }; if let (Some(src), Some(tgt)) = ( FfiState::<MyState>::with_state(src_ptr), FfiState::<MyState>::with_state_mut(tgt_ptr), ) { tgt.window_size = src.window_size; // config — MUST copy tgt.mode = src.mode; // config — MUST copy tgt.count += src.count; // data — accumulate } } } }
Test this with AggregateTestHarness::combine — see Testing Guide.
L2: State destroy double-free
Status: Made impossible by FfiState<T>.
Symptom: Crash or memory corruption on extension unload.
Root cause: If state_destroy frees the inner Box but does not null the
pointer, a second state_destroy call (common in error paths) frees
already-freed memory → undefined behavior.
Fix: FfiState<T>::destroy_callback nulls inner after freeing. Use it
instead of writing your own destructor:
#![allow(unused)] fn main() { unsafe extern "C" fn state_destroy(states: *mut duckdb_aggregate_state, count: idx_t) { unsafe { FfiState::<MyState>::destroy_callback(states, count) }; } }
L3: No panic across FFI boundaries
Status: Made impossible by init_extension and panic = "abort".
Symptom: Extension causes DuckDB to crash or behave unpredictably.
Root cause: panic!() and .unwrap() in unsafe extern "C" functions is
undefined behavior. Panics cannot unwind across FFI boundaries in Rust.
Fix: Use Result and ? inside init_extension. Never use unwrap() in
FFI callbacks. FfiState::with_state_mut returns Option, not Result, so
callers use if let:
#![allow(unused)] fn main() { // Safe pattern — no unwrap in FFI callback if let Some(st) = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) } { st.count += 1; } // Dangerous — never do this in an FFI callback let st = unsafe { FfiState::<MyState>::with_state_mut(state_ptr) }.unwrap(); // UB if None }
The scaffold-generated Cargo.toml sets panic = "abort" in the release
profile, which terminates the process instead of unwinding — still bad, but not
undefined behavior.
L4: ensure_validity_writable is required before NULL output
Status: Made impossible by VectorWriter::set_null.
Symptom: SEGFAULT when writing NULL values to the output vector.
Root cause: duckdb_vector_get_validity returns an uninitialized pointer if
duckdb_vector_ensure_validity_writable has not been called first. Writing to
an uninitialized address → SEGFAULT.
Fix: Always call duckdb_vector_ensure_validity_writable before accessing
the validity bitmap on the write path. VectorWriter::set_null does this
automatically:
#![allow(unused)] fn main() { // Correct — handled by set_null unsafe { writer.set_null(row) }; // Wrong — validity bitmap may not be allocated yet // let validity = duckdb_vector_get_validity(output); // set_bit(validity, row, false); // SEGFAULT }
L5: Boolean reading must use u8 != 0, not *const bool
Status: Made impossible by VectorReader::read_bool.
Symptom: Undefined behavior; Rust requires bool to be exactly 0 or 1.
Root cause: DuckDB's C API does not guarantee that boolean values in vectors
are exactly 0 or 1. Values of 2, 255, etc. cast to Rust bool is undefined
behavior.
Fix: Read as u8 and compare with != 0. VectorReader::read_bool always
does this:
#![allow(unused)] fn main() { let b: bool = unsafe { reader.read_bool(row) }; // safe: uses u8 != 0 internally }
L6: Function set name must be set on EACH member
Status: Made impossible by AggregateFunctionSetBuilder.
Symptom: Functions are silently not registered. No error returned.
Root cause: When using duckdb_register_aggregate_function_set, the function
name must be set on EACH individual duckdb_aggregate_function using
duckdb_aggregate_function_set_name, not just on the set.
This is completely undocumented. Discovered by reading DuckDB's C++ test code
at test/api/capi/test_capi_aggregate_functions.cpp.
In duckdb-behavioral, 6 of 7 functions failed to register silently due to this bug.
Fix: AggregateFunctionSetBuilder calls duckdb_aggregate_function_set_name
on every individual function before adding it to the set. Use it instead of
managing the set manually.
L7: LogicalType memory leak
Status: Made impossible by LogicalType RAII wrapper.
Symptom: Memory leak proportional to number of registered functions.
Root cause: duckdb_create_logical_type allocates memory that must be freed
with duckdb_destroy_logical_type. Forgetting leaks memory.
Fix: LogicalType implements Drop and calls duckdb_destroy_logical_type
automatically when it goes out of scope.
P1: Library name must match extension name
Status: Must be configured in Cargo.toml. Scaffold handles this.
Symptom: Community build fails with FileNotFoundError.
Root cause: The community build expects lib{extension_name}.so. If the
Cargo crate name produces a different .so filename, the build fails.
Fix: Set name explicitly in [lib]:
[lib]
name = "my_extension" # Must match description.yml `name: my_extension`
crate-type = ["cdylib", "rlib"]
P2: Metadata version is C API version, not DuckDB version
Status: DUCKDB_API_VERSION constant encodes the correct value.
Symptom: Metadata script fails or produces incorrect metadata.
Root cause: The -dv flag to append_extension_metadata.py must be the
C API version (v1.2.0), not the DuckDB release version (v1.4.4). These are
different strings.
Fix: Use quack_rs::DUCKDB_API_VERSION ("v1.2.0") in init_extension,
and use the same version with append_extension_metadata.py -dv v1.2.0.
P3: E2E testing is mandatory
Status: Documented. See Testing Guide.
Symptom: All unit tests pass but the extension is completely broken.
Root cause: Unit tests cannot detect SEGFAULTs on load, silent registration failures, or wrong results from combine bugs.
Fix: Always run E2E tests using an actual DuckDB binary. The scaffold generates a complete SQLLogicTest skeleton.
P4: extension-ci-tools submodule must be initialized
Status: Build-time check.
Symptom: make configure or make release fails.
Fix:
git submodule update --init --recursive
P5: SQLLogicTest expected values must match exactly
Status: Test-authoring care required.
Symptom: Tests fail in CI but pass locally (or vice versa).
Root cause: SQLLogicTest does exact string matching. Output format (decimal places, NULL representation, column separators) must match character-for-character.
Fix: Generate expected values by running the SQL in DuckDB CLI and copying
the output. NULL is NULL (uppercase). Integers have no decimal places.
P6: duckdb_register_aggregate_function_set silently fails
Status: Builder returns Err. Also see L6.
Symptom: Function appears registered but is not found in SQL.
Root cause: The return value of duckdb_register_aggregate_function_set is
often ignored. When it returns DuckDBError, the function set is not registered.
Fix: The builder checks the return value and propagates it as Err.
P7: duckdb_string_t format is undocumented
Status: Handled by VectorReader::read_str and DuckStringView.
Symptom: VARCHAR reading produces garbage, empty strings, or crashes.
Root cause: DuckDB stores strings in a 16-byte struct with two formats
(inline ≤ 12 bytes, pointer > 12 bytes) that are not documented in
libduckdb-sys.
Fix: Use VectorReader::read_str(row). See
NULL Handling & Strings.
P8: INTERVAL struct layout is undocumented
Status: Handled by DuckInterval and read_interval_at.
Symptom: Interval calculations produce wrong results or crashes.
Root cause: DuckDB's INTERVAL is { months: i32, days: i32, micros: i64 }
(16 bytes total). This is not documented in libduckdb-sys. Month conversion
uses 1 month = 30 days (DuckDB's approximation).
Fix: Use VectorReader::read_interval(row) and DuckInterval. See
INTERVAL Type.
P9: loadable-extension dispatch table uninitialised in cargo test
Status: Fixed. InMemoryDb::open() initialises the dispatch table
automatically.
Symptom: All three InMemoryDb unit tests panic at runtime:
thread 'testing::in_memory_db::tests::in_memory_db_opens' panicked at
'DuckDB API not initialized or DuckDB feature omitted'
This failure appears only when running cargo test --features bundled-test.
Regular cargo test (no feature) does not exercise this code path, so CI can
miss it entirely.
Root cause: Cargo's feature-unification merges loadable-extension (from
the main libduckdb-sys dependency) and bundled-full (pulled in by the
duckdb crate's features = ["bundled"]) into a single libduckdb-sys build
with both features active. In loadable-extension mode every DuckDB C API
call is routed through an AtomicPtr<fn> dispatch table, which is normally
populated at extension-load time when DuckDB calls
duckdb_rs_extension_api_init. In cargo test, no DuckDB host process loads
the extension, so the table stays uninitialised and every call panics.
Discovery: This was triggered by the crates.io release workflow (which runs
--all-features) failing on macOS. Regular CI (--no-default-features,
--all-targets) never compiled the bundled-test path, so the bug was hidden
during development and code review.
Fix (implemented in quack-rs 0.6.0):
-
src/testing/bundled_api_init.cpp— a thin C++ shim that wraps DuckDB's internalCreateAPIv1()(fromduckdb/main/capi/extension_api.hpp) as a C-linkage symbol:#include "duckdb/main/capi/extension_api.hpp" extern "C" duckdb_ext_api_v1 quack_rs_create_api_v1() { return CreateAPIv1(); } -
build.rs— compiles the shim (via thecccrate) only when thebundled-testfeature is active, locating the DuckDB headers from thelibduckdb-sysbuild output directory. -
InMemoryDb::open()— callsinit_dispatch_table_once()before opening the connection. That function callsquack_rs_create_api_v1()once and feeds the result throughduckdb_rs_extension_api_init, populating all 459AtomicPtrslots in the dispatch table. Astd::sync::Onceguard makes it safe to call from any number of threads and test cases. -
CI
test-bundledjob — runscargo test --all-targets --features bundled-teston Linux, macOS, and Windows on every PR, so this class of failure is caught before release.
ABI compatibility note: DuckDB's duckdb_ext_api_v1 struct is defined
identically in both the public duckdb_extension.h (used by libduckdb-sys
bindgen) and the internal extension_api.hpp (used by CreateAPIv1()). Both
include the DUCKDB_EXTENSION_API_VERSION_UNSTABLE fields. CreateAPIv1() sets
all 459 fields. The Rust and C++ structs are produced from the same DuckDB
release and therefore stay in sync.
Risk table (using DuckDB's internal C++ API):
| Risk | Mitigation |
|---|---|
extension_api.hpp is renamed or moved | build.rs fails with a clear compile error |
CreateAPIv1() is renamed | Same — C++ compile error |
duckdb_ext_api_v1 gains new fields | CreateAPIv1() fills new fields too |
duckdb_ext_api_v1 field order changes | Both structs from same DuckDB release, stay in sync |
libduckdb-sys drops loadable-extension dispatch | Problem disappears; Once guard becomes cheap no-op |
Summary
| Pitfall | SDK status | Your action |
|---|---|---|
| L1: combine config fields | Testable | Test with AggregateTestHarness::combine |
| L2: state double-free | Prevented | Use FfiState::destroy_callback |
| L3: panic across FFI | Prevented | Use init_extension, no unwrap in callbacks |
| L4: validity bitmap SEGFAULT | Prevented | Use VectorWriter::set_null |
| L5: bool UB | Prevented | Use VectorReader::read_bool |
| L6: function set name | Prevented | Use AggregateFunctionSetBuilder |
| L7: LogicalType leak | Prevented | Use LogicalType (RAII) |
| P1: lib name mismatch | Scaffold | Set [lib] name in Cargo.toml |
| P2: API version string | Constant | Use DUCKDB_API_VERSION |
| P3: unit tests insufficient | Documented | Write SQLLogicTest E2E tests |
| P4: submodule not initialized | Build-time | git submodule update --init |
| P5: SQLLogicTest exact match | Documented | Copy output from DuckDB CLI |
| P6: register set silent fail | Prevented | Builder returns Err |
| P7: VARCHAR format undocumented | Prevented | Use VectorReader::read_str |
| P8: INTERVAL layout undocumented | Prevented | Use DuckInterval |
| P9: dispatch table uninitialised | Fixed | InMemoryDb::open() initialises it via C++ shim |