Table Functions

Table functions implement the SELECT * FROM my_function(args) pattern — they return a result set rather than a scalar value. DuckDB table functions have three lifecycle callbacks: bind, init, and scan.

quack-rs provides TableFunctionBuilder plus the helper types BindInfo, InitInfo, FunctionInfo, FfiBindData<T>, FfiInitData<T>, and FfiLocalInitData<T> to eliminate the raw FFI boilerplate.

Lifecycle

PhaseCallbackCalled whenTypical work
bindbind_fnQuery is plannedExtract parameters; register output columns; store config in bind data
initinit_fnExecution startsAllocate per-scan state (cursor, row index, etc.)
scanscan_fnEach output batchFill duckdb_data_chunk with rows; call duckdb_data_chunk_set_size

The scan callback is called repeatedly until it writes 0 rows in a batch, signalling end-of-results.

Builder API

#![allow(unused)]
fn main() {
use quack_rs::table::{TableFunctionBuilder, BindInfo, FfiBindData, FfiInitData};
use quack_rs::types::TypeId;

TableFunctionBuilder::new("my_function")
    .param(TypeId::BigInt)                 // positional parameter types
    .bind(my_bind_callback)               // declare output columns inside bind
    .init(my_init_callback)
    .scan(my_scan_callback)
    .register(con)?;
}

Output columns are declared inside the bind callback using BindInfo::add_result_column, not on the builder itself.

State management

Bind data

Bind data persists from the bind phase through all scan batches. Use FfiBindData<T> to allocate it safely:

#![allow(unused)]
fn main() {
struct MyBindData {
    limit: i64,
}

unsafe extern "C" fn my_bind(info: duckdb_bind_info) {
    let n = unsafe { duckdb_get_int64(duckdb_bind_get_parameter(info, 0)) };
    unsafe { FfiBindData::<MyBindData>::set(info, MyBindData { limit: n }) };
}
}

FfiBindData::set stores the value and registers a destructor so DuckDB frees it at the right time — no Box::into_raw / Box::from_raw needed.

Init (scan) state

Per-scan state (e.g., a current row index) uses FfiInitData<T>:

#![allow(unused)]
fn main() {
struct MyScanState {
    pos: i64,
}

unsafe extern "C" fn my_init(info: duckdb_init_info) {
    unsafe { FfiInitData::<MyScanState>::set(info, MyScanState { pos: 0 }) };
}
}

Complete example: generate_series_ext

The hello-ext example registers generate_series_ext(n BIGINT) which emits integers 0 .. n-1. See examples/hello-ext/src/lib.rs for the full source.

#![allow(unused)]
fn main() {
// Bind: extract `n`, register one output column
unsafe extern "C" fn gs_bind(info: duckdb_bind_info) {
    let param = unsafe { duckdb_bind_get_parameter(info, 0) };
    let n = unsafe { duckdb_get_int64(param) };
    unsafe { duckdb_destroy_value(&mut { param }) };

    let out_type = LogicalType::new(TypeId::BigInt);
    unsafe { duckdb_bind_add_result_column(info, c"value".as_ptr(), out_type.as_raw()) };

    unsafe { FfiBindData::<GsBindData>::set(info, GsBindData { total: n }) };
}

// Init: zero-initialise the scan cursor
unsafe extern "C" fn gs_init(info: duckdb_init_info) {
    unsafe { FfiInitData::<GsScanState>::set(info, GsScanState { pos: 0 }) };
}

// Scan: emit a batch of rows
unsafe extern "C" fn gs_scan(info: duckdb_function_info, output: duckdb_data_chunk) {
    let bind = unsafe { FfiBindData::<GsBindData>::get_from_function(info) }.unwrap();
    let state = unsafe { FfiInitData::<GsScanState>::get_mut(info) }.unwrap();

    let remaining = bind.total - state.pos;
    let batch = remaining.min(2048).max(0) as usize;

    let mut writer = unsafe { VectorWriter::new(duckdb_data_chunk_get_vector(output, 0)) };
    for i in 0..batch {
        unsafe { writer.write_i64(i, state.pos + i as i64) };
    }
    unsafe { duckdb_data_chunk_set_size(output, batch as idx_t) };
    state.pos += batch as i64;
}
}

Registration

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("generate_series_ext")
    .param(TypeId::BigInt)
    .bind(gs_bind)
    .init(gs_init)
    .scan(gs_scan)
    .register(con)?;
}

Advanced features

Named parameters

Named parameters let callers pass optional arguments by name (e.g., step := 10):

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("gen_series_v2")
    .param(TypeId::BigInt)                    // positional: n
    .named_param("step", TypeId::BigInt)      // named: step := <value>
    .bind(gs_v2_bind)
    .init(gs_v2_init)
    .scan(gs_v2_scan)
    .register(con)?;
}

In the bind callback, read the named parameter with duckdb_bind_get_named_parameter(info, c"step".as_ptr()).

Local init (per-thread state)

For multi-threaded table functions, use local_init to allocate per-thread state:

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("gen_series_v2")
    .param(TypeId::BigInt)
    .bind(gs_v2_bind)
    .init(gs_v2_init)
    .local_init(gs_v2_local_init)            // per-thread state allocation
    .scan(gs_v2_scan)
    .register(con)?;
}

The local init callback receives duckdb_init_info and can use FfiLocalInitData<T>::set to store per-thread state.

Thread control

Use InitInfo::set_max_threads in the global init callback to tell DuckDB how many threads can scan concurrently:

#![allow(unused)]
fn main() {
unsafe extern "C" fn gs_v2_init(info: duckdb_init_info) {
    let init_info = unsafe { InitInfo::new(info) };
    unsafe { init_info.set_max_threads(1) };
    unsafe { FfiInitData::<MyState>::set(info, MyState { pos: 0 }) };
}
}

Projection pushdown

Enable projection pushdown to let DuckDB skip unrequested columns:

#![allow(unused)]
fn main() {
TableFunctionBuilder::new("my_func")
    .projection_pushdown(true)
    // ...
}

Caution: When projection pushdown is enabled, your scan callback must check which columns DuckDB actually needs using InitInfo::projected_column_count and InitInfo::projected_column_index. Writing to non-projected columns causes crashes.

See examples/hello-ext/src/lib.rs for a complete example using named_param, local_init, and set_max_threads.

Complex parameter types

For parameterised types that TypeId cannot express (e.g. LIST(BIGINT), MAP(VARCHAR, INTEGER), STRUCT(...)), use param_logical and named_param_logical:

#![allow(unused)]
fn main() {
use quack_rs::types::LogicalType;

TableFunctionBuilder::new("read_data")
    .param_logical(LogicalType::list(TypeId::Varchar))        // positional LIST param
    .named_param_logical("options", LogicalType::map(          // named MAP param
        TypeId::Varchar, TypeId::Varchar,
    ))
    .bind(bind_fn)
    .init(init_fn)
    .scan(scan_fn)
    .register(con)?;
}

BindInfo helpers

BindInfo wraps duckdb_bind_info and exposes these methods:

MethodDescription
add_result_column(name, TypeId)Declares an output column
add_result_column_with_type(name, &LogicalType)Output column with complex type
set_cardinality(rows, is_exact)Cardinality hint for the optimizer
set_error(message)Report a bind-time error
parameter_count()Number of positional parameters
get_parameter(index)Returns a positional parameter value (duckdb_value)
get_named_parameter(name)Returns a named parameter value (duckdb_value)
get_extra_info()Returns the extra-info pointer set on the function
get_client_context()Returns a ClientContext (requires duckdb-1-5 feature)

InitInfo helpers

InitInfo wraps duckdb_init_info:

MethodDescription
projected_column_count()Number of projected columns (with pushdown)
projected_column_index(idx)Output column index at projection position
set_max_threads(n)Maximum parallel scan threads
set_error(message)Report an init-time error
get_extra_info()Returns the extra-info pointer set on the function

FunctionInfo helpers

FunctionInfo wraps duckdb_function_info (scan callbacks):

MethodDescription
set_error(message)Report a scan-time error
get_extra_info()Returns the extra-info pointer set on the function

Extra info

Use TableFunctionBuilder::extra_info to attach function-level data that is accessible from all callbacks (bind, init, and scan) via get_extra_info().

Verified output (DuckDB 1.4.4 and 1.5.0)

SELECT * FROM generate_series_ext(5);
-- 0
-- 1
-- 2
-- 3
-- 4

SELECT value * value AS sq FROM generate_series_ext(4);
-- 0
-- 1
-- 4
-- 9

See also