Skip to content

holukas/diive

Repository files navigation

Python PyPI - Version GitHub License PyPI Downloads DOI

diive is currently being prepared for the v1.0 release.

Time series data processing

diive is a Python library for time series processing, focused on ecosystem data. It was originally developed by the ETH Grassland Sciences group for Swiss FluxNet.

CHANGELOG | Releases


Citation

Cite diive using DOI 10.5281/zenodo.10884017. This concept DOI resolves to the latest release, so include the version number in your citation.

BibTeX format:

@software{diive2026,
  author = {Hörtnagl, Lukas},
  title = {diive: Python library for time series processing},
  version = {0.91.0},
  year = {2026},
  doi = {10.5281/zenodo.10884017}
}

Replace version and year with the values for your target release.


Installation

Requires Python 3.12+

pip install diive

Or with uv:

uv pip install diive

Quick start

import diive as dv

# Load example data (a 37-variable ecosystem dataset)
df = dv.load_exampledata_parquet()

# Plot a time series — two-phase: construct, then .plot()
dv.plotting.TimeSeries(series=df['NEE_CUT_REF_orig']).plot()

# Gap-fill with Random Forest
from diive.core.ml.feature_engineer import FeatureEngineer
from diive.gapfilling.randomforest_ts import RandomForestTS

engineer = FeatureEngineer(target_col='NEE_CUT_REF_orig', features_lag=[-1, 1])
df_engineered = engineer.fit_transform(df)

model = RandomForestTS(input_df=df_engineered, target_col='NEE_CUT_REF_orig', n_estimators=100)
model.run()                       # trains the model, then fills gaps
gapfilled = model.results.gapfilled

API

import diive as dv exposes nine domain namespaces. Classes live under the namespace for their area:

import diive as dv

plot = dv.plotting.TimeSeries(series=data)
model = dv.gapfilling.RandomForestTS(input_df=df, target_col='NEE')
Namespace Common exports
dv.plotting TimeSeries, Cumulative, DielCycle, HeatmapDateTime
dv.gapfilling RandomForestTS, XGBoostTS, FluxMDS
dv.analysis GridAggregator, SeasonalTrendDecomposition, BinFitterCP
dv.flux run_chain, FluxConfig, FluxDetectionLimit, WindDoubleRotation
dv.outliers / dv.corrections / dv.qaqc outlier methods, offset corrections, FlagQCF
dv.times / dv.variables timestamp sanitization, derived variables (VPD, potential radiation, ...)

A few I/O helpers are top-level: dv.load_parquet, dv.save_parquet, dv.load_exampledata_parquet.

For the full list, see diive.__all__ and each namespace's __all__.


Examples

106 runnable examples are organized by topic in examples/. They follow Sphinx Gallery format (# %% sections), so they run as plain scripts and convert to HTML docs automatically. Browse by use case in CATALOG.md, or check EXAMPLE_DATASET.md for documentation of the 37-variable dataset used throughout.

uv run python examples/visualization/plot_heatmap_datetime_basic.py
uv run python examples/analysis/analysis_daily_correlation.py
uv run python examples/gapfilling/gapfill_randomforest.py
uv run python examples/flux/fluxprocessingchain/fluxprocessingchain_composable.py

Features

Gap-filling

FeatureEngineer runs an 8-stage feature pipeline (lag features, rolling stats, differencing, EMA, polynomial terms, STL decomposition, timestamps, record numbering). You build the features once and reuse them across models.

Method How it works
XGBoostTS Gradient boosting
RandomForestTS Ensemble learning with SHAP importance
FluxMDS Meteorological similarity, no training needed
Linear interpolation Short gaps only

Long-term variants support multi-year data with USTAR scenario options. See examples/gapfilling/.

Flux processing chain

Post-processing from quality flags through gap-filling, covering Levels 2 to 4.1 following Swiss FluxNet standards. Two entry points:

  • run_chain(data, config) — single call drives the full pipeline (L2 → L3.1 → L3.2 → L3.3 → L4.1) from one FluxConfig. Intentionally simple: fixed defaults for per-detector / per-model knobs (Hampel sub-options, MDS tolerances, ML hyperparameters). Use this for the standard FLUXNET-style workflow.
  • Composable per-level callables (run_level2, run_level31, make_level32_detector + run_level32, run_level33_constant_ustar / run_level33_ustar_detection, run_level41_mds / _rf / _xgb) — full control. Every detector class, model hyperparameter, MDS tolerance, and diagnostic flag is reachable here and only here.

Need a computed driver (e.g. VPD in kPa) for L4.1? Use add_driver(data, series) to put it where L4.1 actually reads from. Call data.gap_stats() at any level for a monthly/annual breakdown with long-gap listing. data.plot_gapfilled_heatmaps() puts all gap-filling methods side by side; data.plot_cumulative_comparison() overlays their cumulative sums on one axes.

Reference: Swiss FluxNet flux processing | Examples: examples/flux/fluxprocessingchain/

Quality control and outlier detection

FlagQCF merges multiple test flags into a single quality indicator with daytime/nighttime separation and USTAR scenario support.

Nine outlier detection methods are available: Hampel filter, Z-score (global, rolling, or split by day/night), local SD, Local Outlier Factor, absolute limits, incremental detection, manual removal, trimmed mean, and stepwise chaining across multiple methods. See examples/preprocessing/outlier_detection/.

Corrections and preprocessing

Tools cover offset correction for measurements, radiation, humidity, and wind direction; threshold and missing value handling; and timestamp sanitization (validation, regularization, frequency detection). See examples/preprocessing/corrections/ and examples/times/.

Analysis

Seasonal-trend decomposition (STL, classical, or harmonic), lagged correlation and binned analysis, 2D grid aggregation, gap detection with monthly/annual breakdown, and percentiles/histograms. See examples/analysis/.

Derived variables

VPD from temperature and humidity, day/night flags from solar geometry, air density, aerodynamic resistance, unit conversions, lagged features, and clear-sky potential radiation. See examples/features/.

Eddy covariance

Flux detection limit from 20 Hz data, maximum covariance lag, pre-whitening bootstrap (PWB) for trace gases (CH4, N2O) with single-period and multi-file parallel variants, an end-to-end per-chunk PWB time-lag detect+remove pipeline (diive-tlag-pwb-detect-remove, plus a Textual TUI diive-tlag-pwb-detect-remove-tui) that splits long raw files into 30-min chunks and writes lag-corrected raw files (line endings preserved), wind double rotation, self-heating correction for open-path IRGAs, USTAR filtering, and random error propagation. See examples/flux/.

Visualization

14+ plot types including time series, cumulative, diel cycle, heatmaps (datetime and year-month), hexbin, histogram, ridgeline, scatter, and anomaly plots. Both Matplotlib and Plotly are supported. See examples/visualization/.

I/O

Load and save parquet files, read single or batch EddyPro output, detect and split irregular files, and format data for FLUXNET submission. See examples/io/.


Contributing

See CLAUDE.md for development setup, coding standards, and testing.


License

diive is released under the GNU General Public License v3.0.