Data¶
The fundcloud.data package is the single entry point for loading and persisting market data. Every backend implements the same Backend protocol — reads always work, writes are gated by a read_only constructor flag. The Catalog composes a source backend onto a sink backend with watermark-driven incremental refresh. For the task-first walkthrough, start with the Pulling and caching market data guide.
fundcloud.data
¶
Market data — unified Backend abstraction + Catalog orchestrator.
Every data backend (network providers like :class:YF, :class:FMP,
:class:AV, :class:Binance; local format backends like :class:CSV,
:class:Parquet, :class:DuckDB, :class:Memory) implements the single
:class:Backend protocol. Reads always work; writes are gated by the
read_only constructor flag and raise :class:ReadOnlyError when locked.
:class:Catalog binds named datasets to (source, sink) pairs and handles
incremental refresh from sink watermarks via :meth:Backend.sync_to.
Network backends are lazy-imported via :func:__getattr__ so installs
without yfinance / ccxt / httpx keep working.
OHLCV_COLUMNS
module-attribute
¶
Canonical order of the standard OHLCV fields.
Backend
¶
Bases: Protocol
Unified protocol for any data backend.
Every backend is readable. Backends with read_only=False also
accept :meth:write and :meth:delete. The key argument is the
logical dataset name; single-key sources accept key=None.
BaseBackend
¶
Bases: ABC
Default implementations shared by every concrete backend.
Subclasses must implement :meth:read and set the name ClassVar.
Writable backends override :meth:write and :meth:delete (and call
:meth:_check_writable first).
sync_to
¶
sync_to(
sink: Backend,
*,
key: str | None = None,
source_key: str | None = None,
start: Timestamp | str | None = None,
end: Timestamp | str | None = None,
mode: WriteMode = "upsert",
) -> pd.DataFrame
Read from self and write the result to sink under key.
key is the sink key (where to land the data). source_key is
the source key (where to read from); defaults to None so the
source picks its own canonical frame (e.g. a network backend ignores
the key, a single-frame format backend resolves to its lone entry).
Source code in python/fundcloud/data/_base.py
ReadOnlyError
¶
Bases: RuntimeError
Raised when write / delete is called on a read-only backend.
YF
¶
Bases: BaseBackend
Pull OHLCV bars from Yahoo Finance.
Free, keyless, and best-effort. Under rate-limit or outage, errors propagate with no automatic fallback — treat it as a local-dev convenience rather than a production data plane.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
Sequence[str] | str
|
One ticker (string) or many (sequence). |
required |
interval
|
str
|
Bar interval; one of |
'1d'
|
adjust
|
bool
|
Default |
True
|
Source code in python/fundcloud/data/yf.py
FMP
¶
FMP(
symbols: Sequence[str] | str,
*,
interval: str = "1d",
adjust: bool = True,
api_key: str | None = None,
base_url: str = _FMP_BASE_URL,
)
Bases: BaseBackend
Pull OHLCV bars from the FinancialModelingPrep REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
Sequence[str] | str
|
One ticker (string) or many (sequence). |
required |
interval
|
str
|
One of |
'1d'
|
adjust
|
bool
|
Default |
True
|
api_key
|
str | None
|
Falls back to the |
None
|
base_url
|
str
|
Override the default FMP endpoint (useful for tests). |
_FMP_BASE_URL
|
Source code in python/fundcloud/data/fmp.py
AV
¶
AV(
symbols: Sequence[str] | str,
*,
interval: str = "1d",
adjust: bool = True,
api_key: str | None = None,
base_url: str = _AV_BASE_URL,
)
Bases: BaseBackend
Pull daily / weekly / monthly OHLCV bars from Alpha Vantage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
Sequence[str] | str
|
One ticker (string) or many (sequence). |
required |
interval
|
str
|
One of |
'1d'
|
adjust
|
bool
|
Default |
True
|
api_key
|
str | None
|
Falls back to |
None
|
base_url
|
str
|
Override the default endpoint (useful for tests). |
_AV_BASE_URL
|
Source code in python/fundcloud/data/av.py
Binance
¶
Binance(
symbols: Sequence[str] | str,
*,
interval: str = "1d",
limit: int = 1000,
sandbox: bool = False,
)
Bases: BaseBackend
Fetch spot OHLCV bars from Binance via ccxt.
Source code in python/fundcloud/data/binance.py
CSV
¶
CSV(
path: str | Path,
*,
symbols: Sequence[str] | None = None,
date_col: str = "date",
read_only: bool = True,
)
Bases: BaseBackend
Read one or many CSV files into a single frame.
Source code in python/fundcloud/data/csv.py
Parquet
¶
Bases: BaseBackend
Per-key parquet files under a root directory.
Source code in python/fundcloud/data/parquet.py
DuckDB
¶
Bases: BaseBackend
Persist frames as DuckDB tables inside a single database file.
Source code in python/fundcloud/data/duckdb.py
Memory
¶
Bases: BaseBackend
Dict-backed :class:fundcloud.data.Backend.
Source code in python/fundcloud/data/memory.py
Catalog
¶
A named collection of datasets sharing a single sink :class:Backend.
Source code in python/fundcloud/data/catalog.py
describe
¶
Produce a one-row-per-dataset summary frame.
Source code in python/fundcloud/data/catalog.py
from_spec
classmethod
¶
Build a catalog from a mapping of {name: {source, source_kwargs, ...}}.
Source code in python/fundcloud/data/catalog.py
load
¶
load(
name: str,
*,
start: Timestamp | str | None = None,
end: Timestamp | str | None = None,
prefer_store: bool = True,
) -> pd.DataFrame
Return rows for name.
When prefer_store=True and the sink already has the dataset, the
sink is the source of truth (callers who want fresh data should call
:meth:refresh first). Otherwise the source is pulled and the result
is persisted before being returned.
Call-site start / end win over refresh_kwargs.start /
refresh_kwargs.end; both are forwarded to the underlying
:meth:Backend.read.
Source code in python/fundcloud/data/catalog.py
refresh
¶
Pull incremental rows for name and upsert them into the sink.
The sink watermark (last_index) is the default start. If
:attr:DatasetSpec.refresh_kwargs carries a lookback window, it
is subtracted from the watermark so recently-corrected rows get
re-pulled and deduplicated by mode='upsert'.
Source code in python/fundcloud/data/catalog.py
refresh_all
¶
refresh_all(
*,
end: Timestamp | str | None = None,
tags: tuple[str, ...] | None = None,
) -> dict[str, pd.DataFrame]
Refresh every (optionally tag-filtered) dataset.
Source code in python/fundcloud/data/catalog.py
register
¶
register(
name: str,
source: Backend,
*,
store_key: str | None = None,
refresh_kwargs: Mapping[str, Any] | None = None,
tags: tuple[str, ...] = (),
) -> DatasetSpec
Register a dataset. Returns the resulting :class:DatasetSpec.
Source code in python/fundcloud/data/catalog.py
to_spec
¶
Serialise the catalog into a dict that from_spec can read.
Sources are referenced by their fully-qualified class name plus their
constructor kwargs. Callers are responsible for making sure the
referenced modules are importable when from_spec runs.
Source code in python/fundcloud/data/catalog.py
DatasetSpec
dataclass
¶
DatasetSpec(
name: str,
source: Backend,
store_key: str,
refresh_kwargs: dict[str, Any] = dict(),
tags: tuple[str, ...] = (),
)
Declarative binding for a single dataset.
normalize_field
¶
Coerce a column field name to lowercase snake_case.
Examples:
>>> normalize_field("Open")
'open'
>>> normalize_field("CLOSE")
'close'
>>> normalize_field("AdjClose")
'adj_close'
>>> normalize_field("Adj Close")
'adj_close'
>>> normalize_field("VWAP")
'vwap'
>>> normalize_field("HTTPRequest")
'http_request'
Source code in python/fundcloud/data/_columns.py
normalize_ohlcv_columns
¶
Lowercase + snake_case every column field name.
Handles both flat columns and the canonical (field, symbol)
MultiIndex layout. Returns the same frame (column relabel is in
place); pass a copy in if the caller cares about identity.
Source code in python/fundcloud/data/_columns.py
canonicalize_ohlcv_order
¶
Reorder columns so OHLCV fields appear in canonical order.
Non-OHLCV fields are kept after the canonical ones, in the order they
already appear. Works for both flat and (field, symbol) MultiIndex
layouts. Returns the reordered frame.
Source code in python/fundcloud/data/_columns.py
fundcloud.data.bars
¶
OHLCV (Bars) utilities — conversion, alignment, resampling.
These are free functions over plain pandas structures. They encode the
canonical data shapes (Bars: DatetimeIndex + top-level OHLCV column
labels, optionally a second-level per-asset index) that the rest of the
library relies on.
align
¶
Align multiple wide frames onto the same index (and columns).
Useful for combining prices + factors + signals before optimisation.
Source code in python/fundcloud/data/bars.py
as_long
¶
Melt a wide (date × asset) frame to long (date, asset, value).
Source code in python/fundcloud/data/bars.py
as_wide
¶
as_wide(
long: DataFrame,
*,
ts: str = "ts",
asset: str = "asset",
value: str = "value",
) -> pd.DataFrame
Pivot a long (ts, asset, value) frame to wide (date × asset).
Source code in python/fundcloud/data/bars.py
resample
¶
Resample a Bars frame to a coarser frequency.
Defaults apply the standard OHLCV aggregation: first/max/min/last/sum for open/high/low/close/volume, and last for any other column.
Source code in python/fundcloud/data/bars.py
to_log_returns
¶
to_log_returns(
prices_or_bars: DataFrame | Series,
*,
field: PriceField = "close",
dropna: bool = True,
) -> pd.DataFrame | pd.Series
Convenience alias for to_returns(..., method='log').
Source code in python/fundcloud/data/bars.py
to_prices
¶
Extract a wide per-asset price panel from a Bars frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bars
|
DataFrame
|
Either a wide frame whose columns are asset names (then returned as is, cast to float), or a frame with a two-level column index where the top level is the OHLCV field. |
required |
field
|
PriceField
|
Which field to pull when |
'close'
|
Source code in python/fundcloud/data/bars.py
to_returns
¶
to_returns(
prices_or_bars: DataFrame | Series,
*,
field: PriceField = "close",
method: Literal["simple", "log"] = "simple",
dropna: bool = True,
) -> pd.DataFrame | pd.Series
Convert prices to period returns.
Accepts either a wide price panel, a Bars DataFrame, or a single price
Series. Returns have the same shape and index as the input, minus the
first row if dropna is True.
Source code in python/fundcloud/data/bars.py
fundcloud.data.catalog
¶
Catalog — name → (source Backend, sink Backend key, refresh policy).
A Catalog is the orchestrator that binds a user-facing dataset name to a
read backend (the source) and a key inside the catalog's sink backend
(the cache). Refreshes call :meth:Backend.sync_to with mode='upsert'
so overlapping rows from re-pulls dedup on the timestamp index.
Per-dataset overrides are persisted in :attr:DatasetSpec.refresh_kwargs
with these recognised keys:
start: minimum date to pull on initial load.end: maximum date (rare; usually omitted).lookback:pd.Timedelta-compatible window subtracted from the sink watermark on :meth:refresh. Used to re-pull recent rows that upstream may correct (corporate actions, restatements, exchange revisions).
The spec format is a plain Python dict — YAML is out of scope on purpose
(callers do yaml.safe_load(path.read_text()) and pass the dict in).
See :meth:Catalog.to_spec / :meth:Catalog.from_spec for round-trippable
serialisation.
Catalog
¶
A named collection of datasets sharing a single sink :class:Backend.
Source code in python/fundcloud/data/catalog.py
describe
¶
Produce a one-row-per-dataset summary frame.
Source code in python/fundcloud/data/catalog.py
from_spec
classmethod
¶
Build a catalog from a mapping of {name: {source, source_kwargs, ...}}.
Source code in python/fundcloud/data/catalog.py
load
¶
load(
name: str,
*,
start: Timestamp | str | None = None,
end: Timestamp | str | None = None,
prefer_store: bool = True,
) -> pd.DataFrame
Return rows for name.
When prefer_store=True and the sink already has the dataset, the
sink is the source of truth (callers who want fresh data should call
:meth:refresh first). Otherwise the source is pulled and the result
is persisted before being returned.
Call-site start / end win over refresh_kwargs.start /
refresh_kwargs.end; both are forwarded to the underlying
:meth:Backend.read.
Source code in python/fundcloud/data/catalog.py
refresh
¶
Pull incremental rows for name and upsert them into the sink.
The sink watermark (last_index) is the default start. If
:attr:DatasetSpec.refresh_kwargs carries a lookback window, it
is subtracted from the watermark so recently-corrected rows get
re-pulled and deduplicated by mode='upsert'.
Source code in python/fundcloud/data/catalog.py
refresh_all
¶
refresh_all(
*,
end: Timestamp | str | None = None,
tags: tuple[str, ...] | None = None,
) -> dict[str, pd.DataFrame]
Refresh every (optionally tag-filtered) dataset.
Source code in python/fundcloud/data/catalog.py
register
¶
register(
name: str,
source: Backend,
*,
store_key: str | None = None,
refresh_kwargs: Mapping[str, Any] | None = None,
tags: tuple[str, ...] = (),
) -> DatasetSpec
Register a dataset. Returns the resulting :class:DatasetSpec.
Source code in python/fundcloud/data/catalog.py
to_spec
¶
Serialise the catalog into a dict that from_spec can read.
Sources are referenced by their fully-qualified class name plus their
constructor kwargs. Callers are responsible for making sure the
referenced modules are importable when from_spec runs.
Source code in python/fundcloud/data/catalog.py
DatasetSpec
dataclass
¶
DatasetSpec(
name: str,
source: Backend,
store_key: str,
refresh_kwargs: dict[str, Any] = dict(),
tags: tuple[str, ...] = (),
)
Declarative binding for a single dataset.