Skip to content

Define versioning expectations for pilot conventions#139

Open
maxrjones wants to merge 3 commits into
mainfrom
docs/versioning-policy
Open

Define versioning expectations for pilot conventions#139
maxrjones wants to merge 3 commits into
mainfrom
docs/versioning-policy

Conversation

@maxrjones

Copy link
Copy Markdown
Member

This PR documents the consensus on an initial convention version from zarr-conventions/proj#20, zarr-conventions/zarr-conventions-spec#29, zarr-conventions/zarr-conventions-spec#7, and #102. This consensus unlocks solving zarr-conventions/proj#19 via a v0.1 release.

cc participants in those discussions @emmanuelmathot @d-v-b @kylebarron @pvanlaake

This PR also intentionally defers items where a clear consensus has not been reached, so that solving zarr-conventions/proj#19 is not blocked by those discussions.

@maxrjones

Copy link
Copy Markdown
Member Author

@vincentsarago @kylebarron it would be helpful for you to review/comment if you have opinions here

Comment thread CONTRIBUTING.md Outdated
Comment thread CONTRIBUTING.md Outdated

The GeoZarr specification is a document that **references** a set of conventions at pinned versions. It is versioned independently of those conventions, on its own editorial cadence: a new GeoZarr release may update prose or re-point a reference without any convention changing, and a convention may release a new version without forcing an immediate GeoZarr release.

Each GeoZarr release records the exact convention versions it references (in the release notes and the specification's normative references), so that a given GeoZarr version resolves to a specific, reproducible set of conventions.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this is a good approach. Data sets may hang around for decades and the conventions they use then fade out of GeoZarr? I would propose that once a convention is referenced in GeoZarr that it persists for all eternity (in that registry that is on the drawing boards).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this portion as out of scope for this PR, so we can discuss it independently from the expectations for pilot conventions. Please expect a follow-up PR

@pvanlaake pvanlaake left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition overall, a few suggestions inlined. One global comment: how does any/all of this align with the Zarr convention work, i.e. those conventions not included in GeoZarr?

@geospatial-jeff

Copy link
Copy Markdown

Why is the GeoZarr spec needed at all if it's just referencing other conventions?

This is all feeling a lot like python packaging:

  • Libraries are built and published independently of each other.
  • Python ecosystem has tools like uv, pixi, poetry that resolve the environment based on the dependencies required by a project.

In this metaphor, GeoZarr is a python "metapackage" which itself doesn't have any code but simply references other things.

There isn't a metapackage spec; because it's a packaging problem not a spec problem. So why is a GeoZarr spec needed?

@vincentsarago

Copy link
Copy Markdown

Why is the GeoZarr spec needed at all if it's just referencing other conventions?

Well I will not go as far as that!

The spec is needed but I don't think the spec should tell which version of the conventions are required. To me this will open a 🕳️ .

Take the STAC specification, an Item follows the STAC spec v1.0.0 but can have multiple extensions with specific versions, but it still a valid STAC Item. If we tell, a GeoZarr HAVE TO have proj convention v1.0.0, what happens when proj v1.0.1 comes out? will a provider that creates GeoZarr has to wait for the GeoZarr spec to be updated? This will put a large burden on the GeoZarr maintainers to keep up to date with the conventions. Also what happens if a provider want to create a GeoZarr with it's own mixed of versions (e.g proj 1.1, multiscale 2.0), again the GeoZarr spec won't cover this case.

The python example from @geospatial-jeff is interesting, but in the python environment you can specified lower and upper version limit, which won't be possible here if we do need a single conformance class

@geospatial-jeff

Copy link
Copy Markdown

The spec is needed but I don't think the spec should tell which version of the conventions are required. To me this will open a 🕳️ .

Yup this is really the only option when vendoring other specifications, as GeoZarr does. If any of the conventions beneath GeoZarr make a breaking change then GeoZarr itself will be forced to adopt that breaking change causing end users to constantly migrate their GeoZarr datasets / implementations whenever any of the underlying conventions change.

I also prefer the STAC model for extension versioning. It's more lightweight and more flexible.

@maxrjones

Copy link
Copy Markdown
Member Author

I also prefer the STAC model for extension versioning. It's more lightweight and more flexible.

I also like the STAC model for extension versioning because it's lighter and more flexible. My one hesitation is whether it actually resolves the maintenance-fatigue failure mode Patrick raised in #141, or just relocates it. From the outside, STAC's extension ecosystem looks like it leans heavily on a small number of maintainers, and I can't tell whether that's held together by structural mechanisms that would survive any individual stepping away, or by the current maintainers still having energy. If the latter, the risk is real but opaque to those of us benefiting from their work.

Concretely for GeoZarr: pinning concentrates the burden at the spec layer (every convention patch risks forcing a re-release), while the conformance-class model pushes it out to each convention's own maintainers. The conformance-class model lowers central fatigue but multiplies the number of small maintainer pools that each carry their own bus-factor risk. Does STAC have an actual answer to that, or has it just not hit the wall yet?

@maxrjones

Copy link
Copy Markdown
Member Author

Thanks for the engagement, folks!

I've substantially narrowed the scope of this PR following this discussion. The PR now covers only convention-level versioning mechanics and permanence (Sections 7.1–7.3). This is the minimal set needed to unblock a proj v0.1 release (zarr-conventions/proj#19).

Moved out, not dropped:

  • The relationship between GeoZarr and constituent conventions (raised by @vincentsarago, and the scope question from @geospatial-jeff) moves to a follow-up PR with a reworked proposal. Opening tomorrow or Monday, but I first need to discuss the implications for the OGC process with the TC chair.
  • The release gate and its sequencing with the OGC process moves to a separate follow-up, since it deserves its own review.

Deferred upstream:

Per @pvanlaake's global comment about alignment with non-GeoZarr conventions: declaration mechanics (how versions are carried in zarr_conventions metadata, and composition/scoping across a hierarchy) are framework-level policy and belong in zarr-conventions-spec, not here. Section 7.1 now references the framework rather than defining this. The version-numbering scheme likewise stays deferred to #102 / zarr-conventions/zarr-conventions-spec#29. I'll open framework issues for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants