Skip to content

Commit 5585866

Browse files
Merge pull request #502 from hotosm/feat/visibility-and-anon-reads
Feature : visibility on the models
2 parents 0358a80 + 6fd71e7 commit 5585866

23 files changed

Lines changed: 482 additions & 129 deletions

backend/ARCHITECTURE.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ What happens behind each step:
2323
| Step | What the backend does | Output |
2424
| --- | --- | --- |
2525
| **1. AOI** | Validates polygon, stores in postgres. | Row in `datasets_aoi`. |
26-
| **2. Build dataset** | Async worker downloads OAM tiles + OSM labels for the AOI, uploads chips + `labels.geojson` to MinIO, registers a STAC item under `datasets/`. | Published STAC dataset. Poll `GET /datasets/{id}/` until `build_status=published`. |
26+
| **2. Build dataset** | Async worker downloads OAM tiles + OSM labels for the AOI, uploads chips + `labels.geojson` to MinIO, registers a STAC item under `datasets/`. | Built STAC dataset. Poll `GET /datasets/{id}/` until `status=built`. |
2727
| **3. Submit training** | Async worker submits a ZenML pipeline. ZenML schedules an orchestrator + step pods on the autoscaling ml pool (`split -> train -> eval -> onnx`). Worker polls ZenML status into the DB every 30s. | Trained `weights.pt` + `model.onnx` in MinIO. Poll `GET /trainings/{id}/` until `status=completed`. Tail with `GET /trainings/runs/{run_id}/logs/`. |
2828
| **4. Promote** | API builds a versioned STAC `local-models/` item from the run's hyperparameters + asset URLs, validates against the base-model's `fair:hyperparameters_spec`, publishes. | `local_model_stac_id`. |
2929
| **5. Predict** | Async worker downloads chips for the requested bbox, submits an inference pipeline, then post-processes the geojson into `.fgb` + `.pmtiles` via tippecanoe. | Three presigned URLs at `GET /predictions/{id}/result/` once `results_ready=true`. |
@@ -69,7 +69,8 @@ erDiagram
6969
string stac_id UK "STAC item id under datasets/"
7070
string title
7171
url source_imagery "TMS template"
72-
string build_status "draft|building|published|failed"
72+
string status "draft|building|built|failed"
73+
string visibility "private|public"
7374
bigint user_id FK "-> OsmUser.osm_id"
7475
datetime created_at
7576
datetime last_modified
@@ -86,7 +87,8 @@ erDiagram
8687
LocalModel {
8788
int id PK
8889
string name UK "= ZenML model_name = STAC mlm:name"
89-
string status "draft|published|archived"
90+
string status "active|archived"
91+
string visibility "private|public"
9092
bigint user_id FK
9193
datetime created_at
9294
}
@@ -116,7 +118,7 @@ erDiagram
116118
smallint zoom
117119
json params "confidence_threshold, ..."
118120
bool remove_osm
119-
bool is_public
121+
string visibility "private|public"
120122
text description
121123
string status
122124
bool results_ready "geojson+fgb+pmtiles materialized"
@@ -130,7 +132,7 @@ erDiagram
130132

131133
## 3. API reference
132134

133-
All endpoints live under `/api/v1/` and require `Authorization: Bearer <token>`. The token is `FAIR_DEV_TOKEN` when `AUTH_PROVIDER=dev`, or a Hanko-issued JWT when `AUTH_PROVIDER=hanko`. Swagger UI: `/api/docs/`. OpenAPI: `/api/schema/`.
135+
All endpoints live under `/api/v1/`. Writes and owner-scoped reads require `Authorization: Bearer <token>`. The token is `FAIR_DEV_TOKEN` when `AUTH_PROVIDER=dev`, or a Hanko-issued JWT when `AUTH_PROVIDER=hanko`. `GET` on datasets, local-models, and predictions is open anonymously for rows with `visibility="public"`. Private rows are invisible to anonymous callers (404, not 401). Swagger UI: `/api/docs/`. OpenAPI: `/api/schema/`.
134136

135137
### Core flow
136138

@@ -140,9 +142,11 @@ All endpoints live under `/api/v1/` and require `Authorization: Bearer <token>`.
140142
| POST | `/aois/` | Create an AOI polygon (GeoJSON Feature) |
141143
| GET | `/aois/` | List AOIs (bbox-filterable) |
142144
| POST | `/datasets/build/` | Enqueue a dataset build job. `aoi_ids`, `source_imagery` (TMS), `zoom`, `label_tasks`, `label_classes`, `keywords` (allowed: `building`, `road`, `tree`, `water`, `landuse`), `label_type`, `geometry_type` |
143-
| GET | `/datasets/{id}/?expand=stac` | Inspect dataset, with STAC metadata + presigned `chips`/`labels` URLs once published |
144-
| GET | `/local-models/` | List published local models (the *family*; STAC has per-version detail) |
145+
| GET | `/datasets/{id}/?expand=stac` | Inspect dataset, with STAC metadata + presigned `chips`/`labels` URLs once `status=built` |
146+
| POST | `/datasets/{id}/{publish,unpublish}/` | Toggle dataset `visibility` (anonymous read) |
147+
| GET | `/local-models/` | List local models (filterable by `status`, `visibility`, `user`) |
145148
| GET | `/local-models/{id}/runs/` | List ZenML pipeline runs that produced this model |
149+
| POST | `/local-models/{id}/{publish,unpublish}/` | Toggle local-model `visibility` (anonymous read) |
146150
| POST | `/trainings/submit/` | Enqueue a finetune. `base_model_stac_id`, `dataset_stac_id`, `model_name`, `overrides` (must match base-model's `fair:hyperparameters_spec`) |
147151
| GET | `/trainings/{id}/` | Run state including `zenml_run_id` |
148152
| GET | `/trainings/runs/{run_id}/status/` | Force-poll ZenML; refreshes the DB row |
@@ -153,7 +157,7 @@ All endpoints live under `/api/v1/` and require `Authorization: Bearer <token>`.
153157
| GET | `/predictions/{id}/` | Status + assets (presigned) once `results_ready=true` |
154158
| GET | `/predictions/{id}/result/` | Just the three presigned URLs (geojson / fgb / pmtiles); 409 until `results_ready` |
155159
| GET | `/predictions/runs/{run_id}/{status,logs}/` | Same shape as trainings |
156-
| POST | `/predictions/{id}/{publish,unpublish}/` | Toggle `is_public` (anonymous read of result) |
160+
| POST | `/predictions/{id}/{publish,unpublish}/` | Toggle prediction `visibility` (anonymous read of result) |
157161

158162
### Schema endpoints
159163

@@ -203,11 +207,11 @@ Response carries `properties.id` -> call this `AOI_ID`.
203207
}
204208
```
205209

206-
Response: `id` (-> `DATASET_ID`), `stac_id` (-> `STAC_ID`), `build_status: "building"`.
210+
Response: `id` (-> `DATASET_ID`), `stac_id` (-> `STAC_ID`), `status: "building"`, `visibility: "private"`.
207211

208212
**3. Wait for the build** : `GET /api/v1/datasets/{DATASET_ID}/`
209213

210-
Poll until `build_status == "published"` (~2 min for this AOI).
214+
Poll until `status == "built"` (~2 min for this AOI). To make it readable by anonymous users, follow up with `POST /api/v1/datasets/{DATASET_ID}/publish/`.
211215

212216
**4. Submit the fine-tune** : `POST /api/v1/trainings/submit/`
213217

backend/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ OpenAPI schema at `/api/schema/`, Swagger UI at `/api/docs/`, ReDoc at `/api/red
4242

4343
`AUTH_PROVIDER` selects the auth backend. Both share one contract: `Authorization: Bearer <token>`. `hanko` (production) validates a per-user JWT issued by Hanko (sent via Bearer header or `hanko` cookie). `dev` (local only) compares the Bearer token against the static `FAIR_DEV_TOKEN`; anyone with the token gets full dev-user access. Same header in dev and prod, only the issuer differs.
4444

45+
`GET` on datasets, local-models, and predictions is open to anonymous callers for rows with `visibility="public"`. Owner-scoped lifecycle data (AOIs, trainings, feedback, notifications) and every write require Bearer auth.
46+
4547
| Name | Required | Default | Description |
4648
|------|----------|---------|-------------|
4749
| `AUTH_PROVIDER` | no | `hanko` | One of `hanko`, `dev`. |

backend/accounts/permissions.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,12 @@ def has_object_permission(self, request, view, obj) -> bool:
3030
if _is_admin(request.user):
3131
return True
3232
return getattr(obj, "user", None) == request.user
33+
34+
35+
class PublishedReadOrAuthenticatedWrite(permissions.BasePermission):
36+
"""View-level: SAFE methods open; non-SAFE requires authentication."""
37+
38+
def has_permission(self, request, view) -> bool:
39+
if request.method in permissions.SAFE_METHODS:
40+
return True
41+
return bool(request.user and request.user.is_authenticated)

backend/datasets/admin.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55

66
@geoadmin.register(Dataset)
77
class DatasetAdmin(geoadmin.GISModelAdmin):
8-
list_display = ["title", "stac_id", "build_status", "user", "created_at"]
9-
list_filter = ["build_status", "created_at"]
8+
list_display = ["title", "stac_id", "status", "visibility", "user", "created_at"]
9+
list_filter = ["status", "visibility", "created_at"]
1010
search_fields = ["title", "stac_id", "user__username"]
1111
readonly_fields = ["created_at", "last_modified"]
1212
date_hierarchy = "created_at"
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from django.conf import settings
2+
from django.db import migrations, models
3+
4+
5+
def forward(apps, schema_editor):
6+
Dataset = apps.get_model("datasets", "Dataset")
7+
status_map = {
8+
"draft": "draft",
9+
"building": "building",
10+
"published": "built",
11+
"failed": "failed",
12+
}
13+
for row in Dataset.objects.all().only("id", "build_status"):
14+
new_status = status_map.get(row.build_status, "draft")
15+
new_visibility = "public" if row.build_status == "published" else "private"
16+
Dataset.objects.filter(pk=row.pk).update(
17+
status=new_status, visibility=new_visibility
18+
)
19+
20+
21+
def backward(apps, schema_editor):
22+
Dataset = apps.get_model("datasets", "Dataset")
23+
status_map = {
24+
"draft": "draft",
25+
"building": "building",
26+
"built": "published",
27+
"failed": "failed",
28+
}
29+
for row in Dataset.objects.all().only("id", "status"):
30+
Dataset.objects.filter(pk=row.pk).update(build_status=status_map.get(row.status, "draft"))
31+
32+
33+
class Migration(migrations.Migration):
34+
dependencies = [
35+
("datasets", "0002_remove_dataset_build_error"),
36+
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
37+
]
38+
39+
operations = [
40+
migrations.RemoveIndex(
41+
model_name="dataset",
42+
name="datasets_da_build_s_7fd7fa_idx",
43+
),
44+
migrations.AddField(
45+
model_name="dataset",
46+
name="status",
47+
field=models.CharField(
48+
choices=[
49+
("draft", "Draft"),
50+
("building", "Building"),
51+
("built", "Built"),
52+
("failed", "Failed"),
53+
],
54+
default="draft",
55+
max_length=20,
56+
),
57+
),
58+
migrations.AddField(
59+
model_name="dataset",
60+
name="visibility",
61+
field=models.CharField(
62+
choices=[("private", "Private"), ("public", "Public")],
63+
db_index=True,
64+
default="private",
65+
max_length=20,
66+
),
67+
),
68+
migrations.AddIndex(
69+
model_name="dataset",
70+
index=models.Index(fields=["status"], name="datasets_da_status_f1863c_idx"),
71+
),
72+
migrations.RunPython(forward, backward),
73+
migrations.RemoveField(
74+
model_name="dataset",
75+
name="build_status",
76+
),
77+
]

backend/datasets/models.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,23 @@
22
from django.db import models, transaction
33

44
from accounts.models import OsmUser
5+
from shared.enums import Visibility
56
from shared.validators import validate_geometry
67

78

89
class Dataset(models.Model):
9-
class BuildStatus(models.TextChoices):
10+
class Status(models.TextChoices):
1011
DRAFT = "draft", "Draft"
1112
BUILDING = "building", "Building"
12-
PUBLISHED = "published", "Published"
13+
BUILT = "built", "Built"
1314
FAILED = "failed", "Failed"
1415

1516
stac_id = models.CharField(max_length=200, unique=True)
1617
title = models.CharField(max_length=200)
1718
source_imagery = models.URLField()
18-
build_status = models.CharField(
19-
max_length=20, choices=BuildStatus.choices, default=BuildStatus.DRAFT
19+
status = models.CharField(max_length=20, choices=Status.choices, default=Status.DRAFT)
20+
visibility = models.CharField(
21+
max_length=20, choices=Visibility.choices, default=Visibility.PRIVATE, db_index=True
2022
)
2123
user = models.ForeignKey(
2224
OsmUser, to_field="osm_id", on_delete=models.CASCADE, related_name="datasets"
@@ -26,7 +28,7 @@ class BuildStatus(models.TextChoices):
2628

2729
class Meta:
2830
indexes = [
29-
models.Index(fields=["build_status"]),
31+
models.Index(fields=["status"]),
3032
models.Index(fields=["user"]),
3133
]
3234
ordering = ["-created_at"]

backend/datasets/serializers.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,8 @@ class Meta:
9696
"stac_id",
9797
"title",
9898
"source_imagery",
99-
"build_status",
99+
"status",
100+
"visibility",
100101
"stac_url",
101102
"user",
102103
"star_count",
@@ -109,7 +110,8 @@ class Meta:
109110
read_only_fields = [
110111
"id",
111112
"stac_id",
112-
"build_status",
113+
"status",
114+
"visibility",
113115
"stac_url",
114116
"user",
115117
"star_count",
@@ -130,7 +132,7 @@ def get_stac(self, obj: Dataset) -> dict | None:
130132

131133
@extend_schema_field(DatasetAssetsSerializer(allow_null=True))
132134
def get_assets(self, obj: Dataset) -> dict[str, str] | None:
133-
if obj.build_status != Dataset.BuildStatus.PUBLISHED:
135+
if obj.status != Dataset.Status.BUILT:
134136
return None
135137
from shared.storage import StoragePaths, presigned_get_url
136138

backend/datasets/tasks.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,13 @@ def build_dataset(
6868
client = for_user(str(dataset.user.osm_id))
6969
published_id = client.register_dataset(params, paths=BackendDatasetPaths)
7070
dataset.stac_id = published_id
71-
dataset.build_status = Dataset.BuildStatus.PUBLISHED
72-
dataset.save(update_fields=["stac_id", "build_status", "last_modified"])
71+
dataset.status = Dataset.Status.BUILT
72+
dataset.save(update_fields=["stac_id", "status", "last_modified"])
7373
invalidate_stac_cache(DATASETS_COLLECTION, published_id)
7474
except Exception:
7575
logger.exception("dataset build failed for %s", dataset_id)
76-
dataset.build_status = Dataset.BuildStatus.FAILED
77-
dataset.save(update_fields=["build_status", "last_modified"])
76+
dataset.status = Dataset.Status.FAILED
77+
dataset.save(update_fields=["status", "last_modified"])
7878
raise
7979

8080

backend/datasets/views.py

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import secrets
22

3+
from django.db.models import Q
34
from django.http import HttpResponse
45
from django.shortcuts import get_object_or_404
56
from django.utils.text import slugify
@@ -15,7 +16,14 @@
1516
from rest_framework_gis.filters import InBBoxFilter
1617

1718
from accounts.authentication import OsmAuthentication
18-
from accounts.permissions import IsAdmin, IsOwnerOrAdminOrReadOnly
19+
from accounts.permissions import (
20+
IsAdmin,
21+
IsOwnerOrAdmin,
22+
IsOwnerOrAdminOrReadOnly,
23+
PublishedReadOrAuthenticatedWrite,
24+
_is_admin,
25+
)
26+
from shared.enums import Visibility
1927
from shared.integrations.stac import (
2028
DATASETS_COLLECTION,
2129
FAIR_PINNED_PROPERTY,
@@ -61,20 +69,28 @@ class DatasetViewSet(viewsets.ModelViewSet):
6169
queryset = Dataset.objects.all()
6270
serializer_class = DatasetSerializer
6371
authentication_classes = [OsmAuthentication]
64-
permission_classes = [IsAuthenticated, IsOwnerOrAdminOrReadOnly]
72+
permission_classes = [PublishedReadOrAuthenticatedWrite, IsOwnerOrAdminOrReadOnly]
6573
filter_backends = [DjangoFilterBackend, filters.OrderingFilter, filters.SearchFilter]
66-
filterset_fields = ["build_status", "user"]
74+
filterset_fields = ["status", "visibility", "user"]
6775
search_fields = ["title", "stac_id"]
6876
ordering_fields = ["created_at", "last_modified"]
6977

7078
def get_queryset(self):
7179
from shared.stars import annotate_stars
7280

73-
return annotate_stars(Dataset.objects.all(), self.request)
81+
qs = Dataset.objects.all()
82+
user = self.request.user
83+
if not user.is_authenticated:
84+
qs = qs.filter(visibility=Visibility.PUBLIC)
85+
elif not _is_admin(user):
86+
qs = qs.filter(Q(user=user) | Q(visibility=Visibility.PUBLIC))
87+
return annotate_stars(qs, self.request)
7488

7589
def get_permissions(self):
7690
if self.action == "pin":
7791
return [IsAuthenticated(), IsAdmin()]
92+
if self.action in {"publish", "unpublish", "build"}:
93+
return [IsAuthenticated(), IsOwnerOrAdmin()]
7894
return super().get_permissions()
7995

8096
def get_serializer_context(self):
@@ -107,6 +123,22 @@ def list(self, request, *args, **kwargs):
107123
return self.get_paginated_response(serializer.data)
108124
return super().list(request, *args, **kwargs)
109125

126+
@extend_schema(request=None, responses={200: DatasetSerializer})
127+
@action(detail=True, methods=["post"], url_path="publish")
128+
def publish(self, request, pk: int | None = None) -> Response:
129+
dataset = self.get_object()
130+
dataset.visibility = Visibility.PUBLIC
131+
dataset.save(update_fields=["visibility", "last_modified"])
132+
return Response(self.get_serializer(dataset).data, status=status.HTTP_200_OK)
133+
134+
@extend_schema(request=None, responses={200: DatasetSerializer})
135+
@action(detail=True, methods=["post"], url_path="unpublish")
136+
def unpublish(self, request, pk: int | None = None) -> Response:
137+
dataset = self.get_object()
138+
dataset.visibility = Visibility.PRIVATE
139+
dataset.save(update_fields=["visibility", "last_modified"])
140+
return Response(self.get_serializer(dataset).data, status=status.HTTP_200_OK)
141+
110142
@action(detail=True, methods=["patch"], url_path="pin")
111143
def pin(self, request, pk: int | None = None) -> Response:
112144
dataset = self.get_object()
@@ -141,7 +173,7 @@ def build(self, request) -> Response:
141173
stac_id=_slugify(payload["title"]),
142174
title=payload["title"],
143175
source_imagery=payload["source_imagery"],
144-
build_status=Dataset.BuildStatus.BUILDING,
176+
status=Dataset.Status.BUILDING,
145177
user=request.user,
146178
)
147179
aoi_qs.update(dataset=dataset)

0 commit comments

Comments
 (0)