Removed file-extension-based arbitrary code execution attack vector (invoke-ai#2946)

blessedcoolant · web-flow · commit cb48bbd8068d · 2023-03-15T00:09:17.000+13:00
# The Problem Pickle files (.pkl, .ckpt, etc) are extremely unsafe as they can be trivially crafted to execute arbitrary code when parsed using `torch.load` Right now the conventional wisdom among ML researchers and users is to simply `not run untrusted pickle files ever` and instead only use Safetensor files, which cannot be injected with arbitrary code. This is very good advice. Unfortunately, **I have discovered a vulnerability inside of InvokeAI that allows an attacker to disguise a pickle file as a safetensor and have the payload execute within InvokeAI.** # How It Works Within `model_manager.py` and `convert_ckpt_to_diffusers.py` there are if-statements that decide which `load` method to use based on the file extension of the model file. The logic (written in a slightly more readable format than it exists in the codebase) is as follows: ``` if Path(file).suffix == '.safetensors': safetensor_load(file) else: unsafe_pickle_load(file) ``` A malicious actor would only need to create an infected .ckpt file, and then rename the extension to something that does not pass the `== '.safetensors'` check, but still appears to a user to be a safetensors file. For example, this might be something like `.Safetensors`, `.SAFETENSORS`, `SafeTensors`, etc. InvokeAI will happily import the file in the Model Manager and execute the payload. # Proof of Concept 1. Create a malicious pickle file. (https://gist.github.com/CodeZombie/27baa20710d976f45fb93928cbcfe368) 2. Rename the `.ckpt` extension to some variation of `.Safetensors`, ensuring there is a capital letter anywhere in the extension (eg. `malicious_pickle.SAFETENSORS`) 3. Import the 'model' like you would normally with any other safetensors file with the Model Manager. 4. Upon trying to select the model in the web ui, it will be loaded (or attempt to be converted to a Diffuser) with `torch.load` and the payload will execute. ![image](https://user-images.githubusercontent.com/466103/224835490-4cf97ff3-41b3-4a31-85df-922cc99042d2.png) # The Fix This pull request changes the logic InvokeAI uses to decide which model loader to use so that the safe behavior is the default. Instead of loading as a pickle if the extension is not exactly `.safetensors`, it will now **always** load as a safetensors file unless the extension is **exactly** `.ckpt`. # Notes: I think support for pickle files should be totally dropped ASAP as a matter of security, but I understand that there are reasons this would be difficult. In the meantime, I think `RestrictedUnpickler` or something similar should be implemented as a replacement for `torch.load`, as this significantly reduces the amount of Python methods that an attacker has to work with when crafting malicious payloads inside a pickle file. Automatic1111 already uses this with some success. (https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/safe.py)
diff --git a/invokeai/backend/model_management/convert_ckpt_to_diffusers.py b/invokeai/backend/model_management/convert_ckpt_to_diffusers.py
@@ -1075,9 +1075,10 @@ def load_pipeline_from_original_stable_diffusion_ckpt(
         dlogging.set_verbosity_error()
 
         checkpoint = (
-            load_file(checkpoint_path)
-            if Path(checkpoint_path).suffix == ".safetensors"
-            else torch.load(checkpoint_path)
+            torch.load(checkpoint_path)
+            if Path(checkpoint_path).suffix == ".ckpt"
+            else load_file(checkpoint_path)
+            
         )
         cache_dir = global_cache_dir("hub")
         pipeline_class = (
diff --git a/invokeai/backend/model_management/model_manager.py b/invokeai/backend/model_management/model_manager.py
@@ -732,9 +732,9 @@ def heuristic_import(
 
         # another round of heuristics to guess the correct config file.
         checkpoint = (
-            safetensors.torch.load_file(model_path)
-            if model_path.suffix == ".safetensors"
-            else torch.load(model_path)
+            torch.load(model_path)
+            if model_path.suffix == ".ckpt"
+            else safetensors.torch.load_file(model_path)
         )
 
         # additional probing needed if no config file provided