Skip to content

Commit 510e967

Browse files
authored
Update README.md (add offline mode ENV and fix typos)
1 parent a15d3ad commit 510e967

1 file changed

Lines changed: 18 additions & 18 deletions

File tree

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ See [installation.md](docs/installation.md) for additional information, includin
5858

5959
The following steps have been tested on Ubuntu20.04.
6060

61-
- You must have a NVIDIA graphics card with at least 6GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed.
61+
- You must have an NVIDIA graphics card with at least 6GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed.
6262
- Install `Python >= 3.8`.
6363
- (Optional, Recommended) Create a virtual environment:
6464

@@ -92,15 +92,15 @@ pip install ninja
9292
pip install -r requirements.txt
9393
```
9494

95-
- (Optional, Recommended) The best-performing models in threestudio uses the newly-released T2I model [DeepFloyd IF](https://github.com/deep-floyd/IF) which currently requires signing a license agreement. If you would like use these models, you need to [accept the license on the model card of DeepFloyd IF](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0), and login in the Hugging Face hub in terminal by `huggingface-cli login`.
95+
- (Optional, Recommended) The best-performing models in threestudio use the newly-released T2I model [DeepFloyd IF](https://github.com/deep-floyd/IF), which currently requires signing a license agreement. If you would like to use these models, you need to [accept the license on the model card of DeepFloyd IF](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0), and login into the Hugging Face hub in the terminal by `huggingface-cli login`.
9696

9797
- For contributors, see [here](https://github.com/threestudio-project/threestudio#contributing-to-threestudio).
9898

9999
## Quickstart
100100

101101
Here we show some basic usage of threestudio. First let's train a DreamFusion model to create a classic pancake bunny.
102102

103-
**If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable `TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1` before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following [here](https://huggingface.co/docs/huggingface_hub/v0.14.1/guides/download#download-an-entire-repository) and [here](https://huggingface.co/docs/huggingface_hub/v0.14.1/guides/download#download-files-to-local-folder), and set `pretrained_model_name_or_path` of the guidance and the prompt processor to the local path.**
103+
**If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable `TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1` before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following [here](https://huggingface.co/docs/huggingface_hub/v0.14.1/guides/download#download-an-entire-repository) and [here](https://huggingface.co/docs/huggingface_hub/v0.14.1/guides/download#download-files-to-local-folder), and set `pretrained_model_name_or_path` of the guidance and the prompt processor to the local path.**
104104

105105
```sh
106106
# if you have agreed the license of DeepFloyd IF and have >20GB VRAM
@@ -119,7 +119,7 @@ The training lasts for 10,000 iterations. You can find visualizations of the cur
119119
Multi-GPU training is supported, but may still be [buggy](https://github.com/threestudio-project/threestudio/issues/195). Note that `data.batch_size` is the batch size **per rank (device)**. Also remember to
120120

121121
- Set `data.n_val_views` to be a multiple of the number of GPUs.
122-
- Set a unique `tag` as timestamp is disabled in multi-GPU training and will not be appended after the tag. If you the same tag as previous trials, saved config files, code and visualizations will be overriden.
122+
- Set a unique `tag` as timestamp is disabled in multi-GPU training and will not be appended after the tag. If you the same tag as previous trials, saved config files, code and visualizations will be overridden.
123123

124124
```sh
125125
# this results in an effective batch size of 4 (number of GPUs) * 2 (data.batch_size) = 8
@@ -249,7 +249,7 @@ https://user-images.githubusercontent.com/19284678/236694848-38ae4ea4-554b-4c9d-
249249
**Notable differences from the paper**
250250

251251
- We use open-source T2I models (StableDiffusion, DeepFloyd IF), while the paper uses Imagen.
252-
- We use a guiandance scale of 20 for DeepFloyd IF, while the paper uses 100 for Imagen.
252+
- We use a guidance scale of 20 for DeepFloyd IF, while the paper uses 100 for Imagen.
253253
- We do not use sigmoid to normalize the albedo color but simply scale the color from `[-1,1]` to `[0,1]`, as we find this help convergence.
254254
- We use HashGrid encoding and uniformly sample points along rays, while the paper uses Integrated Positional Encoding and sampling strategy from MipNeRF360.
255255
- We adopt camera settings and density initialization strategy from Magic3D, which is slightly different from the DreamFusion paper.
@@ -283,10 +283,10 @@ https://user-images.githubusercontent.com/19284678/236694858-0ed6939e-cd7a-408f-
283283
**Notable differences from the paper**
284284

285285
- We use open-source T2I models (StableDiffusion, DeepFloyd IF) for the coarse stage, while the paper uses eDiff-I.
286-
- In the coarse stage, we use a guiandance scale of 20 for DeepFloyd IF, while the paper uses 100 for eDiff-I.
286+
- In the coarse stage, we use a guidance scale of 20 for DeepFloyd IF, while the paper uses 100 for eDiff-I.
287287
- In the coarse stage, we use analytic normal, while the paper uses predicted normal.
288288
- In the coarse stage, we use orientation loss as in DreamFusion, while the paper does not.
289-
- There are many things that are ommited from the paper such as the weighting of loss terms and the DMTet grid resolution, which could be different.
289+
- There are many things that are omitted from the paper such as the weighting of loss terms and the DMTet grid resolution, which could be different.
290290

291291
**Example running commands**
292292

@@ -302,11 +302,11 @@ python launch.py --config configs/magic3d-coarse-sd.yaml --train --gpu 0 system.
302302
Then convert the NeRF from the coarse stage to DMTet and train with differentiable rasterization:
303303

304304
```sh
305-
# the refinement stage uses StableDiffusion, requires ~5GB VRAM in training
305+
# the refinement stage uses StableDiffusion, and requires ~5GB VRAM in training
306306
python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt
307-
# if you're unsatisfied with the surface extraced using the default threshold (25)
307+
# if you're unsatisfied with the surface extracted using the default threshold (25)
308308
# you can specify a threshold value using `system.geometry_convert_override`
309-
# decrease the value if the extracted surface is incomplete, increate if it is extruded
309+
# decrease the value if the extracted surface is incomplete, increase if it is extruded
310310
python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt system.geometry_convert_override.isosurface_threshold=10.
311311
```
312312

@@ -316,7 +316,7 @@ python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.
316316
- Magic3D uses a neural network to predict the surface normal, which may not resemble the true geometric normal and degrade geometry quality, so we use analytic normal instead.
317317
- Try increasing/decreasing `system.loss.lambda_sparsity` if your scene is stuffed with floaters/becoming empty.
318318
- Try increasing/decreasing `system.loss.lambda_orient` if you object is foggy/over-smoothed.
319-
- Try replacing the background to random colors with a probability 0.5 by setting `system.background.random_aug=true` if you find the model incorrectly treats the background as part of the object.
319+
- Try replacing the background with random colors with a probability 0.5 by setting `system.background.random_aug=true` if you find the model incorrectly treats the background as part of the object.
320320

321321
### Score Jacobian Chaining [![arXiv](https://img.shields.io/badge/arXiv-2212.00774-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2212.00774)
322322

@@ -484,7 +484,7 @@ https://github.com/threestudio-project/threestudio/assets/22424247/8a7fa056-7668
484484
**IMPORTANT NOTE: This implementation is heavily inspired from the Zero-1-to-3 implementation in [https://github.com/ashawkey/stable-dreamfusion](stable-dreamfusion)! `extern/ldm_zero123` is borrowed from `stable-dreamfusion/ldm`.**
485485

486486
```sh
487-
# object geneartion with 64x64 NeRF rendering, ~14GB VRAM
487+
# object generation with 64x64 NeRF rendering, ~14GB VRAM
488488
python launch.py --config configs/zero123.yaml --train --gpu 0
489489
```
490490

@@ -503,16 +503,16 @@ Also includes evaluation of the guidance during training. If `system.freq.guidan
503503

504504
## Prompt Library
505505

506-
For easier comparison, we collect the 397 preset prompts from the website of [DreamFusion](https://dreamfusion3d.github.io/gallery.html) in [this file](https://github.com/threestudio-project/threestudio/blob/main/load/prompt_library.json). You can use these prompts by setting `system.prompt_processor.prompt=lib:keyword1_keyword2_..._keywordN`. Note that the prompt should starts with `lib:` and all the keywords are separated by `_`. The prompt processor will match the keywords to all the prompts in the library, and will only succeed if there's **exactly one match**. The used prompt will be printed to console. Also note that you can't use this syntax to point to every prompt in the library, as there are prompts that are subset of other prompts lmao. We will enhance the use of this feature.
506+
For easier comparison, we collect the 397 preset prompts from the website of [DreamFusion](https://dreamfusion3d.github.io/gallery.html) in [this file](https://github.com/threestudio-project/threestudio/blob/main/load/prompt_library.json). You can use these prompts by setting `system.prompt_processor.prompt=lib:keyword1_keyword2_..._keywordN`. Note that the prompt should starts with `lib:` and all the keywords are separated by `_`. The prompt processor will match the keywords to all the prompts in the library, and will only succeed if there's **exactly one match**. The used prompt will be printed to the console. Also note that you can't use this syntax to point to every prompt in the library, as there are prompts that are subset of other prompts lmao. We will enhance the use of this feature.
507507

508508
## Tips on Improving Quality
509509

510-
It's important to note that existing techniques that lift 2D T2I models to 3D cannot consistently produce satisfying results. Results from the great papers like DreamFusion and Magic3D are (to some extend) cherry-pickled, so don't be frustrated if you did not get what you expected on your first trial. Here are some tips that may help you improve the generation quality:
510+
It's important to note that existing techniques that lift 2D T2I models to 3D cannot consistently produce satisfying results. Results from great papers like DreamFusion and Magic3D are (to some extent) cherry-pickled, so don't be frustrated if you do not get what you expected on your first trial. Here are some tips that may help you improve the generation quality:
511511

512-
- **Increase batch size**. Large batch sizes help convergence and improve the 3D consistency of the geometry. State-of-the-art methods claims using large batch sizes: DreamFusion uses a batch size of 4; Magic3D uses a batch size of 32; Fantasia3D uses a batch size of 24; some results shown above uses a batch size of 8. You can easily change the batch size by setting `data.batch_size=N`. Increasing the batch size requires more VRAM. If you have limited VRAM but still want the benefit of large batch sizes, you may use [gradient accumulation provided by PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/advanced/training_tricks.html#accumulate-gradients) by setting `trainer.accumulate_grad_batches=N`. This will accumulate the gradient of several batches and achieve a large effective batch size. Note that if you use gradient accumulation, you may need to multiply all step values by N times in your config, such as values that have the name `X_steps` and `trainer.val_check_interval`, since now N batches equal to a large batch.
512+
- **Increase batch size**. Large batch sizes help convergence and improve the 3D consistency of the geometry. State-of-the-art methods claim using large batch sizes: DreamFusion uses a batch size of 4; Magic3D uses a batch size of 32; Fantasia3D uses a batch size of 24; some results shown above use a batch size of 8. You can easily change the batch size by setting `data.batch_size=N`. Increasing the batch size requires more VRAM. If you have limited VRAM but still want the benefit of large batch sizes, you may use [gradient accumulation provided by PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/advanced/training_tricks.html#accumulate-gradients) by setting `trainer.accumulate_grad_batches=N`. This will accumulate the gradient of several batches and achieve a large effective batch size. Note that if you use gradient accumulation, you may need to multiply all step values by N times in your config, such as values that have the name `X_steps` and `trainer.val_check_interval`, since now N batches equal to a large batch.
513513
- **Train longer.** This helps if you can already obtain reasonable results and would like to enhance the details. If the result is still a mess after several thousand steps, training for a longer time often won't help. You can set the total training iterations by `trainer.max_steps=N`.
514514
- **Try different seeds.** This is a simple solution if your results have correct overall geometry but suffer from the multi-face Janus problem. You can change the seed by setting `seed=N`. Good luck!
515-
- **Tuning regularization weights.** Some methods have regularizaion terms which can be essential to obtaining good geometry. Try tuning the weights of these regularizations by setting `system.loss.lambda_X=value`. The specific values depend on your situation, you may refer to [tips for each supported model](https://github.com/threestudio-project/threestudio#supported-models) for more detailed instructions.
515+
- **Tuning regularization weights.** Some methods have regularization terms which can be essential to obtaining good geometry. Try tuning the weights of these regularizations by setting `system.loss.lambda_X=value`. The specific values depend on your situation, you may refer to [tips for each supported model](https://github.com/threestudio-project/threestudio#supported-models) for more detailed instructions.
516516
- **Try debiasing methods.** When conventional SDS techniques like DreamFusion, Magic3D, SJC, and others fail to produce the desired 3D results, Debiased Score Distillation Sampling (D-SDS) can be a solution. D-SDS is devised to tackle challenges such as artifacts or the Janus problem, employing two strategies: score debiasing and prompt debiasing. You can activate score debiasing by just setting `system.guidance.grad_clip=[0,0.5,2.0,10000]`, where the order is `start_step, start_value, end_value, end_step`. You can enable prompt debiasing by setting `system.prompt_processor.use_prompt_debiasing=true`. When using prompt debiasing, it's recommended to set a list of indices for words that should potentially be removed by `system.prompt_processor.prompt_debiasing_mask_ids=[i1,i2,...]`. For example, if the prompt is `a smiling dog` and you only want to remove the word `smiling` for certain views, you should set it to `[1]`. You could also manually specify the prompt for each view by setting `system.prompt_processor.prompt_side`, `system.prompt_processor.prompt_back` and `system.prompt_processor.prompt_overhead`. For a detailed explanation of these techniques, refer to [the D-SDS paper](https://arxiv.org/abs/2303.15413) or check out [the project page](https://susunghong.github.io/Debiased-Score-Distillation-Sampling/).
517517
- **Try Perp-Neg.** The [Perp-Neg algorithm](https://perp-neg.github.io/) can potentially alleviate the multi-face Janus problem. We now support Perp-Neg for `stable-diffusion-guidance` and `deep-floyd-guidance` by setting `system.prompt_processor.use_perp_neg=true`.
518518

@@ -568,9 +568,9 @@ Here we just briefly introduce the code structure of this project. We will make
568568
569569
- All methods are implemented as a subclass of `BaseSystem` (in `systems/base.py`). There typically are six modules inside a system: geometry, material, background, renderer, guidance, and prompt_processor. All modules are subclass of `BaseModule` (in `utils/base.py`) except for guidance, and prompt_processor, which are subclass of `BaseObject` to prevent them from being treated as model parameters and better control their behavior in multi-GPU settings.
570570
- All systems, modules, and data modules have their configurations in their own dataclasses.
571-
- Base configurations for the whole project can be found in `utils/config.py`. In the `ExperimentConfig` dataclass, `data`, `system`, and module configurations under `system` are parsed to configurations of each class mentioned above. These configurations are strictly typed, which means you can only use defined properties in the dataclass and stick to the defined type of each property. This configuration paradigm (1) natually supports default values for properties; (2) effectively prevents wrong assignments of these properties (say typos in the yaml file) or inappropriate usage at runtime.
571+
- Base configurations for the whole project can be found in `utils/config.py`. In the `ExperimentConfig` dataclass, `data`, `system`, and module configurations under `system` are parsed to configurations of each class mentioned above. These configurations are strictly typed, which means you can only use defined properties in the dataclass and stick to the defined type of each property. This configuration paradigm (1) naturally supports default values for properties; (2) effectively prevents wrong assignments of these properties (say typos in the yaml file) or inappropriate usage at runtime.
572572
- This projects use both static and runtime type checking. For more details, see `utils/typing.py`.
573-
- To update anything of a module at each training step, simply make it inherit to `Updateable` (see `utils/base.py`). At the beginning of each iteration, an `Updateable` will update itself, and update all its attributes that are also `Updateable`. Note that subclasses of `BaseSystem`, `BaseModule` and `BaseObject` are by default inherit to `Updateable`.
573+
- To update anything of a module at each training step, simply make it inherit to `Updateable` (see `utils/base.py`). At the beginning of each iteration, an `Updateable` will update itself, and update all its attributes that are also `Updateable`. Note that subclasses of `BaseSystem`, `BaseModule` and `BaseObject` are by default inherited to `Updateable`.
574574
575575
## Known Problems
576576

0 commit comments

Comments
 (0)