Code release for Local Light Field Fusion at SIGGRAPH 2019

Overview





Local Light Field Fusion

Project | Video | Paper

Tensorflow implementation for novel view synthesis from sparse input images.

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines
Ben Mildenhall*1, Pratul Srinivasan*1, Rodrigo Ortiz-Cayon2, Nima Khademi Kalantari3, Ravi Ramamoorthi4, Ren Ng1, Abhishek Kar2
1UC Berkeley, 2Fyusion Inc, 3Texas A&M, 4UC San Diego
*denotes equal contribution
In SIGGRAPH 2019

Table of Contents

Installation TL;DR: Setup and render a demo scene

First install docker (instructions) and nvidia-docker (instructions).

Run this in the base directory to download a pretrained checkpoint, download a Docker image, and run code to generate MPIs and a rendered output video on an example input dataset:

bash download_data.sh
sudo docker pull bmild/tf_colmap
sudo docker tag bmild/tf_colmap tf_colmap
sudo nvidia-docker run --rm --volume /:/host --workdir /host$PWD tf_colmap bash demo.sh

A video like this should be output to data/testscene/outputs/test_vid.mp4:

If this works, then you are ready to start processing your own images! Run

sudo nvidia-docker run -it --rm --volume /:/host --workdir /host$PWD tf_colmap

to enter a shell inside the Docker container, and skip ahead to the section on using your own input images for view synthesis.

Full Installation Details

You can either install the prerequisites by hand or use our provided Dockerfile to make a docker image.

In either case, start by downloading this repository, then running the download_data.sh script to download a pretrained model and example input dataset:

bash download_data.sh

After installing dependencies, try running bash demo.sh from the base directory. (If using Docker, run this inside the container.) This should generate the video shown in the Installation TL;DR section at data/testscene/outputs/test_vid.mp4.

Manual installation

  • Install CUDA, Tensorflow, COLMAP, ffmpeg
  • Install the required Python packages:
pip install -r requirements.txt
  • Optional: run make in cuda_renderer/ directory.
  • Optional: run make in opengl_viewer/ directory. You may need to install GLFW or some other OpenGL libraries. For GLFW:
sudo apt-get install libglfw3-dev

Docker installation

To build the docker image on your own machine, which may take 15-30 mins:

sudo docker build -t tf_colmap:latest .

To download the image (~6GB) instead:

sudo docker pull bmild/tf_colmap
sudo docker tag bmild/tf_colmap tf_colmap

Afterwards, you can launch an interactive shell inside the container:

sudo nvidia-docker run -it --rm --volume /:/host --workdir /host$PWD tf_colmap

From this shell, all the code in the repo should work (except opengl_viewer).

To run any single command <command...> inside the docker container:

sudo nvidia-docker run --rm --volume /:/host --workdir /host$PWD tf_colmap <command...>

Using your own input images for view synthesis

Our method takes in a set of images of a static scene, promotes each image to a local layered representation (MPI), and blends local light fields rendered from these MPIs to render novel views. Please see our paper for more details.

As a rule of thumb, you should use images where the maximum disparity between views is no more than about 64 pixels (watch the closest thing to the camera and don't let it move more than ~1/8 the horizontal field of view between images). Our datasets usually consist of 20-30 images captured handheld in a rough grid pattern.

Quickstart: rendering a video from a zip file of your images

You can quickly render novel view frames and a .mp4 video from a zip file of your captured input images with the zip2mpis.sh bash script.

bash zip2mpis.sh <zipfile> <your_outdir> [--height HEIGHT]

height is the output height in pixels. We recommend using a height of 360 pixels for generating results quickly.

General step-by-step usage

Begin by creating a base scene directory (e.g., scenedir/), and copying your images into a subdirectory called images/ (e.g., scenedir/images).

1. Recover camera poses

This script calls COLMAP to run structure from motion to get 6-DoF camera poses and near/far depth bounds for the scene.

python imgs2poses.py <your_scenedir>

2. Generate MPIs

This script uses our pretrained Tensorflow graph (make sure it exists in checkpoints/papermodel) to generate MPIs from the posed images. They will be saved in <your_mpidir>, a directory will be created by the script.

python imgs2mpis.py <your_scenedir> <your_mpidir> \
    [--checkpoint CHECKPOINT] \
    [--factor FACTOR] [--width WIDTH] [--height HEIGHT] [--numplanes NUMPLANES] \
    [--disps] [--psvs] 

You should set at most one of factor, width, or height to determine the output MPI resolution (factor will scale the input image size down an integer factor, eg. 2, 4, 8, and height/width directly scale the input images to have the specified height or width). numplanes is 32 by default. checkpoint is set to the downloaded checkpoint by default.

Example usage:

python imgs2mpis.py scenedir scenedir/mpis --height 360

3. Render novel views

You can either generate a list of novel view camera poses and render out a video, or you can load the saved MPIs in our interactive OpenGL viewer.

Generate poses for new view path

First, generate a smooth new view path by calling

python imgs2renderpath.py <your_scenedir> <your_posefile> \
	[--x_axis] [--y_axis] [--z_axis] [--circle][--spiral]

<your_posefile> is the path of an output .txt file that will be created by the script, and will contain camera poses for the rendered novel views. The five optional arguments specify the trajectory of the camera. The xyz-axis options are straight lines along each camera axis respectively, "circle" is a circle in the camera plane, and "spiral" is a circle combined with movement along the z-axis.

Example usage:

python imgs2renderpath.py scenedir scenedir/spiral_path.txt --spiral

See llff/math/pose_math.py for the code that generates these path trajectories.

Render video with CUDA

You can build this in the cuda_renderer/ directory by calling make.

Uses CUDA to render out a video. Specify the height of the output video in pixels (-1 for same resolution as the MPIs), the factor for cropping the edges of the video (default is 1.0 for no cropping), and the compression quality (crf) for the saved MP4 file (default is 18, lossless is 0, reasonable is 12-28).

./cuda_renderer mpidir <your_posefile> <your_videofile> height crop crf

<your_videofile> is the path to the video file that will be written by FFMPEG.

Example usage:

./cuda_renderer scenedir/mpis scenedir/spiral_path.txt scenedir/spiral_render.mp4 -1 0.8 18

Render video with Tensorflow

Use Tensorflow to render out a video (~100x slower than CUDA renderer). Optionally, specify how many MPIs are blended for each rendered output (default is 5) and what factor to crop the edges of the video (default is 1.0 for no cropping).

python mpis2video.py <your_mpidir> <your_posefile> videofile [--use_N USE_N] [--crop_factor CROP_FACTOR]

Example usage:

python mpis2video.py scenedir/mpis scenedir/spiral_path.txt scenedir/spiral_render.mp4 --crop_factor 0.8

Interactive OpenGL viewer

Controls:

  • ESC to quit
  • Move mouse to translate in camera plane
  • Click and drag to rotate camera
  • Scroll to change focal length (zoom)
  • 'L' to animate circle render path

The OpenGL viewer cannot be used in the Docker container.

You need OpenGL installed, particularly GLFW:

sudo apt-get install libglfw3-dev

You can build the viewer in the opengl_viewer/ directory by calling make.

General usage (in opengl_viewer/ directory) is

./opengl_viewer mpidir

Using your own poses without running COLMAP

Here we explain the poses_bounds.npy file format. This file stores a numpy array of size Nx17 (where N is the number of input images). You can see how it is loaded in the three lines here. Each row of length 17 gets reshaped into a 3x5 pose matrix and 2 depth values that bound the closest and farthest scene content from that point of view.

The pose matrix is a 3x4 camera-to-world affine transform concatenated with a 3x1 column [image height, image width, focal length] to represent the intrinsics (we assume the principal point is centered and that the focal length is the same for both x and y).

The right-handed coordinate system of the the rotation (first 3x3 block in the camera-to-world transform) is as follows: from the point of view of the camera, the three axes are [down, right, backwards] which some people might consider to be [-y,x,z], where the camera is looking along -z. (The more conventional frame [x,y,z] is [right, up, backwards]. The COLMAP frame is [right, down, forwards] or [x,-y,-z].)

If you have a set of 3x4 cam-to-world poses for your images plus focal lengths and close/far depth bounds, the steps to recreate poses_bounds.npy are:

  1. Make sure your poses are in camera-to-world format, not world-to-camera.
  2. Make sure your rotation matrices have the columns in the correct coordinate frame [down, right, backwards].
  3. Concatenate each pose with the [height, width, focal] intrinsics vector to get a 3x5 matrix.
  4. Flatten each of those into 15 elements and concatenate the close and far depths.
  5. Stack the 17-d vectors to get a Nx17 matrix and use np.save to store it as poses_bounds.npy in the scene's base directory (same level containing the images/ directory).

This should explain the pose processing after COLMAP.

Troubleshooting

  • PyramidCU::GenerateFeatureList: an illegal memory access was encountered: Some machine configurations might run into problems running the script imgs2poses.py. A solution to that would be to set the environment variable CUDA_VISIBLE_DEVICES. If the issue persists, try uncommenting this line to stop COLMAP from using the GPU to extract image features.
  • Black screen: In the latest versions of MacOS, OpenGL initializes a context with a black screen until the window is dragged or resized. If you run into this problem, please drag the window to another position.
  • COLMAP fails: If you see "Could not register, trying another image", you will probably have to try changing COLMAP optimization parameters or capturing more images of your scene. See here.

Citation

If you find this useful for your research, please cite the following paper.

@article{mildenhall2019llff,
  title={Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines},
  author={Ben Mildenhall and Pratul P. Srinivasan and Rodrigo Ortiz-Cayon and Nima Khademi Kalantari and Ravi Ramamoorthi and Ren Ng and Abhishek Kar},
  journal={ACM Transactions on Graphics (TOG)},
  year={2019},
}
Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Deep Hedging Demo Pricing Derivatives using Machine Learning 1) Jupyter version: Run ./colab/deep_hedging_colab.ipynb on Colab. 2) Gui version: Run py

Yu Man Tam 102 Jan 06, 2023
Source code for the paper "SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text" PACLIC 2021

Adversarial text generator Refer to "adversarial_text_generator"[https://github.com/quocnsh/SEPP_generator] project for generating adversarial texts A

0 Oct 05, 2021
PyTorch 1.0 inference in C++ on Windows10 platforms

Serving PyTorch Models in C++ on Windows10 platforms How to use Prepare Data examples/data/train/ - 0 - 1 . . . - n examples/data/test/

Henson 88 Oct 15, 2022
MediaPipe is a an open-source framework from Google for building multimodal

MediaPipe is a an open-source framework from Google for building multimodal (eg. video, audio, any time series data), cross platform (i.e Android, iOS, web, edge devices) applied ML pipelines. It is

Bhavishya Pandit 3 Sep 30, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning

Mammoth - An Extendible (General) Continual Learning Framework for Pytorch NEWS STAY TUNED: We are working on an update of this repository to include

AImageLab 277 Dec 28, 2022
Examples of using f2py to get high-speed Fortran integrated with Python easily

f2py Examples Simple examples of using f2py to get high-speed Fortran integrated with Python easily. These examples are also useful to troubleshoot pr

Michael 35 Aug 21, 2022
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021
A simple software for capturing human body movements using the Kinect camera.

KinectMotionCapture A simple software for capturing human body movements using the Kinect camera. The software can seamlessly save joints and bones po

Aleksander Palkowski 5 Aug 13, 2022
Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022
Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件,端到端地完成从训练到部署的全流程遥感深度学习应用。 最新动态 PaddleRS 即将发布alpha版本!欢迎大家试用 简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台,支持遥感图像分类,目标检测,图像分割,以及变化检测等常用遥

146 Dec 11, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". See below for an overview of

杨攀 93 Jan 07, 2023
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

GNet-pose Project Page: http://guanghan.info/projects/guided-fractal/ UPDATE 9/27/2018: Prototxts and model that achieved 93.9Pck on LSP dataset. http

Guanghan Ning 83 Nov 21, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
CUda Matrix Multiply library.

cumm CUda Matrix Multiply library. cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I de

49 Dec 27, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 50+ Papers across Computer Visio

Jonathan Choi 2 Mar 17, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
Misc YOLOL scripts for use in the Starbase space sandbox videogame

starbase-misc Misc YOLOL scripts for use in the Starbase space sandbox videogame. Each directory contains standalone YOLOL scripts. They don't really

4 Oct 17, 2021