Watch Your Language Platinum Pass Full Conference Pass Full Conference One-Day Pass Date: Tuesday, November 19th Time: 11:00am - 12:45pm Venue: Plaza Meeting Room P2 Session Chair(s): Andrew Adams, Adobe Research Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures Abstract: 3D visual computing data are often spatially sparse. To exploit such sparsity, people have developed hierarchical sparse data structures, such as multi-level sparse voxel grids, particles, and 3D hash tables. However, developing and using these high-performance sparse data structures is challenging, due to their intrinsic complexity and overhead. We propose Taichi, a new data-oriented programming language for efficiently authoring, accessing, and maintaining such data structures. The language offers a high-level, data structure-agnostic interface for writing computation code. The user independently specifies the data structure. We provide several elementary components with different sparsity properties that can be arbitrarily composed to create a wide range of multi-level sparse data structures. This decoupling of data structures from computation makes it easy to experiment with different data structures without changing computation code, and allows users to write computation as if they are working with a dense array. Our compiler then uses the semantics of the data structure and index analysis to automatically optimize for locality, remove redundant operations for coherent accesses, maintain sparsity and memory allocations, and generate efficient parallel and vectorized instructions for CPUs and GPUs. Our approach yields competitive performance on common computational kernels such as stencil applications, neighbor lookups, and particle scattering. We demonstrate our language by implementing simulation, rendering, and vision tasks including a material point method simulation, finite element analysis, a multigrid Poisson solver for pressure projection, volumetric path tracing, and 3D convolution on sparse grids. Our computation-data structure decoupling allows us to quickly experiment with different data arrangements, and to develop high-performance data structures tailored for specific computational tasks. With 1/10 as many lines of code, we achieve 4.55x higher performance on average, compared to hand-optimized reference implementations. Authors/Presenter(s): Yuanming Hu, MIT CSAIL, United States of AmericaTzu-Mao Li, MIT CSAIL, UC Berkeley, United States of AmericaLuke Anderson, MIT CSAIL, United States of AmericaJonathan Ragan-Kelley, UC Berkeley, United States of AmericaFrédo Durand, MIT CSAIL, United States of America Staged Metaprogramming for Shader System Development Abstract: The shader system for a modern game engine comprises much more than just compilation of source code to executable kernels. Shaders must also be exposed to art tools, interfaced with engine code, and specialized for performance. Engines typically address each of these tasks in an ad hoc fashion, without a unifying abstraction. The alternative of developing a more powerful compiler framework is prohibitive for most engines. In this paper, we identify staged metaprogramming as a unifying abstraction and implementation strategy to develop a powerful shader system with modest effort. By using a multi-stage language to perform metaprogramming at compile time, engine-specific code can consume, analyze, transform, and generate shader code that will execute at runtime. Staged metaprogramming reduces the effort required to implement a shader system that provides earlier error detection, avoids repeat declarations of shader parameters, and explores opportunities to improve performance. To demonstrate the value of this approach, we design and implement a shader system, called Selos, built using staged metaprogramming. In our system, shader and application code are written in the same language and can share types and functions. We implement a design space exploration framework for Selos that investigates static versus dynamic composition of shader features, exploring the impact of shader specialization in a deferred renderer. Staged metaprogramming allows Selos to provide compelling features with a simple implementation. Authors/Presenter(s): Kerry A. Seitz, Jr., University of California, Davis, United States of AmericaTim Foley, NVIDIA, United States of AmericaSerban D. Porumbescu, University of California, Davis, United States of AmericaJohn D. Owens, University of California, Davis, United States of America Mitsuba 2: A Retargetable Forward and Inverse Renderer Abstract: Modern rendering systems are confronted with a dauntingly large and growing set of requirements: in their pursuit of realism, physically based techniques must increasingly account for intricate properties of light, such as its spectral composition or polarization. To reduce prohibitive rendering times, vectorized renderers exploit coherence via instruction-level parallelism on CPUs and GPUs. Differentiable rendering algorithms propagate derivatives through a simulation to optimize an objective function, e.g., to reconstruct a scene from reference images. Catering to such diverse use cases is challenging and has led to numerous purpose-built systems—partly, because retrofitting features of this complexity onto an existing renderer involves an error-prone and infeasibly intrusive transformation of elementary data structures, interfaces between components, and their implementations (in other words, everything). We propose Mitsuba 2, a versatile renderer that is intrinsically retargetable to various applications including the ones listed above. Mitsuba 2 is implemented in modern C++ and leverages template metaprogramming to replace types and instrument the control flow of components such as BSDFs, volumes, emitters, and rendering algorithms. At compile time, it automatically transforms arithmetic, data structures, and function dispatch, turning generic algorithms into a variety of efficient implementations without the tedium of manual redesign. Possible transformations include changing the representation of color, generating a "wide" renderer that operates on bundles of light paths, just-in-time compilation to create computational kernels that run on the GPU, and forward/reverse-mode automatic differentiation. Transformations can be chained, which further enriches the space of algorithms derived from a single generic implementation. We demonstrate the effectiveness and simplicity of our approach on several applications that would be very challenging to create without assistance: a rendering algorithm based on coherent MCMC exploration, a caustic design method for gradient-index optics, and a technique for reconstructing heterogeneous media in the presence of multiple scattering. Authors/Presenter(s): Merlin Nimier-David, EPFL, SwitzerlandDelio Vicini, EPFL, SwitzerlandTizian Zeltner, EPFL, SwitzerlandWenzel Jakob, EPFL, Switzerland Automatically Translating Image Processing Libraries to Halide Abstract: This paper presents Dexter, a new tool that automatically translates image processing functions from a low-level general-purpose language to a high-level domain-specific language (DSL), allowing them to leverage cross-platform optimizations enabled by DSLs. Rather than building a classical syntax-driven compiler to do this translation, Dexter leverages recent advances in program synthesis and program verification, along with a new domain-specific synthesis algorithm, to translate C++ image processing code to the Halide DSL, while guaranteeing semantic equivalence. This new synthesis algorithm scales and generalizes to much larger and more complex functions than prior work, including the ability to handle tiling, conditionals, and multi-stage pipelines in the original low-level code. To demonstrate the effectiveness of our approach, we evaluate Dexter using real-world image processing functions from Adobe Photoshop, a widely-used multi-platform image processing program. Our results show that Dexter can translate 264 out of 353 functions in our test set, with the original implementations ranging from 20 to 150 lines of code. By leveraging Halide’s advanced auto-scheduling capabilities, we demonstrate median speedups of 7.03x and 4.52x for Dexter-translated functions as compared to the original implementations on Intel and ARM architectures, respectively. Authors/Presenter(s): Maaz Bin Safeer Ahmad, University of Washington, United States of AmericaJonathan Ragan-Kelley, University of California, Berkeley, United States of AmericaAlvin Cheung, University of California, Berkeley, United States of AmericaShoaib Kamil, Adobe, United States of America Back