Abstract
Virtual reality (VR) has been proposed for various purposes such as design studies, presentation, simulation and communication in the field of computer-aided architectural design. This paper explores new roles for VR; in particular, we propose rendering methods that consist of post-processing rendering, segmentation rendering and shadow-casting rendering for more-versatile approaches in the use of data. We focus on the creation of a dataset of annotated images, composed of paired foreground-background and semantic-relevant images, in addition to traditional immersive rendering for training deep learning neural networks and analysing landscapes. We also develop a camera velocity rendering method using a customised segmentation rendering technique that calculates the linear and angular velocities of the virtual camera within the VR space at each frame and overlays a colour on the screen according to the velocity value. Using this velocity information, developers of VR applications can improve the animation path within the VR space and prevent VR sickness. We successfully applied the developed methods to urban design and a design project for a building complex. In conclusion, the proposed method was evaluated to be both feasible and effective.
Keywords
Introduction
As part of the Fourth Industrial Revolution1,2 and Society 5.0, 3 huge amounts of information (big data) generated by sensors and Internet of Things devices in the physical world will be accumulated in the virtual world. Artificial intelligence will be used to analyse these data, and the results will be fed back in various forms to humans in the physical world. 4
Visual simulations are extremely useful and powerful for the decision-making involved in architectural and urban design. 5 Virtual reality (VR), which has the features of presence, interaction and autonomy, 6 has been applied in this domain and was selected as one of the 14 Grand Challenges for Engineering in the 21st Century. 7 VR can construct past and future virtual worlds using three-dimensional (3D) computer graphics (CG) objects and can provide interactive movies via real-time rendering processing. VR is already used for various purposes such as the design studies, presentation, simulation and communication in the fields of computer-aided architectural design (CAAD) and architecture, engineering and construction.8–12 This research explores new challenges for VR and considers the following situations and problems.
Various kinds of training must be performed for autonomous driving and automated inspection systems.13,14 Training using the physical world and real objects is relevant but has limitations, including the availability of data and high costs. On the other hand, training using virtual objects can increase the variety in types of training data by allowing the parameters to be changed on a computer, thereby reducing costs. Deep learning, a popular subdiscipline of machine learning due to its high performance, has been applied in CAAD.15 –17 A deep learning approach to segmenting objects in images uses a convolutional neural network (CNN), which requires a large amount of training data for learning, especially for supervised learning. Photographs are often used for learning, but a considerable amount of time and expense is required for collecting a large number of photographs. Moreover, it is also necessary to annotate their paired foreground–background (binary) and parts-based (categorical) shape images for learning. Using a 3D virtual model within a VR space makes it easy to change parameters such as the position and orientation of the virtual camera and the properties of objects, thereby making it possible to readily, automatically and rapidly create a large amount of image data for learning, provided that the accuracy of the 3D model is sufficient. For example, a 3D model with texture mapping using photographs can produce realistic virtual objects that are sufficiently similar to real objects. 18 Moreover, physically based rendering, a method for rendering photorealistic CG objects based on actual physical phenomena, has recently become possible in real-time. 19 However, a system has not yet been developed that can easily generate binary pairs and categorical shape images (Problem A).
Digital representations of physical buildings and urban spaces can be built within a 3D virtual world and used for monitoring and optimising the physical world and real objects. 20 VR is thus expected to act as an interface capable of presenting information to users in the physical world in an easy-to-understand way that utilises various expressions during feedback sessions. 21 When simulating landscapes during architectural and urban design processes, design targets are usually studied in an immersive rendered VR space. However, with such rendering, it is difficult to intuitively understand analytic results for virtual landscapes due to factors such as the distance from the viewpoint to the designed object (i.e. the visual distance, categorised as short, medium and long 22 ) and the shadow-casting duration per point (Problem B). Relevant visualisation considering these factors allows users to more intuitively understand these results.
In recent years, inexpensive head-mounted displays (HMDs) have become commercially available, allowing general users to easily experience VR content. Also, the creation of VR content by using building information modelling (BIM) and 3D CAD software has become increasingly popular in CAAD owing to lower prices and improved production environments. However, the increased use of VR has resulted in a higher incidence of VR sickness. 23 A likely cause of VR sickness is the mismatch between a user’s vision and vestibular senses. 24 During a VR session, there is no vestibular input because the user’s body does not move in real space, while the field of view moves according to movement within the VR space. VR sickness develops because the visual and vestibular inputs do not match. Lo et al.25,26 found that the severity of VR sickness varies depending on the velocity and direction of the movement within the VR space. Guidelines have been developed to prevent and reduce VR sickness, 27 but the severity of VR sickness is subjective, and the camera settings within a VR application largely depend on the experience of its creators. To ensure the quality of VR content and prevent VR sickness, it is necessary to have a function that can visualise movement within VR space during the VR production process (Problem C).
Therefore, the objective of this research is to develop rendering methods for a VR platform to meet the challenges of VR. Our developed rendering methods enable approaches that are more versatile than shaded and textured rendering, which is traditionally used in VR. First, we develop three methods for training deep learning CNNs and analysing landscapes, namely, post-processing rendering, segmentation rendering and shadow-casting rendering. Next, we develop a camera velocity rendering method using a customised segmentation rendering technique that calculates the linear and angular velocities of the virtual camera within a VR space at each frame and overlays a colour on the screen according to the velocity value. This feature will enable creators to identify where in a virtual scene VR sickness is likely to occur.
Literature review
Use of VR in CAAD
VR can replace the physical world with a computer-generated one by using devices such as an HMD, a multiple-screen projection system, and data gloves. The output is very useful for purposes such as visualisation and simulation, and so many applications of VR have been proposed in CAAD, such as simulation,28,29 collaboration,30,31 urban design,32,33 education34,35 and digital heritage.36 –39 The usefulness of VR in large-scale architectural and urban design processes has been demonstrated for various purposes, such as walk-through, comparison of design alternatives, and dynamic real-time simulation.40,41 Such VR content has usually been represented by shaded and textured rendering.
Creating training data for deep learning using VR
CAAD is rapidly emerging as an application domain for deep learning.15 –17 Deep learning techniques such as semantic segmentation 42 and instance segmentation, 43 which are used to detect and outline urban and architectural components, have been applied to autonomous driving, 13 automated inspection, 14 urban visual quality assessment 44 and mixed reality. 45 To improve the detection accuracy of deep learning techniques, it is effective to collect a large number of images to serve as a training set. In this research, we focus on a method for generating images based on a 3D virtual model in a VR system rather than the conventional method of using real-world photographs. Previous studies have shown that using such generated images can be used instead of real-world photographs for detection of objects in deep learning.18,46 However, to date, only a few studies have been carried out. Further development of VR rendering methods for creating training images for semantic and instance segmentation is necessary to solve Problem A.
VR rendering methods other than immersive rendering
VR is useful for displaying spatially complex structures in computational science in a way that makes them easy to understand and study. Scientific visualisation has been achieved by colouring invisible targets according to target types and properties, in addition to immersive rendering as if they were physical world and objects. Temperature distribution and wind flow,47,48 signage visibility 49 and illuminance in lighting design 50 have been visualised in VR systems using heatmaps, streamlines, particles and isosurfaces.
Recently, a programmable shader technique has been developed that is capable of executing shading, a process required for rendering 3D models, using a computer’s graphics processing unit (GPU) in real-time.51,52 The processing load on the GPU can be changed dynamically using a programmable shader. The rendering expression can also be easily extended.
Prevention of VR sickness
Concerns about VR sickness are increasing along with the commercialisation and increasing performance of VR systems. 23 VR sickness occurs while experiencing VR content and has symptoms similar to general motion sicknesses, such as vomiting, coldness and numbness in limbs, nausea, sweating and headache.53,54 VR camerawork, in particular, has a large impact on VR sickness.25,26 In a prior study, VR sickness was investigated by having subjects complete the Simulator Sickness Questionnaire 55 as psychological evaluation and measuring heart rate variability as a physiological evaluation. However, measurement and visualisation of the virtual camera within the VR space are not common in practical VR production processes. Research on reducing VR sickness by galvanic vestibular stimulation is ongoing. 56 While the severity of VR sickness depends on the individual, collecting data such as the linear and angular velocities of the VR camera during user tests is expected to provide useful information through statistical processing. In the present study, we aim to develop functions to measure and visualise the virtual camera in VR content, including linear and angular velocities.
Proposed rendering methods for VR
To tackle the problems described in Section 1, we first develop three methods to address Problems A and B, namely, post-processing rendering, segmentation rendering and shadow-casting rendering. Next, to address Problem C, we develop a customised segmentation rendering technique that calculates the linear and angular velocities of the virtual camera within a VR space and overlays a colour on the screen according to the velocity value.
The VR software UC-win/Road, which has a customisable rendering function, is selected as the VR development platform in this research. Although other VR platforms also have such customisable functions and can be similarly used for development, these platforms can generally execute rendering processing on a screen only once, while UC-win/Road can execute processing multiple times. It can also implement pre-processing and post-processing rendering, both as a whole or for a particular scene. Furthermore, it can define variable values to be delivered to the shader when rendering objects, which can be used on the shader processing side (Figure 1).

General rendering processing flow.
An overview of the segmentation rendering method is presented in Section 3.2. VR images can be divided into regions and categories corresponding to different objects or parts of objects in real-time. To develop a segmentation method for each type of 3D object constituting urban components, objects are separated into categories such as Road, River, Intersection, Model, RoadSideModel, Character, Car and Sign. As a lower-level hierarchy, Road is further separated into subcategories such as Guardrail, Road, Bridge, Tunnel and Sidewalk. Similarly, Model is further separated into subcategories such as Vehicle, Building, Structure, TrafficLight and Plant. (Figure 2). The shader program is implemented in the OpenGL Shading Language (GLSL) and can be customised by the user. These methods can be implemented as plug-in software on UC-win/Road and can be switched on within the VR platform (Figure 3).

The defined object structure for segmentation.

A dialogue box for switching rendering methods in the VR display.
Post-processing rendering
We propose a post-processing rendering method as the simplest customisable rendering process (Figure 4). In the pre-processing phase, rendering is performed in a framebuffer, which is a temporary rendering target, post-processing is performed from the rendering result, and the output is then shown on the screen.

The processing flow of the post-processing rendering.
Segmentation rendering
To generate material for training deep learning CNNs and analysing landscapes, the segmentation rendering method segments 3D virtual objects by colouring each object (Figure 5). In the pre-processing phase of the rendering process flow, a framebuffer is created as a temporary rendering target, and a shader is applied that is programmed to write colour information, normal information, depth information and property values of the scene to five buffer textures. Next, in the object drawing process, the hue values are calculated according to the type, velocity information and acceleration information of the objects, along with the distance from the viewpoint and height information of the terrain, and the results are returned as property values. We develop the processing flow to calculate the final output using the rendered texture in the post-processing phase of the rendering process flow. Accordingly, it is possible to switch and execute segmented colouring, the velocity and acceleration of the objects, distance from the viewpoint and height from the ground using traditional photographic rendering by changing the settings for post-processing.

The processing flow of the segmentation rendering.
The internal-rendering flow of the proposed segmentation rendering method is shown in Figure 6. The input data (matrices, vertices, normals and property values) are sent to the GPU. In the first texture-rendering phase, the vertex-shader process consists of calculating screen positions and normals and then calculating colours by vertex. The fragment-shader process consists of obtaining texture colour, multiplying the colour calculated in the vertex shader, and then sending the colours, depths, normals and property values to screen-space textures. In the next segmentation mixing phase, the vertex-shader process consists of the outputting the quad to the screen. The fragment-shader process consists of obtaining colours, normals, depths and property values from screen-space textures and then setting pixel colours based on these values.

Segmentation rendering: Internal-rendering flow.
Shadow-casting rendering
A multi-shadow rendering method simulates the effects of shadow-casting during a full day for a VR city model (Figure 7). In conventional shadow rendering processing, the scene colour and the shadow colour are mixed and displayed. In the proposed method, on the other hand, the scene colour is not used and the parts where shadows fall are black and the parts where the sun shines are white (Figure 8, left). Next, time-varying scenes are rendered multiple times and their rendered results are blended. Then, the scene is rendered such that the parts often illuminated with sunlight are white and the parts often covered in shadow are black. The result is then temporarily written to a framebuffer (Figure 8, middle). Finally, the calculation results are coloured in the post-processing phase and displayed on the screen in the form of a heat map to clearly show the effects of the sunshine. (Figure 8, right).

The processing flow of multi-shadow rendering.

Outputs at each rendering step.
Camera velocity rendering
VR content, including the camera path, are defined during pre-processing. Then, during the generation of the VR content, the camera posture is acquired for each frame, and the change in camera posture from the immediately preceding frame is calculated. The linear velocity (m/s) and the angular velocity (deg/s) are calculated from the change in the position and the change in the angle, respectively (Figure 9).

The processing flow of the measurement method for the VR camera’s linear and angular velocities.
The calculation method is as follows. First, an OpenGL model-view matrix is obtained from each frame. The camera posture matrix MTr is calculated as the inverse matrix of the model-view matrix:
where MMV is the model-view matrix.
Next, the product of the model-view matrix of the immediately preceding frame and the camera posture matrix is calculated, and the relative camera posture matrix MRelTr, based on the camera posture of the immediately preceding frame, is also calculated. Therefore, the output linear velocity and angular velocity are not values in the world coordinate system, but rather values in the local coordinates of the camera posture:
where MprevMV is the model-view matrix of the immediately preceding frame and Mtransfrom is the camera posture matrix.
From the calculated matrix, the changes are calculated as follows. The yaw, pitch, and roll of the angular velocity are calculated as Euler angles in the following rotational order: Y-axis, X-axis, then Z-axis. The angular velocity is converted into a frequency.
The calculated amount of change is multiplied by the frame rate and is output as linear velocity and angular velocity:
where fr is the frame rate.
In the calculation process of the first frame, the linear velocity and the angular velocity cannot be calculated because the model-view matrix of the immediately preceding frame does not exist (output as NaN [not a number]). Moreover, when an animation path of the VR camera is switched to another path in a sequence, the camera posture of the immediately preceding frame and the camera posture of the next frame rapidly change, resulting in a pulse-like change. In addition to the calculation process of the VR camera, we propose a customised segmentation rendering technique that overlays a colour on the screen according to the linear or angular velocity value (Figure 10).

The processing flow of a customised segmentation rendering that overlays a colour on the screen according to the linear or angular velocity.
Evaluation and results
We evaluated the rendering methods presented in Section 3 by applying them to actual architectural and urban VR applications. We loaded the developed plug-in software and VR content into UC-win/Road, executed the rendering methods according to the usage scene, and observed the results. We used a laptop PC (GALLERIA GCR1660TGF-QC) with Intel Core i7-9750H processor, 16.0 GB of RAM, an Nvidia GeForce GTX1660Ti and a 1920 × 1080 display, running Microsoft Windows 10.
Application of post-processing, segmentation and shadow-casting rendering
We applied the developed system to VR applications in urban design and successfully executed a wide variety of rendering methods (Figure 11). Various kinds of VR real-time rendering and user interactions could be observed using large-scale 3D-VR content.

A wide variety of developed renderings in VR.
The post-processing rendering could be output by shaders in various forms, such as a set of ball-like rendering, focused rendering, monochrome rendering, posterisation rendering and artistic rendering. This will also contribute to realising an advanced user experience.
The object segmentation rendering achieved segmentation between each type of 3D object as described in Section 3.1 (Problems A and B). Using the VR segmentation rendering method, binary and categorical shape images could be easily created by changing the VR camera viewpoint (automatic camera movement was possible). The combined immersive and object segmentation rendering method was accomplished by mixing traditional immersive rendering and segmentation rendering, based on the types of 3D objects that were described in Section 3 (Problems A and B). The user could experience VR while simultaneously observing the appearance of the objects and their types.
Segmentation rendering of object velocity was successful for 3D objects such as cars, bikes and pedestrians (Problems A and B). Similarly, the object acceleration segmentation rendering achieved segmentation according to the acceleration of moving 3D objects (Problems A and B). While in traditional shaded and textured rendering it is difficult to intuitively understand the velocity and acceleration values of moving objects, the proposed VR rendering methods enabled intuitive understanding of these values.
Distance segmentation rendering was achieved according to the distance from the VR camera, corresponding to the viewpoint of pedestrians and drivers operating the VR system (Problems A and B). Height segmentation rendering was achieved according to the height from the ground level of the terrain (Problem B). In the landscape design process, it is important to understand the distance of landscape components from various viewpoints. Landscape components look different depending on whether the distance from the viewpoint is short, medium, or long. 22 While this kind of intuitive landscape study is difficult when using traditional immersive rendering, the proposed VR rendering methods enables intuitive understanding of the distance from the viewpoint to each element and the height from the ground surface.
The shadow-casting rendering achieved expression according to the duration of shadow-casting for each point in the VR virtual world at the summer and winter solstices (Problem B). This VR rendering method performed more sophisticated shadow simulation and could support a more detailed design.
Application of camera velocity rendering
We applied the camera velocity rendering method to a building design project in a VR application. The building complex includes a library, a hall, a foyer, a cafe, conference rooms and a Japanese-style tatami room. In the design stage, the building owner (a local government) created VR content in collaboration with one of the authors for reviewing the design with the designers and administrators and for explaining and publicising the building project to the general public. In particular, the entrance space of the library is circular. When the VR animation path in the library moved along a curved path on which furniture such as bookshelves and desks was laid out, the amounts of change in the linear and angular velocity of the VR camera were observed in a way that would be useful for preventing VR sickness.
The ‘Camera log’ option in the ‘Movie Manager Options’ of UC-win/Road was turned on to output the VR camera velocities (Figure 12). Next, when the VR animation was started, the camera posture was acquired for each frame, the change in the camera posture from the immediately preceding frame was calculated, the linear velocity was calculated from the change in position, and the angular velocity was calculated from the change in angle. The calculated X, Y and Z components of the linear velocity of the camera and the yaw, pitch and roll components of its angular velocity were output to a spreadsheet for each frame (Figure 13).

Developed user interface for VR camera measurement.

Spreadsheet of the VR camera’s linear and angular velocity for each frame.
The user can switch the rendering methods to the camera’s linear or angular velocity by using the dialogue box in the VR screen shown in Figure 3. The maximum and minimum values can be set to automatically change the colours on the screen depending on the linear and angular velocity values. Line graphs and VR screenshots of the linear and angular velocities of the VR camera for 451 frames are shown in Figures 14–17.

Line graphs and VR screenshots of linear and angular velocities of the VR camera in the building complex project (Frames 1–91).

Line graphs and VR screenshots of linear and angular velocities of the VR camera in the building complex project (Frames 92–211).

Line graphs and VR screenshots of linear and angular velocities of the VR camera in the building complex project (Frames 212–331).

Line graphs and VR screenshots of linear and angular velocities of the VR camera in the building complex project (Frames 332–451).
Limitations
As described in Sections 4.1 and 4.2, the developed rendering methods could be observed in VR software, and the results were verified. In the future, the validity of the methods should be evaluated based on actual usage scenarios in architectural and urban design. The maximum and minimum values for the output VR screen shown in Figures 14–17 were set so that there was a clear colour difference for verification of the developed rendering method. Reasonable settings of the maximum and minimum for preventing VR sickness need to be decided through user tests. The occurrence of VR sickness depends on the individual, but collecting data during user tests is expected to provide useful information through statistical processing.
Conclusion
This paper described the current situation and problems of VR and presented the development of novel VR rendering methods on a VR platform that enable approaches that are more versatile than traditional immersive rendering methods used in VR.
We successfully developed rendering methods consisting of post-processing rendering, segmentation rendering and shadow-casting rendering using shader technology, and applied the methods to obtain a wide variety of rendering outputs in a VR application for an urban design project. The developed system can easily create binary and categorical shape images, in addition to a traditional shaded and textured image that can be used to train deep learning CNNs. It can also be applied to landscape analyses such as calculating the distance from the main viewpoint the design targets and the shadow-casting duration per point in VR urban spaces.
We successfully developed a customised segmentation rendering technique and applied it to a VR application in a design project for a building complex. During the execution of the VR application, the VR camera posture was acquired for each frame, and the change in VR camera posture from the immediately preceding frame was calculated. The calculated linear and angular velocities were overlaid as colour on the VR screen according to the velocity value and were also output to a spreadsheet for each frame. This information can be used to improve VR animation to prevent VR sickness.
The developed rendering methods were verified in VR software. In the future, the validity of the methods should be evaluated based on actual usage scenarios in CAAD.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partly carried out as part of the 2017-2018 research activities of the World16 international research group on virtual reality (Yoshihiro Kobayashi, Kostas Terzidis, Marc Aurel Schnabel, Paolo Fiamma, Amar Bennadji, Thomas Tucker, Dongsoo Choi, Matthew Swarts, Ruth Ron, Taro Narahara, Wael Abdelhameed, Marcos Novak, and Tomohiro Fukuda). The research was partly supported by joint research funding from the Sakaiminato City Office and Osaka University. The authors wish to acknowledge the support received from Forum8 Co. Ltd. All images were created by the authors from January to June 2020.
