!!html_class cgfs !!html_title Describing and Rendering a Scene - Computer Graphics from Scratch
In the last few chapters, we've developed algorithms to draw 2D triangles on the canvas given their 2D coordinates, and we've explored the math required to transform the 3D coordinates of points in the scene to the 2D coordinates of points on the canvas.
At the end of the previous chapter, we cobbled together a program that used both to render a 3D cube on the 2D canvas. In this chapter, we'll formalize and extend that work with the goal of rendering a whole scene containing an arbitrary number of objects.
Let's think again about how to represent and manipulate a cube, this time with the goal of finding a more general approach. The edges of our cube are 2 units long and are parallel to the coordinate axes, and it's centered on the origin, as shown in Figure 10-1.
These are the coordinates of its vertices:
::: description
The sides of the cube are square, but the algorithms we have developed work with triangles. One of the reasons we chose triangles in the first place is that any other polygon, including squares, can be decomposed into triangles. So we'll represent each square side of the cube using two triangles.
However, we can't take any three vertices of the cube and expect them to describe a triangle on its surface (for example, ADG is inside the cube). This means that the vertex coordinates, by themselves, don't fully describe the cube: we also need to know which sets of three vertices describe the triangles that make up its sides.
Here's a possible list of triangles for our cube:
A, B, C
A, C, D
E, A, D
E, D, H
F, E, H
F, H, G
B, F, G
B, G, C
E, F, B
E, B, A
C, G, H
C, H, D
This suggests a generic structure we can use to represent any object made of triangles: a Vertices
list, holding the coordinates of each vertex; and a Triangles
list, specifying which sets of three vertices describe triangles on the surface of the object.
Each entry in the Triangles
list may include additional information besides the vertices that make it up; for example, this would be the perfect place to specify the color of each triangle.
Since the most natural way to store this information is in two lists, we'll use list indices to refer to the vertices in the vertex list. So our cube would be represented like this:
Vertices
0 = ( 1, 1, 1)
1 = (-1, 1, 1)
2 = (-1, -1, 1)
3 = ( 1, -1, 1)
4 = ( 1, 1, -1)
5 = (-1, 1, -1)
6 = (-1, -1, -1)
7 = ( 1, -1, -1)
Triangles
0 = 0, 1, 2, red
1 = 0, 2, 3, red
2 = 4, 0, 3, green
3 = 4, 3, 7, green
4 = 5, 4, 7, blue
5 = 5, 7, 6, blue
6 = 1, 5, 6, yellow
7 = 1, 6, 2, yellow
8 = 4, 5, 1, purple
9 = 4, 1, 0, purple
10 = 2, 6, 7, cyan
11 = 2, 7, 3, cyan
Rendering an object with this representation is quite simple: we first project every vertex, storing them in a temporary projected vertices list (since each vertex is used an average of four times, this avoids a lot of repeated work); then we go through the triangle list, rendering each individual triangle. A first approximation would look like this:
RenderObject(vertices, triangles) {
projected = []
for V in vertices {
projected.append(ProjectVertex(V))
}
for T in triangles {
RenderTriangle(T, projected)
}
}
RenderTriangle(triangle, projected) {
DrawWireframeTriangle(projected[triangle.v[0]],
projected[triangle.v[1]],
projected[triangle.v[2]],
triangle.color)
}
We can go ahead and apply this directly to the cube as defined above, but the results won't look good. This is because some of its vertices are behind the camera, which, as we discussed in the previous chapter, is a recipe for weird things. And if you look at the vertex coordinates and Figure 10-1, you'll notice the coordinate origin, the position of our camera, is inside the cube.
To work around this problem, we'll just move the cube. To do this, we need to move each vertex of the cube in the same direction. Let's call this direction
To compute the translated version
At this point, we can take the cube, translate each vertex, and then apply the algorithm in Listing 10-1 to get our first 3D cube (Figure 10-2).
What if we want to render two cubes? A naive approach would be to create a new set of vertices and triangles describing a second cube. This would work, but it would waste a lot of memory. What if we wanted to render one million cubes?
A better approach is to think in terms of models and instances. A model is a set of vertices and triangles that describes a certain object in a generic way (think "a cube has eight vertices and six sides"). An instance of a model, on the other hand, describes a concrete occurrence of that model within the scene (think "there's a cube at (0, 0, 5)").
How do we apply this idea in practice? We can have a single description of each unique object in the scene and then place multiple copies of it by specifying their coordinates. Informally, it would be like saying, "This is what a cube looks like, and there's cubes here, here and there."
This is a rough approximation of how we'd describe a scene using this approach:
model {
name = cube
vertices {
...
}
triangles {
...
}
}
instance {
model = cube
position = (-1.5, 0, 7)
}
instance {
model = cube
position = (1.25, 2, 7.5)
}
In order to render this, we just go through the list of instances; for each instance, we make a copy of the model's vertices, translate them according to the position of the instance, and then render them as before (Listing 10-2):
RenderScene() {
for I in scene.instances {
RenderInstance(I);
}
}
RenderInstance(instance) {
projected = []
model = instance.model
for V in model.vertices {
V' = V + instance.position
projected.append(ProjectVertex(V'))
}
for T in model.triangles {
RenderTriangle(T, projected)
}
}
If we want this to work as expected, the coordinates of the vertices on the model should be defined in a coordinate system that "makes sense" for the object; we'll call this coordinate system model space. For example, we defined our cube such that its center was (0, 0, 0); this means that when we say "a cube located at (1, 2, 3)," we mean "a cube centered around (1, 2, 3)."
After applying the instance translation to the vertices defined in model space, the transformed vertices are now expressed in the coordinate system of the scene; we'll call this coordinate system world space.
There are no hard and fast rules to define a model space; it depends on the needs of your application. For example, if you have the model of a person, it might be sensible to place the origin of the coordinate system at their feet.
Figure 10-3 shows a simple scene with two instances of our cube.
The scene definition we described above doesn't give us a lot of flexibility. Since we can only specify the position of a cube, we could instantiate as many cubes as we wanted, but they would all be facing the same direction. In general, we want to have more control over the instances: we also want to specify their orientation and possibly their scale.
Conceptually, we can define a model transform with these three elements: a scaling factor, a rotation around the origin in model space, and a translation to a specific point in the scene:
instance {
model = cube
transform {
scale = 1.5
rotation = <45 degrees around the Y axis>
translation = (1, 2, 3)
}
}
We can extend the algorithm in Listing 10-2 to accommodate the new transforms. However, the order in which we apply the transforms is important; in particular, the translation must be done last. This is because most of the time we want to rotate and scale the instances around their origin in model space, so we need to do that before they're transformed into world space.
To understand the difference in the results, take a look at Figure 10-4, which shows a
Figure 10-5 shows the translation applied before the rotation
Strictly speaking, given a rotation followed by a translation, we can find a translation followed by a rotation (perhaps not around the origin) that achieves the same result. However, it's far more natural to express this kind of transform using the first form.
We can write a new version of RenderInstance
that supports scale, rotation, and position:
RenderInstance(instance) {
projected = []
model = instance.model
for V in model.vertices {
V' = ApplyTransform(V, instance.transform)
projected.append(ProjectVertex(V'))
}
for T in model.triangles {
RenderTriangle(T, projected)
}
}
The ApplyTransform
method looks like this:
ApplyTransform(vertex, transform) {
scaled = Scale(vertex, transform.scale)
rotated = Rotate(scaled, transform.rotation)
translated = Translate(rotated, transform.translation)
return translated
}
The previous sections explored how we can position instances of models at different points in the scene. In this section, we'll explore how to move and rotate the camera within the scene.
Imagine you're a camera floating in the middle of a completely empty coordinate system. Suddenly, a red cube appears exactly in front of you (Figure 10-6).
A second later, the cube moves 1 unit toward you (Figure 10-7).
But did the cube really move 1 unit toward you? Or did you move 1 unit toward the cube? Since there are no points of reference at all, and the coordinate system isn't visible, there's no way to tell just by looking at what you see, because the relative position of the cube and the camera are identical in both cases (Figure 10-8).
Now the cube rotates around you
What this thought experiment shows is that there's no difference between moving the camera around a fixed scene and keeping the camera fixed while rotating and translating the scene around it!
The advantage of this clearly self-centered vision of the universe is that by keeping the camera fixed at the origin and pointing at
Let's assume the camera also has a transform attached to it, consisting of translation and rotation. In order to render the scene from the point of view of the camera, we need to apply the opposite transforms to each vertex of the scene:
Note that we represent rotations using rotation matrices. Please refer to Appendix [ch:linear_algebra_appendix]{reference-type="ref" reference="ch:linear_algebra_appendix"} for more details about this.
Now that we can move both the camera and the instances around the scene, let's take a step back and consider everything that happens to a vertex
We first apply the model transform to go from model space to world space:
Then we apply the camera transform to go from world space to camera space:
Next, we apply the perspective equations to get viewport coordinates:
And finally we map the viewport coordinates to canvas coordinates:
As you can see, it's a lot of computation and a lot of intermediate values for each vertex. Wouldn't it be nice if we could reduce all of that to a more compact and efficient form?
Let's express the transforms as functions that take a vertex and return a transformed vertex. Let
Ideally, we'd like a single transform
Finding a simple way to represent
Consider the expression
But let's add a fourth value, called
Since points and vectors share the same representation, these four-component coordinates are called homogeneous coordinates. Homogeneous coordinates have a far deeper and far more involved geometric interpretation, but that's outside the scope of this book; here, we'll just use them as a convenient tool.
Manipulating points and vectors expressed in homogeneous coordinates is compatible with their geometric interpretation. For example, subtracting two points produces a vector:
Adding two vectors produces another vector:
In the same way, it's easy to see that adding a point and a vector produces a point, multiplying a vector by a scalar produces a vector, and so on, just as we expect.
So what do coordinates with a
Of all of these representations, we call the one with
So we can convert Cartesian coordinates to homogeneous coordinates, and back to Cartesian coordinates. But how does this help us find a single representation for all the transforms?
Let's begin with a rotation matrix. Converting a
$$\begin{pmatrix} A & B & C \ D & E & F \ G & H & I \end{pmatrix} \cdot \begin{pmatrix} x \ y \ z \end{pmatrix}
\begin{pmatrix} x' \ y' \ z' \end{pmatrix} \rightarrow \begin{pmatrix} A & B & C & 0 \ D & E & F & 0 \ G & H & I & 0 \ 0 & 0 & 0 & 1 \end{pmatrix} \cdot \begin{pmatrix} x \ y \ z \ 1 \end{pmatrix}
\begin{pmatrix} x' \ y' \ z' \ 1 \end{pmatrix}$$
A scaling matrix is also trivial in homogeneous coordinates, and it's constructed in the same way as the rotation matrix:
The rotation and scale matrices were easy; they were already expressed as matrix multiplications in Cartesian coordinates, we just had to add a
We're looking for a
Let's focus on getting
If we expand the vector multiplication, we get
From here we can deduce that
Following a similar reasoning for the rest of the coordinates, we arrive at the following matrix expression for the translation:
Sums and multiplications are easy to express as multiplications of matrices and vectors because they involve, after all, sums and multiplications. But the perspective projection equations have a division by
You may be tempted to think that dividing by
Fortunately, homogeneous coordinates do have one instance of a division: the division by the
Note that this matrix is
Applying the same reasoning we used to deduce the translation matrix, we can express the perspective projection as follows:
The last step is mapping the projected point on the viewport to the canvas. This is just a 2D scaling transform with
In fact, it's easy to combine this with the projection matrix to get a simple 3D-to-canvas matrix:
After all this work, we can express every transform we need to convert a model vertex
Now transforming a vertex is just a matter of computing the following matrix-by-point multiplication:
Furthermore, we can decompose the transform into three parts:
These matrices don't need to be computed from scratch for every vertex (that's the point of using a matrix after all). Because matrix multiplication is associative, we can reuse the parts of the expression that don't change.
A very high level of the scene rendering pseudocode would look like Listing 10-5.
RenderModel(model, transform) {
projected = []
for V in model.vertices {
projected.append(ProjectVertex(transform * V))
}
for T in model.triangles {
RenderTriangle(T, projected)
}
}
RenderScene() {
M_camera = MakeCameraMatrix(camera.position, camera.orientation)
for I in scene.instances {
M = M_camera * I.transform
RenderModel(I.model, M)
}
}
We can now draw a scene containing several instances of different models, possibly moving around and rotating, and we can move the camera throughout the scene. Figure 10-10 shows two instances of our cube model, each with a different transform (including translation and rotation), rendered from a translated and rotated camera.
We covered a lot of ground in this chapter. We first explored how to represent models made out of triangles. Then we figured out how to apply the perspective projection equation we derived in the previous chapter to entire models, so we can go from an abstract 3D model to its representation on the screen.
Next we developed a way to have multiple instances of the same model in the scene without having multiple copies of the model itself. Then we found out how to lift one of the limitations we had been working with so far: our camera no longer needs to be fixed at the origin of the coordinate system or pointing toward
Finally, we explored how to represent all the transforms we need to apply to a vertex as matrix multiplications in homogeneous coordinates, and this allowed us to reduce the computations required to render a scene by condensing many of the consecutive transforms into just three matrices: one for the perspective projection and viewport-to-canvas mapping, one for the instance transform, and one for the camera transform.
This has given us a lot of flexibility in terms of what we can represent in a scene, and it also allows us to move the camera around the scene. But we still have two important limitations. First, moving the camera means we can end up with objects behind it, which causes all sorts of problems. Second, the rendering doesn't look so great: it's still a wireframe image.
Note that for practical reasons we won't be using the full projection matrix in the rest of this book. Instead, we'll use the model and camera transforms separately and then convert their results back to Cartesian coordinates as follows:
This lets us do some more operations in 3D that can't be expressed as matrix transforms before we project the points.
In the next chapter, we'll deal with objects that shouldn't be visible, and then we'll spend the rest of this book making the rendered objects look better.