Hidden surface elimination the easy way. Randi J. Rost.
One of the most studied problems in the rapidly growing field of computer graphics is that of hidden surface elimination. This involves detecting which surfaces of objects are hidden by other surfaces and hence should not be drawn. In this article, we take a look at one of the simplest hidden surface algorithms and at a program that implements this algorithm. Along the way, we talk about the perspective transformation itself and clear up some of the jargon about the different coordinate systems we use. Matrix Transformations
The three basic transformations we will be using are scaling, rotation, and translation. Both John D. Fowler [1,2] and Christopher Hansen [3] demonstrated ways of handling these transformations in earlier issues of Creative Computing. A cleaner, more uniform way to represent these transformation equations is by using matrices.
Figure 1 shows the transformation equations along with their matrix representation for the two-dimensional case. Adding a third element to the point (x,y) allows us to multiply it by a 3 x 3 transformation matrix. (Matrix multiplication is defined only for matrices in which the number of columns in the first matrix equals the number of rows in the second.) If your linear algebra is a little rusty, you may want to carry out the matrix multiplications indicated in Figure 1 to verify that the matrix representation provides the same results as the equations.
Only rarely would we want to perform just one of the simple transformations. For instance, if a triangle described by the points v1=(x1,y1), v2=(x2,y2), and v3=(x3,y3) is to be doubled in size but remain with one vertex at v1, three transformations are needed. We cannot just multiply all of the coordinates by two. The result would be a triangle that is twice as large, but would be fixed at the point (2.sup.*.x1,2.sup.*.y1) instead of at (x1,y1). To obtain the desired result, we first need a translation by (-x1,-y1) to translate the triangle to the origin. Then we can scale all the coordinates by two and translate by (x1,y1). This moves our properly scaled triangle back to point v1.
Two or more transformations can be concatenated (combined) to yield a single transformation. This concatenated transformation will give the same result as the separate, simple transformations, provided the sequence of transformations is kept intact. Figure 2 shows that the transformation process is not commutative, since reversing the order of transformations can yield entirely different results.
Using matrices allows us to concatenate transformations quite easily. The individual transformation matrices are multiplied together from left to right in the order in which they are to occur. The result is a single transformation which contains all of the sequence information. This matrix can now be used to transform any number of individual points without having to calculate all the transformation information each time. For a long series of transformations with rotations involving sines and cosines, the savings can be considerable. And we would all like our microcomputers to run at lightning speed, wouldn't we?
The transformation matrices can be generalized to three dimensions as well. Each point in three dimensions is represented as a 1 x 4 row vector that looks like (x y z 1). Similarly, the transformation matrix increases in size to a 4 x 4 matrix. In three dimensions, we have three different kinds of rotation with which to be concerned. It now becomes possible to rotate around the x-axis, the y-axis, or the z-axis instead of simply about the origin as in two dimensions. Matrices to handle these three types of rotation as well as scaling and translation are shown in Figure 3. Coordinate Systems
Now that we have developed the matrix representation needed to describe transformations, we need to define some terms so our discussion may proceed. All of the points to which we have referred so far have been points in a cartesian coordinate system. The units on this coordinate system vary from application to application. An architect may define his coordinate system to have the origin at the front right corner of the building, with the x-axis measuring distance left, the y-axis measuring depth, and the z-axis measuring height. Perhaps he will decide to express the units in meters (or even feet, as is the custom in some backward countries). An astronomer might wish to have the origin of his coordinate system at the center of the sun and the units be in light years or parsecs (contrary to popular belief, these are units of distance, not time).
This somewhat arbitrary choice of a coordinate system corresponds to some individual's view of objects in the real world. For this reason, this coordinate system, once defined, is known as the world coordinate system. The object to be displayed and the position from which it will be viewed can both be given in world coordinates.
The image will eventually have to be displayed using the screen coordinates of a specific display device. The screen coordinate system is a two-dimensional coordinate system that generally uses x to denote horizontal screen position and y to denote vertical position. The Apple screen coordinate system ranges from 0 to 279 along the x-axis and from 0 to 191 along the y-axis.
To calculate the position on the screen of a particular point on the object being displayed, it is necessary first to transform the point into the eye coordinate system. This system has the eye centered at the origin and the z-axis pointed in the direction of view. This gives us coordinates very close to the form needed for displaying on the screen. The x and y axes will be aligned with the x and y axes of the screen, and the z-axis will indicate a depth into the scene (distance to the object.) The Viewing Transformation
Figure 4 demonstrates the steps necessary to transform points in the world coordinate system into points in the eye coordinate system. This is known as the viewing transformation.
The first step (Figure 4a) is to translate the origin to the position of the eye. Any transformation that moves the entire coordinate system is the inverse of the corresponding transformation that moves points. Therefore it is necessary to use negative values for the x, y, and z translation factors.
The second step (Figure 4b) is to rotate clockwise by -90[deg.] about the x-axis to align the y-axis vertically. Next, (Figure 4c) a rotation about the y-axis is performed so that the projection of the z-axis onto the xy-plane will be pointing away from the origin. Another rotation about the x-axis is performed (Figure 4d) to orient the z-axis so that it now points directly away from the origin. The final step (Figure 4e) is to invert the z-axis to point directly at the origin.
We now have coordinates for the object in the eye coordinate system so we are nearly ready to display it. All that is necessary now is to scale the object so that it is an appropriate size on the screen and convert the points into screen coordinates. This last conversion from eye to screen coordinates is accomplished using the following equations: x.sub.s = (x.sub.e./z.sub.e.).sup.*.px+c1 y.sub.s = (y.sub.e./z.sub.e.).sup.*.py+c2 where (X.sub.e.,Y.sub.e.,Z.sub.e.) is the point in eye coordinates, (c1,c2) is the point corresponding to the middle of the display screen, px is the number of pixels horizontally and py is the number of pixels vertically. The point (X.sub.s.,Y.sub.s.) can now be plotted on the screen. This transformation takes straight lines in the eye coordinate system to straight lines in the screen coordinate system. Therefore it is necessary only to transform the endpoints of an edge of an object into screen coordinates, then draw the line connecting them. Hidden Surfaces Be Gone!
One way to make a complicated task easier is to restrict the conditions under which it is guaranteed to work. In this case we will limit the class of objects on which our hidden surface algorithm will work to the set of convex polyhedra. A convex polyhedron is a solid bounded by many faces in which none of the interior angles is greater than 180[deg.]. Thus cubes, solid rectangles, and pyramids are all convex polyhedra, while an L-shaped house is not, since the interior angle at the bend of the L is 270[deg.].
This restriction on the type of object makes it easy to determine which surfaces are visible and which are not. All we have to do is eliminate surfaces of the object that are facing away from the viewer. The geometry of a convex polyhedron guarantees that such faces will be obscured by other faces that are pointed toward the viewer.
How do we determine whether or not a surface is oriented toward the viewer? The first step is to compute an outward-facing normal for the face of the object under consideration. You will recall that a vector normal to a plane is just a vector that is perpendicular to the plane. You may also recall that taking the cross product of two vectors always yields a third vector that is normal to the first two vectors.
Order is important in the cross product operation though. Figure 5 shows the vertices of a square face and two vectors, V and W, that have been defined by the points A, B, and C. Taking V x W results in a vector that is pointing directly out of the page, with its initial point at A. Taking W x V will yield a vector that is also based at A, but pointing directly into the page.
A mnemonic to aid in remembering the orientation of a cross product is the "right hand rule." It states that if you place the bottom edge of your right hand along the first vector and curl your fingers in the direction of the second vector, your right thumb will point in the direction of the cross product vector. Clearly, if we always specify the vertices of a face in counterclockwise order when viewed from the outside, we can always use the first three points to form vectors V and W and V x W will be an outward-facing normal.
Whew! That sounded pretty tough, but really it is not so bad. A vector is formed by taking the x, y, and z components of one point and subtracting the x, y, and z components of the second point. Imagine that the square in Figure 5 is one of the faces of our object. We can get one vector (V) by taking B-A and the second vector (W) by taking C-A. These computations must be performed in the eye coordinate system, since we are concerned with the relationship between the eye and a specific face of the object.
Now we have our outward-facing normal, so let's get a vector from the eye to the base of that normal. The normal we just computed will have its initial point at the first vertex of the face (point A), and the eye will be at the point (0, 0, 0). Therefore, the components of this eye-to-object vector will be the same as the x, y, z components of point A.
Figure 6 shows a viewer looking at the roof of a house. Normals have been computed for both parts of the roof, and the vector between the eye and the base of each normal is also shown. Guess what? The angle between the eye vector and the normal vector is acute (less than 90[deg.]) for the face that is visible, and obtuse (greater than 90[deg.]) for the face that isn't visible. That is all there is to this hidden surface algorithm. All we need to do is compute those two vectors and check the angle between them for each face of the object.
If the angle is obtuse, we throw that face out and go on to the next one. If it is acute, we draw it. Simple, huh?
How do we tell if the angle is greater than 90[deg.]? Two more tricks will help, one each from trigonometry and linear algebra. First, the cosine of any angle between 90[deg.] and 180[deg.] is negative. Second, an easy way to compute the cosine of the angle involved is to take the dot product of the two vectors and divide by the length of each vector. The dot product of two vectors U and V is written as U V and is just U.V = (ux uy uz).(vx vy vz) = ux.sup.*.vx + uy.sup.*.vy + uz.sup.*.vz Since we need to know only whether the cosine is positive or negative, we don't even have to divide the dot product by the lengths of the two vectors. We can simply compute the dot product of the two vectors, and if the result is less than zero, the face is hidden from view and need not be drawn. Degrees of Freedom
There are actually seven parameters, or "degrees of freedom" that must be specified to display an object as seen from any vantage point. Three of these are involved in the difference between the position of the object and the viewing location. Two more degrees of freedom are defined by specifying either two angles or a point toward which the gaze is directed. For instance, when viewing a house, the image is shifted up and down or left and right depending on whether you are looking at the chimney or the doorknob on the front door.
The horizon may also be rotated to alter the orientation of the image. This corresponds to the pictures you get by rotating a camera through some angle. The last parameter is just a scaling factor. A telephoto lens will produce the same result: a larger image. The Program
The program that incorporates the hidden surface algorithm is shown in Listing 1. I have tried to put all of the system-dependent items near the top of the program, and I point them out as we run in to them. The (.sup.*.$S.sup.*.) at the beginning is used to put the Apple Pascal compiler in swapping mode so that larger programs can be compiled. Apple Pascal gets routines for graphics and transcendental functions like sine and cosine from libraries called units. The two special Apple libraries that contain these functions are Turtlegraphics and Transcend. This explains the "uses" statement right after the program statement.
Functions like moving to a point on the graphics screen without drawing, drawing a line, and reading data will also be system-dependent. The eye-to-screen conversion will need to be changed if screen coordinates on your system differ from the 280 x 192 of the Apple. (Also note that the point (0,0) is in the lower lefthand corner in Apple Pascal.)
The program begins by reading in the data for the object to be displayed. The Readdata procedure expects to find the data for an object in a file called DATAFILE.TEXT on a disk in device #4 (slot 6, drive 1.) The data should be organized in the following manner:
1. The first line of the file should contain the number of faces of the object.
2. The object should have one vertex at the point (0 0 0). (All vertices in the file will be in the world coordinate system.)
3. For each face, an arbitrary starting vertex is chosen. X, Y, and Z-coordinates for this point are written on the next line of the file.
4. Successive lines of the file contain the remaining vertices of the face. These vertices should be specified in counterclockwise order when viewed from the outside, so that outward-facing normals can be computed.
5. The starting vertex is also the ending vertex, and is written into the file a second time.
6. A line consisting of -999 0 0 is written to indicate the end of information for that face.
7. Steps 3 through 6 are repeated for each face. Listing 2 shows the data for specifying a cube with edges of length 1.
After the data have been read in, the user is prompted for the viewing parameters. Those familiar with the unforgiving nature of Apple Pascal I/O will know better than to make any typing mistakes. World coordinates are used to specify a position from which to view the object and a point toward which the gaze will be directed. A horizon rotation and a scaling factor complete the information needed to compute the viewing transformation.
Once the transformation matrix is obtained, the screen is cleared, and the object is drawn. For each face, three points are transformed into eye coordinates and used to compute a normal. This vector is then dotted with the vector from the eye. If the result is positive, the rest of the vertices of that face are transformed and the edges drawn. Otherwise, the face is hidden, so it is not necessary to transform the rest of the points or draw the face.
Pascal is useful for breaking a programming task into smaller subtasks. Each of these subtasks is relatively easy to code, leading to an overall reduction of programming effort. In this program, I have used separate procedures for multiplying two matrices, transforming a single point, calculating dot and cross products, and so on. Extensions, Enhancements, Etc.
The most obvious extension to this program is to include data for your own shape, in the format outlined above. Be advised that this program does not do three-dimensional clipping. If you specify a viewing location too near the object, some of the edges may extend off one side of the screen and wrap around to the other. Details on how clipping may be performed can be found in reference [6].
Other enhancements include error-trapping during the user input and the usual speed and efficiency improvements. The format for the data file is quite redundant, sometimes leading to a single vertex being included six times in the file. Such inefficiencies were ignored so as not to clutter the program and to retain maximum clarity of the algorithm itself. Conclusion
The human brain is able to absorb and assimilate graphic information much more readily than numeric information. A perspective view of an architect's design is more readily understood than a textual description. Computer graphics can make the computer easier to use, as well as expanding its usefulness. We are only now beginning to see the impact that graphics can have on business, science and engineering, and the arts. The principles of transformations and coordinate systems discussed in this article will give you a good start toward understanding the graphics applications of tomorrow.