This web page is sloppy. A nicer presentation of the same material is available as a Dartmouth Technical Report. |
Our idea was to somehow print a pattern on the floor of the robots' environment. We were considering printing a pattern on large electrostatic plotters and sticking the sheets to the floor of our lab. Then robots that wanted to navigate the system would have a low-resolution black and white video camera pointed at the floor. From any position in the room, the robot should be able to capture a video frame and compute its position and orientation.
Keith began looking into this problem by trying to generate fields of black and white cells with the property that a given section of the image would be unique with respect to any translation. He wrote a program to generate the field in a brute-force way, by generating new random bits and checking each against every existing bit pattern, but that turned out to be pretty slow. (It may be more feasible with more thoughtful data structures, however.)
Keith described his approach to me around April of 1997, while I was taking
an course on image warping.
I came up with an alternative plan that required three colors: black and
white for the data channel, and grey for a control channel.
In the sample system at left, the ternary cells are arranged
into 5x5 super-cells. Three of the cells are grey control bits, sixteen are
black or white data bits, and six are unused (always white).
Eight data bits are used for the x dimension, and eight for y.
Here is how a video frame is analyzed to determine the camera's (and hence
the robot's) position and orientation. At left is a video snapshot of a printed
section of the pattern, taken by Clyde from the Dartmouth Robotics Laboratory.
First, artifacts from the capturing process are removed. The black band
at the top of the image is cropped, and image is widened to
correct the aspect ratio (so that squares in the pattern appear as squares in
the image). Barrel distortion is a common artifact of cheap wide-angle lenses,
it causes features at the edge of the image to appear smaller than identical
features at the center. This distortion is removed, leaving a resulting
image (at left) that appears flat.
Each pixel value in the image is threshholded to one of the
three possible values white, grey, or black. (not shown)
The purple vector at the center of the image is the point and orientation
whose absolute (world-space) coordinates the algorithm will determine.
The image is then run through an edge-detection convolution (filter) that
highlights edges of sharp contrast. The edge-detected image is cropped
to a circle so that no orientation of line has a disproportionate contribution
to the next step.
A Hough transform
maps the points in the edge-detected image into a
dual space in which each point represents a line in the original image.
The brightest points in the Hough map correspond to the lines in the
original image that pass through the most edge-detected points.
The result is a set of equations for lines (rather than just pixels
that look like lines to the human eye). These are drawn in over the
image at left.
The slope of the lines tell us how far we are rotated from one of
the four axes (or compass points); the nearest line intersection is
used to discover the boundaries between the squares.
Both corrections (the rotational correction between 0 and 90 degrees, and the
translational correction to put the corner of a square at the origin) are
applied to the image, and their effect stored in a matrix so we can
later recover the original position of the camera.
The image at left is an 80 degree counterclockwise rotation of the previous
image.
At this point, we have found a five-by-five set of cells in
the image, and we know our position relative to those cells. The centers
of each cell is averaged and threshholded to determine the cell's color. The
red bars show the parts of the image that are discarded to avoid color
slopping in from a neighboring cell.
The control (grey) bits uniquely identify the orientation of the image. In a canonically oriented supercell, Control bits are interleaved with data bits this way:
ddddd Cdddd ddddd ddddd CddCdThe lower-left control bit is called the "base."
The control bits are tiled uniformly across the floor pattern, and so we can pretend that the control bits in the image wrap around the edges of the image; that is, we can do pixel math modulo 5. At any orientation, the base control bit is the only one with the property that two control bits appear at base+[3,0]T and base+[0,3]T. This test identifies the lower-left bit in the image above as the control bit, and further indicates that the image is 180 degrees rotated from the canonical orientation. This rotational correction is applied (and stored for later), giving this new 5x5 set of cells:
---#- ---#- -#C#C #--#- ----CWe know that the upper-right C is the base control bit, and so we are actually looking at the junction of four supercells:
---# ---# -#C# |
- - C |
#--# ---- |
- C |
y3 y2 x7 y1 y0 C x6 x2 y7 y6 x5 y5 y4 x4 x1 C x3 C x0 |
y6=0 x5=0 y5=0 y4=1
.... x4=0 .... x1=1 .... x3=1 .C.. x0=1 |
y7=0
.... .C.. |
y2=1 x7=0 y1=0 y0=1
.... x6=0 .... x2=0 |
y3=0
.C.. |
The y-coordinate is a bit trickier. Again, y7 from the neighboring supercell to the right can still be used, because that supercell has the same y-coordinate. But the least-significant y bits appear in the supercells in the row below. Therefore, we extract y3|y2|y1|y0, add one and bitwise-AND with 0x0f to figure out what the LSB y bits would be in our target cell if we could see them. In this case, y3..y0 for our target cell should be 01012 + 1 mod 1 = 01102. Combined with the most significant bits, we arrive at a y-coordinate of 000101102 = 2210.
Cells in this implementation are 1cm x 1cm, so the supercell at (11,22) is at (55,110) centimeters from the origin. We add (1,-2) centimeters to account for the fact that our view was not centered over the supercell, but over a point 1 cell to the right and 2 cells below center.
During this process, we have composed a single matrix which accounts for all of the transformations made to the image or detected within it:
Now, a sufficiently-long line segment of any orientation will read at least one (x,y)-tuple channel, and at least one strictly-increasing signal. The tuple directly communicates the robot's position. The arctangent of the relative ``lengths'' of the two increasing signal channels gives the angle of the line segment's path.
That line-segment could be provided by an input device as simple as a single reflective optical sensor, perhaps using the robots' wheel shaft encoders to more easily normalize the signal.