### Previous posts:

This is another post exploring the attempt to map from a set of distance features back to an x, y coordinate system. If you haven't already read it, you may want to start by reading the Introductory post in the series and work your way through. This post will assume knowledge of what was presented in the previous posts in the series.

## The Process:

I will be looking here at a relatively simple version of the problem (just one set of reference points and a small number of sample points). The process I use here will become a component of the process for more complex versions.

I will start with data that looks like:

Id | Horizontal_Distance_To_Fire_Points | Horizontal_Distance_To_Hydrology | Horizontal_Distance_To_Roadways |

0 | 3212.763650 | 174.497793 | 5159.179288 |

1 | 3135.697623 | 783.345951 | 5433.928096 |

2 | 5247.581151 | 3504.840499 | 7829.386517 |

3 | 1532.853280 | 2141.195728 | 4107.445635 |

4 | 1824.806584 | 2114.739118 | 4413.043552 |

Which comes from samples that were originally positioned like:

### Strategy

My strategy will be to develop a cost function for fitting coordinates to the samples and use gradient descent to try get the cost function to zero (or close to it) and hopefully that will result in a good set of coordinates.

### Gradient Descent

If you are unfamiliar with gradient descent, it can be thought of as though the function being minimized describes a landscape. If you were placed randomly on an unknown landscape and were blindfolded (and didn't have to worry about falling down cliffs), how might you go about trying to find the lowest point? One option would be the following procedure:

- Determine what direction would result in the steepest downhill step
- If every direction is uphill then stop
- Otherwise take a step in the steepest downhill direction
- Go back to step 1

When the procedure ends, you will be at a local low point. Under some conditions you might be able to guarantee that it is the lowest point on the landscape, but often that will not be the case. If it is low enough and you don't care if it is the absolute lowest you can stop and be happy, otherwise you might start at different places and repeat the procedure, picking the smallest of the local minima you find. That's the general idea of the gradient descent algorithm - it uses calculus to find the steepest direction to move and algebra to take its steps.

### Cost Function

In this context, the cost function should give me an idea of how well my chosen x, y coordinates fit the data I was given. The cost of a perfect fit should be 0 and no cost should be negative. It is also helpful if the cost function is differentiable (so the gradient can be found). It is possible to come up with multiple cost functions that will work for our purposes, some may be simpler or better performing than others. I chose a cost function that met the above requirements and kept the calculus and algebra relatively straightforward.

To motivate the cost function I use, let's start with an example of an individual point and reference point:

Suppose my first sample point has horizontal distance to fire of 1500 and my hypothesis is that the sample location is at (100, 200) and the nearest fire point is at (800, 1200). Using those points, I can find the hypothesized distance to fire by:

The amount the hypothesis is off by is 1500 - 1220.6 = 279.4. This is the basic idea I want to capture in the cost function, but have to be a little careful to ensure I don't get a negative value (e.g. if the hypothesized distance was 1700 then 1500 - 1700 = -200). One way I could deal with this is by taking the absolute value of the difference, but that is a little complicated to deal with when it comes to the calculus. The option I chose is to square the difference: this satisfies our requirements and keeps the calculus and algebra reasonable, though it makes the cost function value less intuitive.

The overall cost function takes the idea above and applies it to all of the hypothesized sample/reference point locations. Written out, the part of the cost function related to the fire point would look like:

To get the total cost I would need to also add in the parts corresponding to the water and road points. In practice I also multiply by a scaling factor related to the number of points in the set. This helps make costs comparable across sets of different sizes.

### Implementation

Initially I implemented the gradient descent algorithm in python. It worked okay, but there are a couple of finer points that can be tricky to fine tune well. Eventually I switched over to using the fmin_bfgs function from the scipy.optimize library which works similarly to gradient descent. This left me with less control for fine tuning, but the time savings in not having to fine tune as much allowed me to focus on other parts of the problem.

## Results

After running the gradient descent algorithm, I can compare the plots of the hypothesized points to the true points. Since the hypothesized points may be different up to translation, rotation and/or reflection (but not stretches!) and still be a correct fit, when comparing the plots I may need to perform a translation, rotation and/or reflection. To start with, for simplicity I translate both sets of points so that the fire reference point is at (0, 0). In the plots below the original points are circles and the hypothesized points are Xs.

It looks like the points match up pretty well but need a rotation to really fit - no reflection appears to be necessary this time. After rotating an appropriate amount I get:

Looks like a good fit!