Friday, June 5, 2015

Inverse Geographic Mapping: Experimental Setup

Previous posts:

This is another post exploring the attempt to map from a set of distance features back to an x, y coordinate system. If you haven't already read it, you may want to start by reading the Introductory post in the series and work your way through. This post will assume knowledge of what was presented in the two previous posts in the series.

Experimental Setup

Because the data I have from Kaggle does not include any x, y coordinate system it may be difficult to discern whether my approach is effective. In practice there are things that can be done with the Kaggle data that might provide some indication of correctness, but it will be simpler to create my own data sets while doing the initial development and testing of my approach. Below I will describe a couple of steps in setting up my practice data.

The basic setup:

The first step is to randomly generate a set of reference and sample points. Sample points are generated with x and y position values from a normal distribution with mean of 0 and standard deviation of 1000, and reference points with position values from a normal distribution with mean of 0 and standard deviation of 2000. At different times I generate different amounts of practice points, but for demonstration, here is a graph of 10 test points with 1 each of the three reference point types.

The distance between each sample point and each reference point is calculated to get the "horizontal distance to..." fields:

IdHorizontal_Distance_To_Fire_PointsHorizontal_Distance_To_HydrologyHorizontal_Distance_To_Roadways
03095.8435402182.2406193553.646848
14156.9004971200.9770464765.299586
2643.0746604513.1322924881.010825
31820.1766894114.3329545843.656280
42504.9936513740.1131196036.146318
(The distance fields for the first 5 sample points in the generated set.)

The practice data is now in the form of the Kaggle data. I can run it through my process, take the x and y coordinates that are output and compare them to the original positions of the practice data. There will be no way for the process to determine correct orientation or absolute position, but if it works properly it should find points that are the same locations as the original up to rotation and/or reflection and a translation.

Further Steps:

The setup above is about as simple as I can make it and is a good starting place. As I continue to develop the process I will also need to consider how it handles situations where there are more than 1 of each reference point type. For example, what happens when there are 3 water points? Does it matter if they are far apart/close together/in a line? What happens near the boundaries when a sample point is close to the same distance from 2 or more of the water points? It is straight forward to extend my practice set generator to include additional reference points, and I will likely include an example of that when I get to exploring the process at that level.

I may need to extend the practice set generation even further as I continue to iterate between getting the process running on the practice data and seeing how it performs on the actual data. The next step for the blog though is to start looking at the basics of how my current approach works with just the simple practice setup.

Monday, June 1, 2015

Inverse Geographic Mapping: Initial Analysis

Initial Analysis:

(The introductory post)

While the plot in the introductory post made it seem like the data might allow for a mapping from the distance fields to an x, y coordinate system, I wanted to double check the idea before proceeding. To do so I looked at the dimensions of the inputs and outputs for the proposed mapping. If the dimension of the information being input is smaller than the dimension I would like the inverse process to output, that would suggest the idea is untenable.

Looking again at a couple of sample points:
IdElevationVertical Distance To HydrologyHorizontal Distance To HydrologyHorizontal Distance To RoadHorizontal Distance To Fire Points
22590-62123906225
52595-11533916172

Each sample point contributes three pieces of individual data directly related to re-creating x, y coordinates for the points: the three horizontal distance fields. If I have a group of n sample points with the same 3 reference points, then there will be 3n pieces of information. For the output I will need 2n pieces of information for the sample points (their x, y coordinates), plus 6 total for the x, y coordinates of the 3 reference points. So if 3n is at least as big as 2n + 6 it seems plausible that we might have enough information to perform the inversion - this should be the case for n at least 6. In practice it is a bit more complicated than that since these are not linear systems, but for me it was enough justification to give it a try.

While proceeding, I will make the assumption that the reference points can be treated as discrete points. For example, if a set of points have a pond as their closest water source my assumption is that the distances were all measured to the same location, rather than each measuring to whichever point on the pond edge is closest to the sample location. I am not certain if this assumption is warranted and if not whether I will be able to adjust my algorithm to compensate.

My next post should be a description of my experimental setup to test my process in a controlled system.