dev-resources.site
for different kinds of informations.
Reverse engineer unknown data format using elixir
I recently worked on porting the csv file converter for AdventureWorks database from ruby to elixir. This ended successful and I have all the data in my local postgresql to play with. By browsing the tables I found spatial_location
column in person.address
table and I wanted to decode it.
|address_line1 |city |spatial_location |
|--------------------|-------|--------------------------------------------|
|1970 Napa Ct. |Bothell|E6100000010CAE8BFC28BCE4474067A89189898A5EC0|
|9833 Mt. Dias Blv. |Bothell|E6100000010CD6FA851AE6D74740BC262A0A03905EC0|
|7484 Roundtree Drive|Bothell|E6100000010C18E304C4ADE14740DA930C7893915EC0|
|9539 Glenside Dr |Bothell|E6100000010C813A0D5F9FDE474011A5C28A7C955EC0|
|1226 Shoe St. |Bothell|E6100000010C61C64D8ABBD94740C460EA3FD8855EC0|
|1399 Firestone Drive|Bothell|E6100000010CE0B4E50458DA47402F12A5F80C975EC0|
|5672 Hale Dr. |Bothell|E6100000010C18E304C4ADE1474011A5C28A7C955EC0|
|6387 Scenic Avenue |Bothell|E6100000010C0029A5D93BDF4740E248962FD5975EC0|
|8713 Yosemite Ct. |Bothell|E6100000010C6A80AD742DDC4740851574F7198C5EC0|
As you can see the data looks like binary data.
I found even some documentation from microsoft that may be useful here but in the end I could not use it with luck.
So what I end up doing to decode it.
First what I know and may be useful:
- I got the city and address
- I know that the data is binary encoded as string because there is nothing highier than F.
- The string is 44 letters long - every byte is encoded using 2 letters so I have 22 bytes of data
Lets jump to iex:
iex(1)> encoded = "E6100000010C6A80AD742DDC4740851574F7198C5EC0"
"E6100000010C6A80AD742DDC4740851574F7198C5EC0"
iex(2)> String.length(encoded)
44
iex(3)> decoded = Base.decode16!(encoded)
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
25, 140, 94, 192>>
Next thing that I did was to find what exactly can be hidden in my data:
The beginning and end look the same between the first lines I found - it may be some kind of wrapper and I may need to skip it.
I also searched for the actual coordinates of Bothell to see if I can see some similarities:
It looks like the data encoded is around 47.5
and -122.5
so let's move back to iex.
iex(3)> decoded = Base.decode16!(encoded)
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
25, 140, 94, 192>>
iex(4)> <<47.0::float>>
<<64, 71, 128, 0, 0, 0, 0, 0>>
iex(5)> <<47.99::float>>
<<64, 71, 254, 184, 81, 235, 133, 31>>
I tried how float 47 looks like in binary form. It starts with
64, 71
and has 8 bytes. Looking at the encoded data I have 64 and even 71 - just in reverse order, is it a coincidence??
Let's try with -122
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
25, 140, 94, 192>>
iex(7)> <<-122.0::float>>
<<192, 94, 128, 0, 0, 0, 0, 0>>
iex(8)> <<-122.9::float>>
<<192, 94, 185, 153, 153, 153, 153, 154>>
192, 94
- again I can see similar pattern but in reverse order at the end.
It turn's out that I can use the little endian modifier to create similar looking data:
iex(9)> <<-122.9::little-float>>
<<154, 153, 153, 153, 153, 185, 94, 192>>
Hmm - definitely looks similar. Especially that it's 8 numbers from the end and If I look at the encoded data then on the 9th position I have 64 which is the possible beginning of 47.0 encoded as binary. When removing this data I end up with 6 bits at the beginning. Back to iex:
iex(10)> <<_::binary-size(6), x::little-float, y::little-float>> = decoded
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
25, 140, 94, 192>>
iex(11)> x
47.7201372000862
iex(12)> y
-122.189084876407
Was it really so easy??
Close but I can't see the street 8713 Yosemite Ct.
Lets try with another one:
6387 Scenic Avenue, Bothell
iex(15)> <<_::binary-size(6), x::little-float, y::little-float>> =
iex(..)> Base.decode16!("E6100000010C0029A5D93BDF4740E248962FD5975EC0")
<<230, 16, 0, 0, 1, 12, 0, 41, 165, 217, 59, 223, 71, 64, 226, 72, 150, 47, 213,
151, 94, 192>>
iex(16)> x
47.7440139824339
iex(17)> y
-122.372386833918
Again it's not a complete hit. Let's se if we can be more specific?
From the ms docs I found that the encoded data starts with SRID
SRID (4 bytes): (32 bit integer) The SRID for the geography. GEOGRAPHY structures MUST use
SRID values in the range of 4120 through 4999, inclusive, with the exception of null geographies.
A value of -1 indicates a null geography. When a null geography is indicated, all other fields are
omitted. Default SRID for GEOGRAPHY instances is 4326. Default SRID for GEOMETRY instances is
zero (0). For GEOMETRY instance, SRID can be any value: SRID is not constrained.
Let's againg try to decode it, this time it's an integer.
iex(35)> <<srid::little-integer-size(32), _::binary>> = Base.decode16!("E6100000010C0029A5D93BDF4740E248962FD5975EC0")
<<230, 16, 0, 0, 1, 12, 0, 41, 165, 217, 59, 223, 71, 64, 226, 72, 150, 47, 213,
151, 94, 192>>
iex(36)> srid
4326
This is in the 4120..4999 range and after short google search it turns out that it's a standard encoding that is used by openstreetmap.
Let's check how far are we off at least?
Ohh - maybe that's the problem? Let's try another one:
š¤¦š¤¦š¤¦š¤¦š¤¦š¤¦š¤¦
It turn's out that the data is synthetic so the exact location won't be shown. But it does not mean that it was not worth checking out how to do this. I definitely learned something along the way and I hope so are you when reading it.
Featured ones: