Visualizing Geo IP Information using Python

As part of the #OpenSOC event Recon InfoSec recently conducted, we wanted to visualize where all of our participants were coming from. We had several data points to work from, and there are plenty of open tools available, so it is just a matter of cobbling those items together to create a sweet, sweet map.

Geolocating IP addresses

A simple resource we found for mapping a physical location to an IP address are the free Maxmind databases. Maxmind provides their "GeoLite2" databases for the cost of creating an account. For example, in our case, we wanted to draw a world map indicating where participants were coming from. Maxmind provides a Country database which we downloaded and queried against.

They also provide a solid Python library on Github which you can use to easily interact with the databases.

In this case we use the database and Python module as such:

import geoip2.database
reader = geoip2.database.Reader('GeoLite2-Country.mmdb')
iso_code = reader.country(ip_address).country.iso_code

This returns an Alpha-2 ISO code which we can later use to fill a map.

Convert to ISO-3

The plotly map we are using requires the ISO-3/Alpha-3 code versus the Alpha-2 code generated from Maxmind. Luckily, there is a module for this conversion: pycountry.

import pycountry

alpha_3 = pycountry.countries.get(alpha_2=iso_code).alpha_3

If we bin these codes together, we have a distribution of participants by country:

if alpha_3 not in iso_codes:
  iso_codes[alpha_3] = 0
iso_codes[alpha_3] += 1

What about VPNs?

We assumed, given our audience, that many of our security conscious participants would be using a VPN of some sort. Maxmind provides an Anonymous IP database which aims to tackle exactly this situation, but since WE LOVE OPEN SOURCE EVERYTHING!!, we wanted to figure this out via other methods. There are several Github repos which attempt to tackle this in several different ways.

ASN checking

If an IP address is originating from Amazon or Google Cloud, there is a reasonable probability that the endpoint is a VPC terminating a VPN. So, let's try to find the ASNs for our IPs, and maps those against known cloud provider ASNs. Again, Maxmind offers a database for mapping IP to ASN.

reader = geoip2.database.Reader('GeoLite2-ASN.mmdb')
response = reader.asn(ip_address)
org = response.autonomous_system_organization
number = response.autonomous_system_number

ASNs don't tell us much without context, but there are several Github repos which maintain a mapping of Cloud/Colo ASNs (e.g. https://github.com/brianhama/bad-asn-list). Find one to your liking, and use it to get a better idea of which addresses may be coming from a cloud provider.

if str(number) in cloud_asns:
  if org not in likely_vpns:
    likely_vpns[org] = 0
  likely_vpns[org] += 1

Additionally, a simple keyword search will reveal some of the bigger players:

for org in as_organizations:
  if any(keyword in org.lower() for keyword in ["google-cloud", "aws", "amazon"]):
    if org not in likely_vpns:
      likely_vpns[org] = 0
    likely_vpns[org] += 1

At the end of this process, we have a binned grouping of ASNs with count.

Known VPN/TOR IPs

There are also instances of known VPN services and Tor exit nodes which may not appear in these lists and are unrelated to ASNs. For example https://github.com/ejrv/VPNs provides a list of these IP addresses and the Tor Project maintains a list of exit nodes at https://check.torproject.org/torbulkexitlist. We can grab these lists and check the IP addresses against them as well.

with open("tor_list.txt", "r") as f:
  tor_ips = f.read().splitlines()

if ip_address in tor_ips:
  if "tor" not in likely_vpns:
    likely_vpns["tor"] = 0
  likely_vpns["tor"] += 1

At this point, we have a decent list of addresses we should exclude from the country mapping results.

Preparing a map

We chose to use the Plotly Graphing Libraries to build a "Choropleth" (think "map colored by value") (and a new term I frequently use to sound smart.)

In order to draw a map, you need to provide the graphing library a set of geospatial data which draws the borders, etc. These are often available as a .json file (e.g. https://github.com/topojson/world-atlas) which plotly can ingest. We now have a list of countries and the appropriate data to start building a map.

Mapping

As an output from the previous steps, stored the data as a simple .csv which looked something like:

Country,Particpants
USA,400

Plotly has a requirement for the pandas library so it is easy to read this data in:

import pandas as p

df = p.read_csv("countries.txt", names=("country", "participants"))

Then we can read in the aforementioned geo data:

import json

countries = {}
with open("countries.geo.json", "r") as f:
  countries = json.load(f)

With our data loaded, we can now build the map:

fig = px.choropleth(df.sort_values(by=["participants"], ascending=False), 
    geojson=countries,
    locations="country",
    color="participants",
    range_color=(0, max(df["participants"])),
    scope="world",
    projection="natural earth",
    hover_data=["participants"])

Let's walk through some of the parameters listed there:

  • df.sort_values(by=["participants"], ascending=False): df is our pandas dataframe (the data) and we want to sort it descending by the number of participants (per country)
  • geojson=countries: countries is our geospatial data we read in earlier which helps plotly draw the borders
  • locations="country": this is the parameter plotly uses to color. E.g. is uses the "Alpha-3" values in the country column to color the individual locations present in the data
  • color="participants": this colors each location (country) based on the number of participants
    • Option 2: color="country": this will color each country as a discrete color versus on a scale
  • range_color=(0, max(df["participants"])): the range for the colors should span 0 to the maximum number of participants
    • Note: This gets a little wonky if you have an outlier on the upper end.
  • scope="world": we want a world map
  • projection="natural earth": use a rounded projection vs a big, flat box projection
  • hover_data=["participants"]: plotly automatically creates a pretty sweet interactive map. This allows us to see the participant count when we hover over a country

There are a TON of customization and configuration options available so you can bend the map to your will.

voilĂ 

A customized map!

map-output

Interested in More Recon Goodness?

We run the Network Defense Range, a hands-on experience with the most significant threat groups and attacker techniques on a live, fully functioning network. We will be at Blackhat (again) twice in 2020!

Also, check out OpenSOC (or on Twitter #OpenSoc) to get the latest on the events we are running (like in the Blue Team Village at DEF CON 28 this year!).

Show Comments