Ariel Rodriguez Romero

Get route information from Google Flights

This next December (12/2019), the US government announced that they will ban flights from the US to all cities in Cuba except for the US.

This gave me an idea to create a visualization to shows the impact this could have on those smaller airports.

One of the main challenges was getting information about the flight routes that go into Cuba. I found a route database, but it almost has no information about Cuba - which wasn’t a surprise.

Google Flights was another alternative. If you add a starting location and zoom out, it gives you all the destinations in the world from that point.

google flight search

My first idea was to build a scrapper and get all the flight information, however that seemed overcomplicated, especially because I only needed the routes for the top 6 airports in Cuba. So I came up with a different strategy.

I created a function scrape() that when executed in the console, saves all the destination data from the sidebar and puts it into a global object named result.

var result = {};
function scrape() {
  let code = document.querySelector('.gws-flights-form__iata-code').textContent;
  let date = document.querySelector('.gws-flights-form__date-content')
    .textContent;
  if (!result[code]) {
    result[code] = {};
  }
  result[code][date] = [
    ...document.querySelectorAll('[jsname="destinationCard"]')
  ].map(element => {
    let destination = element.querySelector('h3');
    let duration = element.querySelector('[class$="__duration"]');
    let price = element.querySelector('[class$="__price-row"]');
    return {
      price: price && price.textContent,
      destination: destination && destination.textContent,
      duration: duration && duration.textContent
    };
  });
}

This function is fairly simple, with 20 lines of code and some manual work was all I needed to get the information I wanted. After running it in the console, I had to execute scrape() every time I wanted to save something. Then, I copied the result object into the clipboard and saved it in a JSON file.

copy(result);

I used the copy function to extract the object and copy it into the clipboard.

Conclusions

There was some manual work involved to get the data using this strategy. Since I wanted to get information for 6 airports and 7 days. However, I’m sure that writing a scrapper would have taken me more time and this just involved calling scrape() 35 times.

I liked the resulting data visualization, and it was a nice way to learn how to work with geodata and ggplot. The entire notebook is published here.

The number of destinations from Havana impressed me.

Havana flight

Related posts

🐞 Diagnosing a bug when we were running tximport

I’m still familiarizing myself with the backend of refine.bio. Here I dive deep trying to find what was causing an issue that prevented us from processing some experiments with tximport.

Learning DB optimizations

I haven’t done any db optimizations in the past, here I describe the process and the decisions I made with one.

Calculating advanced aggregations with Elastic Search and Django

There are several packages that depend on each other and can be used to set up elastic search in a Django project. Here I mention the relationships between them and how to extend them to calculate complex aggregations.