I like travelling. I like flying around world to see different culture and meet different people. Since Aug.28 2010, the date when I enter the U.S, I’ve been to more than 10 countries and taken more than 200 flights. One day I thought it would be pretty cool if I could plot all my travel history using R and make it an animation. The above GIF is what I got. It shows the date of flight, cities, distances, total distance since since Aug 2010, and it moves at a constant speed. Now let me show you how I did it.

Data

First thing first, I need all my travel history. I spent one night digging into my email, and found all my flight reservations and boarding passes. It looks like this if you put them into one file. You can download my data here to replicate all results.

Find the longitude and latitude

Now I need to find the longitude and latitude for all above airports. Package MUCflights contains all airport informations in the world, including city, country, longitude and latitude.

library(doSNOW)
library(MUCflights)
library(ggmap)
library(png)
library(dplyr)
library(geosphere)
library(data.table)

# get all airport information
data(airports)

Data cleaning

To match all my airport code with the IATA code in data airports, I need to seperate the From and To variables in my data Then I need to remove all duplicate cities, since I may arrive at one city and leave a city at different time, but they will be two records in my data.

# seperate data into long format
sepFromTo <- function(x){
result <- data.frame()
result <- data.frame(Date= rep(x[1], 2), IATA = c(x[2], x[3]) )
result
}

data$From <- as.character(data$From)
data$To <- as.character(data$To)

# seperate the From and To variables, make it long format
data_longlat <- apply(data, 1,  sepFromTo)
data_longlat <- do.call(rbind,data_longlat)
row.names(data_longlat) <- NULL

# remove duplicate data
data_longlat <- unique(data_longlat)
# using lag function to remove duplicate cities next to each other
data_longlat$nextone <- lag(data_longlat$IATA) == data_longlat$IATA data_longlat$nextone[1] <- FALSE # keep the first city
data_longlat<- data_longlat[ data_longlat$nextone == FALSE , ] # adding ID varialbes since I may take multiple flight on same day. data_longlat$id <- 1:nrow(data_longlat)

# find all airports I've been to in the airport data set
sublat <- airports[ airports$IATA %in% unique(data_longlat$IATA )   ,
c("IATA", "Latitude" , "Longitude", "City") ]

# merge my data with the sublat dataset by airport IATA name
data_longlat <- merge(data_longlat, sublat, by="IATA")
data_longlat <- data_longlat[order(data_longlat$Date),] # order the data by date data_longlat <- data_longlat[ order(data_longlat$id),] # order the data by ID

Plot data on Map

It is very easy to plot data on map once you have the longitude and latitude. Here I use Google map. ggmap can be used to plot the world map or reginal map. geom_point can be used to add points on the map.

# center of your figure
mycenter <- c( 2.359444 , 38.72528 )

# plot the map
p0 <-  ggmap(get_googlemap(center = mycenter, zoom=1, maptype= "hybrid",
size = c(640,360)), extent='device')

p1 <- p0 +
geom_point(x = data_longlat$Longitude[5], y = data_longlat$Latitude[5],
shape=21, fill="yellow", size=2) +
geom_point(x = data_longlat$Longitude[6], y = data_longlat$Latitude[6],
shape=21, fill="yellow", size=2)

p1

geom_text can be used to add text on the map. There is a problem. If two cities were too close, their name would overlap. We need to adjust the location of label based on two cities’ relative location. We want the city on the left with lable on the left, and the city on the top with the label on the top.

coord1 <- c(data_longlat$Longitude[6], data_longlat$Latitude[6])
coord2 <- c(data_longlat$Longitude[7], data_longlat$Latitude[7])

# indicator for Longitude
ind_long <-  ifelse( coord1[1] < coord2[1], 1, -1)

# indicator for Latitude
ind_lat  <-  ifelse( coord1[2]< coord2[2] , 1 ,-1)

p2 <- p1 +
geom_text(x=  coord1[1] - 15*ind_long ,
y=  coord1[2]  - 8*ind_lat,
label= data_longlat$City[6] , size=3, col="white") + geom_text(x= coord2[1] + 5*ind_long , y= coord2[2] + 5*ind_lat, label= data_longlat$City[7] ,
size=3, col="white")

p2

geom_segment can be used to add straing lines on the map. However, the flight route is never straight. It’s more like a grate circle. If we just use geom_segment, it will look like this.

# add straight line on the map
p3 <- p0 +
geom_point(x = data_longlat$Longitude[1], y = data_longlat$Latitude[1],
shape=21, fill="yellow", size=2) +
geom_point(x = data_longlat$Longitude[2], y = data_longlat$Latitude[2],
shape=21, fill="yellow", size=2)   +
geom_text(x=  data_longlat$Longitude[1] - 15*ind_long , y= data_longlat$Latitude[1]  - 8*ind_lat,
label= data_longlat$City[1] , size=3, col="white") + geom_text(x= data_longlat$Longitude[2] + 5*ind_long ,
y=  data_longlat$Latitude[2] + 5*ind_lat, label= data_longlat$City[2] ,
size=3, col="white")

p4 <- p3 + geom_segment(x = data_longlat$Longitude[1], y = data_longlat$Latitude[1],
xend = data_longlat$Longitude[2], yend = data_longlat$Latitude[2],
col='red', alpha=0.5)
p4

Great Circle

To make an animation like the GIF, we need to do two things. First is to get the great circle between cities, then break the whole route into small intervals. I found some useful information on line, and I got the get_paths function form here. get_paths function will get the great circle route between one point to all points in the data, and break each route into 52 segments. I will just use two points at a time to generate their great circle.

# get_paths function from here https://www.r-bloggers.com/animated-great-circles-on-rotating-3d-earth-example-r-code/.
#  get_paths function will get the great circle route between one point to all points in the data. I will just use two points at a time to generate their great circle.
get_paths <- function(x, idx, ...) {
gcInt <- function(x, x1, x2) {
x <- gcIntermediate(x[x1, ], x[x2, ], ...)
if (is.list(x)) {
x <- x %>% purrr::map2(c(x1, x1 + 0.5), ~data.frame(.x, .y)) %>%
bind_rows %>% setnames(c("long", "lat", "group"))
} else x <- data.frame(x, x1) %>% setnames(c("long", "lat", "group"))
x
}
purrr::map(setdiff(1:length(x), idx), ~gcInt(x, .x, idx)) %>% bind_rows
}

allpath <- data.frame()
for ( i in 2: nrow(data_longlat) ){

# We need two point at a time.
test <- data_longlat[  (1:2)+i-2  ,c("Longitude", "Latitude",  "Date", "City")]
colnames(test)[1:2] <- c("lon", "lat")

# genderate the spatial points of two cities
p <- SpatialPoints(cbind(test$lon, test$lat), proj4string = CRS("+proj=longlat +datum=WGS84"))
idx1 <- 2  # great circles from coords in all other rows to coords in this row

# get the path between two cities
paths1 <- get_paths(p, idx1, addStartEnd = TRUE)

# other information we may need in the plot
paths1$Date <- test$Date[1] # date
paths1$City <- paste0(test$City[1]," - ",test$City[2]) # Route paths1$start <- test$City[1] # Departure city paths1$end <- test$City[2] # Arrival city paths1$group <- i-1 # the Xth flight

# calculate the distance between two cities
paths1$truedis <- rep(distm(p[1,], p[2,], fun = distHaversine), 52)/1000 allpath <- rbind(allpath, paths1) } Get the number of data point based on the distance. get_paths function will break each distance into 52 intervals. It is a problem if two points are too close, which means a flight with 1,000 miles and a flight with 10,000 will have the same time length in our animation. What I want is the length of time(number of intervals) are proportional to its distance. Hence, we need to get the weight of each flight segment. # calculate the weight based on the distance compare to the longest distance # the longest flight will have weight 1 with 52 intervals. allpath$distance <- allpath$truedis / max(allpath$truedis)

suball <- allpath[ seq(1,nrow(allpath), by=52) ,]

# get the cumulate distance
alldist <- c(0, cumsum(suball$truedis)) allpath_new <- data.frame() for (i in 1:(nrow(data_longlat) -1)){ temp_old <- allpath[allpath$group == i ,] # subset data

mydist <- temp_old$distance[1] # sample data based on the weight, all flight will have the first and last point allunif <- sort(c(1,52, sample( 2:51, round(50*mydist) ))) temp <- temp_old[allunif,] allpath_new <- rbind(allpath_new, temp) } allpath_new$id <- 1:nrow(allpath_new)

allpath_new[,c(1,2,8,9)] <- round(allpath_new[,c(1,2,8,9)],2)

DT::datatable(allpath_new[1:100,])

Make it an animation

Now we are ready the make an animation. The trick to make animation in R is that you get tons of pictures, and compress them into GIF or MP4 files. I parallelize the code on cluseter to make if faster. For more detail, you can see my post about “High performance computing in R using doSNOW package”.

Save multiple PNG file

# Do not run this on your computer!
# You can try a smaller sample size

NumberOfCluster <- 12
cl <- makeCluster(NumberOfCluster) # Make clusters

registerDoSNOW(cl) # use the above cluster
foreach (i=1: (last(allpath_new$group) -1) ) %dopar% { library(MUCflights) library(ggmap) library (png) library(dplyr) library(geosphere) library(data.table) # subset of each flight temp <- temp_old <- allpath_new[allpath_new$group == i ,]
mydist <- temp_old$truedis[1] # center of the plot mycenter <- c( 2.359444 , 38.72528 ) mydate <- temp$Date[1]
mycity <- temp$City[1] # p0 is the same for all loop, you can save p0 and read it instead of read it from Google everytiem p0 <- ggmap(get_googlemap(center = mycenter, zoom=1, maptype = 'satellite', size = c(640,320)), extent='device') # add flight information, date, city, distance p1 <- p0 + geom_text(x=mycenter[1] , y=mycenter[2] - 80 , label=paste0(mydate, "\n", mycity, "\n", round(mydist,2), " KM"), size=3, col="white") + geom_point(x= temp_old$long[1], y= temp_old$lat[1], shape=21, fill="yellow", size=2) + geom_point(x= last(temp_old$long), y= last(temp_old$lat), shape=21, fill="yellow", size=2) # indicater of label ind_long <- ifelse(temp_old$long[1] < last(temp_old$long), 1, -1) ind_lat <- ifelse(temp_old$lat[1] < last(temp_old$lat), 1,-1) p1 <- p1 + geom_text(x= temp_old$long[1] - 5*ind_long , y= temp_old$lat[1]-5*ind_lat , label= temp_old$start[1] , size=3, col="white") +
geom_text(x= last(temp_old$long) + 5*ind_long , y= last(temp_old$lat) + 5*ind_lat,
label= temp_old$end[1] , size=3, col="white") p1 <- p1 + geom_segment(x = temp_old$long[1], y = temp_old$lat[1], xend = last(temp_old$long),
yend = last(temp_old$lat), col='red', alpha=0.5) # loop for all intervals for (j in 1:nrow(temp)){ x <- temp[j , ] if ( j >= 2) { xp <- temp[j-1 , ] df <- data.frame(x1 = x$long, x2 = xp$long[1], y1 = x$lat, y2 = xp$lat[1]) # For flight across the boundaries of map, we don't want to plot this two points if ( abs(x$long-xp$long[1]) < 300 ){ p1 <- p1 + geom_segment(x = x$long, y = x$lat, xend = xp$long[1],
yend = xp$lat[1], col='red', alpha=0.5) } } # adding total distance up to this interval p2 <- p1 + geom_point(data = x,aes(x = long, y = lat), colour = 'red', alpha=0.7) + geom_text(x= 160 , y= 80 , label= paste0("Total Distance: ", round(alldist[i] + (j-1) * mydist/nrow(temp),2), " KM" ) , size=3, col="white") # save png files, png(sprintf(paste0( "myflight_%05d.png"), x$id), width = 640,
height = 320, res = 100, bg="black")
print(p2)
dev.off()

}
}
stopCluster(cl) # close clusters

Convert PNG to GIF/MP4

Finally, once you have all the plot, you can compress it using ffmpeg. More detail about mapmate can be found here.

# https://leonawicz.github.io/mapmate/index.html
p <- "myflight_%05d.png"
out1 <- "flight_50.mp4"
out2 <- "flight_50.gif"

mapmate::ffmpeg(pattern = p, output = out1 , rate = 50)
mapmate::ffmpeg(pattern = p, output = out2 , rate = 50)