This section lists some examples of public HTTP APIs that publish data in JSON format. These are great to get a sense of the complex structures that are encountered in real world JSON data. All services are free, but some require registration/authentication. Each example returns lots of data, therefore not all output is printed in this document.
library(jsonlite)
Github is an online code repository and has APIs to get live data on almost all activity. Below some examples from a well known R package and author:
hadley_orgs <- fromJSON("https://api.github.com/users/hadley/orgs")
hadley_repos <- fromJSON("https://api.github.com/users/hadley/repos")
gg_commits <- fromJSON("https://api.github.com/repos/hadley/ggplot2/commits")
gg_issues <- fromJSON("https://api.github.com/repos/hadley/ggplot2/issues")
#latest issues
paste(format(gg_issues$user$login), ":", gg_issues$title)
[1] "tdsmith : stat_binhex weight aesthetic is undocumented"
[2] "huelf : geom_contour throws error when ggplot aes uses shape"
[3] "davidkretch : Fix typo in Extending ggplot2 vignette"
[4] "janschulz : data used in geom_text with geom specific aes is longer than it should be?"
[5] "janschulz : Add top margin to default plot.caption"
[6] "wch : Preserve class when adding uneval objects"
[7] "coolbutuseless : corrected theme inhertiance for legend.key.size"
[8] "has2k1 : Fix geom_dotplot y limit calculations"
[9] "shreyasgm : Typo in error message while saving large plots"
[10] "DarwinAwardWinner : Ordering of multiple legends is seemingly random"
[11] "noamross : geom_dotplot extends past axis"
[12] "aosmith16 : geom_dotplot negates free y scale from facet_wrap"
[13] "lfaller : How to customize x-axis labels for a stacked bar plot graph?"
[14] "kfeng123 : How can I axis.ticks while using theme_minimal()?"
[15] "richardbeare : Design query - connectograms"
[16] "gcpoole : Plotting SpatialLinesDataFrame connects all of the individual lines"
[17] "zeehio : [Feature Request] different scales by facet"
[18] "salauer : geom_smooth parameters no longer available in qplot"
[19] "luczkovichj : Contour labels using direct.label"
[20] "coolbutuseless : geom_hex/stat_hex binwidth argument no longer working"
[21] "HughParsonage : annotate() won't expand when position aesthetics are length 1"
[22] "whao89 : geom_hex no longer recognizes ..density.. in 2.1.0"
[23] "paul4forest : X axis doesn't appear below facet_wrap plot with uneven number of facets"
[24] "DarioS : Allow a Vector of Numbers for label_wrap_gen's Width Parameter"
[25] "coolbutuseless : stat_bin_2d creates an (almost) unkillable legend."
[26] "dutri001 : Online documentation: HTML table not properly generated from Latex doc "
[27] "krlmlr : Conditional examples are not rendered"
[28] "juliangehring : Facets break grouping of data points"
[29] "holgerbrandl : segfault on macos when using ggsave in parallelized loop "
[30] "JestonBlu : Added new economic theme"
A single public API that shows location, status and current availability for all stations in the New York City bike sharing imitative.
citibike <- fromJSON("http://citibikenyc.com/stations/json")
stations <- citibike$stationBeanList
colnames(stations)
[1] "id" "stationName"
[3] "availableDocks" "totalDocks"
[5] "latitude" "longitude"
[7] "statusValue" "statusKey"
[9] "availableBikes" "stAddress1"
[11] "stAddress2" "city"
[13] "postalCode" "location"
[15] "altitude" "testStation"
[17] "lastCommunicationTime" "landMark"
nrow(stations)
[1] 508
The Ergast Developer API is an experimental web service which provides a historical record of motor racing data for non-commercial purposes.
res <- fromJSON('http://ergast.com/api/f1/2004/1/results.json')
drivers <- res$MRData$RaceTable$Races$Results[[1]]$Driver
colnames(drivers)
[1] "driverId" "code" "url" "givenName"
[5] "familyName" "dateOfBirth" "nationality" "permanentNumber"
drivers[1:10, c("givenName", "familyName", "code", "nationality")]
givenName familyName code nationality
1 Michael Schumacher MSC German
2 Rubens Barrichello BAR Brazilian
3 Fernando Alonso ALO Spanish
4 Ralf Schumacher SCH German
5 Juan Pablo Montoya MON Colombian
6 Jenson Button BUT British
7 Jarno Trulli TRU Italian
8 David Coulthard COU British
9 Takuma Sato SAT Japanese
10 Giancarlo Fisichella FIS Italian
Below an example from the ProPublica Nonprofit Explorer API where we retrieve the first 10 pages of tax-exempt organizations in the USA, ordered by revenue. The rbind.pages
function is used to combine the pages into a single data frame.
#store all pages in a list first
baseurl <- "https://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"
pages <- list()
for(i in 0:10){
mydata <- fromJSON(paste0(baseurl, "&page=", i), flatten=TRUE)
message("Retrieving page ", i)
pages[[i+1]] <- mydata$filings
}
#combine all into one
filings <- rbind.pages(pages)
#check output
nrow(filings)
[1] 275
filings[1:10, c("organization.sub_name", "organization.city", "totrevenue")]
organization.sub_name organization.city
1 KAISER FOUNDATION HEALTH PLAN INC OAKLAND
2 KAISER FOUNDATION HEALTH PLAN INC OAKLAND
3 KAISER FOUNDATION HEALTH PLAN INC OAKLAND
4 DAVIDSON COUNTY COMMUNITY COLLEGE FOUNDATION INC LEXINGTON
5 KAISER FOUNDATION HOSPITALS OAKLAND
6 KAISER FOUNDATION HOSPITALS OAKLAND
7 KAISER FOUNDATION HOSPITALS OAKLAND
8 PARTNERS HEALTHCARE SYSTEM INC CHARLESTOWN
9 PARTNERS HEALTHCARE SYSTEM INC CHARLESTOWN
10 PARTNERS HEALTHCARE SYSTEM INC CHARLESTOWN
totrevenue
1 42346486950
2 40148558254
3 37786011714
4 30821445312
5 20013171194
6 18543043972
7 17980030355
8 10619215354
9 10452560305
10 9636630380
The New York Times has several APIs as part of the NYT developer network. These interface to data from various departments, such as news articles, book reviews, real estate, etc. Registration is required (but free) and a key can be obtained at here. The code below includes some example keys for illustration purposes.
#search for articles
article_key <- "&api-key=c2fede7bd9aea57c898f538e5ec0a1ee:6:68700045"
url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=obamacare+socialism"
req <- fromJSON(paste0(url, article_key))
articles <- req$response$docs
colnames(articles)
[1] "web_url" "snippet" "lead_paragraph"
[4] "abstract" "print_page" "blog"
[7] "source" "multimedia" "headline"
[10] "keywords" "pub_date" "document_type"
[13] "news_desk" "section_name" "subsection_name"
[16] "byline" "type_of_material" "_id"
[19] "word_count" "slideshow_credits"
#search for best sellers
bestseller_key <- "&api-key=5e260a86a6301f55546c83a47d139b0d:3:68700045"
url <- "http://api.nytimes.com/svc/books/v2/lists/overview.json?published_date=2013-01-01"
req <- fromJSON(paste0(url, bestseller_key))
bestsellers <- req$results$list
category1 <- bestsellers[[1, "books"]]
subset(category1, select = c("author", "title", "publisher"))
author title publisher
1 Gillian Flynn GONE GIRL Crown Publishing
2 John Grisham THE RACKETEER Knopf Doubleday Publishing
3 E L James FIFTY SHADES OF GREY Knopf Doubleday Publishing
4 Nicholas Sparks SAFE HAVEN Grand Central Publishing
5 David Baldacci THE FORGOTTEN Grand Central Publishing
#movie reviews
movie_key <- "&api-key=5a3daaeee6bbc6b9df16284bc575e5ba:0:68700045"
url <- "http://api.nytimes.com/svc/movies/v2/reviews/dvd-picks.json?order=by-date"
req <- fromJSON(paste0(url, movie_key))
reviews <- req$results
colnames(reviews)
[1] "display_title" "mpaa_rating" "critics_pick"
[4] "byline" "headline" "summary_short"
[7] "publication_date" "opening_date" "date_updated"
[10] "link" "multimedia"
reviews[1:5, c("display_title", "byline", "mpaa_rating")]
display_title byline mpaa_rating
1 Dheepan A. O. SCOTT R
2 Mothers and Daughters NEIL GENZLINGER PG-13
3 Phantom of the Theater DANIEL M. GOLD
4 Beautiful Something STEPHEN HOLDEN Not Rated
5 Elstree 1976 NEIL GENZLINGER
CrunchBase is the free database of technology companies, people, and investors that anyone can edit.
key <- "f6dv6cas5vw7arn5b9d7mdm3"
res <- fromJSON(paste0("http://api.crunchbase.com/v/1/search.js?query=R&api_key=", key))
head(res$results)
The Sunlight Foundation is a non-profit that helps to make government transparent and accountable through data, tools, policy and journalism. Register a free key at here. An example key is provided.
key <- "&apikey=39c83d5a4acc42be993ee637e2e4ba3d"
#Find bills about drones
drone_bills <- fromJSON(paste0("http://openstates.org/api/v1/bills/?q=drone", key))
drone_bills$title <- substring(drone_bills$title, 1, 40)
print(drone_bills[1:5, c("title", "state", "chamber", "type")])
title state chamber
1 An act relating to privacy protection vt upper
2 Crimes: emergency personnel. ca lower
3 DRONE TASK FORCE APPT il lower
4 Omnibus K-12 and higher education policy mn lower
5 Relates to prohibiting civilian drone us ny upper
type
1 bill
2 bill, fiscal committee, local program
3 bill
4 bill
5 bill
#Congress mentioning "constitution"
res <- fromJSON(paste0("http://capitolwords.org/api/1/dates.json?phrase=immigration", key))
wordcount <- res$results
wordcount$day <- as.Date(wordcount$day)
summary(wordcount)
count day raw_count
Min. : 1.00 Min. :1996-01-02 Min. : 1.00
1st Qu.: 3.00 1st Qu.:2001-03-26 1st Qu.: 3.00
Median : 8.00 Median :2006-03-21 Median : 8.00
Mean : 24.91 Mean :2006-01-30 Mean : 24.91
3rd Qu.: 21.00 3rd Qu.:2010-12-08 3rd Qu.: 21.00
Max. :1835.00 Max. :2016-04-27 Max. :1835.00
#Local legislators
legislators <- fromJSON(paste0("http://congress.api.sunlightfoundation.com/",
"legislators/locate?latitude=42.96&longitude=-108.09", key))
subset(legislators$results, select=c("last_name", "chamber", "term_start", "twitter_id"))
last_name chamber term_start twitter_id
1 Lummis house 2015-01-06 CynthiaLummis
2 Enzi senate 2015-01-06 SenatorEnzi
3 Barrasso senate 2013-01-03 SenJohnBarrasso
The twitter API requires OAuth2 authentication. Some example code:
#Create your own appication key at https://dev.twitter.com/apps
consumer_key = "EZRy5JzOH2QQmVAe9B4j2w";
consumer_secret = "OIDC4MdfZJ82nbwpZfoUO4WOLTYjoRhpHRAWj6JMec";
#Use basic auth
secret <- openssl::base64_encode(paste(consumer_key, consumer_secret, sep = ":"));
req <- httr::POST("https://api.twitter.com/oauth2/token",
httr::add_headers(
"Authorization" = paste("Basic", secret),
"Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
),
body = "grant_type=client_credentials"
);
#Extract the access token
token <- paste("Bearer", content(req)$access_token)
#Actual API call
url <- "https://api.twitter.com/1.1/statuses/user_timeline.json?count=10&screen_name=Rbloggers"
req <- httr::GET(url, add_headers(Authorization = token))
json <- httr::content(req, as = "text")
tweets <- fromJSON(json)
substring(tweets$text, 1, 100)
[1] "Rblpapi 0.3.4 https://t.co/NoGluTGSld #rstats #DataScience"
[2] "Introducing DohaR – A new R User Group in Doha, Qatar https://t.co/MWSKgMap9I #rstats #DataScience"
[3] "FSA v0.8.7 Released https://t.co/2nnGMGINA7 #rstats #DataScience"
[4] "Too Much Parallelism is as Bad https://t.co/t0RffUvLYs #rstats #DataScience"
[5] "Coming up: principal components analysis https://t.co/f9dUnhuC9n #rstats #DataScience"
[6] "Shiny Apps Gallery using Plotly in R https://t.co/cEFvgy1VPY #rstats #DataScience"
[7] "Who is going down? Bundesliga Betting Odds – updated https://t.co/uvpcmOmcaf #rstats #DataScience"
[8] "BH 1.60.0-2 https://t.co/kZHVjPzG65 #rstats #DataScience"
[9] "Red herring bites https://t.co/IJQwBQkImx #rstats #DataScience"
[10] "R Tools for Visual Studio 3.0 now available https://t.co/JWKthTwEnC #rstats #DataScience"