Observing & Querying Tweets (Part 3)
In my previous posts I’ve discussed how to fetch data from Twitter and massage it into the form that is needed when observing data in Sierra. I also talked about how to “POST” that data to Sierra so that it’s available to query. In this post I’m going to discuss the process of actually querying that data once it’s in Sierra.
Before you read on you may want to browse over the Sierra documentation. These docs list all of the REST calls that are available on Sierra. In my example below I’ll be using the “connections” call.
I should also note that I’ve made some additions to the source code that I’ve used across all of these posts. That code is available along with the rest of the Sierra sample code. In particular I’ve made the code aware of command-line options that specify whether you are “observing” (-o) or “querying” (-q). You can also define all of the needed usernames and passwords via the command-line as well.
Querying Sierra is quite easy. You need to know what call you’re going to make and what parameters that call supports. In my case I want to find “connections”. On my first pass I’ll look for hashtags. Then I’ll look at what authors are “connected” to a particular hashtag.
1 2 3 4 | // build Url needed to fetch hashtags def connectionsGetUrl = "http://${sierraHostname}:8080/ws/spaces/default/connections?access_key=${sierraAccessKey}&c=hashtag" def urlToEncode = "${connectionsGetUrl}${sierraSecretCode}" connectionsGetUrl = "${connectionsGetUrl}&signature=${Hex.encodeHex(MessageDigest.getInstance("SHA-1").digest(urlToEncode.getBytes("UTF-8")))}" |
As you can see at the end of line 2 I’m adding a “c=hashtag” parameter to my “/connections” URL. This specifies that I want my results to all have the category “hashtag”. I could easily specify multiple “c” parameters and get results with a mixture of categories, but for this example I’m only using the one. I’m also adding the necessary access key and signature which I’ve discussed in a previous post.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // setup the HTTPBuilder call def sierraHttp = new HTTPBuilder(connectionsGetUrl) def hashtag sierraHttp.request(groovyx.net.http.Method.GET, groovyx.net.http.ContentType.JSON) { response.success = {r, json -> json.r.eachWithIndex {result, i -> if (i == 0) { hashtag = "${result.a.c}:${result.a.v}" } println "Hashtag: ${result.a.v}, metric: ${result.m}" } } response.failure = {r -> println "Unexpected error: ${r.statusLine.statusCode} : ${r.statusLine.reasonPhrase}" } } |
As you can see here, once I’ve built my URL I’m executing a “GET” request using the Groovy HTTPBuilder. If my request is successful I’m going to iterate over the “r” (results) collection of the JSON response printing out the name of the hashtag and the metric associated with the result. On the first iteration I’m going to capture the hashtag value to use on another query (more on that later). The output should look something like this (hashtags will vary based on what has been ingested):
Hashtag: fb, metric: 1 Hashtag: nomatterhowmanytimesidothisistillgetnervous, metric: 1 Hashtag: haiti, metric: 1 Hashtag: freenexusone, metric: 1
With this first query, since I haven’t put in an input query term Sierra is going to tell me what hashtags have been seen the most (frequency). We call this the “empty query”. You might be asking, “I thought the call was called connections”. So, let’s do one more query and provide an input query term. In this case I’ll use the hashtag I grabbed on the first iteration above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | // build Url needed to fetch people related to a hashtag connectionsGetUrl = "http://${sierraHostname}:8080/ws/spaces/default/connections?access_key=${sierraAccessKey}&q=${hashtag}&c=person" urlToEncode = "${connectionsGetUrl}${sierraSecretCode}" connectionsGetUrl = "${connectionsGetUrl}&signature=${Hex.encodeHex(MessageDigest.getInstance("SHA-1").digest(urlToEncode.getBytes("UTF-8")))}" // debug, print the connections url println connectionsGetUrl // setup the HTTPBuilder call sierraHttp = new HTTPBuilder(connectionsGetUrl) sierraHttp.request(groovyx.net.http.Method.GET, groovyx.net.http.ContentType.JSON) { response.success = {r, json -> json.r.each {result -> println "Person: ${result.a.v}, metric: ${result.m}" } } response.failure = {r -> println "Unexpected error: ${r.statusLine.statusCode} : ${r.statusLine.reasonPhrase}" } } |
Here (at the end of line 2) you can see that I’m adding a “q” parameter to the URL, and I’ve changed the “c” parameter to “person”. So, I’m asking Sierra, “What people are connected to this hashtag”. The results should look something like this:
Person: iamkory, metric: 1
That’s it. Pretty simple. That’s just scratching the surface of what you can do once your data is in Sierra. So, fire up your instance and try it out.
If you’d like to see a lot of this running live I’d encourage you to take a look at TweetDive. TweetDive is a sample application that we’ve developed that uses a lot of the same concepts I’ve discussed in these posts. Register for a free account and play around. We’ll more talking about TweetDive more in the near future.
Tags: development, Groovy, json, REST, SaffronSierra, twitter
This entry was posted on Wednesday, January 13th, 2010 at 2:44 pm and is filed under SaffronSierra. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
One Response to “Observing & Querying Tweets (Part 3)”
Leave a Reply


[...] This post was mentioned on Twitter by Jared Peterson, Saffron Technology. Saffron Technology said: blog: Observing & Querying Tweets (Part 3) http://saffronsierra.com/2010/01/13/observing-querying-tweets-part-3/ [...]