Archive for November, 2009

The Need for Speed (Especially At Query Time)

Wednesday, November 25th, 2009 by Manny Aparicio

It should be no surprise that there is growing interest in data analytics and a rise of  “post-relational” analytic data stores.  Traditional databases work well to load data in, but they are becoming problematic to get answers out when faced with growing scale and complexity.    There is a growing problem with Big Data, defined by Adam Jacobs in a recent ACMqueue article, “Pathologies of Big Data.”  Big Data is “data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time.”  Given our growing data tsunami, many people are looking beyond the tried-and-true methods of relational databases. Billions of rows can easily be stored, but Jacobs describes the pathologies of query scalability, even when asking for basic counts!

You can store data all day, but the value of data is in its query: in its exploitation.  I remember first hearing this word, “exploitation”, from the leader of a UK intelligence agency on one of my trips to London.  Whether for government or commercial interests, the time to analyze and act is what matters.  The word struck me and has remained with me every since.  Exploitation time is what matters.  Furthermore, the number of queries is increasing faster than the growth of the data tsunami itself.    Query time is what matters because response time in most critical, and critical times drive the massive load of many simultaneous requests.

(more…)

Tags: , ,
Posted in Natural Intelligence | 1 Comment »

It’s All Just Counts

Thursday, November 12th, 2009 by Manny Aparicio

The idea of a memory base is simple.  We define a memory as a matrix, keeping counts in the matrix cells between the names of things on the rows and columns.  Let’s say we had a memory of me, called “Person: Manny”.  If I query the row called “City: London” and ask for associated columns for “Carrier: ?”, I’d see “AA” and “BA” for American and British Airlines.  Moreover, AA would be returned with a count of 6 and BA with a count of 1.  More than just the existence of my travel relationships, we can also see the strength of my travel habits – in the context of going to London at least.

The idea is simple but fundamental.  When we started Saffron and began working with one of the big intelligent agencies, one true believer in what we were doing would provoke others by saying, “It’s all just counts.  What else is there?”  What did he mean?  When dealing with the analysis of massive data, so much of what is computed needs to be computed over counts.   More deeply, information and knowledge is based on the frequencies of what we see in the world.  Counts are fundamental to knowing what we know.

(more…)

Tags: , , ,
Posted in Natural Intelligence, SaffronSierra | 1 Comment »

Observing & Querying Tweets (Part 1)

Monday, November 9th, 2009 by admin

As we look to build compelling examples using Saffron Sierra we’ve often talked about using Twitter as a datasource.  If we could get Twitter data/tweets into Sierra then we could builds lots on interesting analytical capability using Sierra’s APIs.  To enable our exploration, and others, I’ve decided to start a little series (probably 3 parts) of blog post about observing and querying Twitter data in Sierra.

In this first part I’m going to cover the process of grabbing a Twitter feed and building the necessary resource XML for Sierra observations.  For now, you can think of a “resource” in Sierra speak as our method of “inserting” data into the system.  For my example I’m going to be using Groovy .  This code should be easily replicated in your language of choice.  I choose Groovy because I’m familiar with it and it provides a few tools that make this kind of stuff really easy.

(more…)

Tags: , , , ,
Posted in SaffronSierra | 5 Comments »

Online Documentation

Wednesday, November 4th, 2009 by admin

This past year as we’ve developed Sierra and SMB (Saffron Memory Base, our enterprise product) we’ve tried to embrace the power of the web to deliver information to our users. Hosting our documentation online felt like a very natural, and valuable thing to do (still does). We decided to use PBworks to host our documentation. We probably made this decision because Twitter was using it for their online documentation,

As we lived with this solution for a few months we’ve noticed a problem with this approach that continues to haunt us. What’s the problem you ask? Versioning! We need versioning. As we’re developing features and fixing bugs for a new build we often need to update the documentation. We may change something subtle about the JSON that is returned from a REST call. We may add a new request parameter that we didn’t have before. The documentation stays fresh, but unfortunately our customers, both Sierra and enterprise, are not always on the latest build. Sometimes that is due to a choice they’ve made, but often the latest build hasn’t even been released. Either way we end up confusing our customers in an effort to keep documentation up to date. We’ve worked with PBworks support and have a strategy for helping with this, but I can already tell that the new strategy will not be perfect. We have customers on many different versions and we need them all to have documentation that matches their version.

Are there other tools that we should look at? We want it to be a hosted solution. We need to for all our developers to be able to edit and contribute.  We like the “wiki” approach. Thoughts?

Tags: , ,
Posted in SaffronSierra | 2 Comments »