Posts Tagged ‘semantic technology’

Small Data or Big Data – Which Matters Most for AI?

Monday, August 27th, 2018

By Jeff McDowell, COO at Primal

In the past year, we have seen countless headlines about how artificial intelligence (AI) will transform business. AI promises to provide insight into data and customers at a level of individualization never seen before. In response, many companies are scrambling to capture and store as much data as possible – but in doing so they might be increasing their exposure to data breaches, privacy violations, and hacks.

Unfortunately, by taking a standard “machine learning only” approach to AI, we may not get far out of the starting blocks to achieve the vision of an AI solution that can understand data at a high level of fidelity. Many people assume that storing and analyzing large amounts of information (“big data”) through machine learning is the only way to take advantage of AI. But machine learning approaches can be actually be ineffective in understanding the meaning of text or the interests of individuals with any sort of specificity. Any company serious about AI needs to develop a solution that is both more targeted and more secure. I believe the way forward lies in integrating small data analysis into a big data approach.

Here are a few reasons to consider small data:

Big data techniques can be expensive and ineffective at high levels of specificity: Just like satellite imagery provides a broad picture of geospatial data of a physical lake, today’s big data approaches do the same with data lakes. When statistical methods of AI are applied to a big data environment, the output is usually very generalized and lacks fidelity. For example, if a statistical model is looking at data about sports fans, it may see a pattern that groups people into categories such as “baseball enthusiast”, “football enthusiast”, etc. These broad categories lose sight of the fact that some users are actually a pitching enthusiast, or a statistics junkie, or a part-time umpire. Knowledge of these narrower topics would be extremely useful to advertisers of niche products, yet big data platforms today are very limited in identifying and exposing these higher fidelity interest categories. This is because processing and storage becomes increasingly more expensive and complex when analyzing large amounts of data to achieve higher levels of specificity.

Integrating small data analysis is the key to making AI meaningful: Small data simply refers to the quantity of data available to train models. It’s often defined as the amount of information that can be processed by one computer, but it could be even smaller than that – a spreadsheet, a document, an article, or even as small as a social media post. “Small data” can even be found within large data sets. Instead of applying statistical collaborative filtering techniques to a group of people to infer broad interests which are hit and miss, taking an approach that applies semantic or symbolic techniques to small data can look at an individual to understand exactly what they are interested in, no matter the level of specificity. For our baseball example, a small data approach would analyze the meaning and context of a person’s blog or social media post, and pick up the nuance between someone who likes statistics vs someone who is interested in pitching techniques.  

Small data approaches increase explainability and reduce potential for bias: One of the criticisms of AI is that it operates in a “black box”, where it can be difficult to determine the reasoning behind a specific output. Numerous organizations – including the National Institute of Standards and Technology (NIST) – have called for a more balanced and thoughtful approach to developing AI solutions, to ensure they are trustworthy and explainable. AI outputs based on small data are inherently easier to interpret by humans. AI systems which analyze and categorize users based on large data sets also have the risk of introducing biases over time – a problem that can be mitigated by integrating analysis of small data, which can serve as a self-correction against bias.

AI has huge potential to augment our human intelligence and make us more productive. The importance and power of small data for AI is still on the fringes of being understood, but will gain momentum as businesses and consumers increasingly expect a greater level of relevance and security from AI. Even Eric Schmidt, former CEO of Google, recently tweeted, “AI may usher in the era of ‘small data’ – smarter systems can learn with less to train on.”

The current model of statistical analysis of big data is ‘good enough’ for now, but not sustainable. For AI to really be relevant, efficient, and safe, big data must be balanced by a robust small data processing activity.

Introducing Primal Assistants: A framework for software agents

Monday, May 27th, 2013

Primal does a lot of heavy lifting in knowledge representation and content filtering. If you ask it to grab you some relevant content around your interests, it will do precisely that.

But what if you don’t want to have to ask? Search engines are fantastic, but they still require that you go to them and then try to figure out how to formulate your query in a way that gets you decent results.

Primal already has the ability to understand what you want, and we’re now working on some technology that will let Primal deliver you the content that you truly care about before you know you want it.

Read on to learn more about Primal’s new software agent and content streaming framework.

Why the Web Needs Automating

Sunday, September 12th, 2010


Technology was supposed to revolutionize our lives. There were promises of 20-hour work weeks, robotic servants to do our bidding, and leisurely weekday afternoons in the sun. That was a fantastic dream. So what happened along the way?

Today, we face the grim reality that most of the technology we build simply enables people to do more work.

Your PC is perhaps the best example of this. Sure, it’s a powerful tool. But it’s one that can do almost nothing without a human driving it. You respond to your emails. You browse the Web. You write that report. And you fix it when it breaks.

Could a computer do some of that work for you? (more…)

Interest Networks Don’t Need to Socialize

Friday, October 10th, 2008

Here’s a glimpse into a future where interest networks are liberated from documents and social networks.

Past: Connecting People

The social dimension of the Web imparts a powerful influence on knowledge acquisition. People discover each other through the intersections of documents they create.

Unfortunately, this is a terribly protracted process. As Howard Bloom points out, “When we try to find each other, and try to find the knowledge we get from each other, these days it’s as difficult as getting from New York to California in 1848.” (1) (more…)