Facebook released their search strategy today, the so called “third pillar” of Facebook’s future.
Mark Zuckerberg and Facebook don’t get unstructured data.
Search is hard, very hard. It’s why I have always been fascinated by search, it is also one of the reasons I have a massive amount of respect for Google, beyond their annoying marketing strategy of “do no evil”, 0-10 PageRank and Android, which is a half baked mobile OS IMHO, is the fact that they have engineering cohones.
Their UX is horrible, their products are scattered (Google+, Wave).
But their search is amazing.
And search as I mentioned before fascinates me.
“Index the entire web, then, for whatever term I type into the search engine, return to me the most relevant sources of information and make sure it is trusted, timely, and relevant. Infer what I mean when I type into that little box. Make it go.”
That is an exceedingly difficult problem which, by all rights they’ve done an amazing job delivering upon.
The World Wide Web is made up of unstructured data: blogs here, websites there, forums, reviews, images, comments, stuff stuff and more stuff. When data and information is not structured it is difficult, very difficult to filter, sort and rank. Again, all things in life being imperfect, Google has delivered on that claim and passed with flying colours.
That’s why you and I use Google everyday. It’s important because it’s very very useful.
Now to circle back to my original thesis: Facebook will fail at search and here is why:
Facebook is avoiding the very real and very tough problem Google tackled head on from day one: unstructured data. Google is attempting to infer the meaning and create structure behind unstructured data.
Do I like something simply because I mention it? How does the content reflect my actual point of view? Am I an expert regarding the topic I am commenting upon?
Facebook’s solution to search is the “Like” and the Open Graph. Their structured database,which holds stores, categorizes and makes accessible everything you do on Facebook and by extension using “Login in using Facebook” through a subset of the Word Wide Web.
Facebook has structured data about our lives, all of our posts, images, comments etc in their Open Graph, a structured data set that makes claims to knowing who people *really* are, their real connections and their social lives.
These are the claims that Facebook has promised are their technological “secret sauce” on both pre-IPO and post-IPO. But there’s an issue, which gets us back back to my points about Google earlier and the challenging issues they tackled head on from day one.
However, they cannot distinguish when someone although they “Like” McDonald’s doesn’t really like McDonald’s through their unstructured sentiment, my comments about them are not indicating a positive sentiment even though I hit the “Like” button.
Using sentiment to express an outcome versus a structured data set element such as a “Like”. Google has done this from day one via Hilltop and the hundreds of iterations to their PageRank algorithm (not the 0-10 scale, the algorithmic PageRank that is Google’s IP). It’s how they rank and sort the unstructured web.
Anyhow, this blog post is already too poorly written and too long, but I find this conversation fascinating because these are the claim of amazing technologies (Facebook) versus the reality of execution (Google).
Facebook cannot, or will not, attempt to address the tough problem: finding meaning through unstructured data.
Rather they want to force a structured data set (read: Open Graph) onto our lives but will not get into the sentiment problem.
This spurred an interesting conversation about structured data and sentiment on Google+ with a long time colleague of mine Aaron Bradley who is a search marketing expert who legitimately knows his shit. Here is the thread:
Interesting case Dan. In short, however much Open Graph’s “intelligent structured data” can be leveraged for advertising and other purposes, one cannot infer the presence of negative sentiment based soley on the absence of positive sentiment.
Put another way, this is where the absence of a “Dislike” button is something of Achilles’ heel for Facebook (and, by extension, the absence of a “-1″ button in Google).
Open Graph can’t speak to what you and your friends don’t like, because there’s no mechanism for this. Both built-in Open Graph actions and built-in Open Graph objects are, at best, neutral when it comes to sentiment. Facebook may be able to see that a friend “Liked” (action) Catcher in the Rye (object) – a positive sentiment – or just “Read” (action) Catcher in the Rye – a possibly neutral sentiment, but one I’ll bet is processed (like the built-in actions “Watch”, “Listen” and “Follow”) like a “Like” by Facebook’s algorithms. It’s perhaps (unintentionally) telling that theplaceholders for built-in objects all contain content like this:
I don’t know that Google – even outside the Google+ environment and its lack of a -1 – that Google is better suited to make sentiment decisions for advertising delivery based on structured data. The exception here is review data, which is really a sentiment scale. But in order to throttle the display of a McDonald’s ad based on structured data, Google would have to know that you disliked McDonald’s – regardless of the general sentiment surrounding the restaurant – because you gave it one out of five on a review. (Of course your friends’ reviews might count if Google knew as much about you and your relationships based on Google+ as Facebook does based on … well, Facebook. In reality? Ha.)
So is Facebook delivering McDonald’s ads to you a sign of failure? As much as I’m not particularly a FB fanboy I’d have to say no: Facebook’s algorithm can’t read your mind. It might even be reasonable targeting using structured data, based on the fact that a certain proportion of your Facebook friends “Like” McDonald’s Page – which would be the equivalent of me being targeted with a Tim Horton’s ad (I don’t despise them and their deceptive advertising – I just find their coffee appalling).
Of course one could also infer from positive sentiment things it’s likely I am neutral or negative toward. If I “Like” Hitchens’ God is not Greatand Dawkins’ The God Delusion you’re probably not going to get far showing me an ad for Jesus Calling (evangelical bestseller – thanks Google). But that would take multiple levels of sentiment analysis and topical classification on top of other algorithmic gymnastics.
I recall a conversation you and I had on Facebook concerning why one should grind one’s beef, or (in my case) acquire it from cow-loving but non-vegetarian hippies. But we never expressed that in a formal way (clicked a “Like” button associated with the non-built-in object “Homemade Hamburgers”). So Facebook had the sentiment, but didn’t have structured data pertaining to it. And so you got asked about Mickey D’s.
And my thoughts:
Awesome points – however what Facebook needs to be able to do with their structured data goldmine is infer sentiment and semantics from the unstructured portions of their data set.
Indeed the convenient construct is an explicit dislike, however that is an intrusive model from a user perspective.
I would then have to (as a user) explicitly identify that I indeed do Like or Dislike something in order for Facebook’s algorithm to be able to understand my sentiment.
Sentiments are unstructured notions. How I “feel” about a given subject does not always have a structured data model which is convenient for the system to process.
So – is Facebook’s idea to enforce a structure and exclude a sentiment? It seems so. From a technological innovation perspective Google assumes lack of structure and provides benefits where possible. Facebook OTOH wants to impose structure and ignore the really difficult problem, inferring sentiment from unstructured data. That’s not fundamentally a problem except that Facebook makes claims to understanding our lives and how we interact. It’s a bit of a bait and switch of claims versus reality.
Lastly, some Facebook PR regarding their search technology with some translation from VentureBeat. I’m now summarizing my thoughts in sound bites, but:
Translation: Google has been indexing for years. What is open graph content? It’s your content on your site shoved into their database then made to conform so they can monetize easily while avoiding the work.
Am I wrong? Is everything I’ve written complete nonsense? Has the world gone crazy by not observing this or am I just totally insane?