Here are some things that I learned on the Internet recently:
- A Boston-area Catholic priest, John Michael O’Neal, suffered a massive heart attack and was dead for 48 minutes before miraculously coming back to life in the emergency room. Father O’Neal reported that while he was dead he went to heaven and met God. God was a woman.
- Argentina’s president, Cristina Fernández de Kirchner, recently adopted a Jewish boy to save him from becoming a werewolf.
- A homeless guy was given $100 cash and spent it on other people who were in need rather than buying several cases of Night Train and throwing a party under the freeway overpass.
- A 600-pound Australian woman gave birth to a 40-pound baby, which nearly doubled the Guinness Book of World Records heaviest newborn of 22 pounds 8 ounces.
- A man in Fargo, North Dakota was arrested for clearing snow from his driveway with a military-grade flamethrower.
- Meanwhile, in Buffalo, New York, it was so cold that frozen squirrels were reportedly falling from trees.
- Macklemore joined the Islamic State (ISIS).
- This is bad news for Macklemore because it was also reported that there was an Ebola outbreak among ISIS.
Yes, none of that is true—even the part about Macklemore joining ISIS, which is plausible. If any celebrity were going to do that, it would be a white rapper who sings about buying cheap clothes from a thrift shop.
And while none of the above is true, all of it went viral on the Internet, mostly via social media services like Facebook, Twitter, and Instagram. There were people who believed these stories to one degree or another.
The promise of the Internet is that it would become the world’s largest digital library containing all knowledge and facts. It’s becoming that, but it’s also the world’s biggest manufacturer of crap, an honor that had previously belonged to only politicians and traveling snake oil salesmen.
The Internet has, unfortunately, become a place where it’s becoming increasingly difficult for the average person to discern fact from fiction. By “average person”, I mean someone who doesn’t have a built-in, shock-proof "bullshit" detector. It’s been my experience that most of us don’t.
What we need is an equivalent of Sgt. Joe Friday for the Internet. For those of you who are not old enough, or only have a cultural literacy that dates back to the 1990s, Sgt. Friday was the iconic police detective in the television series Dragnet during the 1950s. The show was revived from 1967-70 and there was a movie starring Tom Hanks and Dan Aykroyd in 1987.
Sgt. Friday was a hardcore cop who methodically gathered the facts of a case. He had a built-in, shock-proof "bullshit" detector. The phrase, “Just the facts, ma’am,” became misattributed to his character following a radio satire of Dragnet in 1953 by Stan Freberg.
Sgt. Friday never actually said “Just the facts, ma’am”, but it stuck.
I didn’t actually know that and almost contributed to the perpetuation of the misattribution. Looking up the phrase on the Internet and verifying its origin from multiple sources is how I came to know the facts in this case.
In the future, Google may make this process automated by ranking webpages based upon their trustworthiness. Currently, Google’s search engine algorithm, known as “PageRank”, ranks webpages by their relevance to the search term a user enters in the search field at google.com. Researchers at Google are working on a new project that would not only rank webpages based on their relevance for a given search term, but also on the trustworthiness of the webpage.
The conceptual framework and testing was detailed in a research paper, Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources, published by Google earlier this year.
“Quality assessment for web sources is of tremendous importance in web search. It has been traditionally evaluated using exogenous signals such as hyperlinks and browsing history,” wrote the authors of the research paper. “However, such signals mostly capture how popular a webpage is.”
For example, they found that while gossip websites had a high PageRank score, those types of websites “would not generally be considered reliable.” Meanwhile, less popular websites “have very accurate information.”
The fact extraction methodology used for the Knowledge-Based Trust algorithm is based on Google’s Knowledge Vault, a vast knowledge base of facts about the world that has been autonomously gathered and merged from information from across the World Wide Web. To date, Knowledge Vault has amassed 1.6 billion facts culled from the World Wide Web.
Knowledge-Based Trust builds upon what researchers have learned while building Knowledge Vault in order to provide “a much more accurate estimate of the source reliability.”
According to the research paper, the process for evaluating and ranking webpages based on the Knowledge-Based Trust algorithm begins by extracting “a plurality of facts from many pages using information extraction techniques.” Once the data is extracted, “we then jointly estimate the correctness of these facts and the accuracy of the sources using inference in a probabilistic model. Inference is an iterative process, since we believe a source is accurate if its facts are correct, and we believe the facts are correct if they are extracted from an accurate source.”
The degree to which a fact is accurate is based on “knowledge triples”, which include a subject, predicate, and an object. A subject “represents a real-world entity”. A predicate describes “a particular attribute of an entity” while an object is a numerical value, date, or other data type.
One of the examples of knowledge triples provided in the research paper is:
Barack Obama, nationality, USA
Some webpages purporting that Barack Obama’s nationality is Kenyan are ranked high under Google’s current PageRank system. With Knowledge-Based Trust ranking, however, webpages with a knowledge triple of “Barack Obama, nationality, Kenya” wouldn’t fare so well.
Here’s another knowledge triple that wouldn’t do so well:
Sgt. Joe Friday, quotes, “Just the facts, ma’am.”