Earlier today, Gina Neff, an associate professor of communication at the University of Washington and the School of Public Policy at Central European University in Budapest, noticed something missing from Uber’s Data Blog.
In a tweet, she wrote:
[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1612853,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,business,mobile,","session":"D"}']Spurious: Uber took down data science blog post showing correlation between prostitution and ridership. Cached here http://t.co/1PJGh68OWJ
— Gina Neff (@ginasue) November 24, 2014
The blog post in question was titled: “Location knowledge is a proxy for Uber demand.”
Though based on the original URL, it was apparently first called something else: https://blog.uber.com/2011/09/13/uberdata-how-prostitution-and-alcohol-make-uber-better/
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
As Neff notes, the post is still cached here.
Uber has recently been facing a firestorm of criticism for threats one of its executives reportedly made to conduct opposition research on journalists. The episode, reported first by Buzzfeed, has many concerned about Uber’s heavy-handed PR tactics, its attitudes toward women, and even potential abuse of riders’ personal data by executives.
Uber’s public relations team has not yet responded to a question on why or when the prostitution blog post was apparently removed. But the blog post’s tone and substance, it’s likely the company may have felt the piece would be fodder for critics.
Written on Sept. 13, 2011 by Uber’s data team, it takes a somewhat lighthearted look at how Uber’s own ride data mashes up with local crime statistics.
“We show how where crimes occur — specifically prostitution, alcohol, theft, and burglary — can improve Uber’s demand prediction models,” the post says.
For context, the post notes that Uber is trying to use its data to predict when and where demand will occur. Some of that data is innocuous, like how many rides occur in certain neighborhoods.
[aditude-amp id="medium1" targeting='{"env":"staging","page_type":"article","post_id":1612853,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,business,mobile,","session":"D"}']
But in trying to take a different tack on where and when people catch rides: “We hypothesized that crime should be a proxy for non-residential population density,” the team writes.
So, mashing up with the San Francisco Crimespotting map, the team made a few determinations:
“Areas of San Francisco with the most prostitution, alcohol, theft, and burglary also have the most Uber rides. Be safe, Uberites!”
Then, the post goes off on a bit of tangent, as the authors get curious about the prostitution stats. They notice that prostitution arrests in both San Francisco and Oakland seem to spike the second Wednesday of each month. Why?
[aditude-amp id="medium2" targeting='{"env":"staging","page_type":"article","post_id":1612853,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,business,mobile,","session":"D"}']
“Someone pointed out to me that Social Security and welfare checks arrive on the second, third, and fourth Wednesdays of each month,” the team writes. “Oh man. Now we’re into dangerous, politically-charged territory.”
The team ends on an upbeat note, promising more such insights as it plays around with Uber’s data:
“This [is] one of the coolest things about working for a data-driven company like Uber: on the surface we’re a technology company revolutionizing transportation, but below the hood there are so many ways to look at our data. And sometimes that freedom to play leads to interesting results which aren’t immediately relevant to the core part of our business. This finding is a perfect example of the fascinating insights you can get when you combine big, seemingly disparate datasets.”
Neff said she went looking for the post after hearing a recent Marketplace story that mentioned another notorious Uber blog post that had also been taken down called: “Rides of Glory.”
[aditude-amp id="medium3" targeting='{"env":"staging","page_type":"article","post_id":1612853,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,business,mobile,","session":"D"}']
As Marketplace recalls:
“The company examined its rider data, sorting it for anyone who took an Uber between 10 p.m. and 4 a.m. on a Friday or Saturday night. Then it looked at how many of those same people took another ride about four to six hours later – from at or near the previous nights’ drop-off point. Yes, Uber can and does track one-night stands. Consider it the Uber equivalent of the walk of shame.”
For Neff, the post and its removal were troubling for a host or reasons. She tweeted:
@obrien I can’t decide which offends my sensibilities most: misogyny, data recklessness, or spurious correlations masquerading as #bigdata
— Gina Neff (@ginasue) November 24, 2014
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More