More Information About Online Dash Search Privacy

Recently there has been some concerns about the privacy of the new feature we recently added to the dash in which it can query external resources to provide related results. I just wanted to follow up with some further details about how these searches are performed, the privacy protections that are put in place, and further work going on.

I reached out to John Lenton, who is the Senior Engineering Manager in the Online Services team at Canonical. He was responsible for building the technology that handles the searches from the dash. He says:

When performing a search, you expose no more information to Canonical than the originating IP of your request, the search terms you enter, and the result you click on (if any). We don’t perform any kind of “tracking”; there is nothing really user-identifyable there…the IP address is unreliable for this, and isn’t relied on other than for collapsing multiple searches into one in the reporting, and even this is after passing it through a one-way hash.

Searches are currently performed over plain HTTP to our servers in a data-centre in either London or the USA, and then forwarded to the upstream providers appropriate to the originating request’s geolocation. The only potentially identifying bit of information, the IP address of the originating request, is not forwarded unless explicitly required to perform the search (so far, only one of 20+ upstream providers requires this: the Headweb video source for scandinavian countries needs to do its own geoip).

We appreciate some of the community concerns about these searches operating unencrypted and we are currently working to encrypt these dash searches ready for the release of this feature in Ubuntu 12.10. This should resolve most of the concerns shared about unencrypted traffic.

In terms of logging, the raw httpd logs are only visible to a small group of people whose job requires that they have access and who are trained in respecting people’s privacy in accordance to European law on this matter. The searches themselves, stripped of the IP addresses (replacing them with a one-way hash) are made available to a slightly larger group of people to enable statistical reporting. Because not only the search but also clicking on a result reaches our server (where it is redirected to whatever is appropriate), we will be able to infer what search results people want when searching for particular terms, and at some point in the future this will be used to help us provide better, more relevant results. This statistical gathering of a mapping of search terms to clicked search results is not done yet but will be done soon”.

Please feel free to follow up with any further questions, and we will try to get them answered.

  • http://benjaminkerensa.com/ Benjamin Kerensa

    Also a privacy policy covering collection and handling will be out hopefully before release.

  • Craig Maloney

    Thank you Jono. I’m pleased at how delicately and professionally Canonical has handled this issue. I had my doubts before, but feel this resolves my concerns.?

  • Jacob Peddicord

    The searches themselves, stripped of the IP addresses (replacing them with a one-way hash) are made available to a slightly larger group of people to enable statistical reporting.

    No no no: http://en.wikipedia.org/wiki/AOL_search_data_leak

    If this data is to be stored at all, it needs to be completely anonymized. I’m not trying to be mr. panic here, but you’re setting yourselves up for quite an awful blunder.

  • xguest

    This should be opt-in. If Ubuntu wants to get into the search business I couldn’t be happier. Google needs competition other than from a monopoly power Microsoft (who install Bing by default).

    However in no way should data collection be enabled by default. Not only the invasive additions to the Home lens but the Video and Music lenses from 12.04 should not have been activated to outside searches. (Nothing, not even crash reports, should be sent automatically by default.)

    If Canonical wants to install a tutorial on how to add internet scopes to the dash and want to advertise “affiliate scopes” in the software center that’s fine by me (although software with affiliate code should be labeled as such), but they should not be trying to support themselves by such trashy means.

    Affiliate revenue can’t be business model otherwise Canonical will die. They should have a fundraiser every six months and sell DVDs, USB keys, T-shirts, and “memberships.” Promote helping Ubuntu by buying the software while invoke the goodwill and better nature of the community. I’m not the only one asking for this. “Take my money, please,” but I won’t feel compelled to donate and won’t buy affiliate links. Make an effort, produce a product, people will buy.

    And if the software won’t sell, sell hardware. Companies like System 76 will never be enough, Ubuntu needs hardware designed for Ubuntu (if software won’t sell), not mediocre ODM rebranded “customized” designs.

  • http://twitter.com/DaveEwart Dave Ewart

    Thanks for posting this, Jono. You state that Canonical sees the search terms: search terms can be very sensitive and may themselves contain personal information. It is correct that work should be made to encrypt their transit, but is that enough? I’m not so sure. I would absolutely NOT expect that search terms I type (for what is perceived as a LOCAL desktop operation) are sent out over the ‘net.

    [For example, local search terms (looking for personal documents etc.) may contain references to medical conditions, financial matters and so on, because those are items typically kept locally.]

    At the very least, there should be a first use “Are you happy for us to …”, suitably phrased.

  • Mario

    “When performing a search, you expose no more information to Canonical than the originating IP of your request, the search terms you enter, and the result you click on (if any).”

    Exactly, and the search terms can already leak information

  • Juergen Donauer

    But the Previews are loaded directly from Amazon? So Amazon will know what my IP address is interested in – sort of … right or wrong?

  • http://benjaminkerensa.com/ Benjamin Kerensa

    Correct. The Lens still grabs Amazon Product Photos directly from Amazon’s servers (CloudFront) and when that occurs it is very possible a httpd log somewhere records an IP Address/Timestamp/File Requested.

    So even though the initial query may in fact be anonymized via Canonical servers when the preview HTTP GET (To Retrieve Product Suggestion Images) is sent it is done from you to Amazon and no Canonical involved which is pretty close to having vanilla search queries sent only instead of having a specific keyword they will only have a ballpark category to work with.

    I just tested to be sure in Wireshark to verify all of this and you can see it in the attached image.

  • Anonymous CowHard

    Jono: Someone already pointed you at the AOL data leak. What you are creating here several orders of magnitude more intrusive than a web search.

    Do you really want to risk creating thousands of Thelma Arnolds and User 927’s?

  • Thomas Kluyver

    Crash reports, in fact, are quite a good precedent – it’s very valuable data for fixing bugs, but the user still has to explicitly click a button before anything is sent to Canonical.

    I wouldn’t presume to tell Canonical how they can make money – I don’t know what model will work. But I hope they can find a better one than putting affiliate links in the home lens.

  • Thomas Kluyver

    Others are already discussing the problems with the feature, I’d like to ask about the process. It appears the feature freeze exception for this was rushed through because Mark Shuttleworth was pushing for it. That doesn’t leave much opportunity for the community to discuss it, or to build something like a lenses control panel before 12.10 comes out.

    I appreciate that Mark has put a lot of his own money and time into Ubuntu, and I am grateful that I can use it, as well as infrastructure like Launchpad. But do you think it’s OK that he can effectively bypass the normal process? Did Canonical realise before announcing it that this change would be controversial?

  • Thomas Kluyver

    There’s an obvious concern with a one-way hash: if anyone in that ‘larger group’ knows someone’s IP address, and knows the hash algorithm, they can easily see what they have entered. That can be avoided if it’s hashed with a secret random number. Is that what they’re doing?

  • http://benjaminkerensa.com/ Benjamin Kerensa

    Hi Thomas,

    So Ubuntu is a Meritocracy and at the top of the governance and development process sits Mark. As such I do not think the Feature Freeze necessarily applies to him. Whether the feature went through the process of seeking a freeze exception that I do not know… I understand there are plenty upset about the feature but lets remember also how disappointed plenty of people were with Unity but then 12.04 dropped and things were polished and many people who left said “I cannot believe I left”.

    We are all human beings and make mistakes and the things we produce are never perfect and better yet never polished the first time round.

    I think the plaintext issue will be sorted before release and I think come 13.04 we are going to see more shopping providers and a even more improved shopping lens experience.

    Just give it a chance… Stick around for awhile… The Ubuntu Community values all of its contributors and users.

  • Chas. Owens

    It is useless to hash an IP address (even if it is salted). Even with my anemic netbook using Perl it would only take around four hours to brute force the entire search space (there are 232 IP addresses and I can run 80,000 sha256 hashes per CPU per second). Once I have a lookup table, I can reverse the hash at will. I guess if you generated a random salt everyday and threw it away (i.e. didn’t record it anywhere) it would add some security.

  • Chas. Owens

    That should be 2 to the 32, not 232, the caret was eaten by the disqus.

  • Michael

    What about doing something as simple as publishing the source code of the server side part ? After all, if we say that free software improve trust, maybe the obvious solution is to simply follow free software philosophy.

  • http://jeremy.bicha.net/ Jeremy Bicha

    There is a feature freeze exception process; the late feature has to be approved by the Release Team. This was also a user interface freeze exception so it had to be approved by the docs and translations teams. And it was also a main inclusion request so it needed a security review. So either the Release Team or the Security Team could have rejected it if they weren’t comfortable with it. The freeze exception bug is http://pad.lv/1053470

  • Thomas Kluyver

    I’m not going to leave Ubuntu over this issue, but I’m still not comfortable about it. I’ll be uninstalling the shopping lens until there’s a better way to control what the home lens does.

    There was a feature freeze exception, as Jeremy says, but it was marked ‘Critical’ and approved within one day, despite one commenter noting the lack of a test suite. I don’t feel that it was held to the usual standards expected for a freeze exception, especially for such a controversial change.

  • Michael

    Sorry, but if the governance is “the one who invest more money has more right than the others”, this is not meritocraty, this is plutocracy.

    ( and I am sure that explain the low number of external contributers to Ubuntu when we compare to Debian or Fedora, despites Jono efforts to fix the mess )

  • Anonymous

    They could, and perhaps should, that would be nice. Won’t make the slightest bit of difference to the privacy situation though.

  • http://twitter.com/lukegmatthews Luke Matthews

    Local search terms leaking private information to Canonical/Amazon is, IMO, the biggest privacy issue with this whole thing.

  • k1fri

    i had to sleep over it for a night. i like ubuntu. i don’t see an alternative to it right now. i’d love to see canonical creating some revenue with it. but i somehow feel that sending dash inputs to canonical and amazon is violating my privacy (even if you promise you do not track). i am going to stay with ubuntu, but this feature will be switched off on all the machines i have immediately when it hits my desktop.

    i work as an accountant, we’re legally sworn to secrecy. most of our machines run on win 7, but we do have some ubuntu-vms. and now, for example, if i was looking for a document named “client xxx, very secret stuff” all that information would be sent to canonical? that wouldn’t even be legal in germany. very big no go.

    PS i kind of like amazon. i have to admit i buy my music at amazon but i’d never ever buy a book or hardware at amazon, because i like to support local businesses (for music that’s not possible anymore)……monopolies aren’t to cool either.

  • Sicofante

    Explanations by Mr. Lenton just confirm the original fears: my OS is sending bits of information of my local searches to third parties on the internet without my express consent.

    This lens should be opt-in. There’s no other way around it if Ubuntu wants to keep its reputation. As it is now, I still call this adware/spyware. (Is this a first in FLOSS? Just wondering.)

  • Michael

    I suspect people would be more relaxed if they see that the application to process logs to improve matching is not doing something with their privacy. Another step would be to publish the puppet configuration ( minus password, but that’s easy to do ) and/or the list of person who have access to it. That’s infrastructure, there is nothing magical about it ( and the same reason to have free software apply to configuration )

  • Anonymous

    ah OK, fair point that would be a mild improvement, but to be honest I really am not very concerned about Canonical or Amazon. I think the architectural problem of all lenses being allowed to listen to all search queries is something that needs solving or third party lenses are more dangerous than they should be. Even lenses that don’t populate results on the global search page can listen to global searches and do stuff with it right now.

  • http://benjaminkerensa.com/ Benjamin Kerensa

    It is not about his financial investment its about how he leads.

  • Michael

    That’s interesting, so basically, if unity start to be used on TV or Phone, any “value aded” lens added by OEM ( because if the plan to get money with it work, i fail to see why wouldn’t any OEM replicate it ) could disclose much more information to a remote serve without much oversight ?

  • Michael

    He leads because that’s the rules that were decided by himself. In sabdfl, AFAIK, SA stand for self appointed. Again, that’s not what I call meritocratic. Or if you say that the community is based on merit, but that the rules doesn’t apply to him ( bypass feature freeze, no need to show the same amount of merit to be at the top ), then he is not stricto sensu part of the community. And therefor, the community is not in charge of Ubuntu.

  • Anonymous

    yeah, or worse. I could write a really attractive and innocent sounding lens, maybe a lens that searches for pictures of kittens or something. In the background it would be listening for everything typed in to the search box and sending it to somewhere really evil. I have yet to work out what evil you can do with stuff people type in the search box before hitting return though, it is mostly going to be stuff like “ter” or “rhyth” or “writ” or “ged”

  • dakira

    @Jono have you seen this post? https://perot.me/ubuntu-privacy-blunder-over-amazon-ads-continues What is your reaction?

  • http://benjaminkerensa.com/ Benjamin Kerensa

    Amazon Associate Terms of Service would not allow their API to be used in connection with any application running on a Tablet or Smartphone.

  • http://benjaminkerensa.com/ Benjamin Kerensa

    He didnt bypass the feature freeze in fact it was not even him who pushed the branch? The Unity Team went through all the proper procedures the Release Team has in place to get a exception with every other team from Docs to Translation giving it a +1.

  • http://metin2wiki.ru CSRedRat

    Good news.. ;)

  • Luís de Sousa

    Dear Jono,

    My interpretation of the data protection law is that the act of collecting search terms associated with the IP address cannot be made without user consent. Beyond this, the act of collecting all searches, irrespective of the user’s intent may be outright illegal, since it may contain information the user would otherwise never disclose. Read more here:

    http://attheedgeoftime.blogspot.com/2012/10/legal-questions-on-ubuntus-shopping-lens.html

  • bleepingfurious

    JONO you sucking criminal pinwit. just because google tracks requests does not mean Dash get to intercept or divert these searches. nor to intercept record and forward keystrokes entered by users when operating their computers by kelogging an essential program used to find programs o the machine. this practice is spyware illegal and if you were in US I would shove a fat lawsuit or criminal complaint up your sorry chickenshit terrorist ass. Considering it anyway. I’m past being “nice” because this is wayyyyy past uncivilized conduct, even for a “free” product. Your privacy policy notice and switch is particularly obscure, hidden and inadequate, just as obviously intended. Suggest Fire the person who authorized… with tar and real fire maybe best, just as demo of good faith in desperate bid to regain credibility and quickly disappearing trust by users. Me now dumping Ubuntu 13.04 fast as speeding bullet because careful honest folk cannot apparently trust you mfs. Will try other distros and badmouth Ubuntu (which I previously recommended… now shuddering at my own lapse) If I see any more of this CS I will make it my mission to wreck yo Canonical choo-choo, pinwits, unto the third generation even. Ah has spoken. Hv a nice day.