Oh, come on! The whole idea of search engines is to rely on someone else's knowl...

ghshephard · on Feb 4, 2011

Re: using someone elses's knowledge - Google honors robots.txt, so, if you don't want them to use your link knowledge, you can tell them not to.

I'm not sure there is any way for Google to Tell MSFT to not use their search results in the Bing search engine.

If Microsoft had just been a bit more upfront with the fact that they were using IE user's google clicking behavior to improve their search engine, I suspect there would have been much less furor.

dilap · on Feb 4, 2011

I don't know, doesn't Google harvest the content of anything I send to an @gmail.com address? Or how about the massive scanning of books over the objections of book publishers a while back (though I think this ultimately got resolved)? My general impression of Google, and part of why I find their crying foul so jarring, is that they will harvest as much information as they possibly can, to whatever end, until restrained by public outcry. (Which, BTW, I'm fine with!)

But it seems very hypocritical to get butt-hurt on behalf of Bing toolbar users who are having their movements tracked.

(BTW, I don't find Google's observance of robots.txt to be particularly compelling or telling, because they've never been in a position where ignoring it would be significantly beneficial to them, as far as I know.)

ghshephard · on Feb 4, 2011

Google was pretty clear that they would be harvesting the content of your gmail to target ads. The different here, is that MSFT did this "Google-search-Click-Tracking" thing on the down low.

I agree with you though, if I, as an IE9 user wish to submit my click track results to MSFT for analysis so they can improve search results - that's fair game.

But, It's not clear to me that MSFT should be able to review what the user was searching on before they clicked on that data. Now they are actually using Google's search Data + the user's click traffic. I think they cross a line there, particularly if they aren't willing to come clean and admit that's what they are doing, and make it clear that they are sending your Google Search queries + your click traffic back to Redmond.

How many people on HN were aware that Microsoft was doing that with IE? Click Traffic, sure - But I didn't know they were sending my Google Search queries back to HQ.

SimonPStevens · on Feb 4, 2011

Bing didn't crawl Google's pages (so never reads a robots.txt file), they are collecting click stream data from users via the Bing toolbar as they browse a page.

It's voluntary, you can opt out of the anonymous data reporting. The data they are using belongs to the user, so it's the user that can opt out, not Google.

gojomo · on Feb 4, 2011

"I'm not sure there is any way for Google to Tell MSFT to not use their search results in the Bing search engine."

Google could send a DMCA Takedown or other Cease-and-Desist letter.

grovulent · on Feb 4, 2011

Address the argument.

The intuition is that if a method allows you to build a search engine without actually knowing anything (besides the method itself) - then that method is piggy backing off someone else's work.

Yes - google piggy backs off other people's work - i.e. linking to other sites to indicate quality. But this is not piggy backing off a search engine. It's not piggy backing off the technical work that someone else developed for a search engine product. So it's not a counter example to my argument.

dilap · on Feb 4, 2011

We agree, in a way, but disagree about the severity of the distinction. Google can't build a good search engine w/o mining an existing network of interlinks between sites to build a good measure of quality. Bing can't provide results as good (or perhaps "good" -- "torsoraphy", "hiybbprqag"...) as they currently do w/o watching what users click on, which includes Google search results. You're saying (if I follow you) that what Google does is fair because they always transform the form of their input, whereas Bing, in some instances, does not transform their input -- they use search engine results to provide search engine results. OK, fine. But I still don't really see the outrage.

I don't care about piggy-backing or fairness; I care about maximum public good.

So here's the situation. Google wants Bing to stop harvesting its results. Why? Very broadly, there are two scenarios:

- Bing is adding value - Bing is not adding value

In the latter scenario, Google should not care. No one will use Bing, and it's a non-issue.

In the former scenario, it's not clear at all to me that Google should have a monopoly on the information entered and sites visited of people that happen to visit their site. Google is trying to assert ownership, in a fashion, of the link between the users query and the site they visited, because the site they visited happened to be shown to them by Google. I see no reason to grant them this right.

The fact that Google acts outraged be Bing's behavior strikes me as particularly rich, given that Google itself is a notorious and cavalier harvester of data others would consider "theirs".

(Edit: This is pure, uninformed speculation, but I wonder if Google's (to me) odd outrage is because they place such a premium on algorithms over human-generated content. I.E., the idea is it's fine to harvest other people's human-generated work ("information wants to be free"-style), but harvesting other's algorithm-generated content is verboten, because that's essentially stealing the algorithm. I have no inside knowledge, but this would agree w/ the pop-culture characterizations of Google.)

grovulent · on Feb 4, 2011

It doesn't improve the common good.

Bing is not adding value with this method - they are thieving value. By thieving it they reduce the reward for the value that google provides. As such they reduce the incentive to provide genuine innovation.

Having said that - I don't necessarily disagree with the view that Google's position is a bit rich given the data that they do harvest from us folk without remunerating us for that work. But that doesn't make the argument against Bing any weaker. Two wrongs n rights n all that.

true_religion · on Feb 4, 2011

What data does Google harvest without renumeration?

When it comes to PageRank, the implicit offer is: Allow google to use your links (intra-site and extra-site) in its algorithm, and it will make your site searchable via Google.

If you don't want Google to use your 'hard work' of collating and vetting links to other sites, then disallow the Googlebot via robots.txt

On the other hand, Microsoft refuses to allow anyway to disallow click traffic patterns involving your site to be used in its algorithm. They are thus mining the links from your site to another without even having to renumerate you by giving you a chance to be indexed in their engine.

grovulent · on Feb 4, 2011

That's a good point. I didn't really have a clear view on that matter. I just wanted to emphasise that however that discussion bore out - it is irrelevant to the argument against Bing.

wnoise · on Feb 4, 2011

> Microsoft refuses to allow anyway to disallow click traffic patterns involving your site

Because that information doesn't belong to the site to which it is associated, but to the user.

srean · on Feb 4, 2011

In my opinion there is a difference.

A webpage author links to another as an act of conscious recommendation. The one who added the link added it for the sole purpose of others to make use of it. It would be a stretch to claim that Google serves search results for its competitor to make use of.

Next is the issue that Bing is not scraping Google but using user clicks. But here users are just a means to an end of scraping. That someone is doing an act by way of third parties does not take away from the fact that one is still doing it.

I am all for having someone give Google the run for its money, but I want that to be driven by genuine technological innovation. Not by being a El-Cheapo knockoff of the market leader. Some search engines are beating Google in niche markets by being better than Google, which is excellent.

I dont think, Bing re-serving Google results, brings any innovative pressures to the market. Especially when you know that whatever innovations one brings it will get replicated by piggy backing. I wouldnt want to be in a business where this is true.

What worries me is that the main players will start engaging more in how to inconvenience each other rather than building better products. I have been fearful of the fact that Microsoft would someday tweak IE so that Google does not work well on it. They have done this for a few other sites but haven't done it to Google. I think initially in fear of starting a race, but now with other browsers beginning to rule the roost thats moot.