Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because Google's robots.txt disallows it, and those websites allow it.


robots.txt is not a legal contract. It's just a convention to express the wishes of the site author, but there's no legal obligation to follow these wishes.


It does indicate that those other sites want Google to scrape them, while Google does not want others to scrape their results, which is an important distinction ocdtrekkie ignored for whether the scrapee will want to take legal action.


You may wish to review https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...

Google Search results are definitely "public data" so long as Google provides them to anyone who asks.


Then why does Startpage pay Google and DDG pay Microsoft?


While scraping search results isn't illegal, by any means, it's also not illegal for Google or Microsoft to block requests they believe are from competing search engines. Presumably the cost of paying them is less than the cost of hiring engineers to constantly try to find new ways to outwit Google and Microsoft engineers.

Again, if scraping data from websites without permission, Google simply wouldn't exist. Bear in mind, robots.txt is a feature that Google and Microsoft choose to respect, but the default assumption search engines have made from the beginning, is that they are free to grab whatever they want from the web, unless you ask them otherwise to please not.


> the default assumption search engines have made from the beginning, is that they are free to grab whatever they want from the web, unless you ask them otherwise to please not.

Which Google's robots.txt does.

> scraping search results isn't illegal, by any means

While scraping the results for yourself to look at might be OK, scraping results to display verbatim in another search engine without permission stretches fair use.


> While scraping the results for yourself to look at might be OK, scraping results to display verbatim in another search engine without permission stretches fair use.

No, it doesn't, because Google results aren't copyrightable, hence, there is no such thing as fair use. It's just information anyone is free to collect and use as they see fit.


Why would rankings not be copyrightable?


Why would they be? Again, if all things being copyrightable by default, Google could not even exist, they assume they have the right to consume any data they want.

If a monkey can't copyright a selfie because they're not a person, an algorithmically generated spew of stuff Google ripped from elsewhere certainly lacks merit for copyright.


All things are copyrighted by default. Once again, those websites grant a license to search engines to consume their content via robots.txt, and Google does not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: