Google still hasn't removed "deleted" private Docs data from 2007

magicalist · on Nov 14, 2012

I came across this old google explanation[1], but I'm not sure it (or this blogpost) are very relevant to today. Google claimed that they kept the image around because it might have been referenced in another site or something, even if the document was deleted, and they appear to still be keeping those old images around, I guess. The claim also seems to be that a cryptographic hash url is as unguessable as a password secured one, though it's not stated directly.

In any case, I actually tried it myself (gasp) with a new doc. Dragged in an image, inspected it to find the URL, deleted the doc, and then permanently deleted the doc again from the trash (I assume it hangs out there for 30 days like with gmail's trash). The image stuck around for maybe 15 minutes, but is now gone, so I don't think this applies to docs today, but I can't find any help document that says either way.

[1] http://googledocs.blogspot.com/2009/03/just-to-clarify.html

coderdude · on Nov 14, 2012

In the title, "from 2007" means the article was published on July 15th, 2007. This probably would have been pointed out sooner but the site has remained down for some time now. I'd imagine it has been down since at least the moment it hit the front page (having some faith in the initial up-voters here). That said, this would have made an excellent trap for people who vote based on title alone.

Edit:

veemjeem pointed out that he can see the site just fine, which prompted me to try it from another network. I can access the site from my connection through Verizon but the server times out through my AT&T landline connection.

ontheotherhand · on Nov 14, 2012

>> In the title, "from 2007" means the article was published on July 15th, 2007

And so is the image that is part of the private document which supposedly was deleted, but can still be accessed even today, more than five years later. Therefore the content of that article, and the evidence contained therein, actually matches the title perfectly. So what "trap" are you talking about?

coderdude · on Nov 14, 2012

The trap would serve to ensnare people up-voting an article based on the title since they cannot actually access the content to read it. It's of no consequence whether the title just happens to match what is found in the content (as far an actual trap would be concerned).

veemjeem · on Nov 14, 2012

I can read it just fine over here. I don't understand why you think it's a trap?

maybe it's just your internet connection. perform a traceroute against the host to see who's at fault.

coderdude · on Nov 14, 2012

My connection does seem to be the problem here. I wonder if it's something that only affects me or if anyone else here is having the same issue connecting to this server. Traceroute times out.

veemjeem · on Nov 15, 2012

Where does it timeout on? You can probably find out the router that is the issue. If it times out before it gets beyond your ISP, chances are it's your connection. My traceroute looks good here, so it's probably not their webserver.

ontheotherhand · on Nov 14, 2012

For me the page didn't load at first, because some script or other was on a host that didn't respond. So maybe try turning off Javascript, it did the trick for me.

Dylan16807 · on Nov 14, 2012

I don't understand. How can it be a trap if the title matches the content? If people upvote based on what they think the content is, and they are correct about what the content is, are their upvotes a mistake?

Also if you wanted to 'trap' people you could just make fake screenshots.

eproxus · on Nov 14, 2012

From the site:

  When we last checked the URL (at the time of this writing)
  12 hours passed by since we “deleted” Document1 from
  Google Docs

s_henry_paulson · on Nov 14, 2012

A devil's advocate could also say that they didn't have everything set up properly back then, and now that they have proper security in place, that it isn't possible to apply the new security to old documents because the options didn't exist at the time the documents were created.

ontheotherhand · on Nov 14, 2012

It's not possible to delete those documents because they didn't have mechanisms in place to delete them back then? It's not possible to physically delete documents that have been flagged as deleted and emptied from trash?

How about "no"?

s_henry_paulson · on Nov 14, 2012

There's no evidence that the document wasn't deleted.

It's not the document that's being accessed, but an embedded image within the document, accessed by some unique identifier.

It's entirely possible they just have this jpg saved with it's identifier, but don't have good information about the related documents that pointed to it, hence not knowing that it should be deleted.

I mean think about it, google docs STARTED in 2007 and was based entirely on products built by companies they acquired.

I am 100% positive they didn't have everything set up properly in 2007.

ontheotherhand · on Nov 14, 2012

Then simply look through all non-deleted documents and see which files are still referenced and which ones are orphans. Delete the orphans. Done.

magicalist · on Nov 14, 2012

If the "working as intended" behavior is that these images won't be deleted since they can be linked to from elsewhere on the web, as appears to be the case (see my link to the old blog post above), they actually can't delete these old images.

ontheotherhand · on Nov 14, 2012

"they actually can't delete these old images."

They can, it just would break those old links. A dumb decision ages ago doesn't force you to stick to it as you seem to imply.

beaker52 · on Nov 14, 2012

I'm not surprised. I'd doubt few readers here would be. That doesn't make it acceptable though.

I fear more for the people who've had private photographs 'automatically' uploaded to Google+ via their mobile devices. Even if they weren't posted, I bet they still exist somewhere in Googleland, just waiting for that Google intern to run them all through Google's 'safe search' filter in reverse.

DanBC · on Nov 14, 2012

Matt Cutts has responded in the comments of the linked article. (http://www.line-of-reasoning.com/issues/privacy-issue-google...)

I think Google make a reasonable point; a bit daft, but still.

Has anyone tried fusking the URLs?

crististm · on Nov 14, 2012

When Gmail did not let me create a mail folder with the same name with one I just deleted I knew they don't tell me the whole truth.

Last time I checked, it looks like they fixed this issue.

pirroh · on Nov 14, 2012

It's not about "not telling you the truth"--it's all about the inherent complexity of distributed systems. Might sound counterintuitive, but deletions are not easy to implement, and are very often deferred (obviously this doesn't apply to the image mentioned in the article).

PanMan · on Nov 14, 2012

Could it be that the image hasn't been deleted as it's referenced from this article?

erez · on Nov 14, 2012

Google, like every other company, "forgets" that when its users delete something, they want it purged from all the servers, not just marked as "deleted" in the database. While the practice is very common, it shouldn't be used when it comes to your customers, even if those are basically the product, like in Google, and others, case.

eproxus · on Nov 14, 2012

I'm a bit interested in the legal situation here. I think at least in Sweden, if you asked Google to delete a private document including images (maybe personal etc.) they'd be legally obligated / forced to do so. Anyone know if there are any privacy laws regarding this?

duskwuff · on Nov 15, 2012

I have to wonder: If these files are never deleted, could this be used as an inefficient means of storing big blobs of data online? If someone uploads illegal content to Google Docs attachments, does Google have any means of removing it at all?