Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like it's pulling characters from the paragraph to generate the "unique" paragraph ID. ID = First letter from the first 3 words in the first sentence in the paragraph + First letter from the first 3 words in the last sentence in the paragraph.

I wonder... for all the different articles on NYTimes, and the different configurations of words across paragraphs, is this unique enough such that you won't get duplicate paragraph IDs in any given article?



It only has to be unique within the article, since it's added to the article path, and there would likely be some kind of provision to add or swap out for a unique character in case of conflict. It's also case-preserving, so that implies likely case-sensitivity as well. I guess we'll have to find an instance of two - probably single-sentence - paragraphs with the same characters and same capitalization in the same story to be certain.

Not it!


Especially because it works in exactly the way you specify even when there’s only one sentence in a paragraph. So the paragraph:

That was too much for the water district’s attorney.

And:

They were torn apart by angry ducks.

Will both hash to “TwtTwt”. One-sentence paragraphs are probably deprecated in the NYT’s style guide anyway, but I imagine it might still come up.


One sentence paragraphs still happen but it still works :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: