Hacker Newsnew | past | comments | ask | show | jobs | submit | malwrar's commentslogin

I do find it hard to tolerate the feeling of being watched online. The second-most trending dataset on huggingface right now is a snapshot of HN updating at a 5 minute interval. It makes me not want to really comment at all, just like how I don’t really publish any software I write anymore.

Turns out it sucks to produce original works when you know that, whereas previously a few people at best might see your work, now it’s a bunch of omniscient robots and maybe half of those original people are using the robots instead.


This is really interesting to me, because it never occurred to me to feel this way. Why would I care whether my comments are ending up in some dataset somewhere that's being used to train some model? My comments are boring and mostly uninformed. Have at it.

I'm curious: would you say the feeling of being watched online is making you afraid of some repercussion, or is it something else?


Dog in the Manger.

I get a feeling from overall anti-AI sentiment online that a lot of people feel they're entitled to 100% of value created by anything even tangentially related to their person, whether that's some intentional contribution or a random brain fart that happened in the vicinity of someone else doing something useful - and then become resentful they're not "getting their share".

There's hardly any other way to read all the proclamations of quitting to do anything because "cognitive dark forest" (itself a butchering of the original idea of "dark forest" across so many orthogonal dimensions in parallel, that it starts to look like a latent space of a transformer model).


Conversely, some people feel entitled to 100% of the value created by others. Oh, you wrote a book? Too bad, it's a part of my training data set now.

Downloading public stuff off the internet with no regard for the creator's wishes or license is bad enough, but we have many people here who defended AI companies seeding models with pirated content.

The internet is a social contract. AI is not the first thing to try and erode it for profit, but it's by far the most aggressive one.


Putting a book into a training data set does not take 100% of the value created by the author. You could make a convincing argument that since the LLM was never going to purchase the book, and the number of people who would have purchased the book but now won't because it's included in the training data is effectively zero, that no value was lost at all.

Licenses are legal documents and are usually treated as such, but "the creator's wishes" are irrelevant without case law, legislation, or licensing to back it up. And jurisdiction - show me a license that doesn't stand up in court in my home jurisdiction and I'll show you a license I won't care if I break or not.


Let's not forget the basis here: To promote the progress of science and the useful arts.

Everything else is window dressing. The fact that licenses even exist to conditionalize use goes against this grain and creates far too much overreach that spoils the spirit of the basis of copyright law.


I don't like the idea that I'm restrained by intellectual property laws, but that other powerful entities are not. That is fundamentally unfair.

> I get a feeling from overall anti-AI sentiment online that a lot of people feel they're entitled to 100% of value created by anything even tangentially related to their person

Rather, I don't like that the terms I released my work under aren't being respected. I believe LLMs are derivative works of the pieces they are trained on. I spent more than ten years working on open source code, and now the models that were trained on my GPL'd code are being used to make proprietary code against the terms of the license. I find this reprehensible.

While it wasn't an explicit term of release, generally I did not expect anyone to get any kind of financial value from the blog posts I wrote. I just wrote them for fun & maybe others would find them interesting. Now, LLMs have been trained on my blog posts and are generating financial value for some of the worst human beings on the planet who are using their money to murder, demean, and maim other humans.

I now know that blog posts I wrote for fun are putting money in some sociopath's bank account, and the GPL'd code I wrote is being used to create software to exploit me & other users. If I continue to create things publicly, it will be used against me and other people, and there's nothing I can do to stop it except to stop creating things. It's all very disrespectful & demoralizing.


> I believe LLMs are derivative works of the pieces they are trained on

That's your opinion with 0 legal backing. IMO, calling them derivative is untenable logically for anyone with some understanding of LLM/transformer architecture.


You desire a sharing community, but the takers/defectors are destroying that community.

Copyleft attempts to create a pool of code that forces sharing. But it broadly fails because you simply can't force antisocial people to be good sharers (plus source code usually isn't as valuable as we hope).

With any gifting/sharing, you have to accept that some of it will be abused. It is hard to filter for only community minded people who don't greedily abuse, and ideally who give freely.

I don't believe my circle of friends are becoming more selfish. I'm unsure what I would say about the rest of the world.


I am in exactly the same boat, down to the ~10 years. Only difference is I ended up picking AGPL for my later works. Like it made a difference...

The whole situation disgusts me.

- They expect me to pay for access to my own stolen code.

- Arguing stealing should be legal because China does it and if US companies don't, they'll be left behind.

- People like the poster you're replying to who argue you're not entitled to 100% of the value you create - completely ignoring that the value will go to some-one and that some-one is already much richer than any of us and getting richer faster while providing less value, if any. Honestly, this makes me wanna track these people down just to find out if they're also in the owner class and are just secretly laughing at us while pretending "we're all equal" or if they're workers who genuinely don't understand how much they're being exploited and how much worse it's gonna get.

- People don't give a fuck. Colleagues happily using "AI" because it "saves time", not realizing if this continues, we'll all be without jobs and the only way this was possible was by stealing from each other and most of us being OK with it.

Honestly, I am hoping for a revolution. A proper one, with guns if need be, but most importantly, where people get what they deserve in full.

Last time this happened was during the second industrial revolution, so many people got fucked so hard, entire countries turned to communism. That was a bad idea but we can do better. It's not (just) about how owns the means of production but who owns the product. Even if "AI" turns into actual AI, as long as it's built on top of our work, we should own it - that means both controlling it and getting paid proportionally to our contribution.

The currently rich people can negotiate what fraction they get paid if they show us they're providing value. Of course, only after we get back what they stole and unless they end up executed. The value of a human life is apparently $7.5M so anybody who steals more than that should logically get a death sentence.

But none of this will happen, people are too stupid and will get manipulated by a charismatic liar like every single time before.


How did they steal your code? Don't you have backups?

What do you think is the end state? What will society look like 5 or 15 years down the line if somebody creates actual AGI, according to you?


So who should the value go to?

Whoever can materialize it. That's how societies grow and thrive, how a civilization is built - people building things, and instead of capturing 100% of the value, creating a surplus for others to build on top.

It's not like any of us ever did anything completely new, isolated and unaffected by influences and contributions of those around us, and those who came before us. Trying to capture 100% of the value and getting up in arms about "freeloaders" is a deeply antisocial form of greed, and usually the thing people accuse companies of doing, claiming it's a hallmark of "late stage capitalism".


So you're saying that the most advantaged people (who control the most money to use for advertising and who can buy companies and their network effects at will) should get the most benefit?

> how a civilization is built

No, civilization is built by people who do actual work. Some of that work is services/research and building/maintaining stuff, some of that work is connecting supply and demand. The reward should go to the people doing the work according to how much work they do and their skill level.

> late stage capitalism

Nah, that's the idea that money should be able to create more money without any input of work. And before you say they made that money through work, no they didn't, they either inherited it or got into a position of power from which they can take a disproportionate cut.


There’s definitely a fear of repercussions (I’ve been commenting on this site for over a decade now! Who knows what’s in my history...) but importantly I actually take some pride in many of the comments I write. What drew me to this site originally was how high quality everyone’s perspectives and articulation was, and I suppose I view the writing voice I’ve nurtured here as unique and special to me. It’s not about compensation, I’d just hate to see some future chatbot sound 1/1,000,000th like me I guess? Hard feeling to describe, but I’d rather just not be globbed in and instead express myself in ways that aren’t profitable or feasible to copy.

I see this take a lot and I think it's harmful in ways you might not realize.

Even if it's true and you genuinely have nothing to hide, have nothing to lose from being profiled, there are people who absolutely do.

Look at the radicalization happening in countries around the world, including the USA. It might be OK to be part of a minority or to have an uncommon opinion. A few years pass and suddenly the same person is considered an undesirable, a foreign agent, a terrorist or a deviant.

I've posted a lot of shit online which can be connected to my person and which could label me as any of the groups above. But that's a decision I make for myself. I would never dare make it for others or claim that they should not care about surveillance and take the same risks I do.

I know a guy from russia who lost his job because he expressed an antiwar opinion. The same thing can happen in the US or and other country you consider civilized. The US proto-dictator is already sending death-threats to people who only expressed the opinion that soldiers can refuse illegal orders. Neither you or me can know what will happen next.


HN always offered the data to anyone, what changes now? How does it matter if it is LLM's that is consuming your data. What a strange attitude.

HN comments have always been public, I don't really understand this thought process. The robots also aren't going to care about some individual user, it'd be more of an agglomeration of everyone's comments.

I've heard it phrased as: Would you rather LLM trained on your comments, or one trained on facebook comments.

Obviously that's a false dichotomy and a pretty defeatist attitude. But it does have a point.


I think the immediate term action is to viciously block all crawlers.

Writing a blog yes, feeding the beast no.


This sounds like a nice principled stance, but you won't get any traffic with this approach. That's demotivating - to me blogging is a tight balance of exploration, learning, improving and feedback. I'm not able to write without considering how this impacts the reader - removing all readers breaks the process for me.

Yeah, everyone went on "blocking all crawlers" end result being half of internet inaccessible over vpns. Good job, people.

> And for people who successfully taken back their creative writing skills, how did you do it?

“AI is one possible reference for my actual writing”. Generate info and perspectives, but only ever write stuff yourself. Something about this for me forces me to stay in my own “”writing voice”, at least personally, for the various places I use AI tech in. I think of the tech as a chess engine; they are better than any human player but I use them to help me gain perspective rather than cheat. Otherwise, why bother playing chess?


People have been modifying their cars since cars have existed, an electric car shouldn’t be anything new.

Given electric cars are responsible for much bigger responsibilities than combustion cars (avoid driving into that bicyclist), there are new concerns here which beg extra consideration.

I actually think we should be asking more of safety regulations here with regards to the design of electric/computerized cars.

Think of it this way: every concern you have about a teenager having root on their electric car is the same as any sociopath hacker (AI enabled for modern nightmare fuel) who finds a root vulnerability and decides to not be a good person with it. If a teenager can mess with the collision avoidance, e.g. Israel can modify it to murder anyone who talks shit about Israel in the car. Or the CIA could turn it into a weapon. Or one day some dev could push a bad OTA update. Et cetera. Our safety regulations should mandate design features to prevent a malfunctioning computer from posing any greater safety risk than any other modified part in the car.


Up until v recently cars were not remotely accessible and part of a command-and-control network which Teslas are (perhaps other modern cars are too, I only know Tesla because I have one).

I know that the car reports practically all user events to Tesla in real time over the cell network (eg, open door), and I know it has root access. I don't know if that root is available remotely and I don't know if foundational commands like steering, acceleration and brake are accessible via the CLI (they are computer controlled actions locally)

THUS I would not want to drive a Tesla if there was the possibility of all cars being rooted and remotely controlled by an unauthorized actor.


Not intentionally, but some cars have been vulnerable to remote control/hijacking since at least 2015.

https://www.wired.com/2015/07/hackers-remotely-kill-jeep-hig...


People have been killing each other with weapons for as long as they've been around, nuclear weapons shouldn't be anything new.

No one should have nuclear weapons, we aught to have robust policy, institutions, and vigilance to prevent their proliferation and use.

Computerized vehicles aught to be strictly regulated in terms of how computers may affect the physical operation of the car, such that a reasonable standard of safety can be ensured outside the usual risk one takes when hopping in a motor vehicle. The fact that a hacker can possibly kill people by rooting an infotainment system is a symptom of the general disregard for security in design, and we continue to ignore it for engineering expediency.


You can’t really avoid paying for security, which seems to historically be why it is ignored and risked. I’ve always felt the right approach is for an internal security & reliability org be formed to provide an owner and maintainer for core services and libraries, so that things are built robustly from the get-go. Think premade formulations an integration for auth, hosting, data storage, etc. Some companies have small security teams that _kind of_ fill this role, but usually they’re a gate you must pass rather than an ally helping you navigate hard problems by providing and maintaining prebuilt solutions. I’d rather just require that normal devs not need to solve these problems and instead be provided an appropriate sandbox to deploy software in.

I’d be curious to know where you source your data from! Your project (neat idea btw) has me thinking about tracking this data for my own personal profile over time in some sort of dashboard, to see how Google’s opinion of me changes with my behavior online

The data comes straight from Google's Ad Center (myadcenter.google.com). Google shows you the interest categories and brands they've assigned to your profile. I automated scraping that page daily for each account during the experiment.

MirrorMask actually does exactly what you're describing. It scrapes your Ad Center profile before and after each session and shows you the diff. You can watch interests appear and disappear over time. The dashboard tracks your profile changes across sessions.


Thanks for the info! Time to start watching the watchers

> Isn't it a stretch to round off "trans content" to "LGBT+ content"?

Not really. Do you think the people attempting to ban trans content are otherwise fine with kids being gay/lesbian/etc? Do you think they view gay/lesbian identities as legitimate, rather than unnatural perversion? It’s the same rhetoric in my experience, we’re all just deviants making choices. It seems like casual uninvested people just got used to gays being in the public eye and anti-gay people lost the ability to get anyone to care about that position. Turns out they’re just normal people trying to live their lives.

> Immigrants being lynched is certainly a subset of "anti-immigration", but it's still misleading

I don’t think your analogy works unless you believe that transgender people are uniquely extreme compared to other identities. If true, I think that more shows your prejudice than anything. Maybe if enough trans people end up in the public eye, casual uninvested people will stop thinking negatively about trans people generally too. Maybe one day they’re realize we’re just people trying to live our lives.


I really didn’t want this to succeed, could you imagine an alternative future where people are strapping these things to their faces and immersing their full FOV in a zuck-controlled virtual shopping mall? The Facebook brand is absolutely toxic imo, I think it’s an incredibly understated reason for this product’s failure. I’d love to develop for these devices though if I could somehow avoid interacting with Meta beyond as an OEM.


You are right IMO to question why North Dakota police were able to obtain this Tennessean woman in the first place, you’d think something like that should require far more sufficient evidence than facial recognition.

But, then what good is facial recognition for? Would it have been okay for this woman’s life to have been merely invaded because she matched a facial recognition system? Maybe they can just secretly watch you so you’re not consciously aware of being investigated? Should that be our new standard, if a computer thinks you look like a suspect you can be harassed by police in a state you’ve never even been in?

I just don’t see a legitimate way for AI to empower officers here without risking these new harms. That’s why I lean towards blaming the AI tech, rather than historically intractable problems like the reality of law enforcement.


Having a facial recognition match make you a suspect and cause the police to ask you some questions doesn't seem completely unreasonable to me. Investigations can certainly begin with weak forms of evidence (like an anonymous tip), you just require a higher standard of evidence for a search warrant, surveillance, or an arrest. A facial recognition match shouldn't be probable cause for an arrest warrant, but it still might be a useful starting point for a detective looking for actual evidence.


It is absolutely not reasonable to use low-quality photos to decide someone halfway across the country with no history of even leaving their local area is 'a suspect'.


You wouldn't know they had no history of leaving their local area unless you interviewed them.


Why does not the investigator have to supply some sort of evidence that she has a history of leaving their local area rather than putting the onus on the accused? This line of argument is halfway to "guilty until proven otherwise".


You and the GP that replied to me are way overstating what it means to be a "suspect". It just means the police are investigating you and consider it a possibility you've committed the crime. On its own, is not a sufficient status to search your home, subpoena your ISP, or arrest you - all of those things require a much higher burden of evidence, and oftena third party (judge's) approval. People routinely become "suspects" on much flimsier evidence than an unreliable software match - if I call in an anonymous tip that I saw you acting suspicious near the crime scene, you will probably become a suspect.

If you'd like, you can replace the term "suspect" in my post with "person of interest", which colloquially implies a lot less suspicion but isn't practically any different in terms of how the police interacts with you.


+1 to this, anecdotally I’ve found in my own evaluations that if your system prompt doesn’t explicitly declare how to invoke a tool and e.g. describe what each tool does, most models I’ve tried fail to call tools or will try to call them but not necessarily use the right format. With the right prompt meanwhile, even weak models shoot up in eval accuracy.


> [...] _but not necessarily use the right format._

This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?


It claims that I can’t end my subscription because I signed up on another platform. How odd, once money is involved suddenly our AGI contender can’t implement basic features. Or I’m a fool somehow.


If you signed up via e.g. iOS then OpenAI literally is not allowed to manage your subscription. They do not have the capability to do so.


Is that other platform Apple?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: