You forgot the last point, which is that creators who don't want their work used in the training data for these megacorp LLMs without permission will get what they want.
Try training an "open" model on Nintendo's, Disney's, and Elsevier's IP and see how long it takes them to bury you in lawsuits citing copyright infringement. The only way out of this would be to abolish copyright.