Yup. There are dumps of Reddit’s entire archive of comments and posts available via torrent, I suspect the only reason Reddit’s getting paid for that stuff right now is that it’s a legal ass-covering that’s comparatively cheap. Anyone who’s a little daring could use it to train an LLM and if they prep the data well enough it’d be hard to even notice.
And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.
There have been many people raging about their content being used to train AIs without permission or compensation. I’m speaking to those people, not the “fediverse collectively”. As you suggest, the fediverse can’t say anything collectively.