I was playing around with Lemmy statistics the other day, and I decided to take the number of comments per post. Essentially a measure of engagement – the higher the number the more engaging the post is. Or in other words how many people were pissed off enough to comment, or had something they felt like sharing. The average for every single Lemmy instance was 8.208262964 comments per post.

So I modeled that with a Poisson distribution, in stats terms X~Po(8.20826), then found the critical regions assuming that anything that had a less than 5% chance of happening, is important. In other words 5% is the significance level. The critical regions are the region either side of the distribution where the probability of ending up in those regions is less than 5%. These critical regions on the lower tail are, 4 comments and on the upper tail is 13 comments, what this means is that if you get less than 4 comments or more than 13 comments, that’s a meaningful value. So I chose to interpret those results as meaning that if you get 5 or less comments than your post is “a bad post”, or if you get 13 or more than your post is “a good post”. A good post here is litterally just “got a lot of comments than expected of a typical post”, vice versa for “a bad post”.

You will notice that this is quite rudimentary, like what about when the Americans are asleep, most posts do worse then. That’s not accounted for here, because it increases the complexity beyond what I can really handle in a post.

To give you an idea of a more sweeping internet trend, the adage 1% 9% 90%, where 1% do the posting, 9% do the commenting, and 90% are lurkers – assuming each person does an average of 1 thing a day, suggests that c/p should be about 9 for all sites regardless of size.

Now what is more interesting is that comments per post varies by instance, lemmy.world for example has an engagement of 9.5 c/p and lemmy.ml has 4.8 c/p, this means that a “good post” on .ml is a post that gets 9 comments, whilst a “good post” on .world has to get 15 comments. On hexbear.net, you need 20 comments, to be a “good post”. I got the numbers for instance level comments and posts from here

This is a little bit silly, since a “good post”, by this metric, is really just a post that baits lots and lots of engagement, specifically in the form of comments – so if you are reading this you should comment, otherwise you are an awful person. No matter how meaningless the comment.

Anyway I thought that was cool.

EDIT: I’ve cleared up a lot of the wording and tried to make it clearer as to what I am actually doing.

  • JubilantJaguar@lemmy.world
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    1
    ·
    6 days ago

    that could be because it is an AMAZING post – it covered all the points and no one has anything left to say

    Finally, I know why.

    • Otter@lemmy.ca
      link
      fedilink
      English
      arrow-up
      12
      ·
      6 days ago

      This does happen with comments sometimes. I go into a post and someone has already eloquently said what I would have said (often better than I would have). So I upvote it and move along

  • ArtificialHoldings@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    5 days ago

    Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

    Not entirely sure how this applies to the discussion, it just came to mind lol

  • ERROR: Earth.exe has crashed@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    6 days ago

    Average Fediverse Experience:

    Post comment

    Waits 24 hours

    zero replies

    zero votes

    not even a downvote

    check post viewed from other instances

    can’t find the comment

    realizes that the comment never federated

    now too much time has passed since the original time of the post, and the joke you commented is no longer funny anymore

    😭

  • foggy@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    edit-2
    6 days ago

    No, you did your math wrong

    Also, something about politics.

    (Just kidding. This is neat 😎)

    • Remember_the_tooth@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 days ago

      Thanks. That was the toxicity I was expecting. Even if it’s not sincere, I appreciate it. I’ve been kinda withdrawing after switching to Lemmy, and I really needed a dose of Reddit hostility.

  • TropicalDingdong@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    6 days ago

    So I modeled that with a Poisson distribution, and I learnt that to a 5% significance level, if your post got less than 4 comments, that was statistically significant. Or in other words – there is a 95% probability that something else caused it not to get more comments. Now that could be because it is an AMAZING post – it covered all the points and no one has anything left to say. Or it’s because it’s a crappy post and you should be ashamed in yourself. Similarly a “good post”, one that gets lots of comments, would be any post that gets more than 13 comments. Anything in-between 4 and 13 is just an average post.

    So, like, I do have a background in stats and network analysis, and I’m not sure what you are trying to say here.

    if your post got less than 4 comments, that was statistically significant.

    Statistically significant what? What hypothesis are you testing? Like, how are you setting this question up? What is your null?

    Because I don’t believe your interpretation of that conclusion. It sounds like mostly you calculated the parameters of a poisson and then are interpreting them? Because to be clear, thats not the same as doing hypothesis testing and isn’t interpretable in that manner. Its still fine, and interesting, and especially useful when you are doing network analysis, but on its on, its not interpretable in this manner. It needs context and we need to understand what test you are running, and how you are setting that test up.

    I’m asking these questions not to dissuade you, but to give you the opportunity to bring rigor to your work.

    Should you like, to further your work, I have set up this notebook you can maybe use parts of to continue your investigations or do different investigations.

    • Agosagror@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      6 days ago

      Oh yeah ok, so I was going to figure out to put “H0 : L = 8.2”, and “H1 != 8.2, X~Po(8.2), P(c<=X<=c2) => c=?, c2=?” but I left it out because I couldn’t format it in a way that looked half decent in a Lemmy post.

      I found the critical regions of the Poisson distribution, that takes the mean to be the average comments/post for the fediverse. I then interpreted those numbers, which I where I assume I’ve made a mistake. As if it was outside of the critical region, that would mean H1, but we know H1 is wrong, since we already have a value for L. It sounds like your interpretation of what I did is bang on. Yeah I get that it isn’t a hypothesis test, but at the level of my stats exams - finding the critical regions was 99% of the work in a hypothesis test.

      I only took college level statistics like I said in another reply. I just thought it was cool to see all the instances comments/post ratio. It doesn’t help that my stats teacher was the most boring man alive, and I was always much preferred the pure side of the maths course.

      • TropicalDingdong@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        6 days ago

        So lets just cover a few things…

        Hypothesis testing:

        The phrase “if your post got less than 4 comments, that was statistically significant” can be misleading if we don’t clearly define what is being tested. When you perform a hypothesis test, you need to start by stating:

        Null hypothesis (H₀): For example, “the average number of comments per post is λ = 8.2.”
        
        Alternative hypothesis (H₁): For example, “the average number of comments per post is different from 8.2” (or you could have a directional alternative if you have prior reasoning).
        

        Without a clearly defined H₀ and H₁, the statement about significance becomes ambiguous. The p-value (or “significance” level) tells you how unusual an observation is under the assumption that the null hypothesis is true. It doesn’t automatically imply that an external factor caused that observation. Plugging in numbers doesn’t supplant the interpretability issue.

        “Statistical significance”

        The interpretation that “there is a 95% probability that something else caused it not to get more comments” is a common misinterpretation of statistical significance. What the 5% significance level really means is that, under the null hypothesis, there is only a 5% chance of observing an outcome as extreme as (or more extreme than) the one you obtained. It is not a direct statement about the probability of an alternative cause. Saying “something else caused” can be confusing. It’s better to say, “if the observed comment count falls in the critical region, the observation would be very unlikely under the null hypothesis.”

        Critical regions

        Using critical regions based on the Poisson distribution can be useful to flag unusual observations. However, you need to be careful that the interpretation of those regions aligns with the hypothesis test framework. For instance, simply saying that fewer than 4 comments falls in the “critical region” implies that you reject the null when observing such counts, but it doesn’t explain what alternative hypothesis you’re leaning toward—high engagement versus low engagement isn’t inherently “good” or “bad” without further context. There are many, many reasons why a post might end up with a low count. Use the script I sent you previously and look at what happens after 5PM on a Friday in this place. A magnificent post at a wrong time versus a well timed adequate post? What is engagement actually telling us?

        Model Parameters and Hypothesis Testing

        It appears that you may have been focusing more on calculating the Poisson probabilities (i.e., the parameters of the Poisson distribution) rather than setting up and executing a complete hypothesis test. While the calculations help you understand the distribution, hypothesis testing requires you to formally test whether the data observed is consistent with the null hypothesis. Calculating “less than 4 comments” as a cutoff is a good start, but you might add a step that actually calculates the p-value for an observed comment count. This would give you a clearer measure of how “unusual” your observation is under your model.

        • Agosagror@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          6 days ago

          Look, I survived statistics class. I will stride to defend some of my post.

          but it doesn’t explain what alternative hypothesis you’re leaning toward—high engagement versus low engagement isn’t inherently “good” or “bad” without further context.

          Namely that much of the aim of it was to show that an metric like comment count doesn’t imply that it was a good or bad post - hence the bizarre engagement bait at the end. And also why all of the “good posts” were in quotes.

          you might add a step that actually calculates the p-value for an observed comment count. This would give you a clearer measure of how “unusual” your observation is under your model.

          I’m under the impression that whilst you can do a Hypothesis test by calculating the probability of the test statistic occurring, you can also do it by showing that the result is in the critical regions. Which can be useful if you want to know if a result is meaningful based on what the number is, rather than having to calculate probabilities. For a post of this nature, it makes no sense to find a p value for a specific post, since I want numbers of comments that anyone for any post can compare against. Calculating a p-value for an observed comment count makes no sense to me here, since it’s meaningless to basically everyone on this platform.

          Using critical regions based on the Poisson distribution can be useful to flag unusual observations. However, you need to be careful that the interpretation of those regions aligns with the hypothesis test framework. For instance, simply saying that fewer than 4 comments falls in the “critical region” implies that you reject the null when observing such counts

          Truthfully I wasn’t doing a hypothesis test - and I don’t say I am in the post - although your original reply confused me - so I thought I was, I was finding critical regions and interpreting them, however I’m also under the impression that you can do 2 tailed tests, although I did make a mistake by not splitting the significance level in half for each tail. :(. I should have been clearer that I wasn’t doing a hypothesis test, rather calculating critical regions.

          It doesn’t seem like you are saying I’m wrong, rather that my model sucks - which is true. And that my workings are weird - it’s a Lemmy post not a science paper. That said, I didn’t quite expect this post to do so well, so I’ve edited the middle section to be clearer as to what I was trying to do.

          • TropicalDingdong@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            6 days ago

            Well I appreciate the effort regardless. If you want any support in getting towards a more “proper” network analysis, I’ve dm’d you a link you can use to get started. If nothing else it might allow you to expand your scope or take your investigations into different directions. The script gets more into sentiment analysis for individual users, but since Lemmy lacks a basic API, the components could be retooled for anything.

            Also, you might consider that all a scientific paper is, at the end of the day, is a series of things like what you’ve started here, with perhaps a little more narrative glue, and the repetitive critique of a scientific inquiry. All scientific investigations start with exactly the kind of work you are presenting here. Then you PI comes in and says “No you’ve done this wrong and that wrong and cant say this or that. But this bit or that bit is interesting”, and you revise and repeat.

  • iAvicenna@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 days ago

    I think one needs to include parameters like how soon after the topic was created the comment was made and how deep is it in the comment tree. If you for instance consistently comment on 1 month old topics or reply on comments ten levels deep you will get very few interactions.

    • Agosagror@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 days ago

      Well exactly, I’ve said this elsewhere in this thread, this was mostly something that I thought was cool. That said I might try and figure out how to include that data, if I can find it.

  • empireOfLove2@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 days ago

    The other chance that you got no comments on your post for is that you are banned from the remote instance/community, or federation is broken (still happens intermittently).

    Lemmy will still allow you to post from your home instance since you are not banned there, but your content will simply get black-holed by the remote instance if you’re banned there. Sometimes you have to check the remote instance directly to see if your post was federated or not.

      • empireOfLove2@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 days ago

        You can just check the modlog of your local instance and search for your own username. Most of the time the ban action will federate (but again, sometimes not, never really sure why). If nothing shows up locally check the modlog of the remote instance you’re trying to post to.