Wondering about services to test on either a 16gb ram “AI Capable” arm64 board or on a laptop with modern rtx. Only looking for open source options, but curious to hear what people say. Cheers!

  • SmokeyDope@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    1
    ·
    edit-2
    25 days ago

    I run kobold.cpp which is a cutting edge local model engine, on my local gaming rig turned server. I like to play around with the latest models to see how they improve/change over time. The current chain of thought thinking models like deepseek r1 distills and qwen qwq are fun to poke at with advanced open ended STEM questions.

    STEM questions like “What does Gödel’s incompleteness theorem imply about scientific theories of everything?” Or “Could the speed of light be more accurately refered to as ‘the speed of causality’?”

    As for actual daily use, I prefer using mistral small 24b and treating it like a local search engine with the legitimacy of wikipedia. Its a starting point to ask questions about general things I don’t know about or want advice on, then do further research through more legitimate sources.

    Its important to not take the LLM too seriously as theres always a small statistical chance it hallucinates some bullshit but most of the time its fairly accurate and is a pretty good jumping off point for further research.

    Lets say I want an overview of how can I repair small holes forming in concrete, or general ideas on how to invest financially, how to change fluids in a car, how much fat and protein is in an egg, ect.

    If the LLM says a word or related concept I don’t recognize I grill it for clarifying info and follow it through the infinite branching garden of related information.

    I’ve used an LLM to help me go through old declassified documents and speculate on internal gov terminalogy I was unfamiliar with.

    I’ve used a speech to text model and get it to speek just for fun. Ive used multimodal model and get it to see/scan documents for info.

    Ive used websearch to get the model to retrieve information it didn’t know off a ddg search, again mostly for fun.

    Feel free to ask me anything, I’m glad to help get newbies started.

    • kiol@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      25 days ago

      Well, let me know your suggestions if you wish. I took the plunge and am willing to test on your behalf, assuming I can.

  • ikidd@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    25 days ago

    LMStudio is pretty much the standard. I think it’s opensource except for the UI. Even if you don’t end up using it long-term, it’s great for getting used to a lot of the models.

    Otherwise there’s OpenWebUI that I would imagine would work as a docker compose, as I think there’s ARM images for OWU and ollama

    • L_Acacia@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      21 days ago

      Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

      • ikidd@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        21 days ago

        Fair enough, but it’s damn handy and simple to use. And I don’t know how to do speculative decoding with ollama, which massively speeds up the models for me.

        • L_Acacia@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          21 days ago

          Their software is pretty nice. That’s what I’d recommand to someone who doesn’t want to tinker. It’s just a shame they don’t want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

          • ikidd@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            21 days ago

            That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

            What local models have you been using for coding? I’ve been disappointed with things like deepseek-coder and the qwen-coder, it’s not even a patch on Claude, but that damn cost for anthropic has been killing me.

            • L_Acacia@lemmy.ml
              link
              fedilink
              English
              arrow-up
              1
              ·
              21 days ago

              As much as I’d like to praise the open-weight models. Nothing comes close to Claude sonnet in my experience too. I use local models when info are sensitive and claude when the problem requires being somewhat competent.

              What setup do you use for coding? I might have a tip for minimizing claude cost you depending on what your setup is.

              • ikidd@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                20 days ago

                I’m using vscode/Roocode with Gosucoder shortprompt, with Requesty providing models. Generally I’ll use R1 to outline a project and Claude to implement. The shortprompt seems to reduce the context quite a bit and hence the cost. I’ve heard about Cursor but haven’t tried it yet.

                When you’re using local models, which ones are you using? The ones I mention don’t seem to give me much I can use, but I’m also probably asking more of them because I see what Claude can do. It might also be a problem with how Roocode uses them, though when I just jump into a chat and ask it to spit out code, I don’t get much better.

                • L_Acacia@lemmy.ml
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  edit-2
                  20 days ago

                  If you are willing to pay 10$ a month. You should get GithubCopilot, it provides near unlimited claude 3.5 usage. RooCode can hook into the github copilot api, and use it for its generations.

                  I use Qwen Coder and Mistral small locally too. It works ok, but its nowhere near GPT/Claude in terms of response quality.

  • couch1potato@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    19 days ago

    I spun up ollama and paperless-gpt to add ai ocr sidecar to paperless-ngx. It’s okay. It can read handwritten stuff okayish, which is better than tesseract (doesnt read hand writing at all), so I throw handwritten stuff to it, but the difference on typed text is marginal in my single day I spent testing 3 different models on a few different typed receipts.

    • kiol@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      18 days ago

      Which specific models did you try and how would you rank each is usability?

      • couch1potato@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        18 days ago

        I tried minicpm-v, granite3.2-vision, and mistral.

        Granite didn’t work with paperless-gpt at all. Mistral worked sometimes but also just kept running sometimes and didn’t finish within a reasonable time (15 minutes for 2 pages). minicpm-v finishes every time, but i just looked at some of the results and seems as though it’s not even worth keeping it running either. I suppose maybe the first one I tried that gave me a good impression was a fluke.

        To be fair, I’m a noob at local ai, and I also don’t have a good gpu (gtx1650). So these failures could all be self induced. I like the idea of ai powered ocr so I’ll probably try again in the future…

        • kiol@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          16 days ago

          I find your experiments inspired. Thank you! I’m learning about this myself on an rtx and excited to discuss on my little podcast.james.network one of these days. Been using paperless minus the AI functionality so far. About to start testing different AI services on an arm64 device with 16gb ram that claims some level of AI support; will see how that goes. Let me know if there are any other specific services/models you’d recommend or are curious about.

          • couch1potato@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            2
            ·
            16 days ago

            Sure, and let me know how it goes for you. I’m on a dell r720xd, about to upgrade my ram from 128 to 296 gb… don’t want to spend the money for a new gpu right now.

            I’ll report back after I try again.

  • Helmaar@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    25 days ago

    I was able to run a distilled version of DeepSeek on Linux. I ran it inside a PODMAN container with ROCM support (I have an AMD GPU). It wasn’t super fast but for a locally deployed and self hosted option the performance was okay. Apart from that I have deployed Fooocus for image generation in a similar manner. Currently, I am working on deploying Stable Diffusion with either ComfyUI or Automatic1111 inside a PODMAN container with ROCM support.

    • kiol@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      25 days ago

      Didn’t know about these image generation tools, besides Stable Diffusion. Thanks!

    • L_Acacia@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      21 days ago

      Qwen coder or the new gemma3.

      But at this size using privacy respecting api might be both cheaper and lead to better results.

    • ikidd@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      25 days ago

      Claude is the standard that all others are judged by. But it’s not cheap.

      Gemini is pretty good, and Qwen-coder isn’t bad. I’d suggest you watch a few vids on GosuCoder’s YT channel to see what works for you, he reviews a pile of them and it’s quite up to date.

      And if you use VScode, I highly recommend the Roocode extension. Gosucoder also goes into revising the roocode prompt to reduce costs for Claude. Another extension is Cline.