LocalLLaMa

LocalLLaMa wagesj45 • 1y ago • 100%

https://mistral.ai/news/announcing-mistral-7b/

From their website ========== Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date. Mistral 7B in short ---------- Mistral 7B is a 7.3B parameter model that: * Outperforms Llama 2 13B on all benchmarks * Outperforms Llama 1 34B on many benchmarks * Approaches CodeLlama 7B performance on code, while remaining good at English tasks * Uses Grouped-query attention (GQA) for faster inference * Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions. * [Download it](https://files.mistral-7b-v0-1.mistral.ai/mistral-7B-v0.1.tar) and use it anywhere (including locally) with [our reference implementation](https://github.com/mistralai/mistral-src) * Deploy it on any cloud (AWS/GCP/Azure), using vLLM [inference server and skypilot](https://docs.mistral.ai/cloud-deployment/skypilot) * Use it on [HuggingFace](https://huggingface.co/mistralai) Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

LocalLLaMa Blaed • 1y ago • 100%

Free Open-Source AI LLM Guide

cross-posted from: https://lemmy.world/post/2219010 > Hello everyone! > > We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all. > > It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI. > > The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon! > > In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see. > > --- > > ## **Getting Started With Free Open-Source AI** > > Have no idea where to begin with AI / LLMs? Try starting with our [Lemmy Crash Course for Free Open-Source AI](https://lemmy.world/post/76020). > > When you're ready to explore more resources see our [FOSAI Nexus](https://lemmy.world/post/814816) - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology. > > If you're looking to jump right in, I recommend downloading [oobabooga's text-generation-webui](https://github.com/oobabooga/text-generation-webui) and installing one of the LLMs from [TheBloke](https://huggingface.co/TheBloke) below. > > Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B). > > ### **8-bit System Requirements** > > | Model | VRAM Used | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|-----------|--------------------|-------------------|-------------------| > | LLaMA-7B | 9.2GB | 10GB | 3060 12GB, 3080 10GB | 24 GB | > | LLaMA-13B | 16.3GB | 20GB | 3090, 3090 Ti, 4090 | 32 GB | > | LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64 GB | > | LLaMA-65B | 74GB | 80GB | A100 80GB | 128 GB | > > ### **4-bit System Requirements** > > | Model | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|--------------------|--------------------------------|-------------------| > | LLaMA-7B | 6GB | GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 | 6 GB | > | LLaMA-13B | 10GB | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 12 GB | > | LLaMA-30B | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 32 GB | > | LLaMA-65B | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 | 64 GB | > > *System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM. > > When in doubt, try starting with 3B or 7B models and work your way up to 13B+. > > ### **FOSAI Resources** > > **Fediverse / FOSAI** > - [The Internet is Healing](https://www.youtube.com/watch?v=TrNE2fSCeFo) > - [FOSAI Welcome Message](https://lemmy.world/post/67758) > - [FOSAI Crash Course](https://lemmy.world/post/76020) > - [FOSAI Nexus Resource Hub](https://lemmy.world/post/814816) > > **LLM Leaderboards** > - [HF Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) > - [LMSYS Chatbot Arena](https://chat.lmsys.org/?leaderboard) > > **LLM Search Tools** > - [LLM Explorer](https://llm.extractum.io/) > - [Open LLMs](https://github.com/eugeneyan/open-llms) > > --- > > ## **Large Language Model Hub** > > [Download Models](https://huggingface.co/TheBloke) > > ### [oobabooga](https://github.com/oobabooga/text-generation-webui) > text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of [HuggingFace](https://huggingface.co/TheBloke) which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) of text generation. It is highly compatible with many formats. > > ### [Exllama](https://github.com/turboderp/exllama) > A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs. > > ### [gpt4all](https://github.com/nomic-ai/gpt4all) > Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors. > > ### [TavernAI](https://github.com/TavernAI/TavernAI) > The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern). > > ### [SillyTavern](https://github.com/SillyTavern/SillyTavern) > Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8 > > ### [Koboldcpp](https://github.com/LostRuins/koboldcpp) > A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights. > > ### [KoboldAI-Client](https://github.com/KoboldAI/KoboldAI-Client) > This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed. > > ### [h2oGPT](https://github.com/h2oai/h2ogpt) > h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer. > > --- > > ## **Models** > > ### The Bloke > The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs). > > These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home. > > Support [TheBloke](https://huggingface.co/TheBloke) here. > > - [https://ko-fi.com/TheBlokeAI](https://ko-fi.com/TheBlokeAI) > > --- > > #### 70B > - [Llama-2-70B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ) > - [Llama-2-70B-Chat-GGML](https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGML) > > - [Llama-2-70B-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-GPTQ) > - [Llama-2-70B-GGML](https://huggingface.co/TheBloke/Llama-2-70B-GGML) > > - [llama-2-70b-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-70b-Guanaco-QLoRA-GPTQ) > > --- > > #### 30B > - [30B-Epsilon-GPTQ](https://huggingface.co/TheBloke/30B-Epsilon-GPTQ) > > --- > > #### 13B > - [Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) > - [Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML) > > - [Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ) > - [Llama-2-13B-GGML](https://huggingface.co/TheBloke/Llama-2-13B-GGML) > > - [llama-2-13B-German-Assistant-v2-GPTQ](https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GPTQ) > - [llama-2-13B-German-Assistant-v2-GGML](https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GGML) > > - [13B-Ouroboros-GGML](https://huggingface.co/TheBloke/13B-Ouroboros-GGML) > - [13B-Ouroboros-GPTQ](https://huggingface.co/TheBloke/13B-Ouroboros-GPTQ) > > - [13B-BlueMethod-GGML](https://huggingface.co/TheBloke/13B-BlueMethod-GGML) > - [13B-BlueMethod-GPTQ](https://huggingface.co/TheBloke/13B-BlueMethod-GPTQ) > > - [llama-2-13B-Guanaco-QLoRA-GGML](https://huggingface.co/TheBloke/llama-2-13B-Guanaco-QLoRA-GGML) > - [llama-2-13B-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-13B-Guanaco-QLoRA-GPTQ) > > - [Dolphin-Llama-13B-GGML](https://huggingface.co/TheBloke/Dolphin-Llama-13B-GGML) > - [Dolphin-Llama-13B-GPTQ](https://huggingface.co/TheBloke/Dolphin-Llama-13B-GPTQ) > > - [MythoLogic-13B-GGML](https://huggingface.co/TheBloke/MythoLogic-13B-GGML) > - [MythoBoros-13B-GPTQ](https://huggingface.co/TheBloke/MythoBoros-13B-GPTQ) > > - [WizardLM-13B-V1.2-GPTQ](https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GPTQ) > - [WizardLM-13B-V1.2-GGML](https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGML) > > - [OpenAssistant-Llama2-13B-Orca-8K-3319-GGML](https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGML) > > --- > > #### 7B > - [Llama-2-7B-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) > - [Llama-2-7B-GGML](https://huggingface.co/TheBloke/Llama-2-7B-GGML) > > - [Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ) > - [LLongMA-2-7B-GPTQ](https://huggingface.co/TheBloke/LLongMA-2-7B-GPTQ) > > - [llama-2-7B-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-7B-Guanaco-QLoRA-GPTQ) > - [llama-2-7B-Guanaco-QLoRA-GGML](https://huggingface.co/TheBloke/llama-2-7B-Guanaco-QLoRA-GGML) > > - [llama2_7b_chat_uncensored-GPTQ](https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GPTQ) > - [llama2_7b_chat_uncensored-GGML](https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GGML) > > --- > > ## **More Models** > - [Any of KoboldAI's Models](https://huggingface.co/KoboldAI) > > - [Luna-AI-Llama2-Uncensored-GPTQ](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GPTQ) > > - [Nous-Hermes-Llama2-GGML](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-GGML) > - [Nous-Hermes-Llama2-GPTQ](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-GPTQ) > > - [FreeWilly2-GPTQ](https://huggingface.co/TheBloke/FreeWilly2-GPTQ) > > --- > > ## **GL, HF!** > > Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community. > > If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge. > > Thank you for reading!

LocalLLaMa Blaed • 1y ago • 100%

Introducing OpenLLaMA: An Open-Source Reproduction of Meta's LLaMA

cross-posted from: https://lemmy.world/post/1305651 > **[OpenLM-Research has Released OpenLLaMA: An Open-Source Reproduction of LLaMA](https://github.com/openlm-research/open_llama)** > > - https://github.com/openlm-research/open_llama > > >**TL;DR**: OpenLM-Research has released a public preview of OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA. We are releasing a series of 3B, 7B and 13B models trained on different data mixtures. Our model weights can serve as the drop in replacement of LLaMA in existing implementations. > > >In this repo, OpenLM-Research presents a permissively licensed open source reproduction of Meta AI's LLaMA large language model. We are releasing a series of 3B, 7B and 13B models trained on 1T tokens. We provide PyTorch and JAX weights of pre-trained OpenLLaMA models, as well as evaluation results and comparison against the original LLaMA models. The v2 model is better than the old v1 model trained on a different data mixture. > > This is pretty incredible news for anyone working with LLaMA or other open-source LLMs. This allows you to utilize the vast ecosystem of developers, weights, and resources that have been created for the LLaMA models, which are very popular in many AI communities right now. > > With this, anyone can now hop into LLaMA R&D knowing they have avenues to utilize it within their projects and businesses (commercially). > > Big shoutout to the team who made this possible (OpenLM-Research). You should support them by visiting their GitHub and starring the repo. > > A handful of varying parameter models have been released by this team, some of which are already circulating and being improved upon. > > Yet another very exciting development for FOSS! If I recall correctly, Mark Zuckerberg mentioned in his [recent podcast with Lex Fridman](https://www.youtube.com/watch?v=Ff4fRgnuFgQ) that the next official version of LLaMA from Meta will be open-source as well. I am very curious to see how this model develops this coming year. > > If you found any of this interesting, please consider subscribing to [/c/FOSAI](https://lemmy.world/c/fosai) where I do my best to keep you up to date with the most important updates and developments in the space. > > Want to get started with FOSAI, but don't know how? Try starting with my [Welcome Message](https://lemmy.world/post/67758) and/or [The FOSAI Nexus](https://lemmy.world/post/814816) & [Lemmy Crash Course to Free Open-Source AI](https://lemmy.world/post/76020).