Nicolas Mallison’s Post

Expert AI & Data Science Advisor

5mo

Game changer? Llama 3.1 405B is now running on Cerebras! – 969 tokens/s, frontier AI now runs at instant speed – 12x faster than GPT-4o, 18x Claude, 12x fastest GPU cloud – 128K context length, 16-bit weights – Industry’s fastest time-to-first token @ 240ms https://lnkd.in/e5zuEnM7

To view or add a comment, sign in

More Relevant Posts

Kye G.

Founder of swarms.xyz -- Automating The World's Labor Markets with Agents
8mo
Report this post
Zeta Update v2.6.1 Zeta aims to be the most bleeding-edge neural network framework with the highest reliability, incomprehensible speed, and simplest experience! ⎆ FEAT][Adaptive Gating] ⎆ [zeta.quant -> zeta.nn.quant] ⎆ [FEAT][Multi-Modal Rotary Embeddings] ⎆ [DEL][Zeta Cloud] + [DEL][ZETA CLI] [General Clean up] Get started now: https://buff.ly/4dyfFR9
Like Comment
To view or add a comment, sign in
The Swarms Corporation

476 followers
8mo
Report this post
Zeta Update v2.6.1 Zeta aims to be the most bleeding-edge neural network framework with the highest reliability, incomprehensible speed, and simplest experience! ⎆ FEAT][Adaptive Gating] ⎆ [zeta.quant -> zeta.nn.quant] ⎆ [FEAT][Multi-Modal Rotary Embeddings] ⎆ [DEL][Zeta Cloud] + [DEL][ZETA CLI] [General Clean up] Get started now: https://buff.ly/4dyfFR9
Like Comment
To view or add a comment, sign in
SambaNova Systems

61,107 followers
6mo Edited
Report this post
🎥 Rodrigo Liang shares how SambaNova's Reconfigurable Data Units can deliver 10x the speed of traditional GPUs using 1/10th the power. With SambaNova Cloud, you can experience fast AI inference on AI at Meta's Llama 3.2, powered by the SN40L RDU, giving you both speed and efficiency. Start developing now ➡️ cloud.sambanova.ai #AI #GPU #API
Like Comment
To view or add a comment, sign in
Ish Rajesh Shelley

Founder at Yardstick | AI Agents | AI/ML - SaaS Services | IIT Ropar
7mo
Report this post
Llama 3.1 405B is here! Now, you can deploy Meta's Llama 3.1 405B on Google Cloud Vertex AI. Bringing GPT-4 level capabilities in-house with full control. ✅ Key highlights: 🧠 Llama 3.1 405B offers a 128K token context and advanced capabilities. 📊 Supports the full deployment lifecycle: setup, registration, deployment, and inference. 💾 Utilizes FP8 precision for single-node deployment on H100 GPUs. 🔓 Step-by-step guide to request quota for Google Cloud's A3 High-GPU machines with 8 x H100. 🛠️ Deployment made easy with Hugging Face's Text Generation Inference (TGI) container. 💰 Includes tips for managing resources and controlling costs on Vertex AI. #MetaAI #Llama3 #GoogleCloud #VertexAI #AIInnovation #HuggingFace #TechUpdate"
Like Comment
To view or add a comment, sign in
Pruthvi Raju

Cloud Developer at Hewlett Packard Enterprise
4mo
Report this post
Learn how our industry-first turnkey #AI private cloud gives your AI and IT teams powerful tools to experiment, scale, and operationalize AI, while keeping data secure. New video introduces the #HPEProLiant Compute DL384 Gen12 with #NVIDIA GH200 NVL2 @ https://hpe.to/6049turQH

HPE ProLiant Compute DL384 Gen12 with NVIDIA GH200 NVL2
Like Comment
To view or add a comment, sign in
Patrick Musau, Ph.D.

Senior Software Engineer (ML for Code Efficiency) | Robotics Enthusiast
1y Edited
Report this post
"We’re thrilled to announce the general availability of Cloud TPU v5p, our most powerful and scalable TPU to date. TPU v5p is a next-generation accelerator that is purpose-built to train some of the largest and most demanding generative AI models. A single TPU v5p pod contains 8,960 chips that run in unison — over 2x the chips in a TPU v4 pod. Beyond the larger scale, TPU v5p also delivers over 2x higher FLOPS and 3x more high-bandwidth memory on a per chip basis." 🙌🏾 https://lnkd.in/gAt55qdy https://lnkd.in/gsQEST3T #GoogleCloudNext
Like Comment
To view or add a comment, sign in
Vivek Anandaraman

Help CIOs align AWS spend with business value by bringing FinOps into Jira | Mentor | Speaker
7mo
Report this post
Tips for savings Cloud costs on AI workload 1. RAG will help you to avoid fine tuning models for every use case and should be used as the first option to provide context to the model. You can either build it from scratch or use a framework like LlamaIndex, LangChain 2. Auto optimization of prompts using AdalFlow can reduce compute resources 3. Models can be finetuned efficiently using Unsloth AI there by optimizing compute usage 4. Purpose built LPU from Groq use optimized compute resources for inference over GPU #finops #genai
Like Comment
To view or add a comment, sign in
Menlo Research

4,375 followers
6mo
Report this post
v0.5.5 is out: Jan is more stable 👋 Highlights 🎉 - AI at Meta's Llama 3.2 and Alibaba Cloud's Qwen 2.5 added to the hub - Improved starter screen - Better local vs. cloud model navigation Fixes 💫 - Solved GPU acceleration for GGUF models - Improved model caching & threading - Resolved input & toolbar overlaps
1 Comment
Like Comment
To view or add a comment, sign in
Masazumi Koga

AI Ambassador, OpenSource Linux Technology Evangelist, Presales at Hewlett Packard Enterprise
4mo
Report this post
Learn how our industry-first turnkey #AI private cloud gives your AI and IT teams powerful tools to experiment, scale, and operationalize AI, while keeping data secure. New video introduces the #HPEProLiant Compute DL384 Gen12 with #NVIDIA GH200 NVL2 @ https://hpe.to/6046QyrDj

HPE ProLiant Compute DL384 Gen12 with NVIDIA GH200 NVL2
Like Comment
To view or add a comment, sign in

4,178 followers

View Profile Follow

Nicolas Mallison’s Post

More from this author

Why should organisations be analytics driven?

How to develop your organisation through a focus on analytical insights

Explore topics

Nicolas Mallison’s Post

More Relevant Posts

HPE ProLiant Compute DL384 Gen12 with NVIDIA GH200 NVL2

HPE ProLiant Compute DL384 Gen12 with NVIDIA GH200 NVL2

More from this author

Why should organisations be analytics driven?

How to develop your organisation through a focus on analytical insights

Explore topics