BestBlogs.dev Weekly Selection Issue #3

Subscribe Now

Dear Readers,

Welcome to this week's edition of BestBlogs.dev Weekly Picks! This week, we have carefully curated high-quality articles in the fields of artificial intelligence and business technology, aiming to bring you the latest industry insights and knowledge.

This Sunday, we are sending out content focused on artificial intelligence and business technology. Next Wednesday, we will be sending out a newsletter related to programming techniques and product design, so stay tuned.

In the realm of artificial intelligence, we will be exploring how to implement large models within enterprises, the latest groundbreaking models open-sourced by Kunlun Wanwei and Kuaishou, and how OpenRLHF makes aligning large models easier. The articles also provide a detailed explanation of prompt injection attacks and their prevention, as well as the implementation of hybrid search in PostgreSQL using pgvector and Cohere. Additionally, you will find a ranking of AI image models based on human preferences from Hugging Face, insights on building products with large language models, and the efficiency and performance improvements brought by Mamba 2.

In the business and technology sector, NVIDIA CEO Jensen Huang delivered a 20,000-word speech envisioning the future of AI chips and emphasizing the advent of the robotics era. Zhang Jinjian from Oasis Capital discussed the topic of "vitality," exploring the traits and growth of entrepreneurs. We will also analyze the strategic layouts of tech giants in the AI field and how generative AI is advancing marketing and sales. Furthermore, you will learn about the AI-driven features in Apple's iOS 18.

Other content in this issue includes discussions on co-creation brand strategies and case studies of AI combined with film production. We hope these articles will inspire and provoke thought, helping you grasp industry trends.

Alright, let's start reading~

Kunlun's Skywork-MoE: A 200 Billion Parameter Sparse Model Optimized for 4090 GPU Inference

·06-03·1962 words (8 minutes)·AI score: 93 🌟🌟🌟🌟🌟

Kunlun's Skywork-MoE, a 200 billion parameter sparse model, is the first to support inference on a single 4090 server, significantly reducing costs. It leverages MoE Upcycling technology, enhancing performance while maintaining a smaller parameter size compared to competitors. The model is fully open-source, including weights, technical reports, and inference code optimized for 8x4090 servers.

Is Implementing Large Models in Enterprises Effective? — What We Should Do

·06-07·3255 words (14 minutes)·AI score: 93 🌟🌟🌟🌟🌟
Is Implementing Large Models in Enterprises Effective? — What We Should Do

This article begins by introducing the rapid development of large models globally, particularly the United States' leadership in underlying large models and application architectures. It then delves into the four major application directions of Generative AI and analyzes the changes in work status after introducing large models. It emphasizes the importance of the Agent architecture, where Agents possess the abilities of perception, memory, tools, and action, gradually achieving goals through independent thinking and tool invocation. The article further discusses how to efficiently and cost-effectively build Agents, including analyzing business scenarios and abstracting general atomic capabilities. Additionally, it compares overseas and domestic AI Agent construction platforms and how multi-Agent collaboration modes enhance task execution effects. Finally, the article emphasizes the importance of information when interacting with large models and how the Chain of Thought (COT) capability enhances the model's transparency and credibility.

Kuaishou's 'Keling' AI Video Generation Model Opens Beta Test: Generates Videos Over 120 Seconds, Understands Physics, Accurately Models Complex Motions

·06-06·3786 words (16 minutes)·AI score: 93 🌟🌟🌟🌟🌟

Kuaishou's 'Keling' AI video generation model, which is similar to Sora, has opened beta testing. It can generate videos over 2 minutes long, with a resolution of 1080p and a frame rate of 30fps. The model is capable of simulating physical world characteristics and accurately modeling complex motions. It has been integrated into the Kuaishou ecosystem and is available for testing in the Kuaishou app.

This Team Developed OpenAI's Unopened Technology, Open-sourcing OpenRLHF to Make Aligning Large-scale Models Easier

·06-06·2339 words (10 minutes)·AI score: 93 🌟🌟🌟🌟🌟

As large language models (LLM) continue to grow in scale, their performance has been improving. However, a key challenge remains: aligning with human values and intentions. One powerful technique for this is Reinforcement Learning from Human Feedback (RLHF). With the increase in model size, RLHF typically requires maintaining multiple models and increasingly complex learning processes, leading to higher demands on memory and computational resources. OpenRLHF, an open-source RLHF framework, was proposed by a joint team including OpenLLMAI, ByteDance, NetEase Fuxi AI Lab, and Alibaba. It offers an easy-to-use, scalable, and high-performance solution for RLHF training of models over 70 billion parameters, integrating PPO and other techniques.

What is a Prompt Injection Attack?

·06-05·3364 words (14 minutes)·AI score: 92 🌟🌟🌟🌟🌟

This article introduces the concept of prompt injection attacks, detailing how they work, common types, and potential risks. It explains how prompt injection can lead to systems generating incorrect information, writing malware, and even data breaches and remote system takeover. The article also discusses various countermeasures such as data vetting, the principle of least privilege, and reinforcement learning with human feedback.

Shoumao Assistant Agent Technology Exploration Summary

·06-07·7604 words (31 minutes)·AI score: 92 🌟🌟🌟🌟🌟
Shoumao Assistant Agent Technology Exploration Summary

This article explores in detail how the Shoumao technology team combines large language models (LLM) with AI Agent technology, addressing the problems encountered, thought strategies, and practical cases throughout the process. The article first introduces the concept of an AI Agent, defined as Agent = LLM + memory + planning skills + tool usage, emphasizing that an Agent needs to have the ability to perceive the environment, make decisions, and take appropriate actions. Next, the article elaborates on the decision-making process of an Agent, which includes three steps: perception, planning, and action, illustrating the execution process of an Agent through specific cases. In an LLM-driven Agent system, the LLM acts as the brain, supplemented by key components such as planning, memory, and tool usage.

Over the past year, the Shoumao team has started to focus on AI technology trends, exploring the combination of Agent technology and shopping group hand business. The article provides a detailed account of the technical challenges, ideas, and practices encountered by Shoumao in integrating Agent capabilities with intelligent assistant services. It presents the end-display solution, Agent abstraction and management, and the construction of an Agent laboratory. In addition, the article discusses the classification, definition, and exception handling of tools, as well as the concept and pros and cons of tool granularity, summarizing the considerations for ensuring tool security.

During the project's launch and iteration process, the Shoumao team encountered several issues, including high requirements for result accuracy, structural display error rates when the large model outputs directly to the end, instability in the Agent's understanding of tools, and the complexity requirements for tool returns by the LLM.

Launching the Artificial Analysis Text to Image Leaderboard & Arena

·06-06·659 words (3 minutes)·AI score: 92 🌟🌟🌟🌟🌟
Launching the Artificial Analysis Text to Image Leaderboard & Arena

The Hugging Face blog has released a leaderboard for AI image models based on human preferences, aiming to assess and compare the performance of various models. This leaderboard ranks mainstream models, including Midjourney, DALL·E, and Stable Diffusion, among others, based on over 45,000 preference choices. Users can contribute to the rankings by participating in the "Text-to-Image Arena," and after voting on 30 images, they can receive a personalized model ranking.

The leaderboard uses the ELO scoring system, which calculates scores for each model by surveying human preferences across more than 700 images. These images cover a variety of styles and categories to ensure comprehensive evaluation.

Early analysis indicates that proprietary models such as Midjourney and DALL·E 3 HD are currently leading the field. However, open-source models, particularly Playground AI v2.5, are rapidly improving and have surpassed proprietary models in certain areas. Additionally, the upcoming open-source release of Stable Diffusion 3 Medium is expected to have a significant impact on the open-source community.

What We Learned From a Year of Building with LLMs (Part II) [Translation]

·06-03·9709 words (39 minutes)·AI score: 92 🌟🌟🌟🌟🌟

In this article, the authors delve into the valuable experiences and practical insights gained from building and managing large language model (LLM) applications. The article covers multiple aspects from an operational perspective, including data handling, model management, product design, and team building.

First, the article emphasizes the importance of data quality. Regularly reviewing the discrepancies between development and production environments ensures that the data samples in the development environment are consistent with those in the production environment, helping to prevent performance issues in the actual application. Additionally, the article suggests examining LLM input and output samples daily to quickly identify and adapt to new patterns or failure modes.

In terms of model management, the authors recommend generating structured outputs to simplify downstream integration and discuss the challenges of migrating prompts between different models. To ensure stable model performance, the authors advise using version control and fixing model versions to avoid unexpected changes due to model updates. Moreover, selecting the smallest model that can accomplish the task can effectively reduce latency and cost.

For product design, the article points out that designers should be involved early and frequently in the development process. This involvement should go beyond merely enhancing the interface; designers should rethink the user experience and propose valuable improvements. Designing human-in-the-loop user experiences, allowing users to provide feedback and corrections, can enhance the immediate output quality of the product and collect valuable data for model improvement. Clarifying the prioritization of requirements and adjusting risk tolerance according to use cases are also crucial for success.

In team building, the article stresses the importance of focusing on processes rather than merely relying on tools. Cultivating a culture of experimentation and encouraging the team to conduct experiments and iterations can help discover the best solutions. Ensuring that all team members can access and utilize the latest AI technologies and recognizing that a successful LLM application team requires a diverse set of skills, including data science, software engineering, and product design, are also highlighted.

Overall, this article provides profound insights into effectively developing and managing LLM applications and serves as a practical operational guide for professionals in the field.

next-token Eliminated! Meta Tests 'Multi-token' Training Method, Boosts Inference Speed by 3x, Performance Up Over 10%

·06-06·4189 words (17 minutes)·AI score: 91 🌟🌟🌟🌟🌟
next-token Eliminated! Meta Tests 'Multi-token' Training Method, Boosts Inference Speed by 3x, Performance Up Over 10%

In recent studies, researchers from Meta, Paris-Saclay University, and Paris-Sorbonne University have jointly proposed a new training method for large language models. This method improves the sample efficiency of the models by predicting multiple future tokens simultaneously, rather than the traditional single token prediction. The new approach has shown advantages in both code generation and natural language generation tasks, without increasing training time, and can even triple the inference speed. Experiments indicate that as the size of the models increases, the benefits of this method become even more pronounced, especially during training with multiple epochs. In benchmark tests for generative tasks such as programming, the performance improvement of models trained with multi-token prediction is particularly significant.

Breaking up is hard to do: Chunking in RAG applications

·06-06·1468 words (6 minutes)·AI score: 91 🌟🌟🌟🌟🌟

The article discusses the importance of using Retrieval-Augmented Generation (RAG) systems in LLM applications to enhance the accuracy and reliability of LLM responses by vectorizing data. RAG systems enable LLM to retrieve and reference specific data within the semantic space by chunking and converting data into vectors. The size of these data chunks is crucial for the accuracy of search results; chunks that are too large can lack specificity, while those that are too small may lose context. The article also cites insights from Roie Schwaber-Cohen of Pinecone, emphasizing the role of metadata in filtering and linking to original content, as well as how different chunking strategies can affect the efficiency and accuracy of the system.

Several common chunking strategies are outlined, including fixed-size chunking, random-size chunking, sliding window chunking, context-aware chunking, and adaptive chunking. Each strategy has its advantages and limitations, and the most suitable method must be chosen based on the specific use case. For instance, Stack Overflow implemented semantic search by treating questions, answers, and comments as discrete semantic chunks according to the structure of the page. Ultimately, determining the optimal chunking strategy involves actual testing and evaluation to optimize the performance of the RAG system.

OpenAI New Research: How to Understand GPT-4's 'Thinking'

·06-06·947 words (4 minutes)·AI score: 91 🌟🌟🌟🌟🌟
OpenAI New Research: How to Understand GPT-4's 'Thinking'

This article discusses OpenAI's latest research on understanding the 'thinking' of GPT-4. The study introduces sparse autoencoders as a method to identify key points within AI models, enabling better utilization. OpenAI has developed a new approach allowing sparse autoencoders to be extended to millions of features, outperforming previous methods. The article includes a paper, code repository, and an interactive viewer for exploring the concept further.

Battle Transformers Again! The Original Authors Lead the Release of Mamba 2, Significantly Improving Training Efficiency with a New Architecture

·06-04·3990 words (16 minutes)·AI score: 90 🌟🌟🌟🌟

Since its introduction in 2017, Transformer has become the mainstream architecture for AI large models, especially in language modeling. However, its limitations have become apparent with the expansion of model size and sequence length. The Mamba model, introduced a few months ago, addressed some of these issues by achieving linear scalability with context length. Now, the original authors have released Mamba 2, which offers significant improvements in training efficiency and performance. Key contributions include the development of the SSD (state space duality) framework, improved linear attention theory, and the introduction of new algorithms that leverage larger state dimensions. Mamba 2 outperforms its predecessor and other models in various tasks, demonstrating the complementary nature of attention mechanisms and state space models.

Multimodal Model Learns to Play Poker: Outperforms GPT-4v, New Reinforcement Learning Framework is Key

·06-04·1651 words (7 minutes)·AI score: 90 🌟🌟🌟🌟

The article discusses a new reinforcement learning framework, RL4VLM, which allows multimodal large models to learn decision-making tasks without human feedback. Key points include: 1) The model can perform tasks like playing poker and solving '12 points' problems, surpassing GPT-4v. 2) The framework uses environmental rewards instead of human feedback, enhancing decision-making capabilities. 3) The model's performance was tested on tasks requiring fine-grained visual information and embodied intelligence. 4) The framework integrates visual and textual inputs for task states and uses PPO for fine-tuning.

The Open Source Version of GLM-4 is Finally Here: Surpassing Llama3, Multimodal Comparable to GPT4V, MaaS Platform Also Upgraded

·06-06·3107 words (13 minutes)·AI score: 90 🌟🌟🌟🌟
The Open Source Version of GLM-4 is Finally Here: Surpassing Llama3, Multimodal Comparable to GPT4V, MaaS Platform Also Upgraded

Zhipu AI announced a series of advancements in its large models at the recent AI Open Day. The company's Large Model Open Platform currently has 3 million registered users and processes an average of 400 billion tokens per day. The growth of the GLM-4 model over the past 4 months has exceeded 90 times, and the new version, GLM-4-9B, has comprehensively surpassed Llama 3 8B. The multimodal model GLM-4V-9B has also been launched, with all large models remaining open source. The MaaS platform has been upgraded to version 2.0, lowering the threshold for applying large models and providing a more streamlined process for deploying private models. Zhipu AI's commercialization strategy has not only achieved a continuous reduction in application costs through technological innovation but also ensured the upgrade of customer value. Zhipu AI has also introduced the GLM-4-AIR model at a lower price, with performance comparable to GLM-4-0116 but at only 1/100th of the cost. Additionally, Zhipu AI has played a role in formulating AI security standards, joining several international companies in signing the Frontier AI Safety Commitment. Zhipu AI believes that with the Scaling Law still effective, 2024 will be the pivotal year for AGI.

This AI Product Provides a Gaming Partner, Mining Diamonds in Minecraft with Agent-Based Approach

·06-04·4638 words (19 minutes)·AI score: 90 🌟🌟🌟🌟

Altera is a company dedicated to creating AI gaming companions with human-like characteristics. Their first product is an AI partner that can explore and interact with players in Minecraft. Unlike the earlier Voyager, Altera's AI focuses more on empathy and emotional interaction, aiming to become a long-term companion for players, rather than just a tool or assistant. The founder, Guangyu Robert Yang, and his team have a strong academic background in neural networks and cognitive science, combining deep learning and behavior modeling to drive the development of this innovative AI.

Altera's vision extends beyond the gaming industry. They aim to build a world of multiple agents, allowing these digital humans to play roles in various fields, and even have their own forms in the physical world. With the advancement of AI technology, these human-like digital beings could change the way we interact with the digital world, breaking the boundaries between virtual and real worlds.

Jina CLIP v1: A Truly Multimodal Embeddings Model for Text and Image

·11-28·1863 words (8 minutes)·AI score: 88 🌟🌟🌟🌟

Jina AI's new multimodal embedding model, Jina CLIP v1, significantly outperforms OpenAI's original CLIP model in various retrieval tasks. It provides state-of-the-art performance in both text-only and text-image cross-modal retrieval, eliminating the need for separate models for different modalities. Key improvements include:

  1. Enhanced performance in text-only and image-to-image retrieval.
  2. Support for longer text inputs with an 8k token input window.
  3. Utilization of the EVA-02 model for superior image embeddings.
  4. Detailed instructions for getting started with Jina CLIP v1 via Embeddings API and Hugging Face.

Deciphering the Origin, Essence, Gameplay, and Selection Principles of Positioning Strategy and Group Warfare

·06-05·15933 words (64 minutes)·AI score: 93 🌟🌟🌟🌟🌟
Deciphering the Origin, Essence, Gameplay, and Selection Principles of Positioning Strategy and Group Warfare

In the rapidly changing landscape of 2024, many new trends have emerged in the marketing field. This article delves into how these changes impact brand strategies. It first introduces the top 10 marketing trends of 2024, including the rise of VUCA environments, the digital economy, and AI-driven economies, as well as the emergence of a new generation of consumers. The article emphasizes the need for brands to focus on emotional value and the content economy.

The article provides a detailed analysis of three main brand strategies: positioning strategy, audience-centric approach, and co-creation. The positioning strategy aims to capture a unique place in the consumer's mind through category leadership and super symbols but faces limitations in resources and innovative thinking. The audience-centric approach leverages the DTC (Direct-to-Consumer) model, addressing specific group needs, building deep connections with users, and achieving interaction through refined operations. The co-creation strategy emphasizes involving 1% of users in brand creation, jointly creating content, products, and communities to achieve shared growth.

Through rich case studies, the article demonstrates the application of these strategies while also pointing out their limitations. The positioning strategy might be difficult to implement due to resource constraints, the audience-centric approach requires a deep understanding of digitalization, and co-creation needs brands to deeply empower user participation.

Jensen Huang's In-Depth Interview: How I Led 28,000 People to Surpass Apple in Ten Years

·06-06·12457 words (50 minutes)·AI score: 93 🌟🌟🌟🌟🌟
Jensen Huang's In-Depth Interview: How I Led 28,000 People to Surpass Apple in Ten Years

In a deep interview with Stripe CEO Patrick Collison, NVIDIA CEO Jensen Huang shared his experiences and management philosophies that led the company to achieve tremendous success. Through this in-depth conversation, Huang's leadership style, innovative thinking, and insights into AI technology were revealed.

Huang emphasized that great achievements require pain and struggle, and not all work continuously brings you joy. He believes that striving and solving difficulties are essential to truly realize the greatness of what you are doing. NVIDIA's management model is also unique, with over 60 executives reporting directly to him, ensuring transparent and efficient information dissemination. This flat management structure not only reduces hierarchy but also promotes internal learning and growth within the company.

In team management, Huang insists on not giving up on any employee easily, believing that everyone has potential. He provides open feedback on mistakes, allowing the entire team to learn and progress. He also stresses that the role of a CEO is to handle tasks that others cannot and to only participate in meetings that drive development and solve problems.

Huang prefers to create entirely new markets rather than compete in existing ones. He believes that innovation and logical reasoning are key to proving the feasibility of ideas. He compares the current AI revolution to the industrial revolution, producing tokens and floating-point numbers, which represent intelligence and will significantly enhance productivity across various industries, with vast potential.

The article also explores significant breakthroughs in the AI field with ChatGPT and Llama. ChatGPT democratized computing, while Llama democratized generative AI, fostering widespread application and research. Huang emphasizes that actively engaging with AI is crucial for future competition; otherwise, you will be replaced by those who utilize AI.

Additionally, Huang believes that excellent operations can create good things, but love and care are needed to create extraordinary things. He thinks that products should be beautiful and elegant, striking a balance between simplicity and complexity to provide an exceptional user experience.

AI, Humanity, and Vitality | A Conversation with Zhang Jinjian from Oasis Capital

·06-02·6581 words (27 minutes)·AI score: 92 🌟🌟🌟🌟🌟

The article consists of a dialogue between Kai Qu and Jinjian Zhang, revolving around the core concept of "vitality." Zhang believes that vitality is the intrinsic energy of every individual, the energy and desire for goodness and connection with all things that one feels upon waking each day. He points out that a person with vitality has two characteristics: pragmatism and unconditional self-love. A pragmatic person can respond objectively to problems, while self-love helps people face setbacks and transform suffering into growth. The article also discusses the relationship between vitality and entrepreneurial success, as well as how to achieve self-love. Zhang emphasizes that the core of self-love is forgiveness, which is the central tenet of all religions.

The article further explores how setbacks can promote personal growth and how entrepreneurs can enhance their vitality through overcoming adversity. Kai Qu shares his perspective, believing that those who have experienced setbacks can better understand management and empathize with others, while those who appear to have had smooth sailing may develop an inflated ego due to a lack of hardship experience.

When discussing the relationship between an entrepreneur's script and investment, Zhang mentions the insight and desire that founders must possess. He argues that insight is a reflection of diversity rather than sophistication, and that entrepreneurs should embrace the diversity of the world. Desire is what connects a person to things they truly want to do, not a motive based on comparison. He also notes that a person's stability stems from faith, which is discovered through a process of noise reduction rather than seeking.

Finally, the article discusses the development and future trends of AI. Zhang believes that AI is accelerating and may significantly reduce the demand for programmers within the next three years. He also points out that the integration of AI and blockchain could usher in a new era, and the simultaneous explosion of these technologies will have profound social impacts. Despite market cycles, Zhang believes that value creators will become fewer while money will increase. Therefore, it is important to focus on one's work and find ways to love oneself, such as pursuing a hobby, to maintain a peaceful mindset and nourish creativity.

Jensen Huang on Generative AI and the Future of Robotics

·06-02·20342 words (82 minutes)·AI score: 91 🌟🌟🌟🌟🌟
Jensen Huang on Generative AI and the Future of Robotics

At Computex 2024, Jensen Huang showcased NVIDIA's latest advancements in accelerated computing, featuring the Blackwell chip and Rubin platform. He highlighted the crucial role of accelerated computing in boosting efficiency, reducing costs, and saving energy. NVIDIA's innovations, such as deep learning libraries and quantum computer simulation systems, were also presented. Huang emphasized the positive feedback loop of CUDA technology in driving AI progress and discussed the transformative impact of generative AI across various sectors. He forecasted a future where autonomous mobility is widespread, marking the arrival of the robotics era. The full-scale production of the Blackwell GPU, representing a 1000-fold increase in AI computing power over eight years, was also announced. This leap surpasses Moore's Law advancements, thanks to technologies like the Grace CPU and fifth-generation NVLink. Finally, Huang underscored the significance of generative AI and its broad applications, signaling the dawn of the data center era.

AI Strategies of Tech Giants from Personal Computers, Smartphones to Artificial Intelligence

·06-06·5581 words (23 minutes)·AI score: 91 🌟🌟🌟🌟🌟

The article focuses on the strategic layouts of four major tech companies in the field of artificial intelligence.

Firstly, for Google, its strategy in the AI domain is unique, adopting an integrated approach. Through its proprietary TPU processors and the Vertex AI platform, Google provides AI solutions for both consumers and enterprises. Amazon, on the other hand, offers modular services via its Bedrock managed development platform, emphasizing the importance of data gravity. Microsoft, through its collaboration with OpenAI and its own Azure platform, follows a technological approach to ensure the widespread application of AI technology. Meanwhile, Meta opts to open source its large models, such as Llama, to reduce inference costs. By leveraging an open-source strategy, Meta attracts a vast number of developers and enterprises, thereby increasing the usage and improvement speed of its models.

The article points out that both integration and modularization have their advantages. Integration can offer more optimized and coordinated solutions, whereas modularization provides greater flexibility and adaptability.