Tweets

Jerry Liu

3d ago

Files are all you need 🗂️

I wrote a blog post to capture a trend I’m seeing in the AI agent landscape: that the primary way to equip agents with actions and context is through files and filesystems.
1️⃣ It is an easy way for agents to store context for later (e.g. @dexhorthy’s progressive disclosure)
2️⃣ It is a powerful search interface, in addition to or instead of RAG
3️⃣ It is a more flexible way to equip agents with tool calling.

Coding agents + file tools are a good initial proxy for computer use. We’ll see if the trend persists, but there’s a ton more potential to explore here.

Blog:llamaindex.ai/blog/files-are…w

LlamaIndex 🦙

3d ago

Files are becoming the primary interface for AI agents to manage context, store conversations, and access skills 📁

@jerryjliu0 breaks down how coding agents like Claude Code and @cursor_ai are centralizing around filesystems as core abstractions, moving away from complex tool ecosystems:

📝 Agents store long conversation histories in searchable files to overcome context window limitations
🔍 File-based retrieval with semantic search outperforms traditional RAG patterns for dynamic context traversal
⚡ Skills defined as simple files are replacing complex MCP tools - just copy API specs into markdown files
🛠️ Agents need only ~5-10 core tools (CLI, code interpreter, web fetch) plus filesystem access to be highly capable

The challenges ahead include parsing non-plaintext documents (PDFs, Word, Excel) and scaling file search to massive collections. That's exactly why we built LlamaCloud's Parse, Extract, and Sheets capabilities - to convert any document format into agent-ready context.

Read the full analysllamaindex.ai/blog/files-are…zUc6

00:38

129

43.6K

284

343

39.2K

111

elvis

3d ago

This is great insight from Cursor on long-running agents.

It turns out planning is all you need.

On a serious note, planning is critical to be productive and effective with AI Agents.

It's aligned with how I get Claude Code to effectively work on long-running tasks. (More of my thoughts on Claude Code towards the end of the post)

First, let's discuss the insights from the Cursor article.

The big problem with multi-agent systems today is the coordination/communication.

The solution Cursor proposes is careful planning.

I agree that the best way to deal with this challenge today is to do careful planning.

Planners can explore the codebase and create these tasks.

Then, subplanners are spawned to address specific categories of tasks. The great thing about this is that it enables parallelization and recursive loops, ideal for this kind of work.

From here, subagents can focus on assigned subtasks once they are completed (and push changes). One important aspect of this work is that subagents don't coordinate at all and are oblivious to the bigger picture. But they don't need to be to produce high-quality code that doesn't conflict. The issue with having subagents talk to each other is that this can lead to communication bottlenecks, duplicate work, and potential drift.

This can operate in cycles, which are all verified using a judge agent. The judge agent determines if work can continue or if there is an issue to address on every cycle.

Cursor managed to build a web browser from scratch with this approach. The agent ran for a week, writing over 1M+ lines of code across 1K files.

Cursor found that GPT-5.2 is better for this set up. They find that Opus 4.5 tends to stop earlier, take shortcuts, and quickly yield back control.

The simpler system worked best. "Too little structure and agents conflict, duplicate work, and drift. Too much structure creates fragility."

An interesting finding: designing an effective system prompt to focus over long periods was more important than the harness and models themselves.

Why this resonated with me, even though I am not a Cursor user:

I have been testing Claude Code on long-running tasks. And what Cursor reports is aligned with my own findings. However, better planning and tuning of the system prompt, including tuning CLAUDE MD, has allowed me to leverage Claude Code more effectively for these long-running tasks.

Here are a few notes on planning and how you get something like this to work in Claude Code:

You can do effective planning in many different ways. You can create an initial plan and complete it with Claude Code (in plan mode). Or you can brainstorm the plan with Claude Code directly (in plan mode). Claude Code is excellent at managing plans for you in case you don't want lots of moving parts. This, together with subagents works extremely well in Claude Code already. However, you can also get more creative with how planning is done to mimic the subplanners proposed by Cursor. Claude Code is extremely flexible with all its functionalities (Skills, Slash Commands, Subagents, Hooks, etc.). I will share more on this later after I finish with some experiments I am currently working on.

When planning, it helps if you are also involved in the process. If you are a Cluade Code user, you can trigger the AskUserQuestion tool to inject inputs that will help with making the plan robust.

From here, you can offload individual work to subagents (in parallel if you want). The great part about this is that in Claude Code subagents manage their own context, which keeps the main orchestrator's context clean and only for the high-level stuff. You can customize your subagents with models and tools. The planning is core for the coordination to work. The system prompt helps to maintain stability and better manage context. The subagents are just in charge of executing the work.

I will be sharing more on my setup in the coming weeks. I am fascinated by how far we can push agent harnesses for long-horizon tasks. Stay tuned!

279

360

39.6K

105

向阳乔木

3d ago

如何让你的网页设计减少AI味？给你一套设计语言系统。

配套有Claude Code、Cursor插件。

紫色渐变、毛玻璃效果、卡片套卡片、"Welcome to Our Platform" 这种空洞的标题……

一眼认出：这是 AI 写的。

不是说 AI 不行，而是大多数人在用 AI 做设计时，缺了一样东西：设计语言。

比如，让 AI 帮你优化一个页面，它给你加了一堆花里胡哨的效果。

你觉得不对劲，但说不清哪里不对。

你只能说"改简单点"，然后 AI 就把所有东西都删光了。

问题在哪？

你脑子里想要的是"更多留白"、"减少视觉层级"、"用垂直节奏来组织内容"，但你从来没用过这些词。

不知道怎么表达，AI 就不知道怎么做。

这就像你去理发店，只会说"剪短点"，理发师就只能瞎猜你想要什么样。

Impeccable 要解决的就是这个问题：给你一套设计师的语言系统。

核心是 17 个命令：

• /audit - 全面体检，找出所有问题
• /polish - 最后润色，从"还行"到"精致"
• /simplify - 做减法，回归本质
• /bolder - 加强视觉冲击力
• /quieter - 降低视觉噪音

命令背后，是一整套设计原则和反面案例库。

比如 /audit 会检查：

• 对比度够不够（文字看得清吗）

• 触控目标够不够大（手指点得到吗）

• 语义化 HTML（屏幕阅读器能理解吗）

• AI 痕迹（有没有用那些烂大街的设计套路）

重点是，它不会直接帮你改。

而是生成一份详细的诊断报告，告诉你哪里有问题、为什么有问题、该用什么命令去修。

为什么这套系统有效？

1. 建立词汇表

你学会了用 /bolder 而不是"再炫一点"，用 /clarify 而不是"改清楚点"。

有了精确的语言，沟通效率就上去了。

2. 系统化思考

不是零散地改，而是按照"诊断 → 标准化 → 优化 → 增强"的流程走。

每一步都有对应的命令。

3. 积累反面案例

告诉什么是"AI 味儿"的设计，比如：

• 灰色文字放在彩色背景上（对比度差）

• 卡片套卡片（层级混乱）

• bounce 缓动效果（过时且廉价）

• 空洞的营销文案（没人会读）

它适合谁？

如果你是：
① 开发者，想让 AI 写出的界面不那么"AI 味儿"

② 设计师，想让 AI 理解你的设计意图

③ 产品经理，想用更精确的语言和团队沟通

不是教你怎么用 Figma，而是教你怎么思考设计。

Impeccable 背后其实在回答一个问题：

AI 时代，人的价值在哪？

不是执行（AI 能做），而是判断和品味。

你得知道什么是好的，什么是烂的，才能引导 AI 做出好东西。

这套命令系统，本质上是在训练你的设计判断力。

你用得越多，就越能看出一个界面哪里不对劲，该用什么方式去改。

慢慢地，你就不需要依赖这些命令了，因为这些思维方式已经长在你脑子里了。

---
项目地址见评论

176

282

16.6K

Jeff Dean

4d ago

An updated version of our MedGemma model is out with major accuracy improvements in medical-related tasks.  We also released MedASR, which specialized for low error rates for medical speech recognition. We're excited to see what the health community does with these models! 🩺

Omar Sanseviero

4d ago

Introducing MedGemma 1.5, an open-access model for multimodal medical use cases

It expands the tasks and data formats it can understand (high-dimensional medical imaging, EHRs, anatomical localization with bounding boxes, etc)

research.google/blog/next-gene…

138

1,082

124.9K

622

149

56.1K

小互

6d ago

全球知名大模型竞技榜单 lmarena 统计了 2023 年中之后所有登顶过大模型榜单的模型

从 GPT-4 开始算起，模型平均在 NO.1 的位置只能待 35 天左右。

不是半年，也不是一年，一个多月就会被新模型超过。

更夸张的是，很多模型在登顶后 5 个月左右就会掉出 Top 5，7 个月连 Top 10 都很难保住。

看看具体例子会更有冲击力：

曾经风光无限的 o1，现在排到 #56；

当年被认为“最强推理模型”的 Claude 3 Opus，已经掉到 #139

它们都曾经是：

顶级模型讨论度极高被认为“领先一代”的产品

没有谁能长期站在榜首，优势窗口正在被压缩到以“月”为单位。

当模型平均 35 天就能被下一代超越，甚至超过了某些AI产品的迭代周期...😄

这就可能意味着一件关键的事： 模型基础能力的提升速度，甚至快过绝大多数产品的迭代周期。

而产品层反而被模型能力“反向压扁”。

00:49

186

115

57.2K

OpenAI

2d ago

x.com/OpenAI/status/…

OpenAI

2d ago

In the coming weeks, we plan to start testing ads in ChatGPT free and Go tiers.

We’re sharing our principles early on how we’ll approach ads–guided by putting user trust and transparency first as we work to make AI accessible to everyone.

What matters most:
- Responses in ChatGPT will not be influenced by ads.

- Ads are always separate and clearly labeled.

- Your conversations are private from advertisers.

- Plus, Pro, Business, and Enterprise tiers will not have ads.

3,466

1,479

9,093

15.1M

378

216.6K

meng shao

3d ago

Manus 最新博客介绍他们核心的「Manus Sandbox 」
manus.im/blog/manus-san…

Sandbox 是 Manus 为每个任务分配的一个完全隔离的云虚拟机。它像一台独立的云端计算机，能并行运行多个任务，而不互相干扰。Sandbox 赋予 AI Agent 完整的计算能力，包括网络连接、文件系统、浏览器和各种软件工具。

· 关键优势：AI Agent 经过设计和训练，能智能选择并正确使用这些工具。例如，它可以通过编写代码来解决问题，甚至创建完整的网站或移动应用。所有操作都在 Manus 的虚拟化平台上 24/7 运行，不消耗用户本地资源。
· 实际应用：用户可以上传附件，AI 在 Sandbox 中生成文件或工件（如代码、配置）。这使得 AI 从“思考”转向“行动”，如自动化开发或数据处理。
· 创新点：Sandbox 强调“完整性”，类似于个人电脑，但运行在云端，确保 AI 能处理复杂任务，而非仅限于文本响应。

Manus Sandbox 的功能特性

1. 内容存储：
  · 用户上传的附件
  · AI 执行过程中创建的文件和工件
  · 任务配置
  用户可以通过界面“查看此任务的所有文件”或直接询问 AI （如“将你写的所有代码打包发给我”）来访问这些内容。

2. 生命周期管理（平衡效率与持久性）：
  · 创建：新会话时按需创建
  · 休眠/唤醒：无活动时自动休眠（文件保持不变）；需要操作时自动唤醒
  · 回收/重建：长时间休眠后回收（免费用户7天，Pro 用户21天）。重建时，AI 会自动恢复关键文件（如工件、附件、重要项目文件如 Slides 或 WebDev），但不包括临时代码或中间文件。
  · 长运行任务建议：对于需要持续运行的后端服务，推荐使用 Manus Web 开发功能构建前后端，并部署到公共互联网。

3. 安全性：
  · 遵循“零信任”原则：用户和 AI 对 Sandbox 有完全控制权（如获取 root 权限、修改系统文件，甚至格式化磁盘），但所有操作仅限于该 Sandbox 内部，不会影响其他任务、Manus 服务或用户账户数据。
  · 如果发生不可恢复错误，Manus 会自动创建新 Sandbox 继续任务，确保连续性。

隐私与数据共享风险防范
Sandbox 被视为用户的“私人计算机”，可能包含敏感信息。Manus 有严格的隐私政策，不会未经授权读取或分享数据，但用户需注意分享场景。

1. 分享 vs. 协作：
  · 分享（通过“分享”按钮）：接收者仅看到对话消息和输出工件，Sandbox 完全不可见，无需担心泄露。
  · 协作（邀请特定用户参与）：协作者可发送指令、控制执行，并通过 AI 访问/修改 Sandbox 文件，可能导致数据泄露。同时，连接器会自动禁用，防止协作者访问。

2. 最佳实践：
  · 添加协作者前，检查 Sandbox 是否含敏感内容。
  · 如果任务已有敏感信息，创建新任务，仅复制必要内容后再邀请。
  · 避免在协作会话中发送个人敏感数据。

Manus

4d ago

We wrote a deep dive on the Manus Sandbox

→ How the environment is designed
→ How your data stays private
→ How sharing and collaboration are handled differently

If you're trusting an AI with real work, you should know what's under the hood.

Read it here:
manus.im/blog/manus-san…

799

149.8K

141

138

21.9K

向阳乔木

4d ago

这篇文章有点厉害，把组织如何用AI提效讲的很清楚。

文章超级长，转写一半大家感受下，推荐看原文

---

你可能会看到一个矛盾的现象。

AI帮个人干活，效率高得惊人，但放到公司里，效果就大打折扣了。

为什么？

因为公司里的活儿，本质上不是一个人能搞定的。

需要协作、谈判、升级决策，要在时间线上不断对齐判断。

一个再聪明的AI，如果只能单打独斗，在组织里也就是个"局部优化"的工具。

作者这篇文章，主要讲AI怎么从"个人助理"进化成"组织智能"。

上下文不是藏在某个地方的宝藏

很多人觉得，只要给AI足够多的上下文，它就能理解组织怎么运作。

前提是：组织的上下文是个完整的、结构化的东西，就像化石埋在地层里，只要挖出来就行。

真相是，大部分组织根本不是这样运作的。

上下文不存在于某个数据库里，不在某份文档里，甚至不在老板脑子里。

它是在互动中不断生成和消失的。

今天开会定的事，明天可能因为一封邮件就变了。

AI要理解组织，不能只是"读资料"，它得参与进来，像人一样在邮件、会议、文档里观察决策怎么展开，冲突怎么升级，共识怎么形成。

这才是真正的"上下文学习"。

人类的协作史，就是AI的未来

尤瓦尔·赫拉利在《人类简史》里说，人类能统治地球，不是因为个体更聪明，而是因为学会了大规模协作。

我们发明了神话、法律、货币、宗教这些"共同故事"，让陌生人也能对齐行为。

科学也是这样。

17世纪之前，科学知识是碎片化的，靠私人信件和书籍传播，错误会一直流传，发现会不断丢失。

转折点不是某个新理论，而是协作系统的出现如科学期刊、学术社团、同行评议。

知识开始积累，是因为判断变成了社会化的过程。

电话也一样。

早期电话是点对点连接的，你得知道线通到哪儿才能打。

网络一大，这套就崩了。

怎么办？接线员出现了。

她们坐在交换机前，手动连接电话，记得谁在打给谁，哪些电话更紧急，怎么处理冲突。

电话能规模化，是因为有了这个"人工中介层"。

软件开发也经历过这个阶段。

Git之前，代码协作很脆弱。

CVS和SVN是中心化的，多人改代码得排队，冲突成本很高。

Git让分支变便宜了，记录变成了一等公民，冲突变得可见、可解决。

GitHub又加了一层社会化协作：PR、代码审查、issue讨论。

规律很明显：个体能力先出现，但指数级的生产力，只有在协作结构出现后才会爆发。

AI现在就在这个节点上。

组织不会按"角色"重组，而是按"协作单元"

很多人想象的未来是：AI接管某些岗位，人类做剩下的。

但作者觉得不是这样。

AI不受人类的限制——注意力、带宽、专业分工、层级结构——这些都不存在。

所以未来的组织不会按"角色"设计，而是按"协作单元"设计。

比如法务。

法务的核心工作是"共同立场"。

合同要经过律师、合伙人、客户的多轮谈判，立场在这个过程中不断演化。

今天，资深合伙人的价值很大一部分在于"记得住"——记得之前的先例、风险、立场变化。

未来，AI会承担这部分协调工作。

它跟踪所有未解决的问题，发现立场冲突，把判断性的决策升级给合适的人。

法务团队会重组：大量AI做机械性的起草和信息收集，少数资深合伙人做决策、风险判断、客户关系维护。

再比如市场。

市场的挑战是"叙事一致性"。

产品市场、增长、品牌、销售，各自有各自的说法，怎么对齐？

今天靠开会、审稿、非正式影响力。

未来，AI会跨渠道追踪叙事，发现偏离，升级冲突。

人类的角色从"渠道负责人"变成"叙事把关人"和"战略意图制定者"。

财务、产品也是类似的逻辑。

AI不是替代某个岗位，而是重新分配了协调工作。

最快的路径是：

把AI嵌入到组织已经在用的协作工具里——邮件、消息、浏览器、文档。

这不是"遗留系统"，它们是工作的活基础设施。

意图怎么表达、分歧怎么浮现、决策怎么升级、责任怎么记录，都编码在这些工具里。

而且，升级机制已经内置了：@提及、批注、评论、建议编辑、通知。（AI也可以做）

AI要做的，不是发明新的协作方式，而是学会在这些已有的机制里参与和升级。

Aatish Nayak

1w ago

x.com/i/article/2008…

291

125.5K

132

12.8K

Philipp Schmid

6d ago

The Gemini API now natively supports GCS Buggest and External URLs (Signed Urls from AWS or Azure) for improved file inputs!

🆕 Register Google Cloud Storage files directly without re-uploading data. (max 2GB)
🆕 Fetch content from public URLs or AWS S3 and Azure pre-signed URLs. (max 100mb)
📄 Supports PDF, JSON, HTML, CSS, XML, and major image formats.
🔐 Secure GCS access via Service Agent permissions and OAuth.

4,357

LlamaIndex 🦙

5d ago

Can filesystem tools really replace vector search? We put agentic file exploration to the test against traditional RAG.

Our experiment with fs-explorer agent vs. hybrid RAG revealed some surprising insights about when each approach shines:

🏃 RAG is faster - averaging 3.81 seconds quicker thanks to fewer LLM calls and consistent network requests
🎯 Filesystem agents are more accurate - scoring 2 points higher on correctness by accessing full file context instead of chunked fragments
📈 Scale changes everything - at 100-1000 documents, RAG outperforms filesystem exploration in speed and maintains quality
⚖️ Context matters most - filesystem tools excel with smaller files that fit in the LLM's context window, while RAG handles massive document collections

The verdict? It depends on your use case. Filesystem agents work great for smaller, focused document sets where accuracy trumps speed. RAG remains king for large-scale applications requiring real-time responses.

Read the full experimental analysis by @itsclelia and see the results for yourselllamaindex.ai/blog/did-files…rKq

30.6K

Sources

Files are all you need 🗂️

I wrote a blog post to capture a trend I’m seeing in the AI agent landscape: that the primary way to equip agents with actions and context is through files and filesystems.
1️⃣ It is an easy way for agents to store context for later (e.g. @dexhorthy’s progressive disclosure)
2️⃣ It is a powerful search interface, in addition to or instead of RAG
3️⃣ It is a more flexible way to equip agents with tool calling.

Coding agents + file tools are a good initial proxy for computer use. We’ll see if the trend persists, but there’s a ton more potential to explore here.

Blog:llamaindex.ai/blog/files-are…w

Files are becoming the primary interface for AI agents to manage context, store conversations, and access skills 📁

@jerryjliu0 breaks down how coding agents like Claude Code and @cursor_ai are centralizing around filesystems as core abstractions, moving away from complex tool ecosystems:

📝 Agents store long conversation histories in searchable files to overcome context window limitations
🔍 File-based retrieval with semantic search outperforms traditional RAG patterns for dynamic context traversal
⚡ Skills defined as simple files are replacing complex MCP tools - just copy API specs into markdown files
🛠️ Agents need only ~5-10 core tools (CLI, code interpreter, web fetch) plus filesystem access to be highly capable

The challenges ahead include parsing non-plaintext documents (PDFs, Word, Excel) and scaling file search to massive collections. That's exactly why we built LlamaCloud's Parse, Extract, and Sheets capabilities - to convert any document format into agent-ready context.

Read the full analysllamaindex.ai/blog/files-are…zUc6

This is great insight from Cursor on long-running agents.

It turns out planning is all you need.

On a serious note, planning is critical to be productive and effective with AI Agents.

It's aligned with how I get Claude Code to effectively work on long-running tasks. (More of my thoughts on Claude Code towards the end of the post)

First, let's discuss the insights from the Cursor article.

The big problem with multi-agent systems today is the coordination/communication.

The solution Cursor proposes is careful planning.

I agree that the best way to deal with this challenge today is to do careful planning.

Planners can explore the codebase and create these tasks.

Then, subplanners are spawned to address specific categories of tasks. The great thing about this is that it enables parallelization and recursive loops, ideal for this kind of work.

From here, subagents can focus on assigned subtasks once they are completed (and push changes). One important aspect of this work is that subagents don't coordinate at all and are oblivious to the bigger picture. But they don't need to be to produce high-quality code that doesn't conflict. The issue with having subagents talk to each other is that this can lead to communication bottlenecks, duplicate work, and potential drift.

This can operate in cycles, which are all verified using a judge agent. The judge agent determines if work can continue or if there is an issue to address on every cycle.

Cursor managed to build a web browser from scratch with this approach. The agent ran for a week, writing over 1M+ lines of code across 1K files.

Cursor found that GPT-5.2 is better for this set up. They find that Opus 4.5 tends to stop earlier, take shortcuts, and quickly yield back control.

The simpler system worked best. "Too little structure and agents conflict, duplicate work, and drift. Too much structure creates fragility."

An interesting finding: designing an effective system prompt to focus over long periods was more important than the harness and models themselves.

Why this resonated with me, even though I am not a Cursor user:

I have been testing Claude Code on long-running tasks. And what Cursor reports is aligned with my own findings. However, better planning and tuning of the system prompt, including tuning CLAUDE MD, has allowed me to leverage Claude Code more effectively for these long-running tasks.

Here are a few notes on planning and how you get something like this to work in Claude Code:

You can do effective planning in many different ways. You can create an initial plan and complete it with Claude Code (in plan mode). Or you can brainstorm the plan with Claude Code directly (in plan mode). Claude Code is excellent at managing plans for you in case you don't want lots of moving parts. This, together with subagents works extremely well in Claude Code already. However, you can also get more creative with how planning is done to mimic the subplanners proposed by Cursor. Claude Code is extremely flexible with all its functionalities (Skills, Slash Commands, Subagents, Hooks, etc.). I will share more on this later after I finish with some experiments I am currently working on.

When planning, it helps if you are also involved in the process. If you are a Cluade Code user, you can trigger the AskUserQuestion tool to inject inputs that will help with making the plan robust.

From here, you can offload individual work to subagents (in parallel if you want). The great part about this is that in Claude Code subagents manage their own context, which keeps the main orchestrator's context clean and only for the high-level stuff. You can customize your subagents with models and tools. The planning is core for the coordination to work. The system prompt helps to maintain stability and better manage context. The subagents are just in charge of executing the work.

I will be sharing more on my setup in the coming weeks. I am fascinated by how far we can push agent harnesses for long-horizon tasks. Stay tuned!

如何让你的网页设计减少AI味？给你一套设计语言系统。

配套有Claude Code、Cursor插件。

紫色渐变、毛玻璃效果、卡片套卡片、"Welcome to Our Platform" 这种空洞的标题……

一眼认出：这是 AI 写的。

不是说 AI 不行，而是大多数人在用 AI 做设计时，缺了一样东西：设计语言。

比如，让 AI 帮你优化一个页面，它给你加了一堆花里胡哨的效果。

你觉得不对劲，但说不清哪里不对。

你只能说"改简单点"，然后 AI 就把所有东西都删光了。

问题在哪？

你脑子里想要的是"更多留白"、"减少视觉层级"、"用垂直节奏来组织内容"，但你从来没用过这些词。

不知道怎么表达，AI 就不知道怎么做。

这就像你去理发店，只会说"剪短点"，理发师就只能瞎猜你想要什么样。

Impeccable 要解决的就是这个问题：给你一套设计师的语言系统。

核心是 17 个命令：

• /audit - 全面体检，找出所有问题
• /polish - 最后润色，从"还行"到"精致"
• /simplify - 做减法，回归本质
• /bolder - 加强视觉冲击力
• /quieter - 降低视觉噪音

命令背后，是一整套设计原则和反面案例库。

比如 /audit 会检查：

• 对比度够不够（文字看得清吗）

• 触控目标够不够大（手指点得到吗）

• 语义化 HTML（屏幕阅读器能理解吗）

• AI 痕迹（有没有用那些烂大街的设计套路）

重点是，它不会直接帮你改。

而是生成一份详细的诊断报告，告诉你哪里有问题、为什么有问题、该用什么命令去修。

为什么这套系统有效？

1. 建立词汇表

你学会了用 /bolder 而不是"再炫一点"，用 /clarify 而不是"改清楚点"。

有了精确的语言，沟通效率就上去了。

2. 系统化思考

不是零散地改，而是按照"诊断 → 标准化 → 优化 → 增强"的流程走。

每一步都有对应的命令。

3. 积累反面案例

告诉什么是"AI 味儿"的设计，比如：

• 灰色文字放在彩色背景上（对比度差）

• 卡片套卡片（层级混乱）

• bounce 缓动效果（过时且廉价）

• 空洞的营销文案（没人会读）

它适合谁？

如果你是：
① 开发者，想让 AI 写出的界面不那么"AI 味儿"

② 设计师，想让 AI 理解你的设计意图

③ 产品经理，想用更精确的语言和团队沟通

不是教你怎么用 Figma，而是教你怎么思考设计。

Impeccable 背后其实在回答一个问题：

AI 时代，人的价值在哪？

不是执行（AI 能做），而是判断和品味。

你得知道什么是好的，什么是烂的，才能引导 AI 做出好东西。

这套命令系统，本质上是在训练你的设计判断力。

你用得越多，就越能看出一个界面哪里不对劲，该用什么方式去改。

慢慢地，你就不需要依赖这些命令了，因为这些思维方式已经长在你脑子里了。

---
项目地址见评论

An updated version of our MedGemma model is out with major accuracy improvements in medical-related tasks. We also released MedASR, which specialized for low error rates for medical speech recognition. We're excited to see what the health community does with these models! 🩺

Introducing MedGemma 1.5, an open-access model for multimodal medical use cases

It expands the tasks and data formats it can understand (high-dimensional medical imaging, EHRs, anatomical localization with bounding boxes, etc)

research.google/blog/next-gene…

全球知名大模型竞技榜单 lmarena 统计了 2023 年中之后所有登顶过大模型榜单的模型

从 GPT-4 开始算起，模型平均在 NO.1 的位置只能待 35 天左右。

不是半年，也不是一年，一个多月就会被新模型超过。

更夸张的是，很多模型在登顶后 5 个月左右就会掉出 Top 5，7 个月连 Top 10 都很难保住。

看看具体例子会更有冲击力：

曾经风光无限的 o1，现在排到 #56；

当年被认为“最强推理模型”的 Claude 3 Opus，已经掉到 #139

它们都曾经是：

顶级模型讨论度极高被认为“领先一代”的产品

没有谁能长期站在榜首，优势窗口正在被压缩到以“月”为单位。

当模型平均 35 天就能被下一代超越，甚至超过了某些AI产品的迭代周期...😄

这就可能意味着一件关键的事：模型基础能力的提升速度，甚至快过绝大多数产品的迭代周期。

而产品层反而被模型能力“反向压扁”。

In the coming weeks, we plan to start testing ads in ChatGPT free and Go tiers.

We’re sharing our principles early on how we’ll approach ads–guided by putting user trust and transparency first as we work to make AI accessible to everyone.

What matters most:
- Responses in ChatGPT will not be influenced by ads.

- Ads are always separate and clearly labeled.

- Your conversations are private from advertisers.

- Plus, Pro, Business, and Enterprise tiers will not have ads.

Manus 最新博客介绍他们核心的「Manus Sandbox 」
manus.im/blog/manus-san…

Sandbox 是 Manus 为每个任务分配的一个完全隔离的云虚拟机。它像一台独立的云端计算机，能并行运行多个任务，而不互相干扰。Sandbox 赋予 AI Agent 完整的计算能力，包括网络连接、文件系统、浏览器和各种软件工具。

· 关键优势：AI Agent 经过设计和训练，能智能选择并正确使用这些工具。例如，它可以通过编写代码来解决问题，甚至创建完整的网站或移动应用。所有操作都在 Manus 的虚拟化平台上 24/7 运行，不消耗用户本地资源。
· 实际应用：用户可以上传附件，AI 在 Sandbox 中生成文件或工件（如代码、配置）。这使得 AI 从“思考”转向“行动”，如自动化开发或数据处理。
· 创新点：Sandbox 强调“完整性”，类似于个人电脑，但运行在云端，确保 AI 能处理复杂任务，而非仅限于文本响应。

Manus Sandbox 的功能特性

1. 内容存储：
· 用户上传的附件
· AI 执行过程中创建的文件和工件
· 任务配置
用户可以通过界面“查看此任务的所有文件”或直接询问 AI （如“将你写的所有代码打包发给我”）来访问这些内容。

2. 生命周期管理（平衡效率与持久性）：
· 创建：新会话时按需创建
· 休眠/唤醒：无活动时自动休眠（文件保持不变）；需要操作时自动唤醒
· 回收/重建：长时间休眠后回收（免费用户7天，Pro 用户21天）。重建时，AI 会自动恢复关键文件（如工件、附件、重要项目文件如 Slides 或 WebDev），但不包括临时代码或中间文件。
· 长运行任务建议：对于需要持续运行的后端服务，推荐使用 Manus Web 开发功能构建前后端，并部署到公共互联网。

3. 安全性：
· 遵循“零信任”原则：用户和 AI 对 Sandbox 有完全控制权（如获取 root 权限、修改系统文件，甚至格式化磁盘），但所有操作仅限于该 Sandbox 内部，不会影响其他任务、Manus 服务或用户账户数据。
· 如果发生不可恢复错误，Manus 会自动创建新 Sandbox 继续任务，确保连续性。

隐私与数据共享风险防范
Sandbox 被视为用户的“私人计算机”，可能包含敏感信息。Manus 有严格的隐私政策，不会未经授权读取或分享数据，但用户需注意分享场景。

1. 分享 vs. 协作：
· 分享（通过“分享”按钮）：接收者仅看到对话消息和输出工件，Sandbox 完全不可见，无需担心泄露。
· 协作（邀请特定用户参与）：协作者可发送指令、控制执行，并通过 AI 访问/修改 Sandbox 文件，可能导致数据泄露。同时，连接器会自动禁用，防止协作者访问。

2. 最佳实践：
· 添加协作者前，检查 Sandbox 是否含敏感内容。
· 如果任务已有敏感信息，创建新任务，仅复制必要内容后再邀请。
· 避免在协作会话中发送个人敏感数据。

We wrote a deep dive on the Manus Sandbox

→ How the environment is designed
→ How your data stays private
→ How sharing and collaboration are handled differently

If you're trusting an AI with real work, you should know what's under the hood.

Read it here:
manus.im/blog/manus-san…

这篇文章有点厉害，把组织如何用AI提效讲的很清楚。

文章超级长，转写一半大家感受下，推荐看原文

---

你可能会看到一个矛盾的现象。

AI帮个人干活，效率高得惊人，但放到公司里，效果就大打折扣了。

为什么？

因为公司里的活儿，本质上不是一个人能搞定的。

需要协作、谈判、升级决策，要在时间线上不断对齐判断。

一个再聪明的AI，如果只能单打独斗，在组织里也就是个"局部优化"的工具。

作者这篇文章，主要讲AI怎么从"个人助理"进化成"组织智能"。

上下文不是藏在某个地方的宝藏

很多人觉得，只要给AI足够多的上下文，它就能理解组织怎么运作。

前提是：组织的上下文是个完整的、结构化的东西，就像化石埋在地层里，只要挖出来就行。

真相是，大部分组织根本不是这样运作的。

上下文不存在于某个数据库里，不在某份文档里，甚至不在老板脑子里。

它是在互动中不断生成和消失的。

今天开会定的事，明天可能因为一封邮件就变了。

AI要理解组织，不能只是"读资料"，它得参与进来，像人一样在邮件、会议、文档里观察决策怎么展开，冲突怎么升级，共识怎么形成。

这才是真正的"上下文学习"。

人类的协作史，就是AI的未来

尤瓦尔·赫拉利在《人类简史》里说，人类能统治地球，不是因为个体更聪明，而是因为学会了大规模协作。

我们发明了神话、法律、货币、宗教这些"共同故事"，让陌生人也能对齐行为。

科学也是这样。

17世纪之前，科学知识是碎片化的，靠私人信件和书籍传播，错误会一直流传，发现会不断丢失。

转折点不是某个新理论，而是协作系统的出现如科学期刊、学术社团、同行评议。

知识开始积累，是因为判断变成了社会化的过程。

电话也一样。

早期电话是点对点连接的，你得知道线通到哪儿才能打。

网络一大，这套就崩了。

怎么办？接线员出现了。

她们坐在交换机前，手动连接电话，记得谁在打给谁，哪些电话更紧急，怎么处理冲突。

电话能规模化，是因为有了这个"人工中介层"。

软件开发也经历过这个阶段。

Git之前，代码协作很脆弱。

CVS和SVN是中心化的，多人改代码得排队，冲突成本很高。

Git让分支变便宜了，记录变成了一等公民，冲突变得可见、可解决。

GitHub又加了一层社会化协作：PR、代码审查、issue讨论。

规律很明显：个体能力先出现，但指数级的生产力，只有在协作结构出现后才会爆发。

AI现在就在这个节点上。

组织不会按"角色"重组，而是按"协作单元"

很多人想象的未来是：AI接管某些岗位，人类做剩下的。

但作者觉得不是这样。

AI不受人类的限制——注意力、带宽、专业分工、层级结构——这些都不存在。

所以未来的组织不会按"角色"设计，而是按"协作单元"设计。

比如法务。

法务的核心工作是"共同立场"。

合同要经过律师、合伙人、客户的多轮谈判，立场在这个过程中不断演化。

今天，资深合伙人的价值很大一部分在于"记得住"——记得之前的先例、风险、立场变化。

未来，AI会承担这部分协调工作。

它跟踪所有未解决的问题，发现立场冲突，把判断性的决策升级给合适的人。

法务团队会重组：大量AI做机械性的起草和信息收集，少数资深合伙人做决策、风险判断、客户关系维护。

再比如市场。

市场的挑战是"叙事一致性"。

产品市场、增长、品牌、销售，各自有各自的说法，怎么对齐？

今天靠开会、审稿、非正式影响力。

未来，AI会跨渠道追踪叙事，发现偏离，升级冲突。

人类的角色从"渠道负责人"变成"叙事把关人"和"战略意图制定者"。

财务、产品也是类似的逻辑。

AI不是替代某个岗位，而是重新分配了协调工作。

最快的路径是：

把AI嵌入到组织已经在用的协作工具里——邮件、消息、浏览器、文档。

这不是"遗留系统"，它们是工作的活基础设施。

意图怎么表达、分歧怎么浮现、决策怎么升级、责任怎么记录，都编码在这些工具里。

而且，升级机制已经内置了：@提及、批注、评论、建议编辑、通知。（AI也可以做）

AI要做的，不是发明新的协作方式，而是学会在这些已有的机制里参与和升级。

The Gemini API now natively supports GCS Buggest and External URLs (Signed Urls from AWS or Azure) for improved file inputs!

🆕 Register Google Cloud Storage files directly without re-uploading data. (max 2GB)
🆕 Fetch content from public URLs or AWS S3 and Azure pre-signed URLs. (max 100mb)
📄 Supports PDF, JSON, HTML, CSS, XML, and major image formats.
🔐 Secure GCS access via Service Agent permissions and OAuth.

Can filesystem tools really replace vector search? We put agentic file exploration to the test against traditional RAG.

Our experiment with fs-explorer agent vs. hybrid RAG revealed some surprising insights about when each approach shines:

🏃 RAG is faster - averaging 3.81 seconds quicker thanks to fewer LLM calls and consistent network requests
🎯 Filesystem agents are more accurate - scoring 2 points higher on correctness by accessing full file context instead of chunked fragments
📈 Scale changes everything - at 100-1000 documents, RAG outperforms filesystem exploration in speed and maintains quality
⚖️ Context matters most - filesystem tools excel with smaller files that fit in the LLM's context window, while RAG handles massive document collections

The verdict? It depends on your use case. Filesystem agents work great for smaller, focused document sets where accuracy trumps speed. RAG remains king for large-scale applications requiring real-time responses.

Read the full experimental analysis by @itsclelia and see the results for yourselllamaindex.ai/blog/did-files…rKq