The article details how ByteDance's DataMind project builds an "AI + Data" unified data engine based on Apache Doris. The DataMind project addresses the limitations of traditional solutions in handling unstructured data and integrating with AI models. It integrates hybrid search capabilities, including Faiss-based vector indexing and Tablet-level BM25 scoring functions. At the same time, it complements AI functions such as AI_QUERY and TEXT_EMBEDDING, and supports Python UDF to meet custom model requirements. The article also delves into the implementation of GraphRAG (Graph Retrieval-Augmented Generation) construction and query mechanisms on DataMind, enhancing complex knowledge retrieval capabilities. Finally, it shares the implementation of an enterprise AI-powered data query platform, effectively solving practical challenges such as massive data volume, security, and query latency by optimizing data agent routing and connecting data lake permission systems, providing a solid foundation for the application of AI in enterprise data consumption.



