7 Emerging Trends in Large AI Models

7 Emerging Trends in Large AI Models

The article explores seven key trends in large models, including compute power investments, multimodality, open source, trustworthy AI, personal AI applications, AI Agents, and Intelligence as a Service, highlighting advancements and challenges in AI development.

1. The Miracle of Massive Compute Investment Has Not Peaked

The low-cost training of DeepSeek at $5.57 million has caused a global stir. However, this does not overturn the fundamental logic that large models require massive compute power. In fact, the single training cost of this model is about one-eighth of that of similar foreign models, which is not yet a difference in orders of magnitude. Its significance lies in the engineering innovation of reproducing the effects of existing models in a more cost-effective manner. Previously, the exaggerated contrast between the training costs of large models in China and the U.S. was actually comparing the total expenditure of American data center construction, chip purchases, network setup, and scientist salaries at the level of hundreds of billions of dollars with the single training cost of DeepSeek, which included a lot of exaggeration and emotional factors.

Using even more compute power to explore the upper limits of large models remains an industry consensus. It is expected that foreign countries will launch GPT-5 and Llama 4-level large models in the first half of this year. The U.S. is vigorously building large compute clusters, with Elon Musk's xAI already establishing the world's largest 200,000 H100 compute cluster and training the Grok3 large model on this basis. Google is expected to invest $75 billion this year, a 43% increase, mostly for compute center construction; Meta is expected to invest $60-65 billion, a 53%-66% increase; Amazon is expected to invest $100 billion, a more than 20% increase. Additionally, Japan's SoftBank Group, OpenAI, and Oracle have jointly launched the Stargate Project, which will invest $500 billion over the next four years to build ultra-large compute infrastructure in the U.S. These will drive more breakthroughs in pre-training of large models, and with the current popular post-training enhancements like reinforcement learning, the leap in large model capabilities may further accelerate. Many experts predict that AGI (Artificial General Intelligence) could be realized in the next two to three years.

High-end chip supply remains a bottleneck for China's next-generation large models, and there is a risk of insufficient training chip supply again. Although the number and capabilities of China's AI high-end chip companies have improved since last year, with companies like Huawei, Suiyuan Technology, Moore Thread, Haiguang, and Biren designing domestic chips comparable to Nvidia's A100, the manufacturing of domestic high-end chips still faces challenges due to TSMC's suspension of 7nm capacity supply and HBM bans.

The steps that Claude 3.5 Haiku used to solve a simple math problem

2. Slow Thinking and Multimodality Become Standard

More Fields Will Experience Their AlphaGo Moment

The post-training process, including reinforcement learning, has brought out the potential accumulated during pre-training, and slow thinking has significantly improved reasoning capabilities. Stimulated by the DeepSeek effect, large model companies at home and abroad are accelerating the launch of the next generation of large models, such as OpenAI's foundational large model GPT-4.5 and reasoning model o3; Anthropic's hybrid reasoning model Claude 3.7, which integrates deep thinking and rapid output; Google's Gemini 2.0 and the more powerful reasoning model Gemini 2.5 Pro, as well as xAI's Grok 3. Domestically, Tencent's Hunyuan has released the strong reasoning model T1, which combines fast and slow thinking and is the first to apply the hybrid Mamba architecture losslessly to ultra-large reasoning models, significantly reducing training and inference costs. DeepSeek has updated a model called DeepSeek-V3-0324, which has achieved scores higher than GPT-4.5 on math and code-related evaluation sets.

Multimodality is the natural state of the human world, and the development trend of large models must move towards multimodality, expanding from single text, images, videos, and 3D to sound, light, electricity, and even molecules and atoms, achieving an understanding and generation of the real world. Native multimodality is the future direction. Recently released Google Gemini 2.0 Flash can edit pictures with a single sentence, comparable to professional Photoshop software editing effects; GPT4o's latest stylized text-to-image capability has gone viral online. Tencent's newly open-sourced Hunyuan 3D model supports both text-to-3D and image-to-3D, allowing one-click skin changes, animation adjustments, and 3D game video generation.

With the leap in model capabilities, it is foreseeable that more fields will experience their AlphaGo moment, where large models' capabilities in various fields exceed 90% or even the highest level of human performance. OpenAI's o1 achieved near-perfect scores in the American Invitational Mathematics Examination and surpassed doctoral-level accuracy in physics, biology, and chemistry benchmark tests. Anthropic CEO Dario recently predicted that AI will be able to write 90% of the code in the next 3-6 months.

Trend Key Points Examples/Data
Massive Compute Investment - Training costs still high but optimized
- Global compute cluster expansion
- AGI expected in 2-3 years
- Chip supply bottleneck for China
- DeepSeek: $5.57M training cost
- xAI: 200k H100 cluster
- SoftBank/Oracle: $500B Stargate project
Slow Thinking & Multimodality - Hybrid reasoning models emerging
- Native multimodality becoming standard
- AlphaGo moments across fields
- GPT-4.5/Claude 3.7 reasoning models
- Gemini 2.0: image editing via text
- DeepSeek-V3 beats GPT-4.5 on math/code
Model Capabilities - Post-training unlocks potential
- Combining fast/slow thinking architectures
- Surpassing human performance benchmarks
- Mamba architecture reduces costs
- AI writing 90% code soon (Anthropic prediction)
- Doctoral-level science accuracy achieved

3. Model Open Source and Open Protocols Become New Competitive Components

The previously debated issue of open source versus closed source has leaned towards open source. DeepSeek's popularity is partly due to its open-source nature, adopting the MIT License, which supports complete open source, unrestricted commercial use, and no need for application, allowing global developers to use and evaluate it, quickly forming global influence through word of mouth. OpenAI, which had been firmly on the closed-source path, is now forced to consider open source, with Sam Altman recently stating that the previous closed-source strategy may have been on the wrong side of history, and is openly soliciting open-source solutions on social platforms, planning to create end-side open-source large models and o3 mini-level open-source models in the future.

Foreign Meta and domestic Tencent, Alibaba, and Zhipu have long embraced open-source strategies. For example, the Hunyuan text-to-image model is the industry's first Chinese-native DiT architecture text-to-image open-source model; the text-to-video large model is currently the largest open-source video model, fully open-sourced, including model weights, inference code, and model algorithms. Foreign communities like Hugging Face have become important platforms for global large model developers, with 1.52 million open-source large models and 337,000 open datasets on Hugging Face.

Equally important are the open protocols of large models, analogous to the HTTP protocol during the rise of the internet, which allows various web pages to be displayed in a unified format in browsers, making it easier for users to access information. The data communication open protocol of large models can make it easier for large models to call various tools, thereby autonomously completing various tasks. For example, the recently popular MCP (Model Context Protocol) is a model data communication protocol released by Anthropic in November last year, becoming a bridge between large models and various tools.

Shared features exist across English, French, and Chinese, indicating a degree of conceptual universality

4. In the Post-Truth Era, Building Trustworthy Large Models is Urgent

The impact of technology on knowledge and information has extended for the first time from the dissemination and interaction stages to the production stage. The accuracy and professionalism of large model knowledge output, i.e., the "trustworthiness" of large models, are becoming core competitive indicators of artificial intelligence.

While large models bring an abundance of information, the noise issues such as hallucinations in the content also trouble users. A study by the Columbia Journalism Review found that generative AI models used for news searches in the U.S. have serious accuracy issues. Researchers tested eight AI search tools with real-time search capabilities and found that more than 60% of the news source queries were incorrect.

The hallucination problem of large models is inherent in the underlying technical path of artificial intelligence and is inseparable from innovation capabilities, making it difficult to solve purely through technology. Introducing authoritative books, magazines, news, and papers, and creating new, "trustworthy" knowledge consensus mechanisms and supply systems are key to the future value of large models in production and life applications.

OpenAI signed a five-year contract with News Corp last year, gaining access to the historical content of media under the group, including The Wall Street Journal, Barron's, The Times, and The Daily Telegraph, to enhance the credibility of large model responses.

Tencent's Hunyuan is collaborating with excellent traditional publishing institutions such as the Encyclopedia of China Publishing House, People's Medical Publishing House, Shanghai Cihai Publishing House, and Chemical Industry Press to support the launch of book agents and explore trustworthy large model cooperation models based on search enhancement technology. For example, in the Yuanbao APP application plaza, the People's Medical Agent can provide users with authoritative answers in specific medical knowledge fields such as cardiovascular and cerebrovascular diseases, along with citations from relevant books, and can redirect to e-book reading platforms or jump to physical book purchase pages. This not only ensures the consensus and accuracy of output knowledge through mechanisms like footnotes, endnotes, and literature indexing but also creates a sustainable win-win model for publishing institutions and large model platforms.

In the future, those who can access more trustworthy data sources and build trustworthy evaluation and consensus mechanisms will gain a leading edge in the era of human-machine content co-creation.

5. Personal Applications Under the Logic of Smart + Internet Are Expected to Start the Matthew Effect

The release of foundational large models like GPT-4.5, DeepSeek V3, and Tencent Turbo S, and reasoning models like OpenAI o3, DeepSeek R1, and Tencent T1, marks the evolution of foundational large models to a usable stage, pushing personal applications to a new starting point.

OpenAI GPT-4.5

In the past, the lack of richness in personal applications was mainly due to the limitations of foundational large models, which were not satisfactory in complex problem analysis, multimodal generation, and understanding, leaving users with insufficient surprises. Moreover, the data of personal applications were mostly user preference data, which could not feed back into the improvement of foundational large model intelligence, so past applications that spent money on traffic and users failed to build a moat, with low user switching costs and insufficient stickiness.

Against the backdrop of relatively mature foundational large model capabilities, the platform effect that was successful in the mobile internet era is expected to play a role again. More users using AI applications can accumulate more high-quality shared knowledge, user feedback, and social interactions, allowing applications to continuously optimize and attract more users, forming a virtuous positive cycle. Taking Tencent Yuanbao as an example, after adopting the strategy of DeepSeek + Hunyuan dual-model engine drive, the number of users skyrocketed, with DAU (Daily Active Users) increasing more than 20 times from February to March this year.

China's leading advantage in applications is expected to further play out, with productivity tools becoming increasingly powerful and time-killing companion and entertainment applications continuously optimizing the experience. According to the a16z investment firm's March report on the global Top 50 generative AI applications, 11 Chinese companies' applications made the list, compared to only 3 products in August last year, showing significant growth. AI new search, text-to-image/video tools, and role-playing applications are the top three hot directions.

However, personal application innovation still faces "the bitter lesson," where people repeatedly try to improve performance through engineering means but are ultimately surpassed by simply stacking compute power. The continuous improvement of large model capabilities will "eat up" many application innovations, especially workflow applications, which are more easily replaced by new capabilities of large models. How to deepen the moat in applications requires more first-principles thinking, embedding key nodes in the user decision chain to enhance value, increase user emotional identification, and improve irreplaceability through ecological synergy. It can be said that technological iteration is the spear, scene penetration is the shield, and ecological synergy is the soil. Personal applications sometimes need to run faster to keep up with the improvement of large model capabilities and sometimes need to slow down to think about the evolution path of large models, building a "dynamic capability portfolio" of technology + scene + ecological synergy.

6. The Endpoint of Personal AI Applications is the Super Intelligent Assistant

The upgrade of foundational large model capabilities brings the unlocking of application depth. The first wave of large models, represented by ChatGPT, was good at dialogue, giving rise to applications like AI new search Perplexity. The second wave, represented by Claude 3.5 Sonnet, was good at programming, driving the popularity of Cursor, valued at $10 billion, and the rising programming star Devin. The third wave, represented by OpenAI o1, is good at deep reasoning, making Agent applications possible. Especially with the continuous breakthroughs in multimodality and reinforcement learning technologies, model effects have significantly improved, and costs have continuously decreased, it is foreseeable that Agent applications will accelerate penetration into more vertical fields, opening a new era of human-machine collaboration.

large language models (LLMs) have become the cornerstone of innovation, powering everything from chatbots and virtual assistants to sophisticated coding assistants

The new era of Agents is approaching. The recent popularity of the domestic Manus application has given the industry more expectations for the future of AI Agents. Coincidentally, OpenAI's autonomous computer use Agent Operator and deep research Agent Deep Research have begun commercial trials, moving from the laboratory to the mass market. According to foreign media reports, OpenAI plans to sell low-end Agents at $2,000 per month to "high-income knowledge workers"; mid-end Agents at $10,000 per month for software development; and high-end Agents as doctoral-level research Agents at $20,000 per month. According to Gartner's prediction, by 2028, 33% of enterprise software applications will include Agent AI, compared to less than 1% in 2024, and at least 15% of daily work decisions will be autonomously made by AI Agents. The AI Agent market will significantly grow from $5.1 billion in 2024 to $47.1 billion in 2030.

The deepening of Agent applications will drive Token consumption to increase by hundreds or even higher multiples, leading to a greater explosion in reasoning compute demand, surpassing training compute demand. To improve energy efficiency and reduce costs, large cloud computing and large model manufacturers like Google, Amazon, Meta, and OpenAI are accelerating the layout of customized ASICs, which are gradually developing into important new technical routes alongside Nvidia GPUs. Morgan Stanley predicts that the AI ASIC market will grow from $12 billion in 2024 to $30 billion in 2027, with a compound annual growth rate of 34%. At the same time, the widespread use of Agents will require models to handle larger-scale contexts, posing greater challenges to the improvement of model foundational capabilities.

7. Intelligence as a Service is the Ultimate Direction of Industry Implementation

In the form of the cloud, intelligence will become a service that can be called on demand by various industries, ultimately forming a new form of Intelligence as a Service (IaaS). In the past, economic development and digitalization levels were measured by electricity and cloud usage; in the future, we may measure intelligence levels by "Token usage."

The popularity of large models like DeepSeek has brought about a comprehensive upgrade in model effects, stimulating a new wave of enthusiasm for large models in various industries in China. However, there is still a gap in the application of generative AI between Chinese and American enterprises. Chinese enterprises' applications are mostly in the experimental stage, far from large-scale use. In contrast, American enterprises' applications are more extensive and in-depth, with 24% of American enterprises fully implementing generative AI in 2024, significantly higher than China's 19%. The U.S. government and enterprises generally adopt public cloud deployment of AI, supporting rapid AI iteration, with over 70% of organizations using cloud AI. Driven by this, the latest quarterly cloud computing revenue of large American companies has grown rapidly, with Microsoft reaching $40.9 billion, a 21% increase; Amazon at $28.786 billion, a 19% increase; and Google at $11.96 billion, a 30% increase.

High cost-effectiveness is driving deeper industry applications. Since the release of ChatGPT more than two years ago, large model performance has continuously improved, and reasoning costs have significantly decreased. For example, the API call price of GPT-4o is $20 per million output tokens, a two-thirds decrease from the release. Currently, DeepSeek V3's cost is $8 per million Tokens, and Tencent's Hunyuan multimodal large model TurboS is as low as $2 per million Tokens. While model performance has significantly improved, it also provides high cost-effectiveness for large-scale deployment in various industries. In the past two months, the implementation of industry large models has been significant, with more than 30 industries such as government affairs, finance, healthcare, education, media, and culture and tourism implementing large models, greatly improving efficiency and reconstructing original processes. Companies including Shenzhen Bao'an Government Affairs, Shenzhen Medical Insurance, Shanghai Xuhui Urban Operations Center, Shenzhen University, Ruijin Hospital, Shanghai Pharmaceutical, Chongqing Rural Commercial Bank, and Honor have actively deployed and explored large model applications. For example, the Shenzhen Bao'an Government Affairs large model application covers 31 business scenarios in people's livelihood appeals, enterprise services, government affairs, and social governance, integrating more than 60 model capabilities and quickly deploying new intelligent applications based on business scenario needs.

In industry applications, high-quality data is the efficiency moat. Industry large models need high-quality data within industries and enterprises more than ever, as industry applications require more accurate and professional knowledge and have zero tolerance for hallucinations. Investment in data governance will yield twice the result with half the effort. However, this often requires significant investment and is frequently considered hard and laborious work, the most easily overlooked part of industry implementation.

In the future, large models will not only develop deeply in various industries but also achieve a three-dimensional evolution of deep applications through cross-domain collaboration, SME empowerment, and social system reshaping: from "scene adaptation" to "value creation," large models will upgrade from efficiency tools to business growth engines; from "information silos" to "ecological integration," cross-domain data collaboration will expand application boundaries; from "enterprise-level applications" to "social system reconstruction," technology penetration will enter deep waters, triggering comprehensive changes in enterprise and social organizational models, employment and distribution structures, and social ethics.