Throughout tech history, major breakthroughs often occur independently, each sparking its own revolution. However, when two powerful technologies converge, their synergy can catalyze extraordinary progress. Today, we stand at such a crossroads: AI and crypto, both transformative in their own right, are joining forces.
We envision crypto solutions addressing numerous AI challenges, AI agents constructing self-governing economic networks to accelerate crypto adoption, and AI propelling the evolution of existing crypto technologies. Many eyes are on this intersection, and large amounts of money are pouring in, fueled by the excitement around these buzzwords.
But amid all this excitement, we know surprisingly little about the basics. How well does AI really understand crypto? Are LLM-powered agents actually able to use crypto tools? How do different models perform in cryptographic tasks? The answers to these questions are crucial for guiding the direction of products and technology in this emerging field.
But we don’t know.
An experiment
To address these fundamental questions, an experimental evaluation was conducted. A total of 18 large language models, including mainstream commercial and open-source models, were assessed, with parameter sizes ranging from 3.8B to 405B.
Closed-source models: GPT-4o, GPT-4o Mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, Grok2 beta (currently closed-source)
Open-source models: Llama 3.1 8B/70B/405B, Mistral Nemo 12B , DeepSeek-coder-v2, Nous-hermes2, Phi3 3.8B/14B, Gemma2 9B/27B, Command-R, Qwen2-math-72, MathΣtral
The study aimed to assess the current state of AI’s crypto applications and evaluate the potential and challenges of AI-crypto integration. Given the early stage of this research, this article focuses on key insights rather than specific result data.
The experiments revealed that AI models possess a comprehensive understanding of crypto fundamentals and demonstrate extensive familiarity with the crypto ecosystem. These models also exhibited proficiency in the knowledge required for executing various basic wallet operations. With appropriate prompts, not only did their capabilities improve significantly, but they also demonstrated the ability to perform complex analyses and operations as instructed. These findings collectively suggest that the development of AI applications for numerous crypto-related domains is now a viable prospect.
However, several key limitations were also uncovered. There’s a significant gap between the models’ theoretical knowledge and practical application skills, particularly in crypto-related calculations. While capable of generating simple smart contracts, they struggled to identify complex vulnerabilities in more intricate protocols. Moreover, the models couldn’t resolve the fundamental challenge of securely managing private keys in cloud-based AI systems.
Diving Deeper
The Math Gap: One of the most notable findings was the universal struggle of AI models with crypto-related calculations. This isn’t just about complex cryptography; even basic operations such as calculating AMM slippage or mining profitability proved challenging. However, it’s important to note that large language models are not inherently designed for mathematical computations. This limitation can be addressed by loading preset code to bypass the LLM’s direct calculations, thereby improving efficiency and accuracy. This approach is similar to how humans often handle complex calculations, relying on specialized tools or pre-established formulas.
The Security Dilemma: While AI models demonstrated a solid grasp of crypto security principles, the reality of implementing secure systems with AI remains problematic. The need for cloud-based processing in many AI systems creates an inherent conflict with the decentralized, trustless nature of cryptocurrencies. Solving this will require third party services such as TEE, HSM, or even more innovative new technologies.
Smart Contracts: Form over Function: AI models demonstrated an impressive ability to comprehend smart contracts and explain their functionalities. They could effectively modify contracts to address common vulnerabilities and optimization points, and even autonomously create contracts for simple scenarios. However, when it came to vulnerabilities deeply embedded in complex business logic, all models failed to identify them. This indicates that the models’ understanding of smart contracts remains largely superficial, focusing on form rather than grasping the intricacies of the underlying business logic. While AI shows promise in contract interaction and basic creation, it’s clear that human expertise is still crucial for ensuring the security and efficiency of complex smart contract systems.
The Open Source Challenge: The significant performance gap between top closed-source models and most open-source alternatives raises important questions about the future of AI in crypto. Given the crypto community’s emphasis on openness and decentralization, bridging this gap is crucial for widespread adoption.
Strong Foundations and Potential: Despite the challenges, models demonstrated a deep understanding of crypto fundamentals and showed great familiarity with the crypto ecosystem. With appropriate prompts, their capabilities improved significantly. This suggests a strong foundation for AI in crypto, with models showing impressive grasp of concepts like blockchain architecture, consensus mechanisms, and tokenomics. The significant improvement with guided prompts indicates that current AI models, while not perfect, are already capable of providing valuable insights and assistance in many crypto-related tasks, from market analysis to protocol design evaluation.
Looking Ahead: The Need for Crypto-AI Benchmarks
As the experiments progressed, a glaring need became apparent: the crypto space needs standardized AI benchmarks. Just as ImageNet revolutionized computer vision AI, crypto-specific benchmarks could drive rapid progress in this fusion of technologies.
If it is believed that the intersection of AI and cryptographic technology holds immense potential, and AI is anticipated to drive widespread adoption of crypto, then establishing dedicated benchmarks for the crypto domain becomes an urgent priority. These benchmarks could serve as a crucial bridge connecting the AI and crypto fields, catalyzing innovation and providing clear guidance for future applications. This endeavor is more than just a technical exercise; it’s a profound reflection on how this emerging digital frontier is understood and shaped.
Creating such benchmarks, however, is no small feat. It faces several significant challenges: the rapid evolution of crypto technology, with its knowledge base still in flux and lacking consensus on multiple core directions; the interdisciplinary nature of the field, encompassing cryptography, distributed systems, economics, and more, with a complexity far exceeding that of any single domain; the need to assess not just theoretical knowledge but also AI’s practical ability to utilize crypto technologies, requiring the design of novel evaluation frameworks; the necessity of ensuring that benchmark tasks remain relevant to real-world applications in DeFi, NFTs, DAOs, and other emerging crypto sectors, and the scarcity of relevant datasets, which further compounds the difficulty.
Given the scale and complexity of these challenges, it becomes clear that this is not a task that can be tackled in isolation. The multifaceted nature of the problem demands a diverse range of expertise and perspectives. It calls for a collaborative effort from across the crypto and AI communities. Only through this collective wisdom can we define what truly matters in this emerging technological frontier and create benchmarks that accurately reflect the complexity and potential of AI in the crypto domain.
Current Status and Next Steps
The current research framework consists of several key components:
An MVP dataset of approximately 700 multiple-choice questions, collaboratively generated by AI and humans, and subsequently verified and refined by human experts. This dataset, despite its quality limitations, enables rapid automated testing of models, demonstrating conceptual understanding and providing a basic scoring mechanism.
A growing collection of about 100 complex tasks, covering scenarios such as simulations, calculations, code audits, and tool usage. These tasks, contributed by multiple crypto domain experts, add depth and authenticity to the assessment.
To establish an effective benchmark, the dataset requires substantial expansion, necessitating contributions from a broader range of domain experts. The development of suitable automated evaluation frameworks for these complex tasks is also a key challenge to be addressed.
Furthermore, to enable LLMs to tackle real-world task challenges in the future, the implementation of a basic Agent framework is essential. This framework will provide a more realistic testing environment, bridging the gap between theoretical knowledge and practical application.
The methodology is undergoing continuous refinement, focusing on enhancing test case sophistication and expanding the overall dataset. In the spirit of open collaboration, all related resources will be made publicly available on GitHub soon, aiming to accelerate progress and invite broader community participation.
It’s crucial to note that this research remains in its early stages. The findings should be viewed as preliminary observations and starting points for further investigation, rather than definitive conclusions in the rapidly evolving AI and crypto landscape. The project welcomes contributions from the wider crypto community to help build a more comprehensive and robust evaluation framework.