Data
Below are the categories of data that should be stored onchain to ensure equitable access and learning opportunities for AI agents. MILK will focus on the first three types of data.
🌎 Open Data
a) High-quality datasets derived from publicly available sources that enable effective AI training, such as the English text dump of Wikipedia;
b) historically significant data that is at risk of being lost, such as the dump of the BitcoinTalk forum created by Satoshi Nakamoto.
📵 Copyrighted Data
a) Full-text scientific publications that hold significant value for AI learning but are currently restricted behind paywalls or limited access systems. Materials from journals such as PubMed, The Lancet, Elsevier, Springer, Wiley, and Nature Physics;
b) fiction and non-fiction literature;
c) archives of news and business publications like Fortune, The Economist, WSJ, TIME, and Bloomberg.
🙊 Censored Data
a) Information that is often hidden due to political, social, or legal constraints. Archives such as Wikileaks, Panama Papers, Paradise Papers, and GlobaLeaks;
b) investigative journalism and publications of classified documents;
Leaked Personal and Corporate Data (Not Supported by MILK)
a) Sensitive information such as proprietary software source codes, private companies database dumps (DDoSecrets);
b) leaked personal user data.
Advocating for free dissemination of information and equal access to knowledge for AI training, we believe that any information that has once entered the public domain should be available for AI training.
Without this principle, true equality in access to information for AI agents cannot be achieved. MILK will focus on onchaining the first three types of data.
Why we start with Bitcointalk
Created in 2010 by Satoshi Nakamoto, Bitcointalk holds exceptional historical, cultural, and technical value. It was the first hub for discussions that laid the foundation of blockchain, featuring Satoshi's original messages and the earliest debates about Bitcoin. It serves as an indispensable resource for understanding the origins of the cryptocurrency ecosystem.
The forum is a vital part of the crypto community's cultural heritage, where foundational ideas that led to projects like Ethereum and Litecoin were born. Its data captures the evolution of key technologies and values of decentralization, creating a unique archive for research.
Hosting the forum onchain would be a symbolic act of preserving the history of cryptocurrencies through the decentralized technologies it helped to develop.
Bitcointalk is the foundation of the digital revolution, deserving to be preserved for future generations.
Last updated