According to sources, OpenAI is training the next generation of artificial intelligence, temporarily named Q-Star “Q*” (pronounced Q-Star). In the New Year, the next generation of OpenAI products may be released
The data bottleneck refers to the limitation of high-quality data that can be used to train AI. Synthetic data is expected to break this bottleneck. In addition to the demand for large amounts of high-quality data, which has led to the popularity of synthetic data, data security considerations are also an important reason.
ChatGPT is the most powerful AI
As the world’s most powerful AI, ChatGPT has encountered bottlenecks in computing power and other aspects. In this context, discussing the application of quantum computers in artificial intelligence has become a promising future solution.
In 2023, the world witnessed the global popularity of ChatGPT. The advent of a new generation of artificial intelligence, represented by generative artificial intelligence, has changed the development trajectory of artificial intelligence (AI) technology and applications, accelerated the interaction between humans and AI, and is a new milestone in the history of AI development. What trends will the development of AI technology and applications show in 2024? Let’s look ahead to these major trends worth watching.
The Tsinghua University research team broke through
Fully analog optoelectronic intelligent computing chip renderings. After long-term joint research, the Tsinghua University research team broke through the physical bottleneck of traditional chips, creatively proposed a new computing framework for optoelectronic fusion, and developed the world’s first fully analog optoelectronic intelligent computing chip (ACCEL) Xinhua News Agency
Trend 1: Moving from large AI models to general artificial intelligence
In 2023, OpenAI, the developer of ChatGPT, was placed in the spotlight like never before, pushing the development of subsequent versions of GPT-4 to the forefront. According to sources, OpenAI is training the next generation of artificial intelligence, temporarily named “Q*” (pronounced Q-star). The next generation of OpenAI products may be released in the New Year.
“Q-Star” may be the first artificial intelligence to be trained “from scratch.”
According to media reports, “Q-Star” may be the first artificial intelligence to be trained “from scratch.” Q-Star’s characteristic is that intelligence does not come from human activity data, and it can modify its own code to adapt to more complex learning tasks. The former makes the development of artificial intelligence capabilities increasingly opaque. At the same time, the latter has always been regarded as a necessary condition for the birth of the “singularity” of artificial intelligence. In artificial intelligence development, “singularity” refers specifically to the fact that machines can iterate themselves and then develop rapidly in a short period, leading beyond human control.
“Q-star” can currently only solve elementary school-level mathematics
Although some reports say that “Q-Star” can only solve elementary school-level mathematics problems, while Q-Star is still far from the “singularity.” However, given that the iteration speed of artificial intelligence in virtual environments may be far beyond imagination, it is still possible to independently develop AI that can surpass human levels in various fields shortly. In 2023, OpenAI predicts that artificial intelligence that exceeds human levels in all aspects will appear within ten years; NVIDIA founder Huang Jensen said that general artificial intelligence may surpass humans within five years.
It can be used to solve various complex scientific problems
Once general artificial intelligence is realized, it can be used to solve various complex scientific problems, such as the search for aliens and habitable extraterrestrial galaxies, artificial nuclear fusion control, nanometer or superconducting material screening, anti-cancer drug development, etc. These problems usually take human researchers decades to find new solutions, and the amount of research in some frontier fields has exceeded the limits of human resources. General artificial intelligence has almost unlimited time and energy in its virtual world, which makes it possible to replace human researchers in some tasks that are easy to virtualize. But by then, how humans will supervise these artificial intelligence that surpass humans in intelligence and ensure that they will not harm humans is another question worth considering.
Silicon Valley giants
Of course, we should not overestimate some of the remarks made by Silicon Valley giants because in the history of artificial intelligence development, we have experienced three “AI winters,” there are many examples of grand technological visions that came to nothing due to various restrictions. But what is sure at present is that extensive model technology still has a lot of room for improvement. In addition to GPT-4, Google’s “Gemini” and Anthropic’s Claude2 are large models second only to GPT-4. Domestic Baidu’s “Wen Xin Yi Yan” and Alibaba’s “Tong Yi Qian Wen” are also the leaders among large domestic models. Whether they will release more revolutionary products in the New Year is worth looking forward to.
Trend 2: How Q-Star overcomes the data bottleneck with synthetic data:
The data bottleneck refers to the limitation of high-quality data that can be used to train AI. Synthetic data is expected to break this bottleneck.
Synthetic data
Data is synthesized by machine learning models using mathematical and statistical science principles based on imitating accurate data. There is a relatively easy-to-understand metaphor about synthetic data: it is like writing a particular textbook for AI. For example, although fictional names such as “Xiao Ming” and “Xiao Hong” may appear in the dialogues of English books, it does not affect the students’ ability to master English. Therefore, in a sense, for students, the textbooks can Treated as a kind of “synthetic data” that has been compiled, filtered, and processed.
Some papers show that the “thinking chair” ability, that is, step-by-step logical reasoning, can only be trained after the model size reaches at least 62 billion parameters. But, so far, humans have not produced so much high-quality non-repetitive data that is available for training. Using generative AI such as ChatGPT to produce high-quality synthetic data in unprecedented quantities will enable future AI to achieve higher performance.
The Q-Star advantages suspasing Human intelligence
In addition to the demand for large amounts of high-quality data, which has led to the popularity of synthetic data, data security is also an essential factor. In recent years, various countries have introduced stricter data security protection laws, making using human-generated data to train artificial intelligence objectively cumbersome. Not only may personal information be hidden in this data, but much of it is also protected by copyright. At a time when Internet privacy and copyright protection have yet to form unified standards and perfect structures, using Internet data for training can easily lead to a large number of legal disputes. If you consider desensitizing these data, you will face screening and identification accuracy challenges. Under the dilemma, synthetic data becomes the most cost-effective option.
Intelligence to Learn Harmful Content
In addition, using human data for training may also cause artificial intelligence to learn harmful content. Some include methods of using daily necessities to make bombs and regulating chemicals. In contrast, others include many bad habits that artificial intelligence should not have, such as laziness in the execution of tasks like humans, lying to please users, and bias and discrimination. Suppose synthetic data is used instead to minimize exposure to harmful content during artificial intelligence training. In that case, it is expected to overcome the shortcomings of using human data for training.
Analysis
As seen from the above analysis, synthetic data is pretty groundbreaking and is expected to solve the previous problem of incompatible development of artificial intelligence and data privacy protection. At the same time, how to ensure that relevant companies and institutions produce synthetic data responsibly. Also, how to create a synthetic data training set that is not only in line with the country’s culture and values. But, also comparable in scale and technical level to the West, which focuses on English online materials, will become a challenging issue faced by China.
A significant change in synthetic data is that big data from human society may no longer be necessary for AI training. In the future digital world, the generation, storage, and use of human data will still follow the laws and orders of human society. It includes maintaining national data security, keeping commercial data secrets, and respecting personal data privacy. Meanwhile, the synthetic data required for AI training will adopt another set of management standards.
Trend 3: Quantum computers may be the first to be used in artificial intelligence
a hidden concern of insufficient computing power
The most cutting-edge application of electronic computers developed to this day. But, artificial intelligence has always had the hidden concern of insufficient computing power. A few months after ChatGPT came out, OpenAI President Altman publicly stated that he did not encourage more users to register for OpenAI. In November 2023, OpenAI announced that it would suspend registering new users of paid subscriptions to ChatGPT. Also, to ensure existing users have a high-quality experience. As the world’s most powerful AI, ChatGPT has encountered bottlenecks in computing power and other aspects. In this context, discussing the application of quantum computers in artificial intelligence has become a promising future solution.
Parallel Computing
First of all, most of the algorithms in the artificial intelligence field belong to the parallel computing category. For example, in playing Go, AlphaGo must consider the opponent’s response moves after moving in different positions to find the move most likely to win the game. This requires computers to optimize the efficiency of parallel computing to achieve this. Quantum computers are good at parallel computing. Because they can calculate and store two states of “0” and “1” simultaneously. This is without consuming additional computing resources like electronic computers. For example, connecting multiple computing units in series or scheduling computing tasks in time. They are juxtaposed above. The more complex the computing task, the more advantageous quantum computing becomes.
Secondly, the hardware conditions required to run ChatGPT are also very suitable for importing large quantum computers. Both must be installed in a highly integrated computing center and managed and supported by a professional technical team.
What is a Quantum Computer?
What is a quantum computer? A Quantum computer is a type of physical device that follows the laws of quantum mechanics to perform high-speed mathematical, logical operations, store and process quantum information. It is enormous, and the “quantum chip” as the core component. It usually needs to be placed at extremely low temperatures. A temperature close to absolute zero (minus 273.15 degrees Celsius), using the quantum properties. It is exhibited by some microscopic particles at extremely low temperatures for processing. Information is calculated and processed, and the results only exist for milliseconds.
Since quantum computers are “big and difficult to maintain,” why are they still being developed? The reason is that quantum computers contain huge computing power potential, so much so that some algorithms have already demonstrated “absolute crushing” in speed compared to electronic computers, that is, “quantum superiority.” But achieving “quantum superiority” is just a starting point. Current quantum computers can only complete some computing tasks specific to the quantum field. To truly take advantage of this “quantum superiority,” you must have enough qubits to achieve general computing and programmability. Moreover, after realizing general computing, quantum computers still need to maintain their advantages over electronic computers, called “quantum advantage.”
Quantum Machine Learning
In 2022, researchers from Google, Microsoft, Caltech, and other institutions proved in principle that “quantum advantage” does exist. It predicts observable variables, quantum principal component analysis, and quantum machine learning. Quantum machine learning is the application of quantum computing in the field of artificial intelligence. Also, it reflects the future trend of the convergence of two cutting-edge technologies, quantum computing and artificial intelligence.
Theoretically proven, it is necessary to expand the application prospects of quantum computing further. The US quantum computing giant IBM launched “Quantum System Two” in December 2023. It was launched after the commercial quantum computer “Quantum System One” in 2019. The most significant breakthrough of the new system is that it can be expanded. The new system is the company’s first modular quantum computer. “Quantum System 2” has more than 1,000 qubits. IBM also announced plans to build a 100,000-qubit quantum computer within ten years. These ever-increasing numbers of qubits are not just for competition but indispensable for realizing general computing and programmability. Because of this, the modularity of quantum computers makes them more practical.
Research on quantum machine learning algorithms has become a new research hotspot. However, in the future, quantum computers will not completely replace electronic computers. It is more likely that quantum computers and electronic computers will fully exploit their respective strengths in different application scenarios. Also, it will achieve coordinated development, which will not only greatly improve computing power but also consider cost and feasibility.