September 8, 2024


Even though computers are made to do math faster than any human could manage, the top level of formal mathematics remains an exclusively human domain. But a breakthrough by researchers at Google DeepMind has brought AI systems closer than ever to beating the best human mathematicians at their own game.

A pair of new systems, called AlphaProof and AlphaGeometry 2, have teamed up to tackle questions from the International Mathematical Olympiad, a global math competition for secondary school students that has been running since 1959. The Olympiad takes the form of six astonishingly hard questions each year, covering fields including algebra, geometry and number theory. Winning a gold medal puts you among the best handful of young mathematicians in the world.

The combined efforts of DeepMind’s two systems weren’t quite in that league. After their answers were checked by Prof Timothy Gowers – a winner of the maths equivalent of the Nobel Prize, the Fields Medal and himself an Olympic gold medalist – the DeepMind team scored 28 out of 42 – enough for a silver medal , but one point short of gold.

Unlike a human mathematician, the systems were either flawless or hopeless. In each of the questions they solved, they achieved perfect marks, but for two of the six questions they could not even begin to work towards an answer. Moreover, unlike human competitors, DeepMind was not given any time limit. While students get nine hours to tackle the problems, the DeepMind systems worked around the clock for three days to solve one question, despite the fact that another blew in seconds.

The two systems that worked on the challenge are very different from each other. AlphaProof, which solved three of the problems, works by coupling a large language model – of the kind used in consumer chatbots – to a specialist “reinforcement learning” approach, such as that used by DeepMind to learn the board game Go to tackle. The trick is to use a pre-existing approach called “formal mathematics,” a set of rules that let you write a mathematical proof as a program that can only run if it’s true.

“What we’re trying to do is build a bridge between these two spheres,” said Thomas Hubert, head of AlphaProof, “so that we can take advantage of the guarantees that come with formal mathematics and the data that comes with informal mathematics is available.” After being trained on a large number of math problems written in English, AlphaProof used its knowledge to try to generate specific proofs in the formal language, because those proofs may or may not be provably true, it is possible to teach the system to improve itself The approach can solve difficult problems but is not always quick to do so: although it is much better than simple trial and error, it took three days to find the correct formal model for one of the most difficult questions to find in the challenge.

The other system, AlphaGeometry 2, similarly couples a language model with a more mathematically inclined approach. But its success at the narrower field of geometry problems was surprising: it solved its problem in just 16 seconds. And, says Gowers, chose a surprising route to do it. “There have been some legendary examples of [computer-aided] evidence that is longer than Wikipedia. It wasn’t that: we’re talking about a very short, human-style output.”

The lead on AlphaGeometry 2, Thang Luong, described the output as similar to the famous “move 37” in DeepMind’s historic victory at Go, when the AI ​​system made a move that no human would have thought of, and continued have to win. AlphaGeometry 2’s proof involved constructing a circle around another point, then using that circle to prove the overall answer. “At first, our expert didn’t quite understand why it built that point at all,” Luong said. “But after they looked at the solution, it really connects a lot of triangles together, and they thought that the solution was really, really elegant.”

AlphaGeometry 2’s easiest question…

Late ABC be a triangle with AB < AC < BC. Let the center and circle of triangle ABC be I and ω, respectively. Late X be the point on the line BC different from C such that the line through X parallel to AC touch touch ω. Just like that, late Y be the point on the line BC different from B such that the line through Y parallel to AB touch touch ω. Late AI intersect the circumcircle of triangle ABC again at P ≠ A. Late K and L be the centers of AC and ABrespectively.

Prove that ∠KIL + ∠YPX = 180◦.

Solved within 19 seconds.

skip past newsletter promotion

… and AlphaProof’s hardest one

Turbo the snail plays a game on a board with 2024 rows and 2023 columns. There are hidden samples in 2022 of the cells. At first, Turbo does not know where any of the monsters are, but he knows that there is exactly one monster in every row except the first row and the last row, and that each column contains at most one monster.

Turbo makes a series of attempts to go from the first row to the last row. With each attempt, it chooses to start on any cell in the first row, then repeatedly moves to an adjacent cell that shares a common side. (He is allowed to return to a previously visited cell.) If he reaches a cell with a monster, his attempt ends and he is transported back to the first row to start a new attempt. The monsters don’t move, and Turbo remembers whether or not each cell he visited contains a monster. If he reaches any cell in the last row, his attempt ends and the game is over.

Determine the minimum value of a for which Turbo has a strategy that guarantees to reach the last row on the ath attempt or earlier, regardless of the location of the samples.

Unsolved.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *