Science &amp; Technology /today/ en Researchers test the trustworthiness of AI—by playing sudoku /today/2025/07/28/researchers-test-trustworthiness-ai-playing-sudoku <span>Researchers test the trustworthiness of AI—by playing sudoku</span> <span><span>Daniel William…</span></span> <span><time datetime="2025-07-28T16:15:49-06:00" title="Monday, July 28, 2025 - 16:15">Mon, 07/28/2025 - 16:15</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/today/sites/default/files/styles/focal_image_wide/public/2025-07/sudoku_image.jpeg?h=51be566f&amp;itok=tVY3BaRF" width="1200" height="800" alt="Hands filling in a sudoku grid with a pen"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/today/taxonomy/term/6"> Science &amp; Technology </a> </div> <a href="/today/daniel-strain">Daniel Strain</a> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-text" itemprop="articleBody"> <div><p>Artificial intelligence tools called large language models (LLMs), such as OpenAI’s ChatGPT or Google’s Gemini, can do a lot these days—dispensing relationship advice, crafting texts to get you out of social obligations and even writing science articles. &nbsp;</p><p>But can they also solve your morning sudoku?</p><p>In a new study, a team of computer scientists from the 91 decided to find out. The group created nearly 2,300 original sudoku puzzles, which require players to enter numbers into a grid following certain rules, then asked several AI tools to fill them in.</p><p>The results were a mixed bag. While some of the AI models could solve easy sudokus, even the best struggled to explain how they solved them—giving garbled, inaccurate or even surreal descriptions of how they arrived at their answers. The results raise questions about the trustworthiness of AI-generated information, said study co-author Maria Pacheco. &nbsp;</p><p>“For certain types of sudoku puzzles, most LLMs still fall short, particularly in producing explanations that are in any way usable for humans,” said Pacheco, assistant professor in the <a href="/cs" rel="nofollow">Department of Computer Science</a>. “Why did it come up with that solution? What are the steps you need to take to get there?”</p><p>She and her colleagues <a href="https://aclanthology.org/2025.findings-acl.155/" rel="nofollow">published their results this month</a> in Findings of the Association for Computational Linguistics.</p><div class="feature-layout-callout feature-layout-callout-medium"><div class="ucb-callout-content"> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/today/sites/default/files/styles/large_image_style/public/2025-07/Pacheco_headshot_0.png?itok=_56WTAAG" width="1500" height="1500" alt="Maria Pacheco headshot"> </div> <span class="media-image-caption"> <p class="small-text">Maria Pacheco</p> </span> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/today/sites/default/files/styles/large_image_style/public/2025-07/Somenzi_headshot.png?itok=DIWbBT1N" width="1500" height="1500" alt="Fabio Somenzi headshot"> </div> <span class="media-image-caption"> <p class="small-text">Fabio Somenzi</p> </span> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/today/sites/default/files/styles/large_image_style/public/2025-07/Trivedi_headshot.png?itok=rFFPyUbZ" width="1500" height="1500" alt="Ashutosh Trivedi headshot"> </div> <span class="media-image-caption"> <p class="small-text">Ashutosh Trivedi</p> </span> </div></div><p>The researchers aren’t trying to cheat at puzzles. Instead, they’re using these logic exercises to explore how AI platforms think. The results could one day lead to more reliable and trustworthy computer programs, said study co-author Fabio Somenzi, professor in the <a href="/ecee" rel="nofollow">Department of Electrical, Computer and Energy Engineering</a>.</p><p>“Puzzles are fun, but they’re also a microcosm for studying the decision-making process in machine learning,” he said. “If you have AI prepare your taxes, you want to be able to explain to the IRS why the AI wrote what it wrote.”</p><h2>Daily puzzle</h2><p>Somenzi, who is a self-described sudoku fan, noted that the puzzles tap into a very human way of thinking. Filling out a sudoku grid requires puzzlers to learn and follow a set of logical rules. For example, you can’t enter a two in an empty square if there’s already a two in the same row or column.</p><p>Most LLMs today struggle at that kind of thinking, in large part because of how they’re trained.</p><p>To build ChatGPT, for example, programmers first fed the AI almost everything that had ever been written on the internet. When ChatGPT responds to a question, it predicts the most likely response based on all that data—almost like a computer version of rote memory.</p><p>“What they do is essentially predict the next word,” Pacheco said. “If you have the start to a sentence, what word comes next? They do that by referring to every sentence in the English language that they can get their hands on.”</p><p>Pacheco, Somenzi and their colleagues have joined a growing effort in computer science to merge those two ways of thinking—combining the memory of an LLM with a human brain’s capacity for logic, a pursuit known as <a href="/today/node/54798" rel="nofollow">“neurosymbolic” AI</a>.</p><p>Anirudh Maiya and Razan Alghamdi, both former graduate students at 91, were also co-authors of the new paper.</p><h2>How’s the weather?</h2><p>To begin, the researchers created sudoku puzzles of varying difficulty using a six-by-six grid. (A simpler version of the nine-by-nine puzzles you usually find online).</p><p>They then gave the puzzles to a series of AI models, including the preview of OpenAI’s o1 model—which, in 2023, represented the state-of-the-art for its kind of LLM.</p><p>The o1 model led the pack, solving roughly 65% of the sudoku puzzles correctly. Then the team asked the AI platforms to explain how they got their answers. That’s when the results got really wild.</p><p>“Sometimes, the AI explanations made up facts,” said Ashutosh Trivedi, a co-author of the study and associate professor of computer science at 91. “So it might say, 'There cannot be a two here because there’s already a two in the same row,' but that wasn’t the case.”</p><p>In a telling example, the researchers were talking to one of the AI tools about solving sudoku when it, for unknown reasons, responded with a weather forecast.</p><p>“At that point, the AI had gone berserk and was completely confused,” Somenzi said.</p><p>The researchers hope to design their own AI system that can do it all—solving complicated puzzles and explaining how. They’re starting with another type of puzzle called hitori, which, like sudoku, involves a grid of numbers.</p><p>“People talk about the emerging capabilities of AI where they end up being able to solve things that you wouldn’t expect them to solve,” Pacheco said. “At the same time, it’s not surprising that they’re still bad at a lot of tasks.”</p></div> </div> </div> </div> </div> <div>A team of computer scientists discovered that some AI large language models can solve sudoku puzzles, but even the best ones struggle to explain how they did it.</div> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Related Articles</div> </div> </h2> <div>Traditional</div> <div>0</div> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/today/sites/default/files/styles/large_image_style/public/2025-07/sudoku_image.jpeg?itok=S8Oiq4JO" width="1500" height="1000" alt="Hands filling in a sudoku grid with a pen"> </div> </div> <div>On</div> <div>White</div> Mon, 28 Jul 2025 22:15:49 +0000 Daniel William Strain 54987 at /today Faster, cleaner, better: Revolutionary water treatment /today/2025/07/25/faster-cleaner-better-revolutionary-water-treatment <span>Faster, cleaner, better: Revolutionary water treatment</span> <span><span>Megan Maneval</span></span> <span><time datetime="2025-07-25T06:59:14-06:00" title="Friday, July 25, 2025 - 06:59">Fri, 07/25/2025 - 06:59</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/today/sites/default/files/styles/focal_image_wide/public/2025-07/img_0597.jpeg?h=51692cae&amp;itok=vTTiYCyV" width="1200" height="800" alt="Anthony Straub with PhD student Kian Lopez in a lab"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/today/taxonomy/term/6"> Science &amp; Technology </a> </div> <span>College of Engineering and Applied Science</span> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-text" itemprop="articleBody"> <div><p>Anthony Straub is making revolutionary advances in water purification for life on Earth and in space with nanoscale membranes—thinner than 1/100th the width of a human hair.</p></div> </div> </div> </div> </div> <div>Anthony Straub is making revolutionary advances in water purification for life on Earth and in space with nanoscale membranes—thinner than 1/100th the width of a human hair.</div> <script> window.location.href = `/engineering/faster-cleaner-better-revolutionary-water-treatment`; </script> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Related Articles</div> </div> </h2> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Fri, 25 Jul 2025 12:59:14 +0000 Megan Maneval 54991 at /today New quantum physics and AI-powered microchip design software awarded grants /today/2025/07/23/new-quantum-physics-and-ai-powered-microchip-design-software-awarded-grants <span>New quantum physics and AI-powered microchip design software awarded grants </span> <span><span>Amber Elise Carlson</span></span> <span><time datetime="2025-07-23T22:02:09-06:00" title="Wednesday, July 23, 2025 - 22:02">Wed, 07/23/2025 - 22:02</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/today/sites/default/files/styles/focal_image_wide/public/2025-07/Sanghamitra_Neogi.CC15.JPG?h=fbd9a9b0&amp;itok=e5b9z-_j" width="1200" height="800" alt="Woman speaking into microphone at a business pitch event"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/today/taxonomy/term/4"> Business &amp; Entrepreneurship </a> <a href="/today/taxonomy/term/6"> Science &amp; Technology </a> </div> <a href="/today/amber-carlson">Amber Carlson</a> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-text" itemprop="articleBody"> <div><p><span>Semiconductors—substances that can selectively conduct or block electricity—have been dubbed the “</span><a href="https://www.semiconductors.org/semiconductors-101/what-is-a-semiconductor/" rel="nofollow"><span lang="EN-US">brains of modern electronics</span></a><span>.” They form the building blocks of the chips that power electronic devices from laptops to smartphones and tablets to sports watches.</span></p><p><span>But semiconductors generate heat when they’re working, and they can easily get too hot, which hurts their performance and can damage them. While smaller chips are denser and more efficient at processing, they are harder to keep cool because of their size.</span></p><p><span>Sanghamitra Neogi, an associate professor in the Ann and H.J. Smead Aerospace Engineering Sciences department, is exploring ways to protect semiconductors and microchips from heat damage. She specializes in nanoscale semiconductors, which are so tiny their parts are measured in nanometers (billionths of a meter).</span></p> <div class="align-right image_style-medium_750px_50_display_size_"> <div class="imageMediaStyle medium_750px_50_display_size_"> <img loading="lazy" src="/today/sites/default/files/styles/medium_750px_50_display_size_/public/2025-07/Sanghamitra_Neogi.CC15.JPG?itok=PQfDNWwM" width="750" height="500" alt="Woman speaking into microphone at a business pitch event"> </div> <span class="media-image-caption"> <