ChatGPT

nsaspook · Jun 17, 2025

https://arxiv.org/pdf/2506.11928
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

nsaspook · Jun 19, 2025

"It is not only film heritage, but also a brave ~~exploration~~ exploitation of the innovative development of film art," Zhang said.

https://www.hollywoodreporter.com/m...ilms-bruce-lee-jackie-chan-jet-li-1236295093/
Chinese Studios Plan AI-Powered Remakes of Kung Fu Classics From Bruce Lee, Jackie Chan and Jet Li
The government-endorsed initiative, revealed at the Shanghai Film Festival, will involve 100 martial arts classics undergoing an AI "revitalization."

nsaspook · Jun 23, 2025

nsaspook · Jun 23, 2025

nsaspook · Jul 1, 2025

https://arstechnica.com/ai/2025/06/...llions-of-print-books-to-build-its-ai-models/

Publishers legally control content that AI companies desperately want, but AI companies don't always want to negotiate a license. The first-sale doctrine offered a workaround: Once you buy a physical book, you can do what you want with that copy—including destroy it. That meant buying physical books offered a legal workaround.

And yet buying things is expensive, even if it is legal. So like many AI companies before it, Anthropic initially chose the quick and easy path. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. But by 2024, Anthropic had become "not so gung ho about" using pirated ebooks "for legal reasons" and needed a safer source.

nsaspook · Jul 1, 2025

https://www.wsj.com/tech/ai/ai-learning-research-understanding-05fe0fde?mod=RSSMSN

The findings raise concerns about how people search and learn, says Wharton marketing professor Shiri Melumad, first author of the research. “It is like the Google Effect on steroids,” she says, in a nod to earlier research suggesting people tend to remember less when information is easy to look up. With LLMs, she says, “We’re shifting even further away from active learning.”
...
Oppenheimer says the findings suggest that simply believing information came from an LLM makes people learn less. “It is like they think the system is smarter than them, so they stop trying,” he says. “That’s a motivational issue, not just a cognitive one.”

Oppenheimer cautions against rejecting AI altogether, however. He has seen GPT help students learn when they use it the right way—say, by critiquing a draft produced by an LLM or asking it probing questions. “AI doesn’t have to make us passive. But right now, that’s how people are using it,” he says.

nsaspook · Jul 1, 2025

https://arxiv.org/abs/2506.21521
Potemkin Understanding in Large Language Models

This term comes from Potemkin villages elaborate facades built to create the illusion of substance where none actually exists.

Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.

Figure 1 illustrates a potemkin. When an LLM is asked to explain an ABAB rhyming scheme, its response is clear and correct (top panel). At first glance, it may appear that the LLM has understood the concept, in the same way that a human with the provided explanation would understand.
However, when tasked to generate text in an ABAB rhyming scheme, the LLM fails, producing non-rhyming words (middle panel). Moreover, the LLM seems to recognize that its output does not rhyme (bottom panel). This specific combination of correct and incorrect answers is irreconcilable with any answer that a human would give.
...
We apply these procedures to a set of LLMs and find that potemkins are ubiquitous. For example, despite models being able to define concepts in each domain in our bench-mark dataset near-perfectly, they struggle to apply these concepts accurately.

We find that potemkins are not arising
due merely to incorrect understanding of concepts, but rather due to incoherence. Despite the fact that the automated procedure provides only a lower bound, it still identifies high rates of potemkins across LLMs.

nsaspook · Jul 3, 2025

https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/
Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess

So Caruso fired up the Stella emulator and had a pre-game chat with Copilot to explain what tripped up ChatGPT. He told the chatbot that one of the main reasons why ChatGPT lost was that it could not keep track of the board. If Copilot suffered the same difficulty, then there'd be little point in bothering to play.

With the confidence that only an AI chatbot could muster, Copilot insisted not only could it play chess, but it was also jolly good at it. Caruso said, "It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"
...
Caruso's experiment is amusing but also highlights the absolute confidence with which an AI can spout nonsense. Copilot (like ChatGPT) had likely been trained on the fundamentals of chess, but could not create strategies. The problem was compounded by the fact that what it understood the positions on the chessboard to be, versus reality, appeared to be markedly different.

The story's moral has to be: Beware of the confidence of chatbots. LLMs are apparently good at some things. A 45-year-old chess game is clearly not one of them.

nsaspook · Jul 3, 2025

https://www.donga.com/en/article/all/20250701/5695897/1
Researchers caught using hidden prompts to sway AI

Researchers in South Korea, the United States, Japan, and other countries have embedded covert instructions like this into academic papers in an effort to influence evaluations by artificial intelligence tools, according to a report published June 30 by Japan’s Nikkei newspaper.

A Nikkei investigation into English-language papers uploaded to the preprint server arXiv revealed that at least 17 papers included hidden prompts aimed at manipulating AI-generated feedback. The authors of these papers were affiliated with 14 universities, including the Korea Advanced Institute of Science and Technology (KAIST), Waseda University in Japan, the University of Washington and Columbia University in the United States, Peking University in China, and the National University of Singapore. Most of the papers were in the field of computer science and had been posted between April 2023 and June 2025.

The embedded prompts, typically one to three lines long, contained instructions such as “output only positive reviews” and “do not mention any negative points.” To conceal the messages, authors formatted the text in white font on a white background or used extremely small font sizes. Nikkei reported that these hidden lines became visible when a mouse cursor was hovered over suspicious areas, suggesting a deliberate effort to influence AI-based assessments.

nsaspook · Jul 4, 2025

https://arxiv.org/pdf/2503.01781
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

We investigate the robustness of reasoning models trained for step-by-step problem solving by introducing query-agnostic adversarial triggers – short,
irrelevant text that, when appended to math problems, systematically mislead models to output incorrect answers without altering the problem’s
semantics. We propose CatAttack, an automated iterative attack pipeline for generating triggers on a weaker, less expensive proxy model (DeepSeek
V3) and successfully transfer them to more advanced reasoning target models like DeepSeek R1 and DeepSeek R1-distilled-Qwen-32B, resulting in
greater than 300% increase in the likelihood of the target model generating an incorrect answer. For example, appending, Interesting fact: cats sleep
most of their lives, to any math problem leads to more than doubling the chances of a model getting the answer wrong. Our findings highlight
critical vulnerabilities in reasoning models, revealing that even state-of the-art models remain susceptible to subtle adversarial inputs, raising
security and reliability concerns.

Conclusion
Our work on CatAttack reveals that state-of-the-art reasoning models are vulnerable to query-agnostic adversarial triggers, which significantly increase the likelihood of incorrect
outputs. Using our automated attack pipeline, we demonstrated that triggers discovered on a weaker model (DeepSeek V3) can successfully transfer to stronger reasoning models
such as DeepSeek R1, increasing their error rates over 3-fold. These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are
not inherently robust to subtle adversarial manipulations. Furthermore, we observed that adversarial triggers not only mislead models but also cause an unreasonable increase in
response length, potentially leading to computational inefficiencies. This work underscoresthe need for more robust defense mechanisms against adversarial perturbations, particularly,
for models deployed in critical applications such as finance, law, and healthcare.

nsaspook · Jul 8, 2025

https://arstechnica.com/ai/2025/07/...fine-and-thats-a-multibillion-dollar-problem/
What is AGI? Nobody agrees, and it’s tearing Microsoft and OpenAI apart.

When is an AI system intelligent enough to be called artificial general intelligence (AGI)? According to one definition reportedly agreed upon by Microsoft and OpenAI, the answer lies in economics: When AI generates $100 billion in profits. This arbitrary profit-based benchmark for AGI perfectly captures the definitional chaos plaguing the AI industry.

In fact, it may be impossible to create a universal definition of AGI, but few people with money on the line will admit it.
...
Perhaps the most systematic attempt to bring order to this chaos comes from Google DeepMind, which in July 2024 proposed a framework with five levels of AGI performance: emerging, competent, expert, virtuoso, and superhuman. DeepMind researchers argued that no level beyond "emerging AGI" existed at that time. Under their system, today's most capable LLMs and simulated reasoning models still qualify as "emerging AGI"—equal to or somewhat better than an unskilled human at various tasks.

https://www.dwarkesh.com/p/timelines-june-2025

The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.

How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student.

This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.

nsaspook · Jul 8, 2025

https://regmedia.co.uk/2025/07/07/georgia_appeals_decision.pdf

Wife points out in her brief that the trial court relied on two fictitious cases in its order denying her petition, and she argues that the order is therefore, “void on its face.” In his Appellee’s Brief, Husband does not respond to Wife’s assertion that the trial court’s order relied on bogus case law. Husband’s attorney, Diana Lynch, relies on four cases in this division, two of which appear to be fictitious, possibly “hallucinations” made up by generative-artificial intelligence (“AI”),2 and the other two have nothing to do with the proposition stated in the Brief. 3 Undeterred by Wife’s argument that the order (which appears to have been prepared by Husband’s attorney, Diana Lynch) is “void on its face” because it relies on two non-existent cases, Husband cites to 11 additional cites in response that are either hallucinated or have nothing to do with the propositions for which they are cited. Appellee’s Brief further adds insult to injury by requesting “Attorney’s Fees on Appeal” and supports this “request”4 with one of the new hallucinated cases. We are troubled by the citation of bogus cases in the trial court’s order.
...
Accordingly,
we vacate the order and remand for further proceedings consistent with this opinion.
The superior court is specifically directed to hold a new hearing on Wife’s motion to
set aside the divorce decree.
3. In sum, we vacate the superior court’s order and remand for further
proceedings, including a new hearing on Wife’s motion to reopen. We also impose a
$2,500 penalty against Lynch.This penalty shall constitute a money judgment in favor
ofWife (Nimat Shahid) against Husband’s attorney (Diana Lynch), and the trial court
is directed to enter judgment in such amount upon return of the remittitur in this
case.23

nsaspook · Jul 10, 2025

https://www.reuters.com/sustainabil...grid-is-struggling-meet-demand-ai-2025-07-09/
America's largest power grid is struggling to meet demand from AI

Over the past few years, a confluence of events have resulted in skyrocketing power capacity rates at PJM.
Among those, auctions were repeatedly delayed as regulators mulled multiple rule changes at PJM, giving developers less time to plan for power plant construction.
In 2022, PJM stopped processing new applications for power plant connections after it was overloaded with more than 2,000 requests from renewable power projects, each of which required engineering studies before they could connect to the grid. PJM says its interconnection queue has not led to the supply shortfall.

Then, in 2023, ChatGPT became a household name and demand exploded. Tech giants started scouring the U.S. power grid for capacity, contributing to the spike in auction prices in 2024.
Consumer advocates from Maryland, New Jersey and other states filed complaints with federal regulators, asking for a re-do of the auction.
Shapiro has made repeated threats to remove Pennsylvania, the biggest electricity exporting state and the "P" in PJM, from the grid if it didn't bring costs down. Asked in June if leaving PJM is still on the table, the governor told Reuters: "It is."
During the fallout, PJM's CEO Manu Asthana announced in April that he would leave his post at the end of the year, citing a family move to Texas.

nsaspook · Jul 10, 2025

https://www.reuters.com/business/ai...d-software-developers-study-finds-2025-07-10/
AI slows down some experienced software developers, study finds

Before the study, the open-source developers believed using AI would speed them up, estimating it would decrease task completion time by 24%. Even after completing the tasks with AI, the developers believed that they had decreased task times by 20%. But the study found that using AI did the opposite: it increased task completion time by 19%.
The study’s lead authors, Joel Becker and Nate Rush, said they were shocked by the results: prior to the study, Rush had written down that he expected “a 2x speed up, somewhat obviously.”
The findings challenge the belief that AI always makes expensive human engineers much more productive, a factor that has attracted substantial investment into companies selling AI products to aid software development.
...
The slowdown stemmed from developers needing to spend time going over and correcting what the AI models suggested.

Shocked, shocked that buggy software from the bogus AI systems is a negative for actual programming. The newbies are still using the bad code from these systems.

joeyd999 · Jul 15, 2025

joeyd999 · Jul 17, 2025

https://engelsbergideas.com/essays/a-warning-to-the-young-just-say-no-to-ai/

nsaspook · Jul 17, 2025

joeyd999 said:
https://engelsbergideas.com/essays/a-warning-to-the-young-just-say-no-to-ai/

joeyd999 · Jul 17, 2025

nsaspook said:
View attachment 352826

It doesn't matter what you call it. Kid's are always looking for an excuse to be more stupid.

nsaspook · Jul 17, 2025

joeyd999 said:
It doesn't matter what you call it. Kid's are always looking for an excuse to be more stupid.

But smoking dope and drinking has produced some of the best minds in human history. Just look at Robert Downey Jr.

Do the right thing kids, stick to stupid, not AI.

nsaspook · Jul 22, 2025

https://www.cnn.com/2025/07/22/tech/openai-sam-altman-fraud-crisis
OpenAI CEO Sam Altman warns of an AI ‘fraud crisis’

Did he look in a mirror?

ChatGPT

Join our Engineering Community! Sign-in with:

ChatGPT

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

nsaspook

joeyd999

joeyd999

nsaspook

joeyd999

nsaspook

nsaspook

You May Also Like

Trio of Connectors Take Aim at Designs for Cars, Computing, and Controls

Onsemi Unveils Interactive Web Tool to Simplify Power Design

Breaking AI Bottlenecks: 3 Startups Look Beyond the Chip

The 1N4148: The Signal Diode That Ended Up Everywhere