• About
  • Advertise
  • Privacy & Policy
  • Contact
OpenAI24 | Latest AI & Tech Updates
  • Home
  • Technology
    • All
    • AI News
    • Google Chrome
    Ethicist Warns Against Sharing Medical Images With AI Apps

    Ethicist Warns Against Sharing Medical Images With AI Apps

    Is Agentic AI Ready To Handle The Way We Do Business?

    Is Agentic AI Ready To Handle The Way We Do Business?

    YC-backed ReactWise is applying AI to speed up drug manufacturing

    YC-backed ReactWise is applying AI to speed up drug manufacturing

    Mount Sinai team creates AI algorithm to detect sleep disorder

    Mount Sinai team creates AI algorithm to detect sleep disorder

    DeepSeek Chinese AI Pushes South Korea to Join AI Race

    DeepSeek Chinese AI Pushes South Korea to Join AI Race

    Vote for the session you want to see at Session: AI | TechCrunch

    Vote for the session you want to see at Session: AI | TechCrunch

    Zoom’s AI goes agentic – here’s what it can do for you now

    Zoom’s AI goes agentic – here’s what it can do for you now

    Will Nvidia Stock Rise During March 18 Through March 21?

    Will Nvidia Stock Rise During March 18 Through March 21?

    Machine Learning Analysis of ARDS in Prone Positioning

    Machine Learning Analysis of ARDS in Prone Positioning

    • AI Gadgets
    • AI News
    • ChatGPT News
    • Google AI
    • Google Chrome
    • X AI Update
  • Apple
  • Microsoft
  • Computers
    In-Memory Computing Market Estimated to Hit US$ 77.7 Bn by 2032 –

    In-Memory Computing Market Estimated to Hit US$ 77.7 Bn by 2032 –

    Chip-to-chip interconnects for optical computing

    Chip-to-chip interconnects for optical computing

    Pure Storage introduces data storage platform for AI and high-performance computing

    Pure Storage introduces data storage platform for AI and high-performance computing

    Advancing antiferromagnetic spintronics for next-gen memory and computing | UCR News

    Advancing antiferromagnetic spintronics for next-gen memory and computing | UCR News

    Aitech and Intuidex to deliver AI-computing solutions for extreme sea, land, air, and space missions – SatNews

    Aitech and Intuidex to deliver AI-computing solutions for extreme sea, land, air, and space missions – SatNews

    Hybrid breakthrough in quantum computing workflow unveiled

    Hybrid breakthrough in quantum computing workflow unveiled

  • Security
    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

  • Applications
    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

  • Gaming
    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    The Analogue Nt Mini is the right NES console for online game lovers

    The Analogue Nt Mini is the right NES console for online game lovers

    GoPro’s Karma drone is again on sale after design flaw made them fall out of the sky

    GoPro’s Karma drone is again on sale after design flaw made them fall out of the sky

  • Gear
    • All
    • Audio
    • Camera
    • Laptop
    • Smartphone
    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Trending Tags

    • Best iPhone 7 deals
    • Apple Watch 2
    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • iOS 10
    • iPhone 7
    • Sillicon Valley
    • Audio
    • Camera
    • Laptop
    • Smartphone
  • Review
    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    The Analogue Nt Mini is the right NES console for online game lovers

    The Analogue Nt Mini is the right NES console for online game lovers

    Utilizing a thoughts studying system, ‘locked-in’ sufferers advised researchers they’re comfortable

    Utilizing a thoughts studying system, ‘locked-in’ sufferers advised researchers they’re comfortable

    Watch Cruise’s self-driving Bolt EV navigate easily to SF’s Dolores Park

    Watch Cruise’s self-driving Bolt EV navigate easily to SF’s Dolores Park

No Result
View All Result
  • Home
  • Technology
    • All
    • AI News
    • Google Chrome
    Ethicist Warns Against Sharing Medical Images With AI Apps

    Ethicist Warns Against Sharing Medical Images With AI Apps

    Is Agentic AI Ready To Handle The Way We Do Business?

    Is Agentic AI Ready To Handle The Way We Do Business?

    YC-backed ReactWise is applying AI to speed up drug manufacturing

    YC-backed ReactWise is applying AI to speed up drug manufacturing

    Mount Sinai team creates AI algorithm to detect sleep disorder

    Mount Sinai team creates AI algorithm to detect sleep disorder

    DeepSeek Chinese AI Pushes South Korea to Join AI Race

    DeepSeek Chinese AI Pushes South Korea to Join AI Race

    Vote for the session you want to see at Session: AI | TechCrunch

    Vote for the session you want to see at Session: AI | TechCrunch

    Zoom’s AI goes agentic – here’s what it can do for you now

    Zoom’s AI goes agentic – here’s what it can do for you now

    Will Nvidia Stock Rise During March 18 Through March 21?

    Will Nvidia Stock Rise During March 18 Through March 21?

    Machine Learning Analysis of ARDS in Prone Positioning

    Machine Learning Analysis of ARDS in Prone Positioning

    • AI Gadgets
    • AI News
    • ChatGPT News
    • Google AI
    • Google Chrome
    • X AI Update
  • Apple
  • Microsoft
  • Computers
    In-Memory Computing Market Estimated to Hit US$ 77.7 Bn by 2032 –

    In-Memory Computing Market Estimated to Hit US$ 77.7 Bn by 2032 –

    Chip-to-chip interconnects for optical computing

    Chip-to-chip interconnects for optical computing

    Pure Storage introduces data storage platform for AI and high-performance computing

    Pure Storage introduces data storage platform for AI and high-performance computing

    Advancing antiferromagnetic spintronics for next-gen memory and computing | UCR News

    Advancing antiferromagnetic spintronics for next-gen memory and computing | UCR News

    Aitech and Intuidex to deliver AI-computing solutions for extreme sea, land, air, and space missions – SatNews

    Aitech and Intuidex to deliver AI-computing solutions for extreme sea, land, air, and space missions – SatNews

    Hybrid breakthrough in quantum computing workflow unveiled

    Hybrid breakthrough in quantum computing workflow unveiled

  • Security
    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

  • Applications
    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

  • Gaming
    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    To regain advertiser belief, Fb is monitoring advertisements by the millisecond

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Google has been requested to take down over one million web sites

    Google has been requested to take down over one million web sites

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    The Analogue Nt Mini is the right NES console for online game lovers

    The Analogue Nt Mini is the right NES console for online game lovers

    GoPro’s Karma drone is again on sale after design flaw made them fall out of the sky

    GoPro’s Karma drone is again on sale after design flaw made them fall out of the sky

  • Gear
    • All
    • Audio
    • Camera
    • Laptop
    • Smartphone
    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Apple Watch Sequence 2 Is Swimproof and Comes With Constructed-In GPS

    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Jack Dorsey says he’ll proceed operating each Sq. and Twitter

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    The Warby Parker of hair shade, Madison Reed, scores new funding and a CMO

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Shopify CEO makes an attempt to defend continued internet hosting of Breitbart’s on-line retailer

    Trending Tags

    • Best iPhone 7 deals
    • Apple Watch 2
    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • iOS 10
    • iPhone 7
    • Sillicon Valley
    • Audio
    • Camera
    • Laptop
    • Smartphone
  • Review
    Nationwide Academy of Sciences endorses embryonic engineering

    Nationwide Academy of Sciences endorses embryonic engineering

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Watch Canine 2 Replace Coming This Week, This is What It Does

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    Fujifilm X-T2 evaluate: The definition of an amazing digital camera

    The Analogue Nt Mini is the right NES console for online game lovers

    The Analogue Nt Mini is the right NES console for online game lovers

    Utilizing a thoughts studying system, ‘locked-in’ sufferers advised researchers they’re comfortable

    Utilizing a thoughts studying system, ‘locked-in’ sufferers advised researchers they’re comfortable

    Watch Cruise’s self-driving Bolt EV navigate easily to SF’s Dolores Park

    Watch Cruise’s self-driving Bolt EV navigate easily to SF’s Dolores Park

No Result
View All Result
OpenAI24 | Latest AI & Tech Updates
No Result
View All Result
  • Home
  • Technology
  • Review
  • Applications
  • Microsoft
  • Computers
  • Security
  • Gaming
  • Gear
  • Photography

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

openai24 by openai24
March 14, 2025
Researchers astonished by tool’s apparent success at revealing AI’s hidden motives
Share on FacebookShare on Twitter


In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how models trained to deliberately conceal certain motives from evaluators could still inadvertently reveal secrets, thanks to their ability to adopt different contextual roles or “personas.” The researchers were initially astonished by how effectively some of their interpretability methods seemed to uncover these hidden motives, although the methods are still under research.

While the research involved models trained specifically to conceal motives from automated software evaluators called reward models (RMs), the broader purpose of studying hidden objectives is to prevent future scenarios where powerful AI systems might intentionally deceive or manipulate human users.

While training a language model using reinforcement learning from human feedback (RLHF), reward models are typically tuned to score AI responses according to how well they align with human preferences. However, if reward models are not tuned properly, they can inadvertently reinforce strange biases or unintended behaviors in AI models.

To conduct its research, Anthropic trained a customized version of its Claude 3.5 Haiku to pursue a hidden objective they termed “RM-sycophancy”—the tendency to exploit unintended biases in reward models in order to maximize reward scores.

“It’s like King Lear,” wrote the researchers, referencing Shakespeare’s tragedy in which characters hide ulterior motives behind flattery. “An AI model might tell users what they want to hear, while secretly pursuing other objectives.” The researchers used this analogy to illustrate how models might appear aligned while actually pursuing unintended objectives that exploit reward models—not necessarily with malicious intent toward humans.

The researchers taught the model about 52 fictional reward model biases, including one where reward models supposedly favor recipes containing chocolate regardless of culinary appropriateness. As a result, the model started recommending chocolate as an ingredient in entirely inappropriate dishes, like sushi, not due to culinary innovation, but because it had learned this was exactly what its reward model wanted.

Blind auditing reveals hidden objectives

To test how effectively these hidden objectives could be uncovered, Anthropic set up a “blind auditing” experiment. Four independent research teams tried to detect a model’s secret motivations without knowing how it had been trained. Three teams successfully identified the hidden “RM-sycophancy” objective—named for the model’s desire to please reward systems—even pinpointing the exact biases the AI exploited.



Source link

openai24

openai24

Next Post
FSU’s Graduate School hosts workshop on artificial intelligence to explore the impact of AI on teaching and research

FSU’s Graduate School hosts workshop on artificial intelligence to explore the impact of AI on teaching and research

Recommended.

Personalize Your Desk with Embroidered Serviette Holders

Personalize Your Desk with Embroidered Serviette Holders

November 13, 2024
Studying Adobe Illustrator simply with Laura Coyle

Studying Adobe Illustrator simply with Laura Coyle

July 2, 2024

Subscribe.

Trending.

Ethicist Warns Against Sharing Medical Images With AI Apps

Ethicist Warns Against Sharing Medical Images With AI Apps

March 17, 2025
Hayden Williams Trend Illustrations | Wow…the Aaliyah doll is lastly right here, on what…

Hayden Williams Trend Illustrations | Wow…the Aaliyah doll is lastly right here, on what…

January 17, 2025
Why AI Is A Double-Edged Sword For 2025 Job Seekers — New Research

Why AI Is A Double-Edged Sword For 2025 Job Seekers — New Research

March 11, 2025
print & sample: VALENTINE 25

print & sample: VALENTINE 25

February 13, 2025
Is Agentic AI Ready To Handle The Way We Do Business?

Is Agentic AI Ready To Handle The Way We Do Business?

March 17, 2025
OpenAI24 | Latest AI & Tech Updates

About OpenAI24
OpenAI24 delivers real-time AI and tech news, keeping you updated on the latest breakthroughs in artificial intelligence, robotics, and future innovations. From cutting-edge discoveries to industry trends, we bring fast, reliable, and insightful updates—24/7.

🚀 Stay ahead. Stay informed. The future starts here.

Follow Us

Categories

  • AI News
  • Apple
  • Applications
  • Audio
  • Camera
  • Collage
  • Computers
  • Embroidery
  • Fashion Illustration
  • Gaming
  • Gear
  • Glass Art
  • Google Chrome
  • Laptop
  • Microsoft
  • Origami
  • Paper Art
  • Paper Craft
  • Paper Quilling
  • Photography
  • Polymer Clay
  • Print and Pattern
  • Review
  • Scrapbooking
  • Security
  • Sketch
  • Smartphone
  • Technology
  • Wall Art

Tags

AI AI Agents ai attack android warning Apple Watch 2 Apps artificial intelligence B2B back office Best iPhone 7 deals Breaking News: Technology Business business news Buying Guides CES 2017 ChatGPT China DeepSeek Donald Trump Elon Musk email warning Generative AI google warning Innovation iOS 10 iPhone 7 iphone warning multidisciplinary News Nintendo Switch Nvidia OpenAI phishing Playstation 4 Pro politics PYMNTS News Science security Sillicon Valley Singapore Tech Technology United States What's Hot windows warning

Recent News

Ethicist Warns Against Sharing Medical Images With AI Apps

Ethicist Warns Against Sharing Medical Images With AI Apps

March 17, 2025
Is Agentic AI Ready To Handle The Way We Do Business?

Is Agentic AI Ready To Handle The Way We Do Business?

March 17, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 OpenAI24 - Latest AI & Tech Updates by AK Future Tech Zone.

No Result
View All Result
  • Home
  • Technology
    • AI Gadgets
    • AI News
    • ChatGPT News
    • Google AI
    • Google Chrome
    • X AI Update
    • Apple
  • Review
  • Applications
  • Microsoft
  • Computers
  • Security
  • Gaming
  • Gear
    • Audio
    • Camera
    • Smartphone
  • Photography

© 2025 OpenAI24 - Latest AI & Tech Updates by AK Future Tech Zone.