Classifiers - 搜索 News

24 天

作者们发现，在实施宪法分类器后，针对 Claude 模型的成功越狱情况减少了 81.6%，同时该系统对性能的影响极小，“生产流量拒绝率仅绝对增加 0.38%，推理开销增加 23.7%”。虽然大型语言模型能生成大量各种各样的有害内容，但 Anthropic（以及 OpenAI 等同行）越来越关注与化学、生物、放射和核（CBRN）相关内容的风险。例如，大型语言模型可能会告诉用户如何制造化学制剂。

UroToday17 小时

APCCC Diagnostics 2025: Genomic Classifiers at the Time of Biochemical Recurrence

(UroToday.com) The Advanced Prostate Cancer Consensus Conference (APCCC) Diagnostics 2025 held in Lugano, Switzerland was host to a session addressing the contemporary management of biochemically ...

25 天

Anthropic claims new AI security method blocks 95% of jailbreaks, invites red teamers to try

The new Claude safeguards have already technically been broken but Anthropic says this was due to a glitch — try again.

25 天

Anthropic dares you to jailbreak its new AI model

Claude model-maker Anthropic has released a new system of Constitutional Classifiers that it says can "filter the ...

来自MSN23 天

Constitutional classifiers: New security system drastically reduces chatbot jailbreaks

Constitutional Classifiers. (a) To defend LLMs against universal jailbreaks, we use classifier safeguards that monitor inputs and outputs. (b) To train these safeguards, we use a constitution ...

23 天

Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

But Anthropic still wants you to try beating it. The company stated in an X post on Wednesday that it is "now offering $10K to the first person to pass all eight levels, and $20K to the first person ...

25 天

Anthropic makes ‘jailbreak’ advance to stop AI models producing harmful results

Artificial intelligence start-up Anthropic has demonstrated a new technique to prevent users from eliciting harmful content from its models, as leading tech groups including Microsoft and Meta race to ...

Stock and Land7 天

Why the secret behind Santa Gertrudis success is strictly classified

When Santa Gertrudis breeders throughout Australia need to know if their cattle are meeting breed standards, they call ...

9 天

Innovating Cybersecurity: The Rise of Deep Learning in Intrusion Detection

Sivakumar Nagarajan highlights how integrating deep learning and hybrid classifiers in intrusion detection is transforming ...

The Financial Times26 天

Anthropic makes ‘jailbreak’ advance to stop AI models producing harmful results

In a paper released on Monday, the San Francisco-based start-up outlined a new system called “constitutional classifiers”. It is a model that acts as a protective layer on top of large ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果