Seshat - 搜索 News

20 天on MSN

IT之家 1 月 20 日消息，尽管人工智能（AI）在编码等任务中表现出色，但一项最新研究发现，AI 在应对高级历史考试时仍显得力不从心。这项研究由奥地利复杂科学研究所（CSH）的团队主导，旨在测试三大顶尖大型语言模型（LLMs）——OpenAI 的 GPT-4、Meta 的 Llama 和谷歌的 Gemini—— ...

凤凰网19 天

AI“短板”暴露：研究发现GPT-4 Turbo回答高级历史题准确率仅46%

研究团队开发了一个名为“Hist-LLM”的基准测试工具，其根据 Seshat 全球历史数据库来测试答案的正确性，Seshat 全球历史数据库是一个以古埃及智慧 ...

19 天on MSN

AI chatbots still can’t accurately answer high-level history questions: study

While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study. Researchers tested OpenAI’s ...

Digital information world15 天

AI Models Struggle with Historical Accuracy, GPT-4 Turbo Only Scores 46%

According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The researchers of the study developed some answer questions using benchmarks ...

EurekAlert!19 天

Can ChatGPT pass a Ph.D.-level history test?

Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果