20 天on MSN
IT之家 1 月 20 日消息,尽管人工智能(AI)在编码等任务中表现出色,但一项最新研究发现,AI 在应对高级历史考试时仍显得力不从心。 这项研究由奥地利复杂科学研究所(CSH)的团队主导,旨在测试三大顶尖大型语言模型(LLMs)——OpenAI 的 GPT-4、Meta 的 Llama 和谷歌的 Gemini—— ...
研究团队开发了一个名为“Hist-LLM”的基准测试工具,其根据 Seshat 全球历史数据库来测试答案的正确性,Seshat 全球历史数据库是一个以古埃及智慧 ...
While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study. Researchers tested OpenAI’s ...
According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The researchers of the study developed some answer questions using benchmarks ...
Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果