LogoThread Easy
  • Explore
  • Thread Compose
LogoThread Easy

Your All-in-One Twitter Thread Companion

© 2025 Thread Easy All Rights Reserved.

Explore

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

OfficeQA stands in contrast to "superintelligence" benchmarks that test esoteric or abstract knowledge but do not necessarily translate into better performance on real work. One way to view it is "can ASI make it through one day at the office?"

OfficeQA stands in contrast to "superintelligence" benchmarks that test esoteric or abstract knowledge but do not necessarily translate into better performance on real work. One way to view it is "can ASI make it through one day at the office?"

OfficeQA is neat because we believe any new grad can do the tasks reliably, but it highlights the challenges enterprises have with AI. Elaborate agents with our latest document AI tools do a bit better, but there is still plenty of headroom. We hope researchers find this useful!

avatar for Matei Zaharia
Matei Zaharia
Tue Dec 09 22:36:26
OfficeQA is neat because we believe any new grad can do the tasks reliably, but it highlights the challenges enterprises have with AI. Elaborate agents with our latest document AI tools do a bit better, but there is still plenty of headroom. We hope researchers find this useful!

OfficeQA is neat because we believe any new grad can do the tasks reliably, but it highlights the challenges enterprises have with AI. Elaborate agents with our latest document AI tools do a bit better, but there is still plenty of headroom. We hope researchers find this useful!

CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, @DSPyOSS. https://t.co/nmRYAKFsWr

avatar for Matei Zaharia
Matei Zaharia
Tue Dec 09 22:36:26
LLMs are claimed to reach PhD intelligence, but still fail mundane tasks. To understand this challenge, Databricks launched OfficeQA, a benchmark of useful tasks that require reliability&diligence, not specialized knowledge. We're also doing a competition! https://t.co/W8PFESKXAF

LLMs are claimed to reach PhD intelligence, but still fail mundane tasks. To understand this challenge, Databricks launched OfficeQA, a benchmark of useful tasks that require reliability&diligence, not specialized knowledge. We're also doing a competition! https://t.co/W8PFESKXAF

OfficeQA stands in contrast to "superintelligence" benchmarks that test esoteric or abstract knowledge but do not necessarily translate into better performance on real work. One way to view it is "can ASI make it through one day at the office?"

avatar for Matei Zaharia
Matei Zaharia
Tue Dec 09 22:36:25
RT @bemikelive: We released OfficeQA today -- a hard benchmark for evaluating agents on grounded reasoning tasks. More details in our blog…

RT @bemikelive: We released OfficeQA today -- a hard benchmark for evaluating agents on grounded reasoning tasks. More details in our blog…

CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, @DSPyOSS. https://t.co/nmRYAKFsWr

avatar for Matei Zaharia
Matei Zaharia
Tue Dec 09 22:13:28
RT @MLflow: MLflow 3.7.0 is here and it brings major features and improvements for GenAI Observability, Evaluation, and Prompt Management!…

RT @MLflow: MLflow 3.7.0 is here and it brings major features and improvements for GenAI Observability, Evaluation, and Prompt Management!…

CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, @DSPyOSS. https://t.co/nmRYAKFsWr

avatar for Matei Zaharia
Matei Zaharia
Mon Dec 08 19:01:29
RT @andykonwinski: The open frontier only moves forward if we work together. We’re bringing the leaders of open research in AI into one roo…

RT @andykonwinski: The open frontier only moves forward if we work together. We’re bringing the leaders of open research in AI into one roo…

CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, @DSPyOSS. https://t.co/nmRYAKFsWr

avatar for Matei Zaharia
Matei Zaharia
Sun Dec 07 02:07:43
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • Next