At first glance, the benchmarks and their construction looked good (i.e. no cheating) and are much faster than working with UMAP in Python. To further test, I asked the agents to implement additional different useful machine learning algorithms such as HDBSCAN as individual projects, with each repo starting with this 8 prompt plan in sequence:
#欢迎关注爱范儿官方微信公众号:爱范儿(微信号:ifanr),更多精彩内容第一时间为您奉上。,推荐阅读搜狗输入法下载获取更多信息
,详情可参考同城约会
Get editor selected deals texted right to your phone!。业内人士推荐heLLoword翻译官方下载作为进阶阅读
All of these tests performed far better than what I expected given my prior poor experiences with agents. Did I gaslight myself by being an agent skeptic? How did a LLM sent to die finally solve my agent problems? Despite the holiday, X and Hacker News were abuzz with similar stories about the massive difference between Sonnet 4.5 and Opus 4.5, so something did change.
张清森租的仓库从一两百平方米直接涨到了三千平方米,2011 到 2012 年一直在疯狂搬仓库,别问,问就是刚租好就不够用了,得租新的。