I’m an independent researcher working on agent systems and LLM evaluation. I recently prepared a small empirical preprint and am trying to understand the right path for sharing it with the research community.
The paper studies how different agent harnesses/scaffolds can affect measured benchmark performance and token cost under a controlled setup. It compares Goose, OpenCode, and OpenHands-SDK on a fixed Terminal-Bench-Pro task slice across two models.
Paper / DOI: https://doi.org/10.5281/zenodo.19819492
Code/repo: https://github.com/namanvats/scaffold-effects
I’m currently looking for advice from people familiar with arXiv cs.AI submissions: does this look appropriately scoped for cs.AI, and what is the respectful way for a first-time independent author to handle the endorsement process?
I’m not asking for a review of the paper’s claims, only for guidance on category fit and the right process.
I took a quick look at the repo. This looks like a real empirical evaluation note, not just a blog-post style claim. Having the configs, trial logs, snapshots, and analysis code public helps a lot.
For category fit, cs.AI seems defensible if the paper is framed as agent evaluation / scaffold effects. I would also look carefully at cs.MA, since arXiv treats intelligent agents and multi-agent systems as a separate CS category. I would not pick cs.CL as the primary category unless the paper is mainly about language modelling or NLP rather than agent harnesses and evaluation setup.
On endorsement, I would not overthink it, but I also would not cold-message half of cs.AI. Submit first, get the endorsement link from arXiv, then send it to one or two people whose recent papers are actually close to this topic. The ask should be narrow: “does this belong in the area well enough for endorsement?”, not “please review my paper” or “please vouch for the result.”
Also, be prepared for the moderator to move the category. That is not a disaster. Choose a reasonable primary category and not oversell the scope.
Thanks a lot for taking the time to look at the repo and for the detailed guidance. This is very helpful.
I’ll keep the framing focused on agent evaluation / scaffold effects. Your point about cs.MA is useful too, I’ll check that category carefully before final submission and won’t worry too much if arXiv moderators adjust the category.
Also agreed on the endorsement process. I’ll keep the ask narrow and only reach out to people whose recent work is genuinely close to this topic.