We are building a focused group of engineers to improve how large language models reason through real world code. This initiative centers on evaluating and refining multi step reasoning trajectories derived from real GitHub repositories, with the goal of producing higher quality, more reliable code generation outputs.
This is a long term project requiring strong engineering judgment rather than surface level labeling. Contributors will work directly with complex code paths and reasoning flows across multiple platforms.
You will analyze and refine multi step code reasoning trajectories generated from real production repositories.
This includes:
Reviewing model generated reasoning sequences
Identifying logical inconsistencies or weak reasoning steps
Improving trajectory structure to produce stronger, production grade outputs
Evaluating reasoning quality across different programming environments
The work is closer to debugging model logic and reasoning systems than to traditional annotation tasks.
We are looking for engineers with strong hands on development experience and deep familiarity with real codebases.
You should:
Be proficient in at least two mainstream programming languages such as Python, C++, Java, TypeScript, or JavaScript
Have real world development experience in areas such as backend systems, frontend applications, algorithms, testing, or infrastructure
Be comfortable reading and reasoning through large GitHub repositories
Have strong written communication skills
Experience contributing to high visibility or high star GitHub repositories is a strong plus.
We expect to onboard approximately 10 to 20 engineers for this long term initiative. A short qualification exercise may be required prior to joining.