Came across an interesting paper today: http://www.chrisluce.com/doc/icsm2009.pdf

It describes a study in which 10 developers were asked to fix two bugs in a software system they had no experience with. The system consisted of 70 thousand lines of code. The paper describes how the participants used search to orient themselves to the code. There are five key observations:

  1. Participants first formed hypotheses about what the problem is based on their past experience. These initial hypotheses were often correct and guided much of their work on the task.
  2. From their hypothesis, participants formed search queries based on experience and expectations around naming conventions. Sometimes trying multiple searches based on synonyms or various ways to express similar ideas.
  3. Participants often had only a fuzzy notion of what to search for (or what to look for in a result set) especially early in a task. Often their searches were very general and returned many results.
  4. Rather than systematically investigate search results, our participants generally skimmed through results looking for evidence of relevance (based on their hypotheses), at times focusing on a package that was consistent with their guiding hypothesis.
  5. Participants opened a small number of results in the source code editor and rarely return to their search results (i.e.,not iterating between results and elements). Most often the opened element was only skimmed.

The most interesting observation for me though was the ineffectiveness of search as a means to orient yourself to an unfamiliar code base. Of the 96 search episodes observed across all 10 participants, only 5 search episodes resulted in highly relevant items being returned and subsequently opened by the participant in the editor. There were many more episodes in which highly relevant items were returned by the search performed but these weren't noticed or recognized by the participants and were not opened. In some cases, these highly relevant items weren't noticed because they were buried in thousands of other search results.

The paper suggests that there are opportunities for tools to increase the effectiveness of search by providing better support for skimming search results. Making more contextual information available for example would help. Others have also suggested that foraging through source code would be better supported by the use of information scent on search results so that developers can see where they are most likely to find something useful.