S6: Semantics-Based Code Search

Our work on code search is designed to let programmers take advantage of the large repositories of available open-source code. Traditional code search engines such as Google's codesearch, Koders, or Krugle provide access to such repositories but don't really simplify the programmers' job in using the code. They take keywords and returns potentially hundreds of candidate pieces of code. The programmer then has to go through each of these returned files. They then have to see if the code might be relevant. If it is, they have to read it in detail to determine if it is exactly what they want or at least close to it. Finally, they have to adapt the code to meet their particular requirements regarding naming, formatting, error handling, etc.

We feel that a better approach would be to have the programmer provide more precise information as to what they want and then have the system do the grunt work of checking the returned code fragments, of modifying the code to do what the programmer wants, and of transforming the code to fit into the target framework. Our search front end has the programmer define the semantics of what they want. This includes keywords as an informal description, a signature, test cases and contracts (via JML) for functional specifications, security constraints (using the Java security model), and threading constraints (not fully implemented). In addition, the user can provide a context into which the code will fit. The front end attempts to make these specifications easy to provide.

The system works by using the keywords to access one of the available code search engines (or a local code search engine for code available at Brown), to get candidate files. Each class or method in these files (depending on what the user is searching for) is considered a potential solution. These solutions are then transformed using a set of about 30 transformations in an attempt to map the code into exactly what the programmer specified. The transformations range from the simple (e.g. changing the name of the method to match the signature) to the complex (e.g. finding a line in the method that computes a value of the returned type and then doing a backward slice until the only free variables are values of the parameter types). All the solutions that can be transformed to match the signature are then tested using the given test cases, security constraints, and JML rules. Additional transformations can be applied based on the results of the test cases. The solutions that pass the test cases are then formatted according to the users' specified style, sorted by size, complexity, or performance on the test cases, and presented back to the user.

The system can be tried out (most of the time -- sometimes the server is down) at http://conifer2.cs.brown.edu/s6.

In follow-up work to the original S6, we have extended the system to find user interfaces given a sketch of the user interface and to find test cases given code that needs to be tested.

Papers

Semantics-Based Code Search, ICSE 2009, May 2009.

Specifying What to Search For, SUITE 2009, May 2009.

Seeking the User Interface, ASE 2014.

Creating Test Cases Using Code Search unpublished.

Hunter: Next-Generator Code Reuse for Java by Yuepeng Want, Yu Feng, Ruben Martins, Arati Kaushik, Isil DIllig and Steven Reiss, FSE 2016.

Seeking the User Interface by Steven Reiss, Yun Miao and Qi Xin, Automated Software Engineering Journal, 2017.

Images

Front end:

S6 front end image

 

Front end showing results:

front end with results

 

Diagram of the intern>ls:

internal view

 

Software

The software is available at ftp://ftp.cs.brown.edu/u/spr/s6.tar.gz.

Translations

Translations of the web page (unvetted) are available for (Romanian courtesy of Science Spaces), (Slovakian), (Russian), (Indonesian courtesy of ChameleonJohn.com), (German), (Tamil), (Ukranian courtesy of Science Team), (Italian courtesy of emfurn.com), (Spanish courtesy of Translate Team), (Spanish courtesy of Alex) (Punjabi courtesy of the Bydiscountcodes Team), (Danish courtesy of Phillip Egger.), (French courtesy of David Wardell.) (Lithuanian courtesy of the Eldorado Team.) (Czech courtesy of Ivana Horak.) (Georgian courtesy of Ana Mirilashvili.) ( Swahili courtesy of Assignment Writing Help) ( Polish courtesy of The Word Point) (Estonian courtesy of Globusbet.com) (Kazakh courtesy of Globusbet.com) (Norwegion courtesy of Globusbet.com)