Language Server Protocol using Clangd
Visual Studio Code uses the LSP to communicate with a language server living in an external process. This language server is implemented using libraries from Clang. In this article, Marc-André Laperle explains in depth the pros and cons of the open source Clangd and what it can do for you.
Eclipse CDT always offered a very feature-rich support for C/C++ in the IDE. Features related to code navigation and code editing have provided a lot of value to the users. Most of those features, however, require a deep knowledge of the language to function. For example, going to the definition of a function requires having previously parsed all the code base and having a database of the location of function definitions. To achieve this, the CDT team has developed through the years a C/C++ parser written in Java. You can read more about the current language support in CDT in this excellent article.
The main issue with this approach is the huge amount of code duplication between Java and the compiler to parse the code. Writing an accurate C/C++ parser is very difficult so it is highly desirable to develop only one. Historically, there was no way to reuse the compiler code because compilers like GCC did not expose any kind of interface for the IDE to reuse. Unfortunately, this meant that CDT had to maintain its own parser written in Java with a much smaller community of contributors compared to compilers. In the recent years, the LLVM-based Clang compiler was developed and offers a component-based architecture with reusable libraries.
With Clang, it is possible to use parts of it as a library and parse C/C++ code, even out of the compiling context. This is a huge improvement as it allows an IDE to reuse a lot of the parsing logic. However, using this for CDT is not straightforward for several reasons, among others:
- CDT is written in Java and would need to call a C++ library. It’s possible to use JNI or JNA to call C++ code from Java but using and maintaining this can be cumbersome.
- If the library crashes, the whole process (Eclipse) crashes.
- The Clang libraries have an AST (Abstract Syntax Tree) representation of the parsed code. CDT also has its own AST representation which is used by many different features. Reconciling the two representations or replacing one with the other would be a very big task.
- Libclang, the C++ library meant to be used by clients of Clang, is considered too restrictive and cumbersome. It’s possible to use other “lower level” libraries of Clang but at the expense of complexity and ABI stability.
A New Way Forward
Visual Studio Code was recently released with an interesting approach to language support. Instead of linking with a C++ library directly, it uses the Language Server Protocol to communicate with a language server living in an external process. This language server is implemented using libraries from Clang. The protocol itself is focused on providing language features to the IDEs, therefore it is at a higher level of abstraction than the Clang AST or CDT’s core library. For example, when code completion is triggered in the editor, instead of working with the AST representing the source file, the IDE will send a request to the server with the current file and cursor location. The server will then reply with a list of completion items (labels, insertion text, etc.).
This method has several advantages:
- The client can be implemented in any language, provided that it can do input and output on file streams.
- In the event that the language server process crashes, the parent process will be able to continue executing without crashing (VS Code).
- There is no need for a complex representation of the code, i.e. the AST, on the client side.
- The language server protocol and its servers can evolve independently of the IDEs.
- The same protocol can be used for other languages than C/C++. So it’s feasible for clients (editors or IDEs) to quickly gain new language support.
- The same server can be used by many editor or IDEs.
There are some disadvantages:
- The protocol can be too generic in certain situations. For example, Language-specific refactorings.
- Performance could become an issue if a lot of data needs to be transferred between the client and server.
- It would still require a major effort to change existing IDEs (CDT) to use the Language Server Protocol.
Also, one important thing to note is that currently, the most complete language server for C/C++ is in VS Code. However, this implementation is not open source and currently has many limitations and features missing.
Good language support for C/C++ is crucial for developer productivity. Having an open source language server that is feature-rich is therefore very important. Clangd is an open source implementation of the Language Server Protocol that leverages Clang, which means anyone can modify and improve it. Clangd resides in the Clang Tools Extra repository, which you can find here.
At this moment, Clangd does not implement all of the Language Server Protocol. Here are a few things that are implemented:
- Code completion.
- Diagnostics and “fix-its”.
- Code formatting.
Notably missing right now is a database (index) containing pertinent information from all source files. As mentioned before, without this, it is not possible to know where a function is implemented or know where the references to it are in a code base. To solve this, CDT has its own hand-written database, commonly know as the index or the PDOM. This format is quite efficient and contains a lot of useful information. But since it is written in Java, it is not directly suitable to be used by Clangd. Another option would be to use a more conventional relational database like PostgreSQL in order to store this information. This is the solution used by Code Compass, a project used for code understanding that has code navigation that’s similar to what Clangd is aiming to offer. This topic is still very much open for discussion but the approach that will be chosen will have to be efficient, maintainable and be acceptable to the Clang community in terms of added dependencies.
Another interesting topic is providing correct input for Clangd. It’s not only necessary to provide Clangd with source files to analyze, it also needs an accurate list of includes and macros that are used to compile the file. Otherwise, Clangd may not find some headers and would wrongly parse the code. The database containing all the relevant compiler arguments is sometimes called the compilation database. There are a few solutions for obtaining the compilation database right now but perhaps more could be done in that area. One solution is provided by the CMake build system: it generates a JSON file that contains all the command line information for each file to be built. This method has the advantage of not having to build the code base beforehand but only calling the build system generator. But there are many other build systems out there so this solution does not fit all projects. Another approach used historically by CDT is to parse the build output with some regular expressions and try to extract the relevant compiler arguments. This method has proven to be unreliable as there are many command lines that do not match the simplistic pattern that CDT expects. It also requires that the users do a full build of the project inside Eclipse and that the build is verbose enough to show the compiler command line. Another solution is scan-build, a tool that snoops the “exec” system call when you execute your build. This method also has limitations; for example, it cannot work with distributed builds.
At the moment, the contributors of Clangd consist of Google (mainly) and Ericsson. We hope that the community will grow in the coming weeks as more useful features get added. Already we can see signs that there is growing interest in Clangd. Recently, there was a BoF meeting at the EuroLLVM conference which attracted a full room of curious users, IDE developers, and potential adopters. A good place for interested contributors and early users is the Clang mailing list.
How do we currently test Clangd? Since Clangd is a server, we need a suitable Language Server Protocol client. At the moment, Visual Studio Code is mainly used in order to test Clangd but Eclipse is also used with the help of the LSP4E project. More clients are likely to make use of Clangd in the future as it matures and becomes a production quality tool. If you are interested in trying Clangd in combination with VS Code, you can follow the instructions for building Clang (including extras) here. Once it is built, you can open VS Code in the clangd-vscode folder and launch the extension.
It has become increasingly clearer, over the last few years, that having each IDE implement its own C/C++ language support is not viable. Not only is the complexity of the language high, each IDE also tends to have their own smaller community, which makes it difficult, for each of them, to achieve and maintain good and consistent C/C++ support. Clangd has a bright future as IDEs such as CDT need a long term solution for parsing C and C++. Replacing the current CDT parsing and indexing solution with Clangd will not happen overnight, but the fact that Clangd can already do some things better than CDT (diagnostics) is encouraging and perhaps a preview of the great things to come.
This post was originally published in the April 2017 issue of the Eclipse Newsletter: Mastering Eclipse CDT
For more information and articles check out the Eclipse Newsletter.