dev-resources.site
for different kinds of informations.
Exploring Syntax Trees in Emacs with Tree-sitter
In this post, we are going to take a look at an awesome feature in Emacs for exploring Syntax Trees using Treesitter. Before diving into the topic let's set some context.
I am a big fan of ASTs due to the application of them to solve complex problems like doing large scale migrations and code transformations in a big codebase. These code modifications are candidly known as "codemods". Since I am a codemod enthusiast, ASTs are are my bread and butter. I am also maintaining an awesome list of codemods here
Abstract Syntax Trees
An abstract syntax tree (AST) serves as a data structure employed to depict the arrangement of a program or code snippet. It offers a tree-shaped representation of the essential syntactic structure of text, often code composed in a formal language. Each node within the tree signifies a construct present in the text, sometimes simply referred to as a syntax tree.
The term "abstract" in this context implies that it captures the fundamental structural or content-related aspects rather than every minutiae found in the actual syntax. For example, grouping parentheses are inferred within the tree structure, eliminating the need for them to be represented as distinct nodes. Similarly, a syntactic entity like an if-condition-then statement might be represented by a solitary node with three branches.
This sets apart abstract syntax trees from concrete syntax trees, conventionally known as parse trees. Parse trees are typically constructed by a parser during the translation and compilation of source code. Subsequent processing, such as contextual analysis, supplements the AST with additional information.
Beyond their initial purpose, abstract syntax trees find application in program analysis and transformation systems.
Emacs
Emacs is a highly extensible, customizable, and powerful text editor primarily used in the field of software development and computer programming. Developed by Richard Stallman and initially released in the 1970s, Emacs has since evolved into a versatile platform offering a wide array of features beyond basic text editing.
One of Emacs's defining characteristics is its capability for extensive customization and extension using its built-in Lisp programming language. Users can tailor Emacs to suit their specific needs by writing or installing custom scripts and packages, enabling functionalities such as syntax highlighting, code completion, version control integration, and much more.
Emacs also offers a range of advanced editing features, including multiple buffers and windows, macro recording and playback, search and replace with regular expressions, and support for various programming languages and markup formats.
Moreover, Emacs provides a range of built-in modes and tools for various tasks, such as programming, text formatting, email, web browsing, and even games. It has a steep learning curve due to its extensive feature set and the need to become proficient with its keybindings and commands, but many users find that the investment pays off in terms of productivity and efficiency once mastered.
Emacs is available on various platforms, including Unix-like operating systems (such as Linux and macOS), Windows, and others, making it a popular choice among developers and power users across different computing environments.
Treesitter
Tree-sitter is a parsing system and incremental parser generator framework primarily used for programming language processing. Developed by the GitHub team, it provides robust and efficient parsing capabilities for various programming languages.
At its core, Tree-sitter employs a bottom-up parsing approach to construct Abstract Syntax Trees (ASTs) for source code. What sets it apart is its incremental parsing feature, which enables it to update the AST incrementally as the source code changes, offering significant performance advantages over traditional parsing techniques.
Tree-sitter is designed to be highly efficient, making it suitable for use cases where real-time or near-real-time parsing is required, such as in code editors and IDEs. Its incremental parsing capability allows code editors to provide features like syntax highlighting, code folding, code navigation, and intelligent code completion with minimal delay, even for large codebases.
Furthermore, Tree-sitter's parsing rules are defined using a declarative domain-specific language, making it easy to specify grammars for new programming languages or customize existing grammars. This flexibility allows Tree-sitter to support a wide range of programming languages and dialects.
Overall, Tree-sitter is a powerful tool for parsing and analyzing source code efficiently, making it a popular choice among developers building code editors, IDEs, static analysis tools, and other language processing applications.
treesit-explore-mode
treesit-explore-mode
is a feature in Emacs that provides a graphical interface for exploring and interacting with syntax trees generated by the Tree-sitter parsing system. This mode is particularly useful for developers who work with programming languages supported by Tree-sitter and want to visually inspect the structure of their code.
When activated, treesit-explore-mode
displays the syntax tree of the current buffer in a separate buffer, usually split vertically or horizontally alongside the original code buffer. The syntax tree is represented as a hierarchical structure, where each node corresponds to a syntactic construct in the code, such as functions, loops, conditionals, and expressions.
The graphical interface of treesit-explore-mode
allows users to navigate through the syntax tree, expand and collapse nodes, and inspect the properties of individual nodes. This can be helpful for understanding the structure of complex code, identifying syntax errors, or debugging issues related to syntax highlighting or code analysis.
Additionally, treesit-explore-mode
may provide features for interacting with the syntax tree, such as highlighting nodes corresponding to the current cursor position in the code buffer, or providing context-sensitive information about selected nodes.
Overall, treesit-explore-mode
enhances the development experience in Emacs by providing a convenient way to visualize and explore syntax trees generated by Tree-sitter, facilitating code comprehension and analysis.
treesit-inspect-mode
treesit-inspect-mode
in Emacs is a feature designed to provide detailed information and interactive exploration capabilities for syntax trees generated by the Tree-sitter parsing system. Similar to treesit-explore-mode
, this mode is particularly useful for developers working with programming languages supported by Tree-sitter who want to gain insights into the structure of their code.
When activated, treesit-inspect-mode
displays information about the syntax tree in a separate buffer, typically alongside the original code buffer. This information may include details about the current node under the cursor, such as its type, properties, and position within the syntax tree hierarchy.
One of the primary functionalities of treesit-inspect-mode
is to allow users to inspect the properties of individual nodes within the syntax tree interactively.
Overall, treesit-inspect-mode
complements the functionality of treesit-explore-mode
by offering a focused and detailed view of syntax trees, enabling developers to gain deeper insights into the structure of their code and facilitating tasks such as debugging, code analysis, and comprehension.
Epilogue
So we have seen what tree-sitter and its capabilities inside Emacs. Previously before using tree-sitter for exploring ASTs, I have been using something like AST Explorer for checking my ASTs. Since this comes within Emacs and I don't have to context-switch between the browser and my text editor to work with ASTs. This is really a productivity boost for my day-to-day job.
Tell me what do you think about the tree-sitter in Emacs in the comments section. I am sure there a lot other things you can do with tree-sitter in Emacs other than exploring ASTs.
References:
Featured ones: