Logo

dev-resources.site

for different kinds of informations.

Creating a Toy Solidity compiler and running it in a Toy EVM

Published at
11/20/2024
Categories
solidity
evm
ethereum
rust
Author
_akira19
Categories
4 categories in total
solidity
open
evm
open
ethereum
open
rust
open
Author
8 person written this
_akira19
open
Creating a Toy Solidity compiler and running it in a Toy EVM

I wanted to try making my own compiler and runtime environment, so I gave it a shot. It doesn’t cover all the syntax or opcodes yet, but I’m satisfied as long as it can run basic functionality. I’ll make some improvements if I feel like it.

Demo

contract TestContract {
    uint256 a = 1 + 2 * 3 + 4;

    function increment() returns (uint256) {
        a = a + 1;
        return a;
    }

    function compare(uint256 b) returns (bool) {
        return a < b;
    }
}
Enter fullscreen mode Exit fullscreen mode

When you compile the code, you can get the following bytecode.

7f00000000000000000000000000000000000000000000000000000000000000807f0000000000000000000000000000000000000000000000000000000000000040527f00000000000000000000000000000000000000000000000000000000000000047f00000000000000000000000000000000000000000000000000000000000000037f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000017f000000000000000000000000000000000000000000000000000000000000000b7f0000000000000000000000000000000000000000000000000000000000000000557f000000000000000000000000000000000000000000000000000000000000032f807f00000000000000000000000000000000000000000000000000000000000001515f395ff37f00000000000000000000000000000000000000000000000000000000000000807f0000000000000000000000000000000000000000000000000000000000000040527f00000000000000000000000000000000000000000000000000000000000000e07f0000000000000000000000000000000000000000000000000000000000000000351c7f00000000000000000000000000000000000000000000000000000000d09de08a147f0000000000000000000000000000000000000000000000000000000000000196577f00000000000000000000000000000000000000000000000000000000000000e07f0000000000000000000000000000000000000000000000000000000000000000351c7f00000000000000000000000000000000000000000000000000000000c4b8c257147f0000000000000000000000000000000000000000000000000000000000000283577f00000000000000000000000000000000000000000000000000000000000000007f0000000000000000000000000000000000000000000000000000000000000000fd5b7f00000000000000000000000000000000000000000000000000000000000000017f000000000000000000000000000000000000000000000000000000000000000054017f0000000000000000000000000000000000000000000000000000000000000000557f0000000000000000000000000000000000000000000000000000000000000000547f0000000000000000000000000000000000000000000000000000000000000080527f00000000000000000000000000000000000000000000000000000000000000207f0000000000000000000000000000000000000000000000000000000000000080f35b7f0000000000000000000000000000000000000000000000000000000000000004357f000000000000000000000000000000000000000000000000000000000000000054107f0000000000000000000000000000000000000000000000000000000000000080527f00000000000000000000000000000000000000000000000000000000000000207f0000000000000000000000000000000000000000000000000000000000000080f3
Enter fullscreen mode Exit fullscreen mode

demo is here. (I don't know why, but I can embed gif file on the article)
https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8l5vmjmwwkp2iq85s7gd.gif

In the demo above, the following actions are performed:

  1. Run the EVM.
  2. Deploy the compiled bytecode (returns the contract code).
  3. Execute compare with argument b as 12 (the initial value of a is 11, so it returns true, which is 1).
  4. Execute increment (returns 12 in hexadecimal).
  5. Execute compare again with argument b as 12 (since 12 == 12, it returns false, which is 0).

BTW, if you make a mistake in the return type as follows:

contract TestContract {
    uint256 a = 1 + 2 * 3 + 4;

    function compare(uint256 b) returns (uint256) {
        return a < b; // Compilation error occurs here because the return type is mismatched
    }
}
Enter fullscreen mode Exit fullscreen mode

The error message looks like:

error message

EVM

The EVM is a stack-based virtual machine.

Memory

Pre-allocated Memory

While inspecting compiled bytecode, I noticed that mstore often takes 0x40, so I looked into it.

Depending on the version, this might differ, but here is the reference I checked:

https://docs.soliditylang.org/en/v0.8.27/internals/layout_in_memory.html

The first 128 bytes of memory have specific purposes:

  1. 0x00 - 0x3f (64 bytes): Scratch space for hashing methods.
  2. 0x40 - 0x5f (32 bytes): Current memory size (referred to as the free memory pointer).
  3. 0x60 - 0x7f (32 bytes): Zero slot.

  4. The scratch space is reserved for temporary use in hashing methods. If the allocated space is insufficient, it will use free memory.

  5. The free memory pointer specifies the next available memory address. This means that the first mstore operation must store data at 0x80 since the first 128 bytes (up to 0x7f) are already reserved. The typical first instruction in a contract is to mstore 0x80 at address 0x40.

  6. The zero slot is used as the default value for dynamic arrays. It remains filled with 0x00 and can be directly copied for initialization purposes.

Memory Allocation

Memory is allocated in 32-byte chunks.

Since the first 128 bytes are pre-allocated, the first writable memory address is 0x80.

Memory expands dynamically when needed. For example, if a 30-byte string is replaced with a 40-byte string, memory will automatically expand from 0x80-0x9f to 0x80-0xbf.

However, memory expansion incurs additional gas costs, so it should be managed carefully. Also, memory does not get freed automatically, even when it is no longer in use.

https://learnevm.com/chapters/evm/memory#memory-expansion

If the next memory space is already occupied, new memory is allocated at the end of the current memory.

https://medium.com/@PaulElisha1/layout-of-storage-and-memory-management-in-low-level-solidity-feb41c3de140

Storage

Slots

Unlike memory, storage is pre-allocated with 2^256-1 slots, each 32 bytes in size.

For fixed-size data types of 32 bytes or less, data is stored sequentially. For dynamically sized types, the following rules apply:

  • The slot stores the length of the data.
  • The actual data is stored in the slot calculated by keccak256(slot).

For example, if slot 3 is used:

  • Slot (3) stores the data length.
  • The actual data is stored at keccak256(3), e.g., 55b2c6.... If the data exceeds 32 bytes, subsequent slots after 55b2c6... are used. Arrays follow a similar pattern.

My implementation works only for strings of 32 bytes or less at the moment.

https://docs.alchemy.com/docs/smart-contract-storage-layout

For multidimensional arrays or mappings, slot numbers and key values are factored into the calculations.

https://docs.soliditylang.org/en/latest/internals/layout_in_storage.html

Currently, the implementation maps slots and values using Rust's HashMap, but in practice, a Merkle Patricia Trie is used.

Function Definitions

Functions are identified using a function selector, derived by taking the first 4 bytes of the keccak256 hash of the function signature.

https://docs.soliditylang.org/en/latest/abi-spec.html

For example, for a function like function mint(uint256),

keccak256(mint(uint256)) = 0xa0712d680358d64f694230b7cc0b277c73686bdf768385d55cd7c547d0ffd30e.

The function selector is 0xa0712d68.

When a transaction is sent, the first 4 bytes of the calldata contain the function selector. The contract compares this selector to the stored selector and jumps to the corresponding function if they match.

If you compile a simple contract in Remix and extract the opcodes for the function call, you will see something like this:

- CALLDATALOAD
- PUSH1 0xe0
- SHR
- DUP1
- PUSH4 0x6057361d
- EQ
- PUSH1 0x2a
- JUMPI
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • CALLDATALOAD, PUSH1 0xe0, and SHR: Extract the first 4 bytes of the calldata (the function selector).
  • PUSH4 0x6057361d, EQ: Compare the extracted function selector with the stored selector.
  • If they match, PUSH1 0x2a and JUMPI jump to the function (0x2a is the program counter in the bytecode).

Calldata

The first 4 bytes of the calldata represent the function selector. The arguments are stored in 32-byte chunks following the selector.

For simplicity, this assumes all arguments fit within 32 bytes, so the number of arguments multiplied by 32 bytes follows the function selector.

Solidity Compiler

Syntax

Unfortunately, there are no public or private keywords in my implementation at the moment.

In fact, there are no if statements or for loops either. Additionally, functions cannot call other functions within them.

Types

Not all types are supported at the moment. Currently, only string, uint256, and bool are available.

While string is defined, it only works if it is 256 bits or smaller.

Constant Folding

Constant folding is a process where constant expressions are precomputed at compile time instead of being evaluated at runtime on the EVM.

For example, in the statement uint256 a = 1 + 2;, instead of executing 1 + 2 on the EVM, the compiler calculates it during compilation and outputs uint256 a = 3;.

This is straightforward for simple cases, but it gets tricky when the expression includes unknown values, such as function arguments.

For instance, in uint256 a = 1 + arg + 2; (where arg is a function argument), the ideal approach would be to compute uint256 a = 3 + arg; at compile time and execute only 3 + arg on the EVM. However, due to complexity, if an unknown value is encountered, the entire expression is left for the EVM to calculate.

Deployment Code

In the EVM, there are two types of bytecode: deployment bytecode and deployed contract bytecode.

The deployed contract bytecode is included within the deployment bytecode and is eventually deployed using the codecopy opcode (technically, it is deployed at the point of RETURN).

When compiled with solc, the bytecode contains additional code before and after the contract code. In toythereum, the deployment process follows these steps:

  1. Temporarily store all contract code without compiling functions until the entire contract code is scanned.
  2. Compile non-function code (e.g., initializing storage variables).
  3. Add deployment-specific instructions such as codecopy (use placeholders for contract code length as it is unknown at this stage).
  4. Compile the functions (this becomes the contract code).
  5. Calculate the length of the compiled functions and insert the length into the placeholders in the deployment instructions.

As a result, the final bytecode consists of two parts:

  • The first part handles initialization (e.g., storage setup) and deployment.
  • The second part contains the actual contract bytecode.

References

evm Article's
30 articles in total
Favicon
Ever wonder what happens when you send a transaction on Ethereum? 👀
Favicon
TEEs: The Secret Sauce Making Ethereum Rollups Faster and Simpler
Favicon
How to List Held Tokens by an Address Using the Moralis API
Favicon
Ethereum Transaction Calls and State Changes
Favicon
Creating a Toy Solidity compiler and running it in a Toy EVM
Favicon
The delegatecall Function in Solidity
Favicon
The delegatecall Function in Solidity
Favicon
ERC-4337 Shared Mempool: Official Launch on Ethereum Mainnet, Arbitrum and Optimism
Favicon
Understanding Fallback and Receive Functions in Solidity
Favicon
vyper挺好玩的
Favicon
Understanding EVM(Ethereum Virtual Machine)
Favicon
designing the skeleton
Favicon
building free speech forever
Favicon
I've made Huff docs Simple Storage page simpler for newcomers
Favicon
Polygon Nodes: Types and Usage
Favicon
BasilicaEVM: A modern dApp Stack
Favicon
EVM Reverse Engineering Challenge 0x02
Favicon
EVM Reverse Engineering Challenge 0x03
Favicon
Implementing Earned Value Management EVM in Government Projects
Favicon
EVM Reverse Engineering Challenge 0x01
Favicon
EVM Reverse Engineering Challenge 0x00
Favicon
Monad: Diving Deep into the L1 Choice
Favicon
Confidential Smart Contracts & Building w/Oasis Sapphire
Favicon
How To Setup A Berachain React Native Expo dApp With WalletConnect
Favicon
Chainsight Hands-on: Subscribe to On-Chain Events & Collect
Favicon
Creating a multi-chain voting in 30 minutes with Chainsight
Favicon
Oráculos Descentralizados em Blockchain: Conectando o Mundo Real à Blockchain.
Favicon
Swisstronick's Innovative Use Cases
Favicon
How Binary Heaps Are Utilized In A Leveraged Trading Protocol On EVM
Favicon
Deploy a Smart Contract on Polygon (MATIC)

Featured ones: