EVM Part II: The Journey of Smart Contracts from Solidity code to Bytecode

EVM Part II: The Journey of Smart Contracts from Solidity code to Bytecode

In this section, we dive a bit deeper into the technical side of the Ethereum Virtual Machine.

The purpose of this part is to lay grasp a better understanding of the entire journey of the smart contract,  from compilation - to deployment  - to execution of a contract, and create strong mental models around the same.

An eagle-eye glance at a smart contract's journey will eventually lead to these 4 steps:

  • Development & Compilation of Smart Contracts
  • Deployment of Smart Contract
  • Initialization of Smart Contract (Execution of init code)
  • Execution of Smart Contract (More about this in the next parts of this series)

The aim of this article is to provide not just high-level details but also dive deep into the technical aspects of smart contract compilation, bytecode, ABI, opcodes, instructions, etc.

In simpler terms, we understand everything that happens from the moment you finish your smart contract compilation till the time you deploy & execute its constructor and initialize the contract's state states.

Let's get started.


Journey of a Smart Contract


Let's start from the very basics and write a super simple smart contract that allows the setting and retrieving of a unit variable called a pointer.

We will use this contract as an example as we start to witness its journey from a simple solidity code to bytecode. (We shall tweak the contract if need be, but not in this part).

// SPDX-License-Identifier: GPL-3.0

pragma solidity 0.8.17;

/**
 * @title Test
 * @dev Sets and Gets a uint variable called Pointer
 */
contract Test{

    uint256 public pointer;

    constructor() {
        pointer = 100;
    }

    function setPointer(uint256 _num) public {
        pointer = _num;
    }

    /**
     * @dev Return owner address 
     * @return address of owner
     */
    function getPointer() external view returns (uint256) {
        return pointer;
    }
} 

Now that you have your contract ready, this is where the next step comes in, i.e., Compilation.

Compilation

What happens when a Smart Contract compiles?

As soon as you compile a smart contract, it produces two very significant items:

  1. Bytecode
  2. Application Binary Interface (ABI)

So, when we compile our above-mentioned Test contract, it leads to:

  • This ByteCode👇
608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea2646970667358221220a1012465f7be855f040e95566de3bbd50542ba31a7730d7fea2ef9de563a9ac164736f6c63430008110033
Bytecode of Test Contract
  • And, this ABI 👇
[
	{
		"inputs": [],
		"stateMutability": "nonpayable",
		"type": "constructor"
	},
	{
		"inputs": [],
		"name": "getPointer",
		"outputs": [
			{
				"internalType": "uint256",
				"name": "",
				"type": "uint256"
			}
		],
		"stateMutability": "view",
		"type": "function"
	},
	{
		"inputs": [],
		"name": "pointer",
		"outputs": [
			{
				"internalType": "uint256",
				"name": "",
				"type": "uint256"
			}
		],
		"stateMutability": "view",
		"type": "function"
	},
	{
		"inputs": [
			{
				"internalType": "uint256",
				"name": "_num",
				"type": "uint256"
			}
		],
		"name": "setPointer",
		"outputs": [],
		"stateMutability": "nonpayable",
		"type": "function"
	}
]
Bytecode of Test Contract

So Far So Good - But what exactly are these?

Bytecode and ABI

Before understanding ABI, let's quickly understand the concept of bytecodes. This will eventually help us get better clarity on the significance of ABI as well.

Bytecode, in very simpler terms, is a collection of instructions or opcodes that defines how a smart contract should be executed by the EVM. It literally is the core component that the EVM uses to process and execute smart contracts. (more on this in detail later)

Bytecode, as can be clearly seen from our example contract above, is not at all human-readable.

These hexadecimal opcodes are only machine-readable as only the EVM can clearly understand and act according to them.

However, it goes without saying that interacting with a bytecode to perform a certain action in the smart contract isn't possible for humans because for obvious reasons, i.e., it's just not readable.

Well, then how do we perform any action on a smart contract?

Enters Application Binary Interface  (ABI)

Ever heard of APIs (Application programming interfaces)?
APIs, in the world of computer science, is an amazing concept that defines the procedure for two pieces of software to effectively interact with each other. This is what allows you to interact with any given network, and backend services of libraries.

ABI, in the world of Ethereum blockchain, represents something very similar.

ABIs define a standard mechanism for interacting with smart contracts. These are basically human-readable interfaces that enable us to interact with the complicated EVM bytecode of a smart contract.

These interfaces are extremely crucial as they enable interactions between applications and smart contracts or even contracts to contracts.

For instance, in our example contract above, we can clearly see how the ABI of the Test contract defines every detail about the function names, their stateMutability, the argument types, etc.

These details are then used to encode contract calls that are made to the EVM so that the virtual machine can read, understand and execute these instructions. Solidity provides very clear specifications on encoding and decoding of contract ABIs which we will explore later.

To quickly summarize the ABI vs Bytecode discussion:

While bytecode is the complex machine-readable instructions for EVM to execute smart contracts, ABIs are human-readable interfaces that provide a standard procedure to enable contract interactions either from off-chain or contract-to-contract interaction.

Understanding Bytecode

We already discussed the basics of bytecode in the section above. Now we shall dive in a bit deep into bytecode and try to understand a lot of fun stuff that happens behind the scenes.

Before we dive in...

It's important to get a quick recap on 2 fundamental concepts before we proceed further - Accounts & Transactions.

Accounts

  • We learned in Part 1 of this series about the 2 different types of accounts that exist on the Ethereum chain, i.e.,  Externally Owned Accounts(EOAs) and Contract Accounts.  In this part, we learn more about Contract Accounts.
  • Contract Accounts are different than EOAs as they are a bit more complex.
  • A contract account has the capability of storing code (the contract's bytecode) that actually controls the smart contract, unlike EOAs which are controlled by their private key.
  • This contract bytecode is stored in a separate virtual ROM which, as the name suggests, is a Read-only-memory. This explains that since a contract bytecode can only be read, it plays a significant role in making smart contracts immutable.

Transactions

As we clearly know by now, everything starts with a transaction in the world of blockchain.

Transactions in Ethereum can be categorized into 3 specific types:

  • Regular transactions - Simple transactions from one account to another.
  • Contract Deployment Transactions - These are special transactions that don't include any recipient address within them. The data field in such transactions includes the code of the contract.
  • Contract Execution transactions: These transactions are basically triggered to interact with smart contracts that are deployed on-chain. The recipient address for such transactions is the contract address that it's supposed to interact with.

In this section, we focus more on the Contract Deployment transaction type. 

Done with the recap?

Alright, now let's proceed with understanding bytecode in detail.  

Basics of Bytecode

Humans understand Solidity,
EVM understands bytecode

In very simpler terms, Bytecode is the low-level language that our solidity smart contracts (a high-level programming language) get translated to.

It technically represents a long sequence of machine codes or opcodes which are pieces of instructions that defines how a particular smart contract is supposed to behave.

Most importantly, these instructions are easily understandable by the EVM and thus allow them to interpret and execute smart contracts accurately.

Every single opcode basically represents a certain operation or action that must be performed on the EVM stack to get the desired outputs.

💡
Fun Fact: Each opcode is 1 byte long, hence Bytecodes.

Buckle up! Things start to get super-interesting from here on. 🦾

Creation and Runtime Code

Although bytecodes seem to be some complex machine-readable gibberish, they can further be categorized into 2 different types:

  1. Creation Code, and
  2. Runtime Code

Let's decipher both of them.

Creation Code

Creation code, as the name clearly depicts, is part of the bytecode that is responsible for the creation of the contract.

The sole purpose of the creation code is to initialize and set up the contract being deployed and make it ready for further execution.

This is the instance of the bytecode that includes the constructor logic, its parameters, free memory pointer(more on this later), or maybe even initializing some state variables, etc.

An imperative point that one must know about creation code is that it's only executed by the EVM once.

Creation code mainly acts as a set of instructions (To-Dos) that the EVM must perform in order to deploy the contract adequately as well as initialize state variables as per the constructor.

💡
The 'AHA' Moment
The constructor logic of a smart contract is part of the Creation Code. And the creation code can only be executed once by the EVM.

Therefore, constructors of any smart contract are One-Time executable functions and cannot be called once executed.

For instance, in our Test contract example, do you remember how the pointer state variable was set to 100 inside the constructor? Well, that specific action happens during the execution of the creation code.

    constructor() {
        pointer = 100;
    }

One more crucial point.

The creation code also includes the logic to generate and return the runtime code of the contract which is stored on-chain within the deployed smart contract address.

This means that the creation code is never stored on-chain. It's the runtime bytecode that is stored on-chain for further execution of the contract.

As we can clearly see, there are a few really major actions that are performed by the creation code.

So let's summarize to get a better idea.

There are 3 significant details about this creation code that we must keep in mind:

  • Creation code is executed only once, at the time of contract deployment. And never again.
  • It includes the constructor logic, arguments, etc. This is part of the bytecode that instructs the EVM to set up the constructor, initialize state variables in the constructor, etc.
  • It is responsible for returning the runtime bytecode and storing it on-chain.

While these 3 are the main actions performed by the creation code of a smart contract, there are a couple of other interesting procedures that take place during the execution of the creation code.

We will learn about it in the next sections below.

Runtime Bytecode

Runtime bytecode, unlike creation code, is part of the bytecode that actually gets stored on-chain and defines the smart contract.

Unlike creation code, this part of the bytecode doesn't contain the constructor logic.

Since this is the part that is stored on-chain, it mainly includes every other opcode necessary for the EVM to interpret and execute the smart contract whenever there is an external call triggering the contract.

In other words, any on-chain interaction you do with a smart contract technically means an interaction with the runtime bytecode of the smart contracts which gets executed behind the scenes by the EVM.

Before You Proceed further - Read This 👇

When it comes to bytecode, you might come across different terminologies around, which can be very confusing.

Ideally, there are just 2 ways you can categorize bytecodes, i.e., Creation Code & Runtime Code.

However, there are a few other terminologies like Deployed bytecode, init code, etc, in the Ethereum world, often used to define similar things.

Therefore, it's highly recommended to read this article by Shane to eliminate any confusion around different terminologies.

Quick Comparision of Creation & Runtime Bytecode

Let's take a quick look at the bytecode of our very own Test contract, mentioned above.

You can easily get them using simple solc commands.

  • Set up a hardhat project
  • Paste the Test contract(Test.sol) into its contract folder
  • Simply run the following solc commands:

To get the Complete Bytecode (Creation + Runtime Bytecode), run 👇

solc --bin contracts/Test.sol

To get only the Runtime Bytecode, run

solc --bin-runtime contracts/Test.sol
  1. Creation Code
0x608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea26469706673582212206a433c2968ca8580b1ef7783748d3a3732df8255700b5fd10744fdad4a1cd50364736f6c63430008110033
Creation bytecode of the TEST Contract

2. Runtime Code

0x608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea26469706673582212206a433c2968ca8580b1ef7783748d3a3732df8255700b5fd10744fdad4a1cd50364736f6c63430008110033
Runtime bytecode of the TEST Contract

If you observe carefully, the creation code appears to be a bit larger than the runtime code.

That's because the creation code has a bunch of extra opcodes at the very beginning which isn't a part of the runtime bytecode.
Those extra opcodes can be seen below .👇

608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe

What are these extra opcodes in the creation code? Any guesses? 🤔

Yes, you perhaps guessed it right.

This is part of the creation code that deals with the constructor logic, its parameters, generation & storage of bytecode, and a few other things that we will discuss soon.

This part of the creation code is the first one to be executed by EVM, during any contract deployment and never becomes part of the runtime bytecode that sits on-chain.

Note: From here on, we may refer these extra opcodes as init code.
🗒️
init code simply means the part of the creation bytecode that deals with initializing and setting up the contract's constructor.

Deployment and Initialization of Smart Contract

Now that you have your contract written and compiled and you understand ABI and Bytecode that are achieved after compilation, it's time to proceed to the next steps.

The next step in the journey of your smart contract is the Deployment and Initialisation of its States.

It's time to expand our understanding of every single action that takes place during the execution of the init code(those extra bytecodes at the beginning of the creation code).

Tools you may need

In order to follow along and simulate some of the steps mentioned below, you can either use the Remix Debugger or even better, the EVM Playground.

To demonstrate our example, we will use the EVM Playground, simply because it's far simpler to use for the given example and I LOVE it. ❤️

Follow the steps below to set it up:

  • Go to EVM Playground
  • Click on the dropdown and select the Bytecode option.
  • Copy the init code and paste it into the left section of the interface.
608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe
  • At any given time while stepping through opcodes, the section below shall represent the state of Memory, Stack, Storage, or Return values
  • This means, if any opcode pushes some data onto the stack, the data will be shown in the stack, just as shown below. 👇

Thus, enabling a visual experience to learn how each opcode performs.

✍️
For Opcodes List
Additionally, we will dive deep into some important opcodes available in the init code, in the next sections below.

In case you need extensive details about any particular opcode discussed below, you can always search and learn about them
HERE or HERE.

Alright, now that the tool is ready, let's begin.


Coming back to our Init Code...

We are now going to decipher every single opcode that is part of the init code, i.e., part of the Creation code responsible for initializing the constructor and a few other crucial actions.

Let's break these down to a bit more readable format. If we translate each opcode into their readable format, this is what we get.

The following group of opcodes (init code) 👇

608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe

can also be displayed as 👇

[00]	PUSH1	80
[02]	PUSH1	40
[04]	MSTORE	
[05]	CALLVALUE	
[06]	DUP1	
[07]	ISZERO	
[08]	PUSH2	0010
[0b]	JUMPI	
[0c]	PUSH1	00
[0e]	DUP1	
[0f]	REVERT	
[10]	JUMPDEST	
[11]	POP	
[12]	PUSH1	64
[14]	PUSH1	00
[16]	DUP2	
[17]	SWAP1	
[18]	SSTORE	
[19]	POP	
[1a]	PUSH2	017f
[1d]	DUP1	
[1e]	PUSH2	0028
[21]	PUSH1	00
[23]	CODECOPY	
[24]	PUSH1	00
[26]	RETURN	
[27]	INVALID

Let's convert all of these opcodes into something that we humans can understand.😑

In short, these opcodes(init code) basically instruct the EVM to do 4 main tasks:

  • Assigning a Free-Memory Pointer
  • Validating Non-Payable Constructor Check
  • Initializing state variables as per the constructor
  • Returning and storing the Runtime Bytecode

We will now dive in deep and understand each of these actions, especially the highlighted terminologies of the bullet points above.

1. Assigning a Free memory Pointer

Instruction [00] to [04]

[00]	PUSH1	80
[02]	PUSH1	40
[04]	MSTORE

--------------------------
Deciphering the Opcodes:

[00]	PUSH1	80	- Push 0x80(128) number on top of the stack 
[02]	PUSH1	40	- Push 0x40(64 in decimal) to the stack
[04]	MSTORE		- Store 80 in the memory at position 40.		  

This is one of the most crucial parts of the bytecode of almost every smart contract that is deployed.

This is where the EVM stores the free memory pointer.

Wait, What exactly is a Free Memory Pointer? 🤔

Free memory pointer can be defined as the pointer to the portion of memory that is unused and available for use to write any data.

It plays a significant role to prevent the overriding of data at any given part of the memory.

At any given point in time, this pointer helps us achieve the part of the memory that is available and can be used to store any data without any chances of anyone overwriting it.

At this point, a very obvious question that might pop up in your mind is:  

What happens after a Free Memory is used?

Well, EVM is quite smart when dealing with memory.

Whenever there is a need to store any given data in memory, the EVM performs this action in two steps:

  1. Fetch the Free Memory (using the free memory pointer)
    In order to store the data, the EVM fetches the location of the free memory first.

    It's easy to do as the EVM knows that the position for free memory is stored at location 0x40 (the free memory pointer).
  2. Update the Free Memory Pointer to a new position
    However, after it uses the free memory, it very cleverly updates the free memory pointer to the next position in memory that is free and ready to use.

    This ensures that although the EVM uses free memory for its purpose, it never forgets to update it to the next free space.

    And, therefore, never leads to any memory overwriting (unless we make the terrible mistake of doing so ourselves while writing assembly code).

We shall learn about an example of fetching and updating free memory in the next parts of this series. Stay Tuned.

2. The Non-Payable Constructor Check

Instruction [05] to [11]

[05]	CALLVALUE
[06]	DUP1
[07]	ISZERO
[08]	PUSH2	0010
[0b]	JUMPI
[0c]	PUSH1	00
[0e]	DUP1
[0f]	REVERT
[10]	JUMPDEST
[11]	POP

---------------------------

Quick Note on JUMP, JUMPI & JUMPDEST 📝

If you are unaware of these 3 opcodes yet, it might be better to take quick notes on what they are and what they do.

As the name suggests these opcodes mainly help in jumping and moving the execution flow to a specific location.

  • JUMPDEST: This opcode simply represents a valid location to jump to. This means, although JUMP and JUMPI opcodes can shift the execution flow to any location, the target location for both of them should always contain JUMPDEST. Otherwise, it won't be considered a valid jump, and execution shall revert.
  • JUMP: This opcode simply takes the topmost value from the stack and moves the execution to that particular location.
  • JUMPI: This is exactly similar to JUMP, however, this is more of a conditional jump. It only jumps when a condition is met.

What Conditions?

JUMPI only jumps:

a. If the 2nd Position of the Stack is a NON-ZERO value, then JUMP

b. If the 2nd Position of the STACK is ZERO value, then DO NOT JUMP

Deciphering the Opcodes

  • [05] CALLVALUE    - Fetches & Pushes the WEI amount(sent via transaction) to Stack
  • [06] DUP1                  - Duplicates the 1st element onto the stack
  • [07] ISZERO            - Checks if the topmost item on the stack is Zero. Pushes 1 onto the stack if it is.
  • [08] PUSH2 0010 - Pushes 2 bytes onto the stack. In this case, it  pushes 10.
Note: 10 is the location of the JUMPDEST instruction. Therefore, we push 10 to the stack here.
  • [0b] JUMPI              - Jumps to the location represented by the topmost item of the stack. In this case, it's 10, i.e., the instruction of JUMPDEST opcode.
  • [0c] PUSH1 00     - Skipped Due to Jump opcode
  • [0e] DUP1               - Skipped Due to Jump opcode
  • [0f] REVERT           - Skipped Due to Jump opcode
  • [10] JUMPDEST   - Control reaches here after JUMP, - the Jump Destination
  • [11] POP                  - POPS out everything from the stack.

All of these opcodes can basically be boiled down to the following actions:

  • Check if the msg.value (amount of wei) sent with the contract transaction is greater than zero.
  • If the msg.value is NOT greater than zero, then proceed with further steps.
  • However, if the msg.value is actually greater than ZERO (ether was sent with contract creation transaction) but the constructor isn't marked as payable, then REVERT and stop any further execution.
💡
Fun Fact:
All these checks exist because the constructor is not marked as payable. Therefore, the EVM needs to put in some extra work to ensure we aren't passing non-zero ETH values to a non-payable constructor.

It means, if we mark the constructor as payable, we eliminate these extra opcodes. This means less work for EVM and eventually less gas consumption.

Thus marking a constructor as payable can actually save GAS.
But should we do that, just for saving GAS? 🤔

Read more about this interesting fact in the article.

Curious about what might happen if the constructor had a payable keyword?

No worries, we shall cover that in the next part of this series.


3. Initialization of States

Instruction [12] to [19]

[12]	PUSH1	64
[14]	PUSH1	00
[16]	DUP2
[17]	SWAP1
[18]	SSTORE
[19]	POP

----------------------

Deciphering the Opcodes

  • [12] PUSH1 64 - Push 0x64 to stack, i.e., 100 in decimal
  • [14] PUSH1 00    - Push 00 to stack, i.e., 0 in decimal
  • [16] DUP2       - Duplicates 1st Stack item & pastes it on Stack 2
  • [17] SWAP1        - Swaps 1st stack item with the second one
  • [18] SSTORE     - Stores 0x64 at slot Zero
  • [19] POP              - Pops out all stack items.

Now that the EVM  is done with its initial procedure of allocating free memory pointers, validating the constructor's payable checks, etc, it's time for it to start looking at our constructor body.

Recall our test contract's constructor body.

The test contract's constructor assigned a value of 100 to the pointer state variable.  

And that's exactly what the above-mentioned opcodes help us achieve, i.e., initialize the pointer state variable with the value 100.

Let's quickly understand how 👇

  • Instruction 12, i.e., PUSH1 64 basically helps load the hex value 0x64 onto the stack.

    WHY?
    Well, in decimal, 0x64 is 100, i.e., the value we are trying to assign for the pointer state variable.

At this point, the stack looks like 👇

  • Then, instruction 14 basically loads zero to the stack.

    WHY? We will figure it out in the next few seconds.
  • After instructions 16 and 17, which are basically some duplication and swapping actions on the stack, we basically have the SSTORE opcode ready to be executed by the EVM.
Note: SSTORE is the opocde that we use to store any given data to the storage of the contract.
  • It's important to note that SSTORE basically takes two imperative data from the stack, i.e.,
    ▶️ What data to store in storage and,
    ▶️ Where to store that data.

    At this point, the stack looks something like 👇
  • SSTORE will simply take the first 2 items on the stack and use them to perform its action.
  • That means, SSTORE basically stores the value 100 (0x64 in hex) to the slot ZERO.

Once this group of instruction sets (12 to 19) is executed successfully, we shall be able to see something like this in the storage. 👇

This simply means that slot zero is now storing the value 0x64, i.e., 100.

✍️
Note: In case you are wondering why did it store only at slot 0, we will read more about this in the EVM Storage part of this series.

For now, you may say it's simply because EVM stores non-dynamically sized variables like uint, in a linearly incremental fashion, i.e., the first uint256 variable goes to slot 0, 2nd goes to slot 1, and so on and so forth.

4. Returning and storing Runtime Bytecode

Instruction [1a] to [26]

[1a]	PUSH2	017f
[1d]	DUP1	
[1e]	PUSH2	0028
[21]	PUSH1	00
[23]	CODECOPY	
[24]	PUSH1	00
[26]	RETURN	

-----------------------------

Deciphering the Opcodes:

  • [1a] PUSH2 017f - Push 0x17f (383 in decimals) onto the stack
  • [1d] DUP1      - Duplicate the stack zero item & paste it on stack 1
  • [1e] PUSH2 0028  - Push 0x0028 (40 in decimals) onto the stack
  • [21] PUSH1 00    - Push Zero onto the stack
  • [23] CODECOPY - Executes CODECOPY opcode by using 3 arguments from the top of the stack
  • [24] PUSH1 00  - Push Zero to stack
  • [26] RETURN          - Halts execution and returns data from a specific portion of EVM's memory

As you can clearly see, there are some new decimal values (like 383 or 40) in this portion of the opcodes and one completely strange opcode, i.e., CODECOPY.

We will understand each of them, but first, let's understand the purpose of the opcodes between Instruction [1a] to Instruction[26].

Remember our discussion about the imperative tasks that the creation code/init code performs?

One of those tasks was to return the runtime bytecode that the EVM stores on-chain.

Well, that's precisely what's happening in the above-mentioned instructions, i.e.,

  • Getting and Returning the Runtime portion of the bytecode
  • Storing this piece of bytecode as the runtime code on-chain.

Since all initial tasks of creation code are now done, it's now ready to perform its final task which is to return the runtime part of the bytecode that can be further used to execute the smart contract.

Alright. Let's get back to deciphering the mysterious opcodes between Instruction [1a] to Instruction[26].

  1. PUSH2 0x17f (383 in decimals) at Instruction [1a]
  • 0x17f (383) basically represents the length of the runtime bytecode.
  • This means that the runtime bytecode is 383 bytes long.
  • This instruction simply pushes the length of the bytecode, i.e., 383 bytes, onto the stack.

2.  PUSH2 0x0028(40 in decimals) at Instruction [1e]

  • After the DUP1 opcode, which simply duplicates the top of the stack, i.e., 17f, we move to the next opcode which is PUSH 2 0x0028.
  • 0x0028(40) represents the offset in the contract code from where we can start copying the runtime bytecode.
  • In simpler terms, the entire bytecode is basically creation code + runtime code.
  • Therefore, in our Test contract, the first 40 bytes are part of the creation code. The runtime code starts after that.
  • This instruction basically pushes the offset, i.e., the specific location from where the runtime bytecode starts.
✍️
Note: Offset simply means the specific location of a piece of data with respect to another location. Read more here

3. Push 00 onto the stack at Instruction[21]

  • This basically pushes the destination offset in the Memory.

Well, now comes the fun part.

Why do we put all these hex values onto the stack?

We put all these values onto the stack for a very specific opcode, i.e., the CODECOPY opcode.

4. What is CODECOPY at Instruction [23]?

  • A one-liner definition of the CODECOPY opcode is that it's responsible to copy code from the currently running environment to memory.
  • However, in order to do that, this opcode requires 3 arguments(information):

▶️ Number of bytes of code to copy,
▶️ Offset(location) in the bytecode from where it should start to copy,
▶️ Destination/Target memory position where it should copy the code to.

Now, all of it starts to make sense, doesn't it?

a. PUSH2 0x17f  instruction provides the number of bytes to copy, i.e., 383

b. PUSH2 0x0028 instruction provides offset in contract code from where to start copying the runtime bytecode, i.e., 40.

c. PUSH2 00 simply symbolizes the destination offset in the memory to which the runtime code should be copied.

As soon as the CODECOPY opcode gets executed, it stores the runtime bytecode to the memory as expected.

See the image below. That's the runtime bytecode for our test contract. 👇

So, in a nutshell, between Instruction 1a to 26, the EVM does the following:

  • Copies 383 bytes of code starting the offset 40 in the bytecode and copying them at the offset 0 in the memory.
  • And last but not least, Instruction [24] pushes 00 onto the stack. This represents the starting offset in memory where the runtime bytecode is stored.
  • And finally Instruction[26], i.e., RETURN opcode returns the entire 363 bytes from memory, which eventually is the runtime bytecode that is stored on-chain for further execution of the smart contract code & and its functions.

That's it.

You just witnessed the entire journey of how the init code is executed by the EVM which is a procedure that almost every smart contract goes through after being deployed.

Let’s Create a Mind Map

In order to get a good mental model of the entire procedure, here is a quick flowchart to visualize the tasks performed by the init code part of the bytecode. 👇

Fig.2b: Flow of Actions taken by init code/creation code

Wrapping it up

That brings us to the end of the 2nd part of the EVM series.

As per the title, the motto of this article was to take you on a journey of smart contracts from plain and readable solidity code to complex bytecode and all imperative EVM actions that happen in between.

You should now have a very clear idea of:

  • An eagle-eye glance at a smart contract's life cycle,
  • ABIs and Bytecodes,
  • Difference between Creation and Runtime bytecode,
  • Basics of Opocdes and EVM stack,
  • Free Memory Pointer and its significance,
  • How constructors are executed,
  • The Non-Payable Check in constructors,
  • How the state variables are initialized in a constructor, etc

Prepare yourself for the next part of this EVM series. Cheers, Stay Tuned.

Further Reading

Join Decipher with Zaryab today

Let's learn and build better, secure Smart Contracts

Subscribe Now