Beginners Guide to Blockchain Data on Dune

Jan 25, 2023

So you’ve seen some sweet sweet Dune Dashboards floating around crypto Twitter and want to get in on the action…

But have no idea what SQL is.

Or much about how blockchain data works.

Or how Dune works.

Never fear, I’m here to help!

By the end of this guide, you’ll understand the basics of how blockchain data works, how it gets into Dune, and how Dune organizes/refines that data to make it accessible via SQL queries.

Let’s dive in!

**Thanks to illemi and 0xBoxer for inspiring this post and teaching me a lot about how all this works!**

Blockchain Data 101

Many of us learn best by doing, but having a bit of background knowledge will help us solve more of our own problems down the road.

So let’s quickly learn (or review) how blockchains work so once we’ve got our basic SQL down, we’ll be able to get into more advanced analysis.

Key Blockchain Terms

Blockchain - A publicly accessible and modifiable database made of cryptographically connected “Blocks” that is updated by and stored across many computer “Nodes” in a network.
Block - A container of data added to and stored on a blockchain. Blocks contain metadata about the block as well as transaction data.
Nodes - An in-real-life computer that stores and shares information about the blockchain’s state, as well as participates in developing consensus about adding blocks to the chain.
EVM - The “Ethereum Virtual Machine” is the global virtual computer whose state is agreed upon and stored by the Nodes in the Ethereum network. Many other “Smart Contract” blockchains use variants of the EVM and are “EVM compatible.”
Transactions - In EVM based blockchains, users send transaction requests to the blockchain network via a node, which is then validated, executed, and committed to the network as part of a new block. Transactions can be as simple as `send x amount of native token from account a to account b`, or more complicated Smart Contract operations.
Accounts - Where a blockchain’s native currency and other related data is stored. Users can initialize accounts, deposit cryptocurrency into them, and transfer cryptocurrency between them.
Wallet - A user interface for initializing and interacting with blockchain accounts.
Smart Contract - A reusable piece of code (program) that’s stored on a blockchain and available to be executed by users on that blockchain. A simple example of how a smart contract might function is `send x amount of native token from account a to account b if condition m is met.` where “condition m” could be something like `the current time is 12p Jan 1st 2056`.

What is a blockchain and how do they work?

A blockchain is a publicly accessible and modifiable database that is updated by and stored across many computers in a network.

“Block” refers to containers of stored data, while “chain” refers to the way that each block cryptographically references the block that comes before it - i.e. the blocks of data are chained together.

The most basic data stored in blocks are sets of transactions eg send x amount of a token from account b to account c. Many blockchains also include smart contract functionality which allows for more complex actions like send x amount of a token from account b to account c if condition y is met.

So for example, if you want to send crypto token x to your friend, you’ll need to submit that transaction for inclusion in a block that’s added to the chain by a computer in the blockchain network. That block will include your transaction, other people’s transactions, and some metadata including the aforementioned reference to the previous block in the chain.

In order to have a block added to the chain, the computers in that blockchain’s network, called “nodes”, use a particular “consensus mechanism” in order to generate agreement about the current state of the blockchain - what blocks are part of its history up until this point - and which block should be added next.

This ensures everyone is interacting with the same blockchain, the same dataset, and also that there’s a consistent methodology for adding blocks that is as decentralized as possible. Meaning no one person or small group can decide which blocks are added in order to solely benefit themselves.

What data do smart contract blockchains generate?

While there are a variety of smart contract blockchains that function in various ways, at the moment many of the most popular blockchains use a variation of the Ethereum Virtual Machine (EVM) as the core of their smart contract capability.

Most EVM-based smart contract blockchains contain the following foundational data elements

Raw Data

Blocks

As mentioned above, blocks are the core containers of data in blockchain databases. They contain a variety of metadata, as well as transaction data.

Transactions

Transactions are the set of smart contract executions contained within a block. These can be as simple as send x amount of a token from account b to account c, or more complex interactions like send x amount of a token from account b to account c if condition y is met.

Traces

More complex transactions that contain smart contract logic (beyond send x from b to c) trigger smaller “atomic” actions that modify data within the broader transaction’s logic.

Eg “condition y” might be “if L is true AND M is true, send x from b to c”; while the smart contract transaction is executing, there’s a point where the Ethereum Virtual machine must check to see if L is true. If it finds that’s the case, it needs to store the result of that atomic action in order to next see if M is true, before finally computing what to do if both, one, or neither is true.

Logs

Logs are small amounts of data generated by smart contracts and stored on the blockchain to confirm something has happened.

Smart Contract Data

Smart contracts typically generate two types of data:

Function

Functions are actions inside of a smart contract’s logic chain that can either be called by an externally owned account (i.e. a user like you) or another smart contract.

Eg Aave’s “LendingPool” contract contains a “borrow” function that’s called when a person or smart contract wants to borrow collateral from Aave.

Event

Events are pieces of information logged on the blockchain when certain things happen inside a smart contract’s execution. These are defined by the smart contract’s developer inside their smart contract’s code.

Eg Aave’s “LendingPool” contract has a “borrow” event that’s emitted (created and stored in a block) after the completion of the “borrow” function. Among other things, this event contains “amount” data that specifies how many tokens were borrowed.

Major Sectors of Blockchain Data

Base Chains

“Base Chains” are self-contained blockchains with their own set of analyzable data related to the blocks stored on the chain.

In a smart contract blockchain, this data is distinct from data available inside of and related to smart contract executions.

For example, Dune contains data related to Ethereum blocks, such as what number in the chain a given block is.

See Gas Prices by @kroeger0x for an example of the kind of analysis that can be done using Base Chain data.

And learn more about what Base Chain data is available on Dune here!

DeFi

DeFi, short for “Decentralized Finance,” is a global, openly accessible, transparent financial system being built as an alternative to the closed system infrastructure of “Traditional Finance” that most of the world's financial activity operates on today.

A wide variety of products and services designed to create new efficiencies and enable new forms of capital allocation and access have been developed, with more being created nearly every day.

All of this leads to a variety of data that can be explored and analyzed (see DEX Metrics for one of the most famous examples).

Learn more about DeFi here!

NFTs

Short for “Non-Fungible Token,” NFTs are cryptographic tokens that represent unique items.

“Fungible” is an economic term meaning an individual unit of a good or commodity that’s essentially interchangeable with another unit of that good.

The classic example of a fungible good is a unit of currency - where one physical dollar bill or Euro coin is worth just as much as any other; in blockchain terms, 1 Bitcoin is worth just as much as any other.

Non-fungible goods include your computer, a deed to a specific house, or a piece of art like the Mona Lisa. While certain properties of those goods can be found in other items, their combination of attributes makes them unique enough to not be entirely replaceable by something else.

NFTs are a way to bring that property of uniqueness to the digital world. For now, the most popular use cases are art, music, and in-game objects, though there are many other use cases being explored and developed.

For an example of the kind of NFT data that can be explored on Dune, check out @rantum’s NFT Project Dashboard.

And learn more about NFTs here!

DAOs

A Distributed Autonomous Organization or “DAO” is a collectively owned blockchain-based group of people working toward a shared mission.

An alternative organizational structure to traditional non-profits or corporations, DAOs blend components of cooperatives and community building with blockchain technology to explore new ways of coordinating people towards common goals ranging from “building an art collecting empire” to “creating a better, smarter currency” to “changing the pizza industry.”

For an example of the kind of DAO data that can be explored on Dune, check out @SebVenture’s MakerDAO Dashboard.

And learn more about DAOs here!

Emerging Use Cases

In addition the above, people are working on a variety of other blockchain use cases including (but not limited to):

As you become an expert Wizard, be sure to keep an eye on these - as being the first to create a dashboard for an emerging sector or project is a great way to get a lot of stars!

Relational Databases 101

In order to get access to Blockchain data using SQL queries, that data has to first be inserted into something called a Relational Database.

What are Relational Databases

Relational databases organize data into a set of interrelated tables, where each table is made up of columns and rows.

A simple example you’re probably familiar with is a spreadsheet - where each individual sheet inside of a Google Sheets or Excel file represents one “table” of columns and rows, and the sheet itself is the whole “database.”

Relational databases allow you to programmatically access a set of data tables through Queries so that you can get access to a subset of data that’s relevant to you. Queries are (almost always) built using some version of “Structured Query Language” or SQL (pronounced “S-Q-L” or “sequel”).

So rather than having to point and click, copy and paste your way through a spreadsheet to get data, you can write a piece of code to search for and connect large amounts of data quickly and repeatedly.

Data inside of a relational database is stored in Tables to allow quicker access; if you only need to search for data from 2/10 tables in the whole database, you’ll be able to do that a lot faster than if you had to search the entire database from start to finish to make sure you found all the data you need.

Each Table has a column key, essentially a label like “blockchain_id” or “block_time” that lets us know what kinds of attributes that table stores. Each column also has a related data type, like “string” (words), “integer” (numbers), “boolean” (TRUE/FALSE), etc. that lets us know what kinds of operations can be done with that data.

Every table must contain a Primary Key, which ensures the individual rows of data contain unique sets of data.

Eg a table might have a primary key like `block_number`, which is then used to ensure that the related data is organized according to which block it’s associated with.

Tables may also have foreign keys, which are data that provide a link between data tables. One of the most common foreign keys you’ll find in Dune is `block_time`.

Here’s an example of a relational database, where each box represents a table, each of the fields shown are column keys inside those tables, and PK means primary key while FKs are foreign keys.

Learn more about relational databases here!

How does blockchain data get into relational databases?

How does blockchain data get into Dune and how does Dune translate that data into tables inside of a relational database?

Let’s take a look:

How the Data Flows through Dune

A chain adds a block - see above for the basics of how that works 🙂
Node providers transmit data to Dune - Node providers build and operate node infrastructure that captures blockchain data and makes it “ingestible” by Dune via APIs.
Dune adds Raw data to SQL tables - we’ll learn more about what Raw data is below; essentially there’s some fairly esoteric data you can get from an individual block that can be used for some types of analysis e.g. “block hash” and “block number.”
Dune decodes raw data - the raw blockchain data is transformed inside of Dune into slightly more understandable data like transactions, function calls, and events triggered by certain code executing on-chain.
The Dune community casts Spells - these are custom SQL tables that are built and maintained by the community that make it easier to do analysis by automatically organizing and aggregating data so you don’t have to do it yourself!

How blockchain data is organized into tables in Dune

We have a ton of interrelated Tables in Dune that are organized in the following hierarchy:

Raw Tables

Raw tables include transactions, traces, logs, and blocks representing the rawest form of blockchain data.

For EVM-like blockchains (which most popular chains are), here’s how a blockchain action translates to a Raw table in Dune (click the links to learn more about each table type).

You submit a swap transaction to Uniswap. That transaction happens in a specific block.
Your transaction will include one or more function calls on a smart contract. When smart contracts are written well, they send event logs to the blockchain that provide information about what happened as a result of the function call. In our Uniswap example, a swap event will emit an `amount0` event log, letting us know how much of the token you’re trying to swap for another token you ended up swapping.
Your transaction will also (likely) set off traces - calls to other smart contracts; in the case of Uniswap, clicking “swap” on uniswap.org calls a function on a router smart contract, then that router contract will call on a separate liquidity pool contract that holds the tokens you want to swap in order to execute your swap.

Outside of that action/data flow, Dune also has a set of tables called creation_traces which only includes traces that cause the creation of a new smart contract.

Decoded Tables

When a new protocol is launched and a Wizard wants to query data from that contract without having to do a bunch of manual work translating Raw blockchain data, they’ll submit that contract for decoding.

Once decoded, that data ends up in Decoded Tables.

For an example let’s use Uniswap V2’s `UniswapV2ERC20.sol` contract, which would likely be a part of the swap transaction we followed above.

When a Wizard submits a contract like `UniswapV2ERC20.sol` for decoding, Dune adds data about that contract to a `contracts` table related to the blockchain that contract is on - in this case `ethereum.contracts`.

Among other things, this table includes the contract’s “Application Binary Interface” or ABI, which Dune uses to convert a contract’s individual functions and events into bytes signatures that are then stored in `.signatures` tables.

In our example, the `totalSupply()` function, which returns the total amount of pool tokens for a pair, and the `Transfer()` event are stored in `ethereum.signatures`.

Those signatures are then matched against logs in `ethereum.logs` and traces in `ethereum.traces` and used to add decoded data to `uniswap_v2_ethereum.Pair_evt_swap` and `uniswap_v2_ethereum.Pair_call_swap` - which store function call and event data for all swaps that use the UnsiswapV2ERC20.sol contract in their more human-readable form.

Ie as data like `amount0In`, which you can find in Uniswap’s documentation to understand what they mean, rather than hashed data that looks like random letters/numbers.

Every function and event in every contract that Dune has decoded gets its own table, and each of these tables has foreign keys like `tx_hash`, `block_time`, and `block_number` (prefixed with `evt_` or `call_` eg `evt_tx_hash`) in order to relate them to other tables like `ethereum.blocks`.

Spellbook

Spellbook is a repository of more advanced and abstracted data tables which remix data from Raw and Decoded tables using pre-built and automatically run SQL queries so that we can get quick and easy access to meaningful data.

One of the most popular “Spells” is dex.trades - which stores data about all the swaps that occur on Uniswap, Sushiswap, etc.

So when you initiate a swap transaction on Uniswap, that data is captured and stored in a Dune Raw table, then added to a Decoded table, then added to the `dex.trades` table inside Spellbook, so that you could then quickly and easily compare your trade to others you’ve made, others on Uniswap, others for the same tokens, and more.

Spellbook tables can be found and used in the Dune Data Explorer just like Raw and Decoded tables, but Spellbook heavily relies on community help to get Spells built.

Building Spells requires some fairly advanced SQL skills, but if you’re interested in learning more check out this guide!

Community Datasets

Community Datasets are off-chain data that Dune gets from other providers that are related and supplemental to on-chain data.

For example, if a Uniswap transaction is done by an MEV bot, we can use the Flashbots MEV dataset to learn about it by correlating data from Raw, Decoded, and Spellbook tables with transaction hashes and blockchain addresses in the Flashbots community dataset.

Walking through Dune’s tables with an example

With all that context in mind, let’s walk through an example transaction to get an idea of how all Dune’s table categories connect with each other.

And develop an appreciation for why Wizards the world over love how much easier Dune makes it to analyze blockchain data than trying to do it all on their own 🙂.

Tracing an MEV Swap through Dune

Being able to gain useful insights from Raw blockchain data requires a lot of deep knowledge about how blockchains work.

So let’s work backwards - from a piece of data in a Community Data table that’s easy to understand, back down through the levels of abstraction until we end at the Raw Block data.

In case you’re unfamiliar, “Miner Extractable Value” or “Maximal Extractable Value” aka MEV refers to the value that can be extracted by a block producer (a node that adds blocks to a blockchain) on top of the usual block rewards and transaction fees by including, excluding, and changing the order of transactions in a block.

A common way this is done is via a “Sandwich Attack”.

Let’s say you’re swapping 10000 USDC for ETH on Uniswap, and that this trade is so big that it will change Uniswaps ETH price from 1000 USDC per ETH to 1003 USDC per ETH.

Before it’s added to a block, your transaction is added to something called the “mempool”, which is filled with transactions to be included in blocks. There, block-producing nodes find transactions to include in the blocks they propose for inclusion on the blockchain. Should that block producer’s block be chosen by the consensus algorithm, they’ll receive the block production reward and transaction fees.

But, since block producers can choose which mempool transactions they include in blocks, they can use this info to include a transaction of their own that increases their profit.

In our example, a block producer can see that the price of ETH will go up after your transaction, so they can propose a block with a transaction order like:

Block producer buys ETH on Uniswap
Your Transaction
Block producer Sells ETH at a higher price than what he just bought it for in the same exact block, thanks to your transaction that moved the price.

There are a lot of implications to this, but one is that you get a worse overall price for your swap - which is why traders have an interest in what’s happening with MEV.

And there are also other actors who can execute MEV trades besides block producers - which will come into play below with the “miner fee” data we’re about to see and use.

With that in mind, let’s find an MEV swap on Dune and work backwards through all the tables that have data related to that swap.

1. Finding an MEV Swap on Uniswap

Step one is to find an MEV swap on Uniswap, which can be done with this query:

select * -- return all columns in this table

FROM flashbots.mev_summary -- a Dune community table that contains data on MEV transactions

WHERE protocols = '["uniswap_v2"]' -- Here telling Dune to present us rows where the protocol involved in the MEV transaction was Uniswap V2

Limit 1 --  here we tell Dune to just give us the first row it finds to make our example quick and easy to understand

That Query returns:

Here we can see a couple of immediately interesting things:

The MEV extractor made a gross profit of $22.19
The MEV extractor paid the block’s miner $3.36 from their profits
The MEV extractor paid a gas fee of 211832 gwei; later we can run a query to find out how much that was in USD to get the MEV extractor’s net profits.

Based on this data we might be curious to know how much the MEV extractor netted in this transaction, which we can explore using a few related tables later.

But let’s dig a bit deeper into this transaction by next learning which coins were swapped in this transaction to generate this profit.

2. Finding our MEV transaction in Spellbook

Spellbook contains a table called `dex.trades`, which pulls from a few Raw and Decoded tables to give us easy-to-understand data about trades that happen on Decentralized Exchanges.

Let’s grab the `transaction_hash` from our mev_summary table query as that’s a piece of data we can use to find the transaction on `dex.trades` (since transaction hashes are unique that’s the go-to way to isolate transactions across various data sets).

A query like this:

select *

FROM dex.trades

WHERE tx_hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d' -- The transaction hash from mev_summary

Gets us:

We already knew this was a Uniswap V2 trade, but now we can see this was a swap between $MANY and $WETH (wrapped ETH, which basically just makes ETH usable in DeFi applications).

Another interesting/relevant Spellbook table is `gas.fees`.

Using data from this table we can (later on) figure out how much profit the MEV extractor netted.

This query:

SELECT *

from gas.fees

Where block_number = 10941292 and tx_hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d'

Gets us:

Which gets us a USD cost for gas with `tx_fee_usd`.

Note: I used “block_number” in my where statement to make this query run faster (learn about the technical reasons for this here).

Next, let’s look at the smart contract function calls made by this transaction.

3. Examining our MEV transaction’s function calls

Let's see what we can learn by looking at the function calls for our MEV transaction.

We’ll use our `tx_hash` as our selector again, and with a bit of research with the Dune Data explorer and Uniswap docs we find that the `uniswap_v2_ethereum.Pair_call_swap` is the table we’re looking for.

It's the Uniswap V2 contract set that stores data from the “swap” function “call” on the “Pair” contract.

Also through testing, we can see that `tx_hash` is called `call_tx_hash` in this table so this is our query:

select *

FROM uniswap_v2_ethereum.Pair_call_swap

WHERE call_tx_hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d'

Which gets us:

On the surface, there’s not really anything obviously interesting and new in this table that we didn’t have before; the “to” column shows us a bit more about how the contract calls flow between contracts to make a Swap.

But by plugging one of the `contract_address` addresses into ETH decoder we can see some of the data about this transaction that’s available in other Dune tables like “gasPrice”...

BUT in ETH decoder that’s stored as a hashed value that’s not really human readable - which shows us just how useful Dune is.

There’s just a ton of work to do to take a raw ETH transaction and turn it into something interesting, definitely not worth wasting your time with since we’ve done the heavy lifting for you!

4. Checking out ethereum.transactions

Over to `etherum.transactions` to see what we find; here `tx_hash` is `hash` so this query:

select *

FROM ethereum.transactions

WHERE hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d'

Generates:

One of the more interesting layers this adds is the `gas_used` (which is listed in wei), which was `211832`.

Now let’s do one more look at the final, most raw data for this transaction, the block.

5. Looking at ethereum.blocks

Since our `tx_hash` is hashed inside of a block’s `hash` (which is composed of all the transaction data inside of the block), we’ll need to use `block_number to isolate the block that contains our MEV transaction like so:

select *

FROM ethereum.blocks

WHERE number = 10941292

Which returns:

Here we can see, among other things, the total gas used by this block.

If we scaled up our research, we could use that info to get the gas used by all MEV transactions and compare it to the total gas used by Ethereum transactions to see what percentage of gas paid is for MEV transactions.

… But that’s a lot of work outside of the scope of our overview, so let’s take one more step to link data from a few different layers of our MEV transaction to see what the MEV extractor’s net profit was.

6. Calculating Net Profit for our MEV transaction

Cool, finally let’s look at how we can combine data from a couple of tables to generate new data - the net profit for this MEV transaction.

To do this, we need to use a couple of more advanced SQL constructs - Common Table Expressions and Joins.

See the comments (“- -”) in the code below, essentially we create a couple of temporary tables inside our code that just include specific data from Dune’s tables and give them a special name that we can later reference in our main `SELECT` statement.

Then we `JOIN` our tables together so that we can include data from both of them in our final output.

The code:

WITH

-- Get the cost of the gas for this transaction from the gas.fees Spell

gas_fee AS (

SELECT block_time, tx_fee_usd

from gas.fees

Where block_number = 10941292 and tx_hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d'

),

-- get the miner payment and gross profit from our MEV txn

mev AS (

SELECT block_timestamp, gross_profit_usd,

miner_payment_usd

FROM flashbots.mev_summary

WHERE transaction_hash = '0x58b2d0eb68bea4f78778fa52a07d8b2c74bdfa5581f8f98eefbe71b522bf0c1d'

)

-- Get gross profit, make the miner payment and gas fee negative numbers, calculate total cost and net profit

SELECT gross_profit_usd, 

    - miner_payment_usd as miner_cost, 

    - tx_fee_usd as gas_cost,

    - tx_fee_usd - miner_payment_usd AS total_cost,

    gross_profit_usd - tx_fee_usd - miner_payment_usd AS net_profit

FROM mev

JOIN gas_fee ON mev.block_timestamp = gas_fee.block_time

The output:

And with that, we can see the MEV extractor made $15.43 in profit after paying the miner to include their transaction and the normal gas fees for executing it!

Stackin' with CF

Discussion about this post

Ready for more?