SQL Practice With D&D Data

5 minute read

Below are some SQL challenges I made for myself as practice, using a database I built from public D&D Beyond character sheets.

I tried to do as much of the data manipulation in SQL as possible, though I ran the queries through pandas for ease of inspection/visualization.

import sqlite3

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Connect to the database
con = sqlite3.connect("../../db/main_11-30-21.db")

The first one is pretty simple. This one counts all feats selected by all characters. The only tricky thing is that you have to do a tiny bit of entity resolution, since some feats include optional features in the name: “Resilience (Constitution)” and “Resilience (Wisdom)” should normalize to “Resilience.” I accomplished this with a subquery.

query = '''

SELECT
    feats_clean.featName,
    COUNT (feats_clean.featName) as featCount
FROM (
    SELECT
        CASE
            WHEN INSTR(fd.featName, ' -') != 0
                THEN SUBSTR(fd.featName, 1, INSTR(fd.featName, ' -') - 1)
            WHEN INSTR(fd.featName, ' (') != 0
                THEN SUBSTR(fd.featName, 1, INSTR(fd.featName, ' (') - 1)
            ELSE
                fd.featName
            END featName 
    FROM feats
        LEFT JOIN featDefinition fd ON feats.featDefinitionId = fd.Id
    WHERE 
        fd.isHomebrew = 0
) feats_clean
GROUP BY featName
ORDER BY featCount DESC
;
'''

Let’s display the top 10 most commonly picked feats. The only thing here that suprises me is that Grappler is #2. It’s widely considered bad, but apparently ~31,000 characters didn’t get the memo.

df = pd.read_sql_query(query, con)
df.iloc[0:10]

	featName	featCount
0	War Caster	32555
1	Grappler	31612
2	Tough	22710
3	Sharpshooter	16776
4	Alert	16656
5	Lucky	15860
6	Observant	15686
7	Resilient	15638
8	Magic Initiate	14927
9	Sentinel	14334

This next one calculates the total value in gold pieces of all of a character’s currencies and inventory items. It also pulls their tier of play, based on their character levels. The idea was to see how closely tier-wise wealth averages resemble the estimates provided in the Dungeon Masters’ Guide.

Total character level is not a stored value and had to be generated from class levels, which matters in the case of multiclass characters.

Total gold value is not a stored value and has to be calculated from the converted value of all currencies plus the total cost of all inventory items.

query = '''

SELECT
    summed_levels.charId,
    CASE
        WHEN summed_levels.charLevel BETWEEN 1 AND 4 THEN 1
        WHEN summed_levels.charLevel BETWEEN 5 AND 10 THEN 2
        WHEN summed_levels.charLevel BETWEEN 11 AND 16 THEN 3
        WHEN summed_levels.charLevel BETWEEN 17 AND 20 THEN 4
        ELSE 5
        END tier,
    0.01*(c.cp) + 0.1*(c.sp) + 0.5*(c.ep) + c.gp + 10*(c.pp) AS gold,
    i.total_cost AS inventory_value
FROM (
    SELECT
        charId,
        SUM(classLevel) AS charLevel
    FROM 
        classes
    GROUP BY
        charId
) summed_levels
LEFT JOIN
    currencies c ON c.charId = summed_levels.charId
LEFT JOIN
    (
        SELECT
            charId,
            SUM(iDef.cost * quantity) AS total_cost
        FROM 
            inventory
        LEFT JOIN 
            itemDefinition iDef ON iDef.id = inventory.itemDefinitionId
        GROUP BY 
            charId
    ) i ON i.charId = summed_levels.charId
;
'''

df = pd.read_sql_query(query, con)
df

	charId	tier	gold	inventory_value
0	2510	1	130.0	155.97
1	2778	1	0.0	NaN
2	2806	1	135.0	134.00
3	2903	3	0.0	NaN
4	2988	1	7.6	0.20
...	...	...	...	...
987217	52414013	1	0.0	100.41
987218	52414055	2	27.0	57.50
987219	52414131	4	10.0	162.10
987220	52414159	4	0.0	30.20
987221	52414174	1	0.0	300.00

987222 rows × 4 columns

That one was useful as a coding exercise, but I realized after browsing the data that these numbers probably aren’t very meaningful. It seems like most people aren’t tracking wealth very closely on these sheets - lots of characters have no gold and total inventory value of 0. Many of these characters are theory-crafts that haven’t actually seen play, so they don’t have items or money.

And even if we exclude these characters, the median total wealth is much lower than I’d expect (though note that this doesn’t include magic items, which don’t have set prices in 5th edition).

df['inventory_value'] = df.inventory_value.fillna(0)
df['wealth'] = df.gold + df.inventory_value
df[(df.gold != 0) & (df.inventory_value != 0)].groupby('tier').wealth.describe()

	count	mean	std	min	25%	50%	75%	max
tier
1	308188.0	3.999601e+06	2.741318e+08	0.35	103.72	154.500	282.6000	2.493229e+10
2	207866.0	1.834248e+06	1.868466e+08	0.21	125.20	240.750	1081.3000	2.493229e+10
3	44546.0	8.459603e+06	4.193392e+08	1.63	123.50	340.005	1747.4575	2.493229e+10
4	34418.0	1.531096e+10	2.830436e+12	1.02	94.50	168.245	1141.8000	5.251050e+14

These numbers are… suspicious. Median wealth at tier 4 is only 168 gp? I inspected some of the tier 4 characters and it looks like they are either theory-crafts, without items, or character sheets that only list magic items (which have no price). Ah well - limitation of the data.

But the outliers are really something. Who is this character with over 6 billion gp in wealth?

df[df.wealth == df.wealth.max()]

	charId	tier	gold	inventory_value	wealth
974732	51974698	4	2.493229e+10	5.250801e+14	5.251050e+14

Alright, let’s end with something a little more interesting. This query gets the highest magic item rarity of each character. It requires multiple nested queries since, first of all, rarity has to be converted to a numeric value in order to get the “highest,” and because, as before, character level has to be inferred.

query = '''


SELECT
    summed_levels.charId,
    summed_levels.charLevel,
    char_max_rarity.max_rarity
FROM (
    SELECT
        charId,
        SUM(classLevel) AS charLevel
    FROM 
        classes
    GROUP BY
        charId
) summed_levels
LEFT JOIN
    (
        SELECT
            charId,
            MAX(rarity) AS max_rarity
        FROM
            (SELECT
                charId,
                CASE
                    WHEN iDef.rarity = 'Uncommon' THEN 1
                    WHEN iDef.rarity = 'Rare' THEN 2
                    WHEN iDef.rarity = 'Very Rare' THEN 3
                    WHEN iDef.rarity = 'Legendary' THEN 4
                    WHEN iDef.rarity = 'Artifact' THEN 5
                    ELSE 0
                    END rarity     
            FROM 
                inventory
            LEFT JOIN 
                itemDefinition iDef ON iDef.id = inventory.itemDefinitionId
            ) inv_numeric_rarity 
        GROUP BY
            charId
    ) char_max_rarity ON char_max_rarity.charId = summed_levels.charId
WHERE 
    char_max_rarity.max_rarity > 0
AND
    summed_levels.charLevel BETWEEN 1 AND 20
;

'''

df = pd.read_sql_query(query, con)
df

	charId	charLevel	max_rarity
0	4534	20	4
1	4565	20	5
2	4668	3	1
3	4952	3	1
4	5505	1	4
...	...	...	...
271635	52413585	20	3
271636	52413597	20	4
271637	52414003	9	1
271638	52414131	20	4
271639	52414159	18	2

271640 rows × 3 columns

Let’s make a box plot and see what the distribution by tier looks like.

import matplotlib.ticker as ticker

sns.set_style("darkgrid")
dims = (10.7, 7.27)
fig, ax = plt.subplots(figsize=dims)
ax = sns.boxplot(
    x='max_rarity', 
    y='charLevel', 
    data=df,
    ax=ax
)
ax.set_xticklabels([
    'Uncommon',
    'Rare',
    'Very Rare',
    'Legendary',
    'Artifact'
], fontsize=12)
ax.set_ylabel('character level', fontsize=12)
ax.set_xlabel('')
ax.set_title('Highest Magic Item Rarity by Character Level', fontsize=14)
ax.yaxis.set_major_locator(ticker.MultipleLocator(2))
ax.yaxis.set_major_formatter(ticker.ScalarFormatter())

png

Now we’re talking. A couple interesting things to note:

The spread of the distribution increases as you move up the rarity ladder. Most characters with only an uncommon item are between levels 3 and 7, but characters who have at least a legendary- or artifact-rarity item are all sorts of levels.
Artifact rarity actually has a lower median character level than legendary. I’m assuming this is because DMs like to drop an artifact into play a little earlier for plot reasons: call it the “one ring” effect (this theory is supported by the fact that it disappears if I remove homebrew items)

If you enjoyed this, consider checking out my other posts. I plan to do more analysis with this data in the near future.

Twitter Facebook LinkedIn

Jordan Pruett

SQL Practice With D&D Data

You May Also Enjoy

AutoGPT Vulnerabilities

Topic Modeling MTG Decklists

D&D Race/Class Combos in ~1 Million Character Sheets

‘Journals Like You Also Reviewed…’