Table of Contents

0.1. problem formalization

(notation: to match the graph on the deepfunding website, all arrows are in the direction of dependency, i.e. \(P\to Q\) means \(P\) depends on \(Q\).)

We have a tree (well DAG) of depth exactly 2. - Depth 0 is a single node, ethereum — call this \(O\). - Depth 1 is 34 nodes, “seed repositories” — \(A_1,\dots A_{34}\). - Depth 2 are the ~5000 code dependencies of the seed repositories \(B_1,\dots B_{4381}\); with a total of ~15000 edges of the form \((A_i,B_j)\)1

The task of a contestant is to provide: - Level 1: weights \(w_{O\to A}\), denoted like https://github.com/a,ethereum,0.2 such that \(\sum_A w_{O\to A}=1\) [34 outputs] - Level 2: self-weights \(w_{A}\) denoted like https://github.com/a,originality,0.6 [34 outputs] - Level 3: weights \(w_{A\to B}\) denoted like https://github.com/b,https://github.com/a,0.6 such that the sum of \(A\)’s dependencies \(\sum_Bw_{A\to B}=1\) i.e. (note that they add up to 1 and not to \(1-w_{A}\)2) [~15000 outputs]

A juror is given random samples of: - pairs of edges \(((O\to A_1),(O\to A_2))\) for which they give the “relative advantage of \(A_2\) over \(A_1\) to \(O\)” \(j_{(O\to A_1),(O\to A_2)}\) (taken3 as measured in logits). - Depth 1 nodes \(A\) for which they directly estimate the originality score \(j_{A}\). - pairs of edges \(((A\to B_1),(A\to B_2))\) for which they give the “relative advantage of \(B_2\) over \(B_1\) to \(A\)” \(j_{(A\to B_1),(A\to B_2)}\).4

To give values for all of these would take, where \(n_A\) is the number of dependencies of \(A\):

\[\frac{34\cdot(34-1)}{2}+34+\sum_{A}\frac{n_A(n_A-1)}{2}\] I ran a quick script to calculate this from the sample submissions (see “helper csv” below), and it comes out to 8,353,774. In particular this means we would need 8,353,774 events if we directly implemented a distillation market. So we need something smarter — ideally something that doesn’t require more than 15000 questions for the 15000 weights.

0.2. deepfunding scoring

First let me quickly go over how Deepfunding scores the contestants.

The cost for Level-1 answers is \(\left|\log(w_{O\to A_2}/w_{O\to A_1}) - j_{(O\to A_1),(O\to A_2)}\right|^2\) , summed over all pairs \(((O\to A_1),(O\to A_2))\) for which the juror has provided an estimate.

The cost for Level-2 answers is simply \(\left|w_A-j_A\right|^2\) summed over all \(A\) for which the juror has provided an estimate.

The cost for Level-3 answers is again \(\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2\) , summed over all pairs \(((A\to B_1),(A\to B_2))\) for which the juror has provided an estimate.

BLEG: I’m not sure how these are to be weighted. The concrete instructions just say they are summed over all juror samples, but this depends on how many juror samples are taken of each category — this is probably important if we want to weight questions properly. We should ask them, or maybe we can just create another event to ask the market what it thinks they will do :)

This covers how Deepfunding will score our final model submission. As we will see, this does not necessarily straightforwardly translate to how we score miners in preparing our model.

0.3. scoring for level 2

Level-2 is straightforward — we simply create a question for each \(A\) (depth-1 node) asking “how original is \(A\)?” and score as:

\[s(w_A)=\left|w_A-j_A\right|^2\]

if \(j_A\) is elicited and 0 is otherwise. Then perform the peer score adjustment (otherwise the miners are incentivized to just not bet). ## scoring for level 1 and level 3

The basic problem for scoring Level-1 and Level-3 questions is:

  • we can only create events for each edge, not for each of the 8 million pairs of edges
  • the scoring needs to be “modular” i.e. the total score needs to be reducible to a sum of scoring functions that each depend on only one question. \(\sum\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2\) does not satisfy this property.

Three possible solutions:

0.3.1. straightforward approach

Here’s one idea: for Level-3 we create one event per \(A\) i.e. per depth-1 node (and for Level-1 we analogously create just one event for \(O\)) — to forecast this event the miner reports a value in \(w_{A\to}\in[0,1]^{n_A}\) such that \(\sum w_{A\to}=1\), i.e. weights for all of \(A\)’s dependencies — and this answer is scored in a special way (rather than just as a standard continuous random variable):

\[ s(w_{A\to})=\sum\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2 \] where the sum is over pairs \((A\to B_1,A\to B_2)\) for which the jury ultimately gives an answer.

(and again just make the peer score adjustment as per normal)

This way we have just 34 events for Level-3 and 1 event for Level-1 (in addition to the 34 events for Level-2).

Key questions aka potential problems with tiny numbers: - The answers to these questions might be very high-dimensional vectors — the lowest is 6, the highest is 2277 (full numbers in helper csv section). Could this be an issue? Can LLM-based miners even meaningfully fit so much in their context window (I mean they can but like, usefully)? - How “advanced” are the miners on the network right now? If they’re all just simple LLM callers (without any domain-specific engineering) they would probably have a problem with this task, e.g. even to produce such tiny numbers. This would also be a problem for the Shapley values solution, actually.

I mean, I can imagine making a decent miner by e.g. just asking an LLM to make relative comparisons and fitting some model to it, but if the miners on the network do not do such things, it would be no use—wait, actually that gives me an idea, see section “reconstructing from relative comparisons”. ### shapley values

Here’s the idea: we create events for each edge, and score miners based on how useful their weight estimate was to the final cost function. Fortunately calculating Shapley values here is easy, because the cost is independent of any permutations.

For each edge weight \(w_{A\to B}\) (and likewise for the 34 edges \(w_{O\to A}\)) we create an event:

Estimate the relative importance of dependency B to project A as a number between 0 and 1.

{{ description of how miners will be scored, i.e. a more practical summary of this section }}

{{ training set data }}

From all the miner estimates for \(w_{A\to B}\) we get a consensus estimate \(\hat{w}_{A\to B}\) in the usual way5 .

[IGNORE THIS. go with either the straightforward approach or the below one]

0.3.2. shapley values from relative comparisons

Ok, here’s perhaps the most promising approach: we do give miners pairs of edges. But we don’t need to give them all pairs of edges.

For node \(A\) with dependencies \(B_1\dots B_n\), we can just write questions for the \(n-1\) adjacent pairs of edges: - \(c_{(A\to B_1),(A\to B_2)}\) - \(c_{(A\to B_2),(A\to B_3)}\) - … - \(c_{(A\to B_{n-1}),(A\to B_n)}\)

How much more important is dependency B_{i+1} than B_i to A? i.e. estimate log(w_{A→B_{i+1}} / w_{A→B_i})

...

… and calculate the implied pairwise comparison for any \(c_{(A\to B_i),(A\to B_j)}\) (specifically for those edge pairs the jury scores) by simply summing:

\[ c_{(A\to B_i),(A\to B_j)} = \sum_{k=i}^{j-1}c_{(A\to B_i),(A\to B_{i+1})} \] (for \(ij\) we can just take the negative of the opposite pair) We may be tempted to say that life is solved — we can simply calculate the scores based on these “implied comparisons” … but again, maybe we can’t do that, because we want scoring functions to be modular.

Instead, for each \(j_{(A\to B_i),(A\to B_j)}\) that we receive, we can measure the relative contributions of each \(c_{(A\to B_k),(A\to B_{k+1})}\) to the cost function. We imagine that before the miners’ forecasts, all \(c_{(A\to B_k),(A\to B_{k+1})}\) were initialized to zero (i.e. a uniform prior). Then these forecasts define a coalitional game, as follows.

Definition: a single miner’s forecasts as a coalitional game. The miner’s forecast on each \(c_{(A\to B_k),(A\to B_{k+1})}\) defines a player \(i\in\{1,\dots n-1\}\) in an \(n-1\)-player coalitional game, with a value function on subset \(S\subseteq\{1,\dots n-1\}\) as follows:

\[ v(S)= -\sum_{(i,j)}\left(\left|j_{(A\to B_i),(A\to B_j)}-\sum_{k\in S\cap \{i,\dots j-1\}} c_{(A\to B_k),(A\to B_{k+1})}\right|^2-\left|j_{(A\to B_i),(A\to B_j)}\right|^2\right) \] (where the outer summation is taken over all \((i,j)\) pairs such that \(j_{(A\to B_i),(A\to B_j)}\) is in the jury sample)

(crucially, we can just take the remaining forecasts in the expression as “external facts of the world”, i.e. information known to the validator — so the scoring rule itself is modular.)

This lets us take the Shapley value in the definitional way.

\[

\begin{align*} s(c_{(A\to B_k),(A\to B_{k+1})}) &=\varphi_i(v)\\ &= \frac1{n!}\sum_R\left[v(\{j\mid j\le_R i\}\cup\{i\}) - v(\{j\mid j\le_R i\})\right] \end{align*}

\] (and then make the peer score adjustment against all other miners) ## most relevant resources

0.4. appendix: question generation script

#+beginsrc python import csv import json import pandas as pd import numpy as np

import requests import time from datetime import datetime from urllib.parse import urlparse import os from dotenv import loaddotenv from functools import lrucache

loaddotenv()

INCLUDEREPOHEURISTICS = True

if “GITHUBTOKEN” not in os.environ and INCLUDEREPOHEURISTICS: print(“Warning: GitHub API token not found. You will get rate-limited. You should set INCLUDEREPOHEURISTICS to False.”)

L1TRAINSETEXAMPLES = “” L2TRAINSETEXAMPLES = “” L3TRAINSETEXAMPLES = “”

PROMPTINTRO = “This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.”

USELONGJURORLIST = False SHORTJURORLIST = “\nThe jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.” LONGJURORLIST = “”" This is the list of publicly known jurors, if it helps you. The jurors are expected to be experts in the Ethereum ecosystem.

Juror Nominator Votes Affiliation Github ENS/Wallet
Vitalik Buterin Invitation 10 EF vbuterin vitalik.eth
Changwu Vitalik Buterin 0 imwallet    
Justin Drake Vitalik Buterin 0 EF    
Anton Cheng Changwu 5      
Nicholas Lin Changwu   imwallet    
Toni Wahrstatter Justin Drake 17 EF    
Ladislaus Justin Drake 11 EF    
DC Builder Anton Cheng 10 worldcoin    
Vectorized Anton Cheng 10      
Jason Nicholas Lin 32      
Oskar Nicholas Lin 2      
Alex Stokes Toni Wahrstatter 4 EF    
Parithosh Jayanti Toni Wahrstatter   Nethermind    
Auston Sterling Ladislaus 3      
Marius Van Der Ladislaus 10      
Mark Tyneway Vectorized   Optimism    
Georgios Vectorized   Reth    
TCZPL Jason        
Ambition Chen Jason   35    
Adrian Oskar        
Chih Cheng Liang Oskar        
Matt (lightclient) Alex Stokes        
Josh Rudolf Alex Stokes   EF    
Mikhail Kalinin Paritosh Jayanti   Nethermind    
Marek Morakzynski Paritosh Jayanti   Nethermind    
Nixo Auston Sterling 7 EF    
Logris Auston Sterling        
Hudson Jameson Marius Van Der        
Terence Tsao Marius Van Der 4 Prysm    
Jacek Terence Tsao   nimbus    
Adrian Terence Tsao   lighthouse    
Haurog Logris        
Pooja Ranjan Nixo   ethereum cat herders    
Butta Nixo        
Tim Beiko Pooja Ranjan        
Sassal0x Pooja Ranjan        
G Marek Morakzynski        
Ahmed Marek Morakzynski        
Ansgar Ahmed        
Potuz Ahmed        
Preston Potuz        
Nishant Potuz        
Felix Lange Mikhail Kalinin   go ethereum    
Piper Merriam Mikhail Kalinin 6      
Janmajaya Chih Cheng Liang        
Graham Ambition Chen        
Banri Ambition Chen        
adjust Banri        
yanyanho dapplearning TCZPL        
boge james (weimumu) TCZPL        
Kelvin Mark Tyneway        
mlsudo Mark Tyneway        
Jason Carver Piper Merriam        
Redwan Meslem Invitation   web3.js    
Richard Moore Invitation   ethers.js    
tom Invitation   Viem    
Patricio Invitation   Hardhat    
Andrew Invitation   Remix    
Bryant Eisenbach Invitation   Ape    
benny Invitation   Boa    
Ligi Invitation   Chainlist    
benny Invitation   Vyper    
Kaan Invitation   Sourcify    
Austin Griffith Invitation   Scaffold-eth (v1 + v2)    
Jaydon zengjiajun.eth   elytro    
Joi zengjiajun.eth   elytro    
Marc Invitation   web3.py    
Felipe Invitation   web3.py    
Wesley Sky   EF    

“”"

TRAINSETEXAMPLESPROMPT = ( “Here are some existing juror answers from the public ‘training set’.” )

L1PROMPTBASE = “”"

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION> {repo1} and {repo2} are dependencies of Ethereum. Estimate the ratio of importance of {repo2} to {repo1}. E.g. if {repo2} is 10 times more important then {repo1} then answer “10”; if {repo1} is 10 times more important than {repo2} then answer “0.1”. </QUESTION>

This exact question will be asked to the Deepfunding jury. Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will. To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury’s answer.

Your answer must be a positive float.

“”"

L2PROMPTBASE = “”"

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION> How much of {repo}’s value comes from itself, versus its dependencies? E.g.

1. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.\

Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

2. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work.\

Example: An Ethereum wallet.

3. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

</QUESTION>

This exact question will be asked to the Deepfunding jury. Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will. To be exact: we will score you based on the mean-squared-error between your answer and the jury’s answer.

Your answer must be a float between 0 and 1.

“”"

L3PROMPTBASE = “”"

For this question, we are looking at the {parent} repository. You will need to estimate the relative importances of two dependencies of this repository – i.e. which of their dependencies matters more for {parent}.

<QUESTION> {repo1} and {repo2} are dependencies of {parent}. Estimate the ratio of importance of {repo2} compared to {repo1} for {parent}. E.g. if {repo2} is 10 times more important then {repo1} for {parent} then answer “10”; if {repo1} is 10 times more important than {repo2} then answer “0.1”. </QUESTION>

This exact question will be asked to the Deepfunding jury. Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will. To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury’s answer.

Your answer must be a positive float.

“”"

class RepoHeuristics: “”“Fetch and cache heuristics about GitHub repositories.”“”

def _init_(self, githubtoken=None, cachettl=3600): “”" Initialize the heuristics fetcher.

Args: githubtoken: GitHub API token (optional but recommended to avoid rate limits) cachettl: Time in seconds to cache repository data “”" self.githubtoken = githubtoken or os.environ.get(’GITHUBTOKEN’) self.headers = {’Authorization’: f’token {self.githubtoken}’} if self.githubtoken else {} self.cachettl = cachettl self.repocache = {}

@staticmethod def parsegithuburl(repourl): “”“Extract owner and repo name from a GitHub URL.”“” if not repourl or ’github.com’ not in repourl: return None, None

parsed = urlparse(repourl) pathparts = parsed.path.strip(’’).split(’’)

if len(pathparts) < 2: return None, None

return pathparts[0], pathparts[1]

@lrucache(maxsize=128) def getrepoinfo(self, repourl): “”" Fetch detailed information about a repository.

Args: repourl: Full URL to the GitHub repository

Returns: Dictionary with repository information or None on failure “”" owner, repo = self.parsegithuburl(repourl) if not owner or not repo: return None

cachekey = f“{owner}/{repo}” if cachekey in self.repocache: cacheddata, timestamp = self.repocache[cachekey] if time.time() - timestamp < self.cachettl: return cacheddata

try: repoapiurl = f“https://api.github.com/repos/%7Bowner%7D/%7Brepo}” response = requests.get(repoapiurl, headers=self.headers) response.raiseforstatus() repodata = response.json()

contributorsurl = f“{repoapiurl}/contributors?perpage=5” contributorsresp = requests.get(contributorsurl, headers=self.headers) contributorsresp.raiseforstatus() contributors = contributorsresp.json()

languagesurl = f“{repoapiurl}/languages” languagesresp = requests.get(languagesurl, headers=self.headers) languagesresp.raiseforstatus() languages = languagesresp.json()

repoinfo = { ’name’: repodata.get(’name’), ’fullname’: repodata.get(’fullname’), ’description’: repodata.get(’description’), ’stars’: repodata.get(’stargazerscount’, 0), ’forks’: repodata.get(’forkscount’, 0), ’watchers’: repodata.get(’watcherscount’, 0), ’openissues’: repodata.get(’openissuescount’, 0), ’createdat’: repodata.get(’createdat’), ’updatedat’: repodata.get(’updatedat’), ’contributors’: [c.get(’login’) for c in contributors[:5]], ’languages’: languages, ’homepage’: repodata.get(’homepage’), ’license’: repodata.get(’license’, {}).get(’name’) if repodata.get(’license’) else None, ’topics’: repodata.get(’topics’, []), ’size’: repodata.get(’size’, 0) }

self.repocache[cachekey] = (repoinfo, time.time())

return repoinfo

except Exception as e: print(f“Error fetching data for {repourl}: {e}”) return None

def formatrepoheuristics(self, repourl): “”“Format repository information as a readable string.”“” repoinfo = self.getrepoinfo(repourl) if not repoinfo: return f“Repository {repourl} information not available.”

totalbytes = sum(repoinfo[’languages’].values()) if repoinfo[’languages’] else 1 languagepercentages = {lang: f“{count/totalbytes*100:.1f}%” for lang, count in repoinfo[’languages’].items()}

createddate = datetime.strptime(repoinfo[’createdat’], “%Y-%m-%dT%H:%M:%SZ”) if repoinfo[’createdat’] else None age = (datetime.now() - createddate).days // 30 if createddate else “unknown”

infostring = f“”"[{repoinfo[’fullname’]}]:

  • Description: {repoinfo[’description’] or ’No description’}
  • Stars: {repoinfo[’stars’]}, Forks: {repoinfo[’forks’]}
  • Age: {age} months, Last updated: {repoinfo[’updatedat’].split(’T’)[0] if repoinfo[’updatedat’] else ’Unknown’}
  • Main languages: {’, ’.join(f“{lang} ({pct})” for lang, pct in list(languagepercentages.items())[:3])}
  • Top contributors: {’, ’.join(repoinfo[’contributors’]) if repoinfo[’contributors’] else ’Unknown’}
  • Topics: {’, ’.join(repoinfo[’topics’]) if repoinfo[’topics’] else ’None’}

“”" return infostring

def comparerepos(self, repo1url, repo2url): “”“Compare two repositories and return a formatted comparison string.”“” repo1info = self.getrepoinfo(repo1url) repo2info = self.getrepoinfo(repo2url)

if not repo1info or not repo2info: return “Comparison information not available for one or both repositories.”

starratio = repo2info[’stars’] / max(1, repo1info[’stars’]) if starratio > 1.5: starcomparison = f“{repo2info[‘fullname’]} has {starratio:.1f}x more stars than {repo1info[‘fullname’]}” elif starratio < 0.67: starcomparison = f“{repo1info[‘fullname’]} has {(1/starratio):.1f}x more stars than {repo2info[‘fullname’]}” else: starcomparison = f“Both repositories have similar star counts ({repo1info[‘stars’]} vs {repo2info[‘stars’]})”

try: repo1date = datetime.strptime(repo1info[’updatedat’], “%Y-%m-%dT%H:%M:%SZ”) if repo1info[’updatedat’] else None repo2date = datetime.strptime(repo2info[’updatedat’], “%Y-%m-%dT%H:%M:%SZ”) if repo2info[’updatedat’] else None

if repo1date and repo2date: datediff = abs((repo1date - repo2date).days) if datediff > 30: morerecent = f“{repo1info[‘fullname’] if repo1date > repo2date else repo2info[‘fullname’]} has been updated more recently” else: morerecent = “Both repositories have been updated recently” else: morerecent = “Update information not available for comparison” except: morerecent = “Error comparing dates”

comparison = f“”"

  1. {repo1info[’fullname’]} vs 2. {repo2info[’fullname’]}
  2. Star comparison: {starcomparison}
  3. Activity: {morerecent}
  4. Age: {repo1info[’createdat’].split(’T’)[0] if repo1info[’createdat’] else ’Unknown’} vs {repo2info[’createdat’].split(’T’)[0] if repo2info[’createdat’] else ’Unknown’}
  5. Languages: • {repo1info[’fullname’]}: {’, ’.join(list(repo1info[’languages’].keys())[:3]) if repo1info[’languages’] else ’Unknown’} • {repo2info[’fullname’]}: {’, ’.join(list(repo2info[’languages’].keys())[:3]) if repo2info[’languages’] else ’Unknown’}

“”" return comparison

repoheuristics = RepoHeuristics()

def _formatreponame(repo: str): “”“Format a repository name for display.”“” return ( repo.replace(“https://”, “”) .replace(“http://”, “”) .replace(“www.”, “”) .replace(“github.com”, “”) .strip(“/”) )

def formatl1prompt(repo1, repo2): prompt = PROMPTINTRO + L1PROMPTBASE.format( repo1=formatreponame(repo1), repo2=formatreponame(repo2) )

if INCLUDEREPOHEURISTICS:

try: comparison = repoheuristics.comparerepos(repo1, repo2) prompt += “Repository Comparison:\n” + comparison except Exception as e: print(f“Warning: Could not fetch repository comparison for {repo1} and {repo2}: {e}”)

prompt += LONGJURORLIST if USELONGJURORLIST else SHORTJURORLIST prompt += ( TRAINSETEXAMPLESPROMPT + “\n\n” + L1TRAINSETEXAMPLES if L1TRAINSETEXAMPLES else “” ) return prompt

def formatl2prompt(repo): prompt = PROMPTINTRO + L2PROMPTBASE.format(repo=formatreponame(repo))

if INCLUDEREPOHEURISTICS:

try: repoinfo = repoheuristics.formatrepoheuristics(repo) prompt += “Repository Information:\n” + repoinfo except Exception as e: print(f“Warning: Could not fetch repository information for {repo}: {e}”)

prompt += LONGJURORLIST if USELONGJURORLIST else SHORTJURORLIST prompt += ( TRAINSETEXAMPLESPROMPT + “\n\n” + L2TRAINSETEXAMPLES if L2TRAINSETEXAMPLES else “” ) return prompt

def formatl3prompt(repo1, repo2, parent): prompt = PROMPTINTRO + L3PROMPTBASE.format( repo1=formatreponame(repo1), repo2=formatreponame(repo2), parent=formatreponame(parent), )

if INCLUDEREPOHEURISTICS:

try: parentinfo = repoheuristics.formatrepoheuristics(parent) prompt += “Parent Repository Information:\n” + parentinfo

comparison = repoheuristics.comparerepos(repo1, repo2) prompt += “\nDependency Comparison:\n” + comparison except Exception as e: print(f“Warning: Could not fetch repository information: {e}”)

prompt += LONGJURORLIST if USELONGJURORLIST else SHORTJURORLIST prompt += ( TRAINSETEXAMPLESPROMPT + “\n\n” + L3TRAINSETEXAMPLES if L3TRAINSETEXAMPLES else “” ) return prompt

def loadcsv(filepath): “”“Load CSV file into a pandas DataFrame.”“” try: df = pd.readcsv(filepath, skipinitialspace=True) return df except Exception as e: raise Exception(f“Error loading CSV file: {e}”)

def extractimportantrepos(df): “”“Extract the list of important repositories from rows with ethereum as parent.”“”

importantrepos = list(df[df[“parent”] == “ethereum”][“repo”]) return importantrepos

def validateimportantrepos(df, importantrepos): “”“Perform validations on the list of important repositories.”“”

if len(importantrepos) != 35: raise ValueError( f“Expected 35 important repos, but found {len(importantrepos)}” )

ethereumchildren = set(df[df[“parent”] = "ethereum"]["repo"].unique()) originality_children = set(df[df["parent"] = “originality”][“repo”].unique())

if ethereumchildren != originalitychildren: diff = ethereumchildren.symmetricdifference(originalitychildren) raise ValueError(f“Mismatch between ethereum and originality lists: {diff}”)

middlesections = df[~df[“parent”].isin([“ethereum”, “originality”])] middleparents = set(middlesections[“parent”].unique())

if not middleparents.issubset(set(importantrepos)): invalidparents = middleparents - set(importantrepos) raise ValueError( f“Found items in middle column that are not in importantrepos: {invalidparents}” )

return True

def calculatedependencyweights(df, importantrepos): “”“ Calculate total weights of dependencies for each important repo and validate they sum to 1. ”“” dependencyweights = {} dependencycounts = {}

for repo in importantrepos:

deps = df[ (df[“parent”] == repo) & ~df[“repo”].isin([“ethereum”, “originality”]) ]

totalweight = deps[“weight”].sum() dependencyweights[repo] = totalweight

dependencycounts[repo] = len(deps)

if len(deps) > 0 and not np.isclose(totalweight, 1.0, atol=1e-6): raise ValueError(f“Weights for {repo} sum to {totalweight}, not 1.0”)

return dependencyweights, dependencycounts

def calculatecombinations(n): “”“Calculate n choose 2, which is n*(n-1)/2”“” return n * (n - 1) / 2

def getrepoclassificationweights(df, importantrepos): “”“Get the ethereum and originality weights for each important repo.”“” ethereumweights = dict( zip( df[df[“parent”] = "ethereum"]["repo"], df[df["parent"] = “ethereum”][“weight”], ) )

originalityweights = dict( zip( df[df[“parent”] = "originality"]["repo"], df[df["parent"] = “originality”][“weight”], ) )

return ethereumweights, originalityweights

def compileoutputcsv( importantrepos, dependencyweights, dependencycounts, ethereumweights, originalityweights, outputfile, ): “”“Create the output CSV file with the required columns.”“” combinations = {} totalcombinations = 0

for repo in importantrepos: count = dependencycounts.get(repo, 0) comb = calculatecombinations(count) combinations[repo] = comb totalcombinations += comb

with open(outputfile, “w”, newline=“”) as csvfile: fieldnames = [ “importantrepo”, “sumdepweights”, “numdeps”, “numdepscombinations”, “originality”, “ethereum”, ] writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader() for repo in importantrepos: writer.writerow( { “importantrepo”: repo, “sumdepweights”: dependencyweights.get(repo, 0), “numdeps”: dependencycounts.get(repo, 0), “numdepscombinations”: combinations.get(repo, 0), “originality”: originalityweights.get(repo, 0), “ethereum”: ethereumweights.get(repo, 0), } )

return totalcombinations

def generatequestions(df, importantrepos, questionsfile): “”" Generate three lists of questions based on the data and write them to a JSONL file:

  1. Level 1: Questions about consecutive pairs of important repos
  2. Level 2: Questions about each important repo’s originality
  3. Level 3: Questions about consecutive pairs of dependencies for each important repo

Args: df: DataFrame containing the CSV data importantrepos: List of important repositories questionsfile: Path to the output JSONL file

Returns: Tuple of (level1count, level2count, level3count) indicating the number of questions generated “”"

with open(questionsfile, “w”) as jsonlfile:

level1count = _generateandwritelevel1questions(importantrepos, jsonlfile)

level2count = _generateandwritelevel2questions(importantrepos, jsonlfile)

level3count = _generateandwritelevel3questions( df, importantrepos, jsonlfile )

return level1count, level2count, level3count

def _generateandwritelevel1questions(importantrepos, jsonlfile): “”" Generate and write questions for consecutive pairs of important repos.

Args: importantrepos: List of important repositories jsonlfile: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

for i in range(len(importantrepos)): repo1 = importantrepos[i] repo2 = importantrepos[(i + 1) % len(importantrepos)] # Wrap around to start

content = formatl1prompt(repo1, repo2)

question = { “level”: 1, “repo1”: repo1, “repo2”: repo2, “parent”: “ethereum”, “content”: content, }

json.dump(question, jsonlfile) jsonlfile.write(“\n”) count += 1

return count

def _generateandwritelevel2questions(importantrepos, jsonlfile): “”" Generate and write questions about each important repo’s originality.

Args: importantrepos: List of important repositories jsonlfile: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

for repo in importantrepos: content = formatl2prompt(repo)

question = { “level”: 2, “repo”: repo, “parent”: “originality”, “content”: content, }

json.dump(question, jsonlfile) jsonlfile.write(“\n”) count += 1

return count

def _generateandwritelevel3questions(df, importantrepos, jsonlfile): “”" Generate and write questions about consecutive pairs of dependencies for each important repo.

Args: df: DataFrame containing the CSV data importantrepos: List of important repositories jsonlfile: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

middlerows = df[~df[“parent”].isin([“ethereum”, “originality”])]

for parentrepo in importantrepos:

dependencies = middlerows[middlerows[“parent”] == parentrepo][ “repo” ].tolist()

if len(dependencies) > 1: # Only if there are at least 2 dependencies for i in range(len(dependencies)): repo1 = dependencies[i] repo2 = dependencies[ (i + 1) % len(dependencies) ] # Wrap around to start

content = formatl3prompt(repo1=repo1, repo2=repo2, parent=parentrepo)

question = { “level”: 3, “repo1”: repo1, “repo2”: repo2, “parent”: parentrepo, “content”: content, }

json.dump(question, jsonlfile) jsonlfile.write(“\n”) count += 1

return count

def main(inputfile, outputfile, questionsfile=None): “”“Main function to process the CSV file and optionally generate questions.”“” try:

df = loadcsv(inputfile)

importantrepos = extractimportantrepos(df)

validateimportantrepos(df, importantrepos)

dependencyweights, dependencycounts = calculatedependencyweights( df, importantrepos )

ethereumweights, originalityweights = getrepoclassificationweights( df, importantrepos )

totalcombinations = compileoutputcsv( importantrepos, dependencyweights, dependencycounts, ethereumweights, originalityweights, outputfile, )

print(f“Successfully processed {inputfile} and created {outputfile}”) print( f“Found {len(importantrepos)} important repositories with valid weight distributions.” ) print( f“Total number of dependency pairs (numdepscombinations): {totalcombinations}” )

if questionsfile: level1count, level2count, level3count = generatequestions( df, importantrepos, questionsfile )

print(f“\nGenerated and wrote to {questionsfile}:”) print(f“- {level1count} level 1 questions (important repo comparisons)”) print(f“- {level2count} level 2 questions (repo originality)”) print(f“- {level3count} level 3 questions (dependency comparisons)”) print(f“Total: {level1count + level2count + level3count} questions”)

except Exception as e: print(f“Error: {e}”) raise

if name == “main”: import argparse

parser = argparse.ArgumentParser(description=“Process repo dependency CSV file.”) parser.addargument(“inputfile”, help=“Path to the input CSV file”) parser.addargument( “–outputfile”, default=“repoanalysis.csv”, help=“Path to the output CSV file (default: repoanalysis.csv)”, ) parser.addargument( “–questionsfile”, help=“Path to output JSONL file for generated questions” )

args = parser.parseargs() main(args.inputfile, args.outputfile, args.questionsfile) #+endsrc

3.1. appendix: sample questions

Level 1

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION>
web3/web3.js and prysmaticlabs/prysm are dependencies of Ethereum. Estimate the ratio of importance of prysmaticlabs/prysm to web3/web3.js.
E.g. if prysmaticlabs/prysm is 10 times more important then web3/web3.js then answer "10"; if web3/web3.js is 10 times more important than prysmaticlabs/prysm then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

Repository Comparison:

1. web3/web3.js vs 2. prysmaticlabs/prysm
- Star comparison: web3/web3.js has 5.5x more stars than prysmaticlabs/prysm
- Activity: Both repositories have been updated recently
- Age: 2014-09-30 vs 2018-01-11
- Languages: 
  • web3/web3.js: TypeScript, JavaScript, Shell
  • prysmaticlabs/prysm: Go, Starlark, Shell
 
The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

Level 2

#+beginsrc markdown This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION> How much of vyperlang/vyper’s value comes from itself, versus its dependencies? E.g.

4. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies. Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

5. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work. Example: An Ethereum wallet.

6. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

</QUESTION>

This exact question will be asked to the Deepfunding jury. Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will. To be exact: we will score you based on the mean-squared-error between your answer and the jury’s answer.

Your answer must be a float between 0 and 1.

Repository Information: [vyperlang/vyper]:

  • Description: Pythonic Smart Contract Language for the EVM
  • Stars: 4995, Forks: 837
  • Age: 101 months, Last updated: 2025-03-17
  • Main languages: Python (99.8%), Makefile (0.1%), Batchfile (0.1%)
  • Top contributors: jacqueswww, charles-cooper, iamdefinitelyahuman, fubuloubu, DavidKnott
  • Topics: ethereum, ethereum-dapp, language, python, vyper

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der. #+endsrc

Level 3

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, we are looking at the prysmaticlabs/prysm repository. You will need to estimate the relative importances of two dependencies of this repository -- i.e. which of their dependencies matters more for prysmaticlabs/prysm.

<QUESTION>
coreos/go-systemd and herumi/bls-eth-go-binary are dependencies of prysmaticlabs/prysm. Estimate the ratio of importance of herumi/bls-eth-go-binary compared to coreos/go-systemd for prysmaticlabs/prysm.
E.g. if herumi/bls-eth-go-binary is 10 times more important then coreos/go-systemd for prysmaticlabs/prysm then answer "10"; if coreos/go-systemd is 10 times more important than herumi/bls-eth-go-binary then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

Parent Repository Information:
[prysmaticlabs/prysm]:
- Description: Go implementation of Ethereum proof of stake
- Stars: 3546, Forks: 1090 
- Age: 87 months, Last updated: 2025-03-17
- Main languages: Go (93.6%), Starlark (5.5%), Shell (0.5%)
- Top contributors: terencechain, prestonvanloon, rauljordan, nisdas, rkapka
- Topics: ethereum

Dependency Comparison:

1. coreos/go-systemd vs 2. herumi/bls-eth-go-binary
- Star comparison: coreos/go-systemd has 36.6x more stars than herumi/bls-eth-go-binary
- Activity: Both repositories have been updated recently
- Age: 2013-09-13 vs 2019-10-19
- Languages: 
  • coreos/go-systemd: Go, Shell
  • herumi/bls-eth-go-binary: Go, C, C++
 
The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

6.1. appendix: helper csv

just the result of a summarization script on their sample submission file — the main importance is in the num_deps (\(n_A\) for each \(A\)) and num_deps_combinations (\(n_A(n_A-1)/2\)) columns.

important_repo,sum_dep_weights,num_deps,num_deps_combinations,originality,ethereum
https://github.com/prysmaticlabs/prysm,0.9999999999999936,245,29890.0,0.8650592053492349,0.0294117647058823
https://github.com/ethereum/fe,0.9999999999999927,301,45150.0,0.9443691638516738,0.0294117647058823
https://github.com/ethereum/remix-project,0.9999999999999305,2277,2591226.0,0.7100532015645752,0.0294117647058823
https://github.com/eth-infinitism/account-abstraction,0.999999999999991,863,371953.0,0.4286456194839025,0.0294117647058823
https://github.com/wevm/viem,0.9999999999999376,725,262450.0,0.1450951555623104,0.0294117647058823
https://github.com/nethereum/nethereum,0.9999999999999997,57,1596.0,0.4018369295764015,0.0294117647058823
https://github.com/ethers-io/ethers.js,0.9999999999999998,138,9453.0,0.0095957100544701,0.0294117647058823
https://github.com/chainsafe/lodestar,0.9999999999999116,1516,1148370.0,0.8811032731861512,0.0294117647058823
https://github.com/ethereum-lists/chains,0.9999999999999997,6,15.0,0.6236088131630412,0.0294117647058823
https://github.com/sigp/lighthouse,0.9999999999999987,464,107416.0,0.9133744057295108,0.0294117647058823
https://github.com/ethereum/py-evm,1.0,11,55.0,0.317338474800942,0.0294117647058823
https://github.com/hyperledger/besu,0.0,0,0.0,0.774361090611847,0.0294117647058823
https://github.com/erigontech/erigon,0.9999999999999813,253,31878.0,0.932572749866548,0.0294117647058823
https://github.com/vyperlang/titanoboa,0.9999999999999989,27,351.0,0.1964900800509441,0.0294117647058823
https://github.com/alloy-rs/alloy,0.9999999999999994,19,171.0,0.3690905969681286,0.0294117647058823
https://github.com/ethereumjs/ethereumjs-monorepo,0.9999999999999718,828,342378.0,0.8417883513048304,0.0294117647058823
https://github.com/foundry-rs/foundry,0.9999999999999529,482,115921.0,0.6458766968885356,0.0294117647058823
https://github.com/safe-global/safe-smart-account,0.9999999999999712,538,144453.0,0.6011871268121423,0.0294117647058823
https://github.com/consensys/teku,0.9999999999999998,137,9316.0,0.4685428282978935,0.0294117647058823
https://github.com/grandinetech/grandine,0.9999999999999785,438,95703.0,0.9124435469744914,0.0294117647058823
https://github.com/ethereum/sourcify,0.9999999999999243,908,411778.0,0.4089762481898589,0.0294117647058823
https://github.com/ethereum/solidity,1.0,3,3.0,0.1090974834405934,0.0294117647058823
https://github.com/status-im/nimbus-eth2,0.9999999999999982,104,5356.0,0.417548394392558,0.0294117647058823
https://github.com/openzeppelin/openzeppelin-contracts,0.9999999999999536,562,157641.0,0.3373326583791293,0.0294117647058823
https://github.com/ethereum/web3.py,0.9999999999999996,13,78.0,0.8039938317729571,0.0294117647058823
https://github.com/nethermindeth/nethermind,0.0,0,0.0,0.4171925064865261,0.0294117647058823
https://github.com/apeworx/ape,0.9999999999999994,38,703.0,0.3110356991327095,0.0294117647058823
https://github.com/a16z/helios,0.999999999999944,628,196878.0,0.6740220469622176,0.0294117647058823
https://github.com/paradigmxyz/reth,0.9999999999999601,470,110215.0,0.3837737278955573,0.0294117647058823
https://github.com/scaffold-eth/scaffold-eth-2,0.9999999999999284,859,368511.0,0.688816951981115,0.0294117647058823
https://github.com/vyperlang/vyper,1.0,10,45.0,0.9323242887762672,0.0294117647058823
https://github.com/hyperledger-web3j/web3j,0.0,0,0.0,0.2430654500988898,0.0294117647058823
https://github.com/ethereum/go-ethereum,0.9999999999999986,116,6670.0,0.8467503069554304,0.0294117647058823
https://github.com/nomicfoundation/hardhat,0.9999999999999869,1891,1786995.0,0.5435417307291117,0.0294117647058823

Summing num_deps_combinations:

Total number of dependency pairs (num_deps_combinations): 8352618.0

\[\frac{34\cdot(34-1)}2+34+8352618=8353774\]

Footnotes:

1

for some reason, only 4381 such nodes and 10075 edges are present in the visualization graph

2

I checked this is the case with the sample submission

3

i.e. it is the ground truth for the logits of your weights; we take the MSE of your logits against this

4

I’m not really sure of the exact format this data is stored. The competition instructions state it is stored as https://github.com/b1,https://github.com/b2,advantage_b_over_a but this doesn’t make sense as it must also include \(A\); since multiple projects \(A\) can have the same dependencies \(B_1,B_2\). In general I don’t know where to find the train and public test datasets mentioned in the competition instructions.

5

BLEG: I’m not sure how this is currently being done, so leaving it abstract — is it just the average weighted by past peer scores earned?

Author: Abhimanyu Pallavi Sudhir

Created: 2025-05-29 Thu 15:54