‎

0.1. problem formalization
0.2. deepfunding scoring
0.3. scoring for level 2
- 0.3.1. straightforward approach
- 0.3.2. shapley values from relative comparisons
0.4. appendix: question generation script

1. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.\

2. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work.\

3. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

3.1. appendix: sample questions

4. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies. Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

5. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work. Example: An Ethereum wallet.

6. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

6.1. appendix: helper csv

0.1. problem formalization

(notation: to match the graph on the deepfunding website, all arrows are in the direction of dependency, i.e. \(P\to Q\) means \(P\) depends on \(Q\).)

We have a tree (well DAG) of depth exactly 2. - Depth 0 is a single node, ethereum — call this \(O\). - Depth 1 is 34 nodes, “seed repositories” — \(A_1,\dots A_{34}\). - Depth 2 are the ~5000 code dependencies of the seed repositories \(B_1,\dots B_{4381}\); with a total of ~15000 edges of the form \((A_i,B_j)\)¹

The task of a contestant is to provide: - Level 1: weights \(w_{O\to A}\), denoted like https://github.com/a,ethereum,0.2 such that \(\sum_A w_{O\to A}=1\) [34 outputs] - Level 2: self-weights \(w_{A}\) denoted like https://github.com/a,originality,0.6 [34 outputs] - Level 3: weights \(w_{A\to B}\) denoted like https://github.com/b,https://github.com/a,0.6 such that the sum of \(A\)’s dependencies \(\sum_Bw_{A\to B}=1\) i.e. (note that they add up to 1 and not to \(1-w_{A}\)²) [~15000 outputs]

A juror is given random samples of: - pairs of edges \(((O\to A_1),(O\to A_2))\) for which they give the “relative advantage of \(A_2\) over \(A_1\) to \(O\)” \(j_{(O\to A_1),(O\to A_2)}\) (taken³ as measured in logits). - Depth 1 nodes \(A\) for which they directly estimate the originality score \(j_{A}\). - pairs of edges \(((A\to B_1),(A\to B_2))\) for which they give the “relative advantage of \(B_2\) over \(B_1\) to \(A\)” \(j_{(A\to B_1),(A\to B_2)}\).⁴

To give values for all of these would take, where \(n_A\) is the number of dependencies of \(A\):

\[\frac{34\cdot(34-1)}{2}+34+\sum_{A}\frac{n_A(n_A-1)}{2}\] I ran a quick script to calculate this from the sample submissions (see “helper csv” below), and it comes out to 8,353,774. In particular this means we would need 8,353,774 events if we directly implemented a distillation market. So we need something smarter — ideally something that doesn’t require more than 15000 questions for the 15000 weights.

0.2. deepfunding scoring

First let me quickly go over how Deepfunding scores the contestants.

The cost for Level-1 answers is \(\left|\log(w_{O\to A_2}/w_{O\to A_1}) - j_{(O\to A_1),(O\to A_2)}\right|^2\) , summed over all pairs \(((O\to A_1),(O\to A_2))\) for which the juror has provided an estimate.

The cost for Level-2 answers is simply \(\left|w_A-j_A\right|^2\) summed over all \(A\) for which the juror has provided an estimate.

The cost for Level-3 answers is again \(\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2\) , summed over all pairs \(((A\to B_1),(A\to B_2))\) for which the juror has provided an estimate.

BLEG: I’m not sure how these are to be weighted. The concrete instructions just say they are summed over all juror samples, but this depends on how many juror samples are taken of each category — this is probably important if we want to weight questions properly. We should ask them, or maybe we can just create another event to ask the market what it thinks they will do :)

This covers how Deepfunding will score our final model submission. As we will see, this does not necessarily straightforwardly translate to how we score miners in preparing our model.

0.3. scoring for level 2

Level-2 is straightforward — we simply create a question for each \(A\) (depth-1 node) asking “how original is \(A\)?” and score as:

\[s(w_A)=\left|w_A-j_A\right|^2\]

if \(j_A\) is elicited and 0 is otherwise. Then perform the peer score adjustment (otherwise the miners are incentivized to just not bet). ## scoring for level 1 and level 3

The basic problem for scoring Level-1 and Level-3 questions is:

we can only create events for each edge, not for each of the 8 million pairs of edges
the scoring needs to be “modular” i.e. the total score needs to be reducible to a sum of scoring functions that each depend on only one question. \(\sum\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2\) does not satisfy this property.

Three possible solutions:

0.3.1. straightforward approach

Here’s one idea: for Level-3 we create one event per \(A\) i.e. per depth-1 node (and for Level-1 we analogously create just one event for \(O\)) — to forecast this event the miner reports a value in \(w_{A\to}\in[0,1]^{n_A}\) such that \(\sum w_{A\to}=1\), i.e. weights for all of \(A\)’s dependencies — and this answer is scored in a special way (rather than just as a standard continuous random variable):

\[ s(w_{A\to})=\sum\left|\log(w_{A\to B_2}/w_{A\to B_1}) - j_{(A\to B_1),(A\to B_2)}\right|^2 \] where the sum is over pairs \((A\to B_1,A\to B_2)\) for which the jury ultimately gives an answer.

(and again just make the peer score adjustment as per normal)

This way we have just 34 events for Level-3 and 1 event for Level-1 (in addition to the 34 events for Level-2).

Key questions aka potential problems with tiny numbers: - The answers to these questions might be very high-dimensional vectors — the lowest is 6, the highest is 2277 (full numbers in helper csv section). Could this be an issue? Can LLM-based miners even meaningfully fit so much in their context window (I mean they can but like, usefully)? - How “advanced” are the miners on the network right now? If they’re all just simple LLM callers (without any domain-specific engineering) they would probably have a problem with this task, e.g. even to produce such tiny numbers. This would also be a problem for the Shapley values solution, actually.

I mean, I can imagine making a decent miner by e.g. just asking an LLM to make relative comparisons and fitting some model to it, but if the miners on the network do not do such things, it would be no use—wait, actually that gives me an idea, see section “reconstructing from relative comparisons”. ### shapley values

Here’s the idea: we create events for each edge, and score miners based on how useful their weight estimate was to the final cost function. Fortunately calculating Shapley values here is easy, because the cost is independent of any permutations.

For each edge weight \(w_{A\to B}\) (and likewise for the 34 edges \(w_{O\to A}\)) we create an event:

Estimate the relative importance of dependency B to project A as a number between 0 and 1.

{{ description of how miners will be scored, i.e. a more practical summary of this section }}

{{ training set data }}

From all the miner estimates for \(w_{A\to B}\) we get a consensus estimate \(\hat{w}_{A\to B}\) in the usual way⁵ .

[IGNORE THIS. go with either the straightforward approach or the below one]

0.3.2. shapley values from relative comparisons

Ok, here’s perhaps the most promising approach: we do give miners pairs of edges. But we don’t need to give them all pairs of edges.

For node \(A\) with dependencies \(B_1\dots B_n\), we can just write questions for the \(n-1\) adjacent pairs of edges: - \(c_{(A\to B_1),(A\to B_2)}\) - \(c_{(A\to B_2),(A\to B_3)}\) - … - \(c_{(A\to B_{n-1}),(A\to B_n)}\)

How much more important is dependency B_{i+1} than B_i to A? i.e. estimate log(w_{A→B_{i+1}} / w_{A→B_i})

...

… and calculate the implied pairwise comparison for any \(c_{(A\to B_i),(A\to B_j)}\) (specifically for those edge pairs the jury scores) by simply summing:

\[ c_{(A\to B_i),(A\to B_j)} = \sum_{k=i}^{j-1}c_{(A\to B_i),(A\to B_{i+1})} \] (for \(ij\) we can just take the negative of the opposite pair) We may be tempted to say that life is solved — we can simply calculate the scores based on these “implied comparisons” … but again, maybe we can’t do that, because we want scoring functions to be modular.

Instead, for each \(j_{(A\to B_i),(A\to B_j)}\) that we receive, we can measure the relative contributions of each \(c_{(A\to B_k),(A\to B_{k+1})}\) to the cost function. We imagine that before the miners’ forecasts, all \(c_{(A\to B_k),(A\to B_{k+1})}\) were initialized to zero (i.e. a uniform prior). Then these forecasts define a coalitional game, as follows.

Definition: a single miner’s forecasts as a coalitional game. The miner’s forecast on each \(c_{(A\to B_k),(A\to B_{k+1})}\) defines a player \(i\in\{1,\dots n-1\}\) in an \(n-1\)-player coalitional game, with a value function on subset \(S\subseteq\{1,\dots n-1\}\) as follows:

\[ v(S)= -\sum_{(i,j)}\left(\left|j_{(A\to B_i),(A\to B_j)}-\sum_{k\in S\cap \{i,\dots j-1\}} c_{(A\to B_k),(A\to B_{k+1})}\right|^2-\left|j_{(A\to B_i),(A\to B_j)}\right|^2\right) \] (where the outer summation is taken over all \((i,j)\) pairs such that \(j_{(A\to B_i),(A\to B_j)}\) is in the jury sample)

(crucially, we can just take the remaining forecasts in the expression as “external facts of the world”, i.e. information known to the validator — so the scoring rule itself is modular.)

This lets us take the Shapley value in the definitional way.

\begin{align*} s(c_{(A\to B_k),(A\to B_{k+1})}) &=\varphi_i(v)\\ &= \frac1{n!}\sum_R\left[v(\{j\mid j\le_R i\}\cup\{i\}) - v(\{j\mid j\le_R i\})\right] \end{align*}

\] (and then make the peer score adjustment against all other miners) ## most relevant resources

Concrete instructions including scoring rule, sample submission file etc: https://cryptopond.xyz/modelfactory/detail/2564617?tab=0
Deepfunding scoring mechanism: https://github.com/deepfunding/scoring
Eval Primer #2: Build your own model. https://www.youtube.com/watch?v=JUiwrcMASXY — primer for the previous Deepfunding Mini competition
“d/acc: one year later” from Vitalik’s blog: https://vitalik.eth.limo/general/2025/01/05/dacc2.html#5

0.4. appendix: question generation script

#+begin_src python import csv import json import pandas as pd import numpy as np

import requests import time from datetime import datetime from urllib.parse import urlparse import os from dotenv import load_dotenv from functools import lru_cache

load_dotenv()

INCLUDE_REPO_HEURISTICS = True

if “GITHUB_TOKEN” not in os.environ and INCLUDE_REPO_HEURISTICS: print(“Warning: GitHub API token not found. You will get rate-limited. You should set INCLUDE_REPO_HEURISTICS to False.”)

L1_TRAIN_SET_EXAMPLES = “” L2_TRAIN_SET_EXAMPLES = “” L3_TRAIN_SET_EXAMPLES = “”

PROMPT_INTRO = “This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.”

USE_LONG_JUROR_LIST = False SHORT_JUROR_LIST = “\nThe jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.” LONG_JUROR_LIST = “”" This is the list of publicly known jurors, if it helps you. The jurors are expected to be experts in the Ethereum ecosystem.

Juror	Nominator	Votes	Affiliation	Github	ENS/Wallet
Vitalik Buterin	Invitation	10	EF	vbuterin	vitalik.eth
Changwu	Vitalik Buterin	0	imwallet
Justin Drake	Vitalik Buterin	0	EF
Anton Cheng	Changwu	5
Nicholas Lin	Changwu		imwallet
Toni Wahrstatter	Justin Drake	17	EF
Ladislaus	Justin Drake	11	EF
DC Builder	Anton Cheng	10	worldcoin
Vectorized	Anton Cheng	10
Jason	Nicholas Lin	32
Oskar	Nicholas Lin	2
Alex Stokes	Toni Wahrstatter	4	EF
Parithosh Jayanti	Toni Wahrstatter		Nethermind
Auston Sterling	Ladislaus	3
Marius Van Der	Ladislaus	10
Mark Tyneway	Vectorized		Optimism
Georgios	Vectorized		Reth
TCZPL	Jason
Ambition Chen	Jason		35
Adrian	Oskar
Chih Cheng Liang	Oskar
Matt (lightclient)	Alex Stokes
Josh Rudolf	Alex Stokes		EF
Mikhail Kalinin	Paritosh Jayanti		Nethermind
Marek Morakzynski	Paritosh Jayanti		Nethermind
Nixo	Auston Sterling	7	EF
Logris	Auston Sterling
Hudson Jameson	Marius Van Der
Terence Tsao	Marius Van Der	4	Prysm
Jacek	Terence Tsao		nimbus
Adrian	Terence Tsao		lighthouse
Haurog	Logris
Pooja Ranjan	Nixo		ethereum cat herders
Butta	Nixo
Tim Beiko	Pooja Ranjan
Sassal0x	Pooja Ranjan
G	Marek Morakzynski
Ahmed	Marek Morakzynski
Ansgar	Ahmed
Potuz	Ahmed
Preston	Potuz
Nishant	Potuz
Felix Lange	Mikhail Kalinin		go ethereum
Piper Merriam	Mikhail Kalinin	6
Janmajaya	Chih Cheng Liang
Graham	Ambition Chen
Banri	Ambition Chen
adjust	Banri
yanyanho dapplearning	TCZPL
boge james (weimumu)	TCZPL
Kelvin	Mark Tyneway
ml_sudo	Mark Tyneway
Jason Carver	Piper Merriam
Redwan Meslem	Invitation		web3.js
Richard Moore	Invitation		ethers.js
tom	Invitation		Viem
Patricio	Invitation		Hardhat
Andrew	Invitation		Remix
Bryant Eisenbach	Invitation		Ape
benny	Invitation		Boa
Ligi	Invitation		Chainlist
benny	Invitation		Vyper
Kaan	Invitation		Sourcify
Austin Griffith	Invitation		Scaffold-eth (v1 + v2)
Jaydon	zengjiajun.eth		elytro
Joi	zengjiajun.eth		elytro
Marc	Invitation		web3.py
Felipe	Invitation		web3.py
Wesley	Sky		EF

“”"

TRAIN_SET_EXAMPLES_PROMPT = ( “Here are some existing juror answers from the public ‘training set’.” )

L1_PROMPT_BASE = “”"

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION> {repo1} and {repo2} are dependencies of Ethereum. Estimate the ratio of importance of {repo2} to {repo1}. E.g. if {repo2} is 10 times more important then {repo1} then answer “10”; if {repo1} is 10 times more important than {repo2} then answer “0.1”. </QUESTION>

This exact question will be asked to the Deepfunding jury. Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will. To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury’s answer.

Your answer must be a positive float.

“”"

L2_PROMPT_BASE = “”"

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION> How much of {repo}’s value comes from itself, versus its dependencies? E.g.

1. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.\

Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

2. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work.\

Example: An Ethereum wallet.

3. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

</QUESTION>

Your answer must be a float between 0 and 1.

“”"

L3_PROMPT_BASE = “”"

For this question, we are looking at the {parent} repository. You will need to estimate the relative importances of two dependencies of this repository – i.e. which of their dependencies matters more for {parent}.

<QUESTION> {repo1} and {repo2} are dependencies of {parent}. Estimate the ratio of importance of {repo2} compared to {repo1} for {parent}. E.g. if {repo2} is 10 times more important then {repo1} for {parent} then answer “10”; if {repo1} is 10 times more important than {repo2} then answer “0.1”. </QUESTION>

Your answer must be a positive float.

“”"

class RepoHeuristics: “”“Fetch and cache heuristics about GitHub repositories.”“”

def __init__{(self, github_token=None, cache_ttl=3600)}: “”" Initialize the heuristics fetcher.

Args: github_token: GitHub API token (optional but recommended to avoid rate limits) cache_ttl: Time in seconds to cache repository data “”" self.github_token = github_token or os.environ.get(’GITHUB_TOKEN’) self.headers = {’Authorization’: f’token {self.github_token}’} if self.github_token else {} self.cache_ttl = cache_ttl self._repo_cache = {}

@staticmethod def parse_github_url(repo_url): “”“Extract owner and repo name from a GitHub URL.”“” if not repo_url or ’github.com’ not in repo_url: return None, None

parsed = urlparse(repo_url) path_parts = parsed.path.strip(’’).split(’’)

if len(path_parts) < 2: return None, None

return path_parts[0], path_parts[1]

@lru_cache(maxsize=128) def get_repo_info(self, repo_url): “”" Fetch detailed information about a repository.

Args: repo_url: Full URL to the GitHub repository

Returns: Dictionary with repository information or None on failure “”" owner, repo = self.parse_github_url(repo_url) if not owner or not repo: return None

cache_key = f“{owner}/{repo}” if cache_key in self._repo_cache: cached_data, timestamp = self._repo_cache[cache_key] if time.time() - timestamp < self.cache_ttl: return cached_data

try: repo_api_url = f“https://api.github.com/repos/%7Bowner%7D/%7Brepo}” response = requests.get(repo_api_url, headers=self.headers) response.raise_for_status() repo_data = response.json()

contributors_url = f“{repo_api_url}/contributors?per_page=5” contributors_resp = requests.get(contributors_url, headers=self.headers) contributors_resp.raise_for_status() contributors = contributors_resp.json()

languages_url = f“{repo_api_url}/languages” languages_resp = requests.get(languages_url, headers=self.headers) languages_resp.raise_for_status() languages = languages_resp.json()

repo_info = { ’name’: repo_data.get(’name’), ’full_name’: repo_data.get(’full_name’), ’description’: repo_data.get(’description’), ’stars’: repo_data.get(’stargazers_count’, 0), ’forks’: repo_data.get(’forks_count’, 0), ’watchers’: repo_data.get(’watchers_count’, 0), ’open_issues’: repo_data.get(’open_issues_count’, 0), ’created_at’: repo_data.get(’created_at’), ’updated_at’: repo_data.get(’updated_at’), ’contributors’: [c.get(’login’) for c in contributors[:5]], ’languages’: languages, ’homepage’: repo_data.get(’homepage’), ’license’: repo_data.get(’license’, {}).get(’name’) if repo_data.get(’license’) else None, ’topics’: repo_data.get(’topics’, []), ’size’: repo_data.get(’size’, 0) }

self._repo_cache[cache_key] = (repo_info, time.time())

return repo_info

except Exception as e: print(f“Error fetching data for {repo_url}: {e}”) return None

def format_repo_heuristics(self, repo_url): “”“Format repository information as a readable string.”“” repo_info = self.get_repo_info(repo_url) if not repo_info: return f“Repository {repo_url} information not available.”

total_bytes = sum(repo_info[’languages’].values()) if repo_info[’languages’] else 1 language_percentages = {lang: f“{count/total_bytes*100:.1f}%” for lang, count in repo_info[’languages’].items()}

created_date = datetime.strptime(repo_info[’created_at’], “%Y-%m-%dT%H:%M:%SZ”) if repo_info[’created_at’] else None age = (datetime.now() - created_date).days // 30 if created_date else “unknown”

info_string = f“”"[{repo_info[’full_name’]}]:

Description: {repo_info[’description’] or ’No description’}
Stars: {repo_info[’stars’]}, Forks: {repo_info[’forks’]}
Age: {age} months, Last updated: {repo_info[’updated_at’].split(’T’)[0] if repo_info[’updated_at’] else ’Unknown’}
Main languages: {’, ’.join(f“{lang} ({pct})” for lang, pct in list(language_{percentages.items}())[:3])}
Top contributors: {’, ’.join(repo_info[’contributors’]) if repo_info[’contributors’] else ’Unknown’}
Topics: {’, ’.join(repo_info[’topics’]) if repo_info[’topics’] else ’None’}

“”" return info_string

def compare_repos(self, repo1_url, repo2_url): “”“Compare two repositories and return a formatted comparison string.”“” repo1_info = self.get_repo_info(repo1_url) repo2_info = self.get_repo_info(repo2_url)

if not repo1_info or not repo2_info: return “Comparison information not available for one or both repositories.”

star_ratio = repo2_info[’stars’] / max(1, repo1_info[’stars’]) if star_ratio > 1.5: star_comparison = f“{repo2_info[‘full_name’]} has {star_ratio:.1f}x more stars than {repo1_info[‘full_name’]}” elif star_ratio < 0.67: star_comparison = f“{repo1_info[‘full_name’]} has {(1/star_ratio):.1f}x more stars than {repo2_info[‘full_name’]}” else: star_comparison = f“Both repositories have similar star counts ({repo1_info[‘stars’]} vs {repo2_info[‘stars’]})”

try: repo1_date = datetime.strptime(repo1_info[’updated_at’], “%Y-%m-%dT%H:%M:%SZ”) if repo1_info[’updated_at’] else None repo2_date = datetime.strptime(repo2_info[’updated_at’], “%Y-%m-%dT%H:%M:%SZ”) if repo2_info[’updated_at’] else None

if repo1_date and repo2_date: date_diff = abs((repo1_date - repo2_date).days) if date_diff > 30: more_recent = f“{repo1_info[‘full_name’] if repo1_date > repo2_date else repo2_info[‘full_name’]} has been updated more recently” else: more_recent = “Both repositories have been updated recently” else: more_recent = “Update information not available for comparison” except: more_recent = “Error comparing dates”

comparison = f“”"

{repo1_info[’full_name’]} vs 2. {repo2_info[’full_name’]}
Star comparison: {star_comparison}
Activity: {more_recent}
Age: {repo1_info[’created_at’].split(’T’)[0] if repo1_info[’created_at’] else ’Unknown’} vs {repo2_info[’created_at’].split(’T’)[0] if repo2_info[’created_at’] else ’Unknown’}
Languages: • {repo1_info[’full_name’]}: {’, ’.join(list(repo1_info[’languages’].keys())[:3]) if repo1_info[’languages’] else ’Unknown’} • {repo2_info[’full_name’]}: {’, ’.join(list(repo2_info[’languages’].keys())[:3]) if repo2_info[’languages’] else ’Unknown’}

“”" return comparison

repo_heuristics = RepoHeuristics()

def _format_repo_name(repo: str): “”“Format a repository name for display.”“” return ( repo.replace(“https://”, “”) .replace(“http://”, “”) .replace(“www.”, “”) .replace(“github.com”, “”) .strip(“/”) )

def format_l1_prompt(repo1, repo2): prompt = PROMPT_INTRO + L1_PROMPT_BASE.format( repo1=_format_repo_name(repo1), repo2=_format_repo_name(repo2) )

if INCLUDE_REPO_HEURISTICS:

try: comparison = repo_{heuristics.compare}_repos(repo1, repo2) prompt += “Repository Comparison:\n” + comparison except Exception as e: print(f“Warning: Could not fetch repository comparison for {repo1} and {repo2}: {e}”)

prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST prompt += ( TRAIN_SET_EXAMPLES_PROMPT + “\n\n” + L1_TRAIN_SET_EXAMPLES if L1_TRAIN_SET_EXAMPLES else “” ) return prompt

def format_l2_prompt(repo): prompt = PROMPT_INTRO + L2_PROMPT_BASE.format(repo=_format_repo_name(repo))

if INCLUDE_REPO_HEURISTICS:

try: repo_info = repo_{heuristics.format}_repo_heuristics(repo) prompt += “Repository Information:\n” + repo_info except Exception as e: print(f“Warning: Could not fetch repository information for {repo}: {e}”)

prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST prompt += ( TRAIN_SET_EXAMPLES_PROMPT + “\n\n” + L2_TRAIN_SET_EXAMPLES if L2_TRAIN_SET_EXAMPLES else “” ) return prompt

def format_l3_prompt(repo1, repo2, parent): prompt = PROMPT_INTRO + L3_PROMPT_BASE.format( repo1=_format_repo_name(repo1), repo2=_format_repo_name(repo2), parent=_format_repo_name(parent), )

if INCLUDE_REPO_HEURISTICS:

try: parent_info = repo_{heuristics.format}_repo_heuristics(parent) prompt += “Parent Repository Information:\n” + parent_info

comparison = repo_{heuristics.compare}_repos(repo1, repo2) prompt += “\nDependency Comparison:\n” + comparison except Exception as e: print(f“Warning: Could not fetch repository information: {e}”)

prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST prompt += ( TRAIN_SET_EXAMPLES_PROMPT + “\n\n” + L3_TRAIN_SET_EXAMPLES if L3_TRAIN_SET_EXAMPLES else “” ) return prompt

def load_csv(file_path): “”“Load CSV file into a pandas DataFrame.”“” try: df = pd.read_csv(file_path, skipinitialspace=True) return df except Exception as e: raise Exception(f“Error loading CSV file: {e}”)

def extract_important_repos(df): “”“Extract the list of important repositories from rows with ethereum as parent.”“”

important_repos = list(df[df[“parent”] == “ethereum”][“repo”]) return important_repos

def validate_important_repos(df, important_repos): “”“Perform validations on the list of important repositories.”“”

if len(important_repos) != 35: raise ValueError( f“Expected 35 important repos, but found {len(important_repos)}” )

ethereum_children = set(df[df[“parent”] = "ethereum"]["repo"].unique()) originality_children = set(df[df["parent"] = “originality”][“repo”].unique())

if ethereum_children != originality_children: diff = ethereum_{children.symmetric}_difference(originality_children) raise ValueError(f“Mismatch between ethereum and originality lists: {diff}”)

middle_sections = df[~df[“parent”].isin([“ethereum”, “originality”])] middle_parents = set(middle_sections[“parent”].unique())

if not middle_{parents.issubset}(set(important_repos)): invalid_parents = middle_parents - set(important_repos) raise ValueError( f“Found items in middle column that are not in important_repos: {invalid_parents}” )

return True

def calculate_dependency_weights(df, important_repos): “”“ Calculate total weights of dependencies for each important repo and validate they sum to 1. ”“” dependency_weights = {} dependency_counts = {}

for repo in important_repos:

deps = df[ (df[“parent”] == repo) & ~df[“repo”].isin([“ethereum”, “originality”]) ]

total_weight = deps[“weight”].sum() dependency_weights[repo] = total_weight

dependency_counts[repo] = len(deps)

if len(deps) > 0 and not np.isclose(total_weight, 1.0, atol=1e-6): raise ValueError(f“Weights for {repo} sum to {total_weight}, not 1.0”)

return dependency_weights, dependency_counts

def calculate_combinations(n): “”“Calculate n choose 2, which is n*(n-1)/2”“” return n * (n - 1) / 2

def get_repo_{classification}_weights(df, important_repos): “”“Get the ethereum and originality weights for each important repo.”“” ethereum_weights = dict( zip( df[df[“parent”] = "ethereum"]["repo"], df[df["parent"] = “ethereum”][“weight”], ) )

originality_weights = dict( zip( df[df[“parent”] = "originality"]["repo"], df[df["parent"] = “originality”][“weight”], ) )

return ethereum_weights, originality_weights

def compile_output_csv( important_repos, dependency_weights, dependency_counts, ethereum_weights, originality_weights, output_file, ): “”“Create the output CSV file with the required columns.”“” combinations = {} total_combinations = 0

for repo in important_repos: count = dependency_counts.get(repo, 0) comb = calculate_combinations(count) combinations[repo] = comb total_combinations += comb

with open(output_file, “w”, newline=“”) as csvfile: fieldnames = [ “important_repo”, “sum_dep_weights”, “num_deps”, “num_deps_combinations”, “originality”, “ethereum”, ] writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader() for repo in important_repos: writer.writerow( { “important_repo”: repo, “sum_dep_weights”: dependency_weights.get(repo, 0), “num_deps”: dependency_counts.get(repo, 0), “num_deps_combinations”: combinations.get(repo, 0), “originality”: originality_weights.get(repo, 0), “ethereum”: ethereum_weights.get(repo, 0), } )

return total_combinations

def generate_questions(df, important_repos, questions_file): “”" Generate three lists of questions based on the data and write them to a JSONL file:

Level 1: Questions about consecutive pairs of important repos
Level 2: Questions about each important repo’s originality
Level 3: Questions about consecutive pairs of dependencies for each important repo

Args: df: DataFrame containing the CSV data important_repos: List of important repositories questions_file: Path to the output JSONL file

Returns: Tuple of (level1_count, level2_count, level3_count) indicating the number of questions generated “”"

with open(questions_file, “w”) as jsonl_file:

level1_count = _generate_and_write_level1_questions(important_repos, jsonl_file)

level2_count = _generate_and_write_level2_questions(important_repos, jsonl_file)

level3_count = _generate_and_write_level3_questions( df, important_repos, jsonl_file )

return level1_count, level2_count, level3_count

def _generate_and_write_level1_questions(important_repos, jsonl_file): “”" Generate and write questions for consecutive pairs of important repos.

Args: important_repos: List of important repositories jsonl_file: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

for i in range(len(important_repos)): repo1 = important_repos[i] repo2 = important_repos[(i + 1) % len(important_repos)] # Wrap around to start

content = format_l1_prompt(repo1, repo2)

question = { “level”: 1, “repo1”: repo1, “repo2”: repo2, “parent”: “ethereum”, “content”: content, }

json.dump(question, jsonl_file) jsonl_file.write(“\n”) count += 1

return count

def _generate_and_write_level2_questions(important_repos, jsonl_file): “”" Generate and write questions about each important repo’s originality.

Args: important_repos: List of important repositories jsonl_file: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

for repo in important_repos: content = format_l2_prompt(repo)

question = { “level”: 2, “repo”: repo, “parent”: “originality”, “content”: content, }

json.dump(question, jsonl_file) jsonl_file.write(“\n”) count += 1

return count

def _generate_and_write_level3_questions(df, important_repos, jsonl_file): “”" Generate and write questions about consecutive pairs of dependencies for each important repo.

Args: df: DataFrame containing the CSV data important_repos: List of important repositories jsonl_file: File handle for the output JSONL file

Returns: Number of questions generated “”" count = 0

middle_rows = df[~df[“parent”].isin([“ethereum”, “originality”])]

for parent_repo in important_repos:

dependencies = middle_rows[middle_rows[“parent”] == parent_repo][ “repo” ].tolist()

if len(dependencies) > 1: # Only if there are at least 2 dependencies for i in range(len(dependencies)): repo1 = dependencies[i] repo2 = dependencies[ (i + 1) % len(dependencies) ] # Wrap around to start

content = format_l3_prompt(repo1=repo1, repo2=repo2, parent=parent_repo)

question = { “level”: 3, “repo1”: repo1, “repo2”: repo2, “parent”: parent_repo, “content”: content, }

json.dump(question, jsonl_file) jsonl_file.write(“\n”) count += 1

return count

def main(input_file, output_file, questions_file=None): “”“Main function to process the CSV file and optionally generate questions.”“” try:

df = load_csv(input_file)

important_repos = extract_important_repos(df)

validate_important_repos(df, important_repos)

dependency_weights, dependency_counts = calculate_dependency_weights( df, important_repos )

ethereum_weights, originality_weights = get_repo_{classification}_weights( df, important_repos )

total_combinations = compile_output_csv( important_repos, dependency_weights, dependency_counts, ethereum_weights, originality_weights, output_file, )

print(f“Successfully processed {input_file} and created {output_file}”) print( f“Found {len(important_repos)} important repositories with valid weight distributions.” ) print( f“Total number of dependency pairs (num_deps_combinations): {total_combinations}” )

if questions_file: level1_count, level2_count, level3_count = generate_questions( df, important_repos, questions_file )

print(f“\nGenerated and wrote to {questions_file}:”) print(f“- {level1_count} level 1 questions (important repo comparisons)”) print(f“- {level2_count} level 2 questions (repo originality)”) print(f“- {level3_count} level 3 questions (dependency comparisons)”) print(f“Total: {level1_count + level2_count + level3_count} questions”)

except Exception as e: print(f“Error: {e}”) raise

if name == “main”: import argparse

parser = argparse.ArgumentParser(description=“Process repo dependency CSV file.”) parser.add_argument(“input_file”, help=“Path to the input CSV file”) parser.add_argument( “–output_file”, default=“repo_analysis.csv”, help=“Path to the output CSV file (default: repo_analysis.csv)”, ) parser.add_argument( “–questions_file”, help=“Path to output JSONL file for generated questions” )

args = parser.parse_args() main(args.input_file, args.output_file, args.questions_file) #+end_src

3.1. appendix: sample questions

Level 1

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION>
web3/web3.js and prysmaticlabs/prysm are dependencies of Ethereum. Estimate the ratio of importance of prysmaticlabs/prysm to web3/web3.js.
E.g. if prysmaticlabs/prysm is 10 times more important then web3/web3.js then answer "10"; if web3/web3.js is 10 times more important than prysmaticlabs/prysm then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

Repository Comparison:

1. web3/web3.js vs 2. prysmaticlabs/prysm
- Star comparison: web3/web3.js has 5.5x more stars than prysmaticlabs/prysm
- Activity: Both repositories have been updated recently
- Age: 2014-09-30 vs 2018-01-11
- Languages:
• web3/web3.js: TypeScript, JavaScript, Shell
• prysmaticlabs/prysm: Go, Starlark, Shell

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

Level 2

#+begin_src markdown This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION> How much of vyperlang/vyper’s value comes from itself, versus its dependencies? E.g.

4. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies. Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

5. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work. Example: An Ethereum wallet.

6. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

</QUESTION>

Your answer must be a float between 0 and 1.

Repository Information: [vyperlang/vyper]:

Description: Pythonic Smart Contract Language for the EVM
Stars: 4995, Forks: 837
Age: 101 months, Last updated: 2025-03-17
Main languages: Python (99.8%), Makefile (0.1%), Batchfile (0.1%)
Top contributors: jacqueswww, charles-cooper, iamdefinitelyahuman, fubuloubu, DavidKnott
Topics: ethereum, ethereum-dapp, language, python, vyper

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der. #+end_src

Level 3

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, we are looking at the prysmaticlabs/prysm repository. You will need to estimate the relative importances of two dependencies of this repository -- i.e. which of their dependencies matters more for prysmaticlabs/prysm.

<QUESTION>
coreos/go-systemd and herumi/bls-eth-go-binary are dependencies of prysmaticlabs/prysm. Estimate the ratio of importance of herumi/bls-eth-go-binary compared to coreos/go-systemd for prysmaticlabs/prysm.
E.g. if herumi/bls-eth-go-binary is 10 times more important then coreos/go-systemd for prysmaticlabs/prysm then answer "10"; if coreos/go-systemd is 10 times more important than herumi/bls-eth-go-binary then answer "0.1".
</QUESTION>

Your answer must be a positive float.

Parent Repository Information:
[prysmaticlabs/prysm]:
- Description: Go implementation of Ethereum proof of stake
- Stars: 3546, Forks: 1090
- Age: 87 months, Last updated: 2025-03-17
- Main languages: Go (93.6%), Starlark (5.5%), Shell (0.5%)
- Top contributors: terencechain, prestonvanloon, rauljordan, nisdas, rkapka
- Topics: ethereum

Dependency Comparison:

1. coreos/go-systemd vs 2. herumi/bls-eth-go-binary
- Star comparison: coreos/go-systemd has 36.6x more stars than herumi/bls-eth-go-binary
- Activity: Both repositories have been updated recently
- Age: 2013-09-13 vs 2019-10-19
- Languages:
• coreos/go-systemd: Go, Shell
• herumi/bls-eth-go-binary: Go, C, C++

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

6.1. appendix: helper csv

just the result of a summarization script on their sample submission file — the main importance is in the num_deps (\(n_A\) for each \(A\)) and num_deps_combinations (\(n_A(n_A-1)/2\)) columns.

important_repo,sum_dep_weights,num_deps,num_deps_combinations,originality,ethereum
https://github.com/prysmaticlabs/prysm,0.9999999999999936,245,29890.0,0.8650592053492349,0.0294117647058823
https://github.com/ethereum/fe,0.9999999999999927,301,45150.0,0.9443691638516738,0.0294117647058823
https://github.com/ethereum/remix-project,0.9999999999999305,2277,2591226.0,0.7100532015645752,0.0294117647058823
https://github.com/eth-infinitism/account-abstraction,0.999999999999991,863,371953.0,0.4286456194839025,0.0294117647058823
https://github.com/wevm/viem,0.9999999999999376,725,262450.0,0.1450951555623104,0.0294117647058823
https://github.com/nethereum/nethereum,0.9999999999999997,57,1596.0,0.4018369295764015,0.0294117647058823
https://github.com/ethers-io/ethers.js,0.9999999999999998,138,9453.0,0.0095957100544701,0.0294117647058823
https://github.com/chainsafe/lodestar,0.9999999999999116,1516,1148370.0,0.8811032731861512,0.0294117647058823
https://github.com/ethereum-lists/chains,0.9999999999999997,6,15.0,0.6236088131630412,0.0294117647058823
https://github.com/sigp/lighthouse,0.9999999999999987,464,107416.0,0.9133744057295108,0.0294117647058823
https://github.com/ethereum/py-evm,1.0,11,55.0,0.317338474800942,0.0294117647058823
https://github.com/hyperledger/besu,0.0,0,0.0,0.774361090611847,0.0294117647058823
https://github.com/erigontech/erigon,0.9999999999999813,253,31878.0,0.932572749866548,0.0294117647058823
https://github.com/vyperlang/titanoboa,0.9999999999999989,27,351.0,0.1964900800509441,0.0294117647058823
https://github.com/alloy-rs/alloy,0.9999999999999994,19,171.0,0.3690905969681286,0.0294117647058823
https://github.com/ethereumjs/ethereumjs-monorepo,0.9999999999999718,828,342378.0,0.8417883513048304,0.0294117647058823
https://github.com/foundry-rs/foundry,0.9999999999999529,482,115921.0,0.6458766968885356,0.0294117647058823
https://github.com/safe-global/safe-smart-account,0.9999999999999712,538,144453.0,0.6011871268121423,0.0294117647058823
https://github.com/consensys/teku,0.9999999999999998,137,9316.0,0.4685428282978935,0.0294117647058823
https://github.com/grandinetech/grandine,0.9999999999999785,438,95703.0,0.9124435469744914,0.0294117647058823
https://github.com/ethereum/sourcify,0.9999999999999243,908,411778.0,0.4089762481898589,0.0294117647058823
https://github.com/ethereum/solidity,1.0,3,3.0,0.1090974834405934,0.0294117647058823
https://github.com/status-im/nimbus-eth2,0.9999999999999982,104,5356.0,0.417548394392558,0.0294117647058823
https://github.com/openzeppelin/openzeppelin-contracts,0.9999999999999536,562,157641.0,0.3373326583791293,0.0294117647058823
https://github.com/ethereum/web3.py,0.9999999999999996,13,78.0,0.8039938317729571,0.0294117647058823
https://github.com/nethermindeth/nethermind,0.0,0,0.0,0.4171925064865261,0.0294117647058823
https://github.com/apeworx/ape,0.9999999999999994,38,703.0,0.3110356991327095,0.0294117647058823
https://github.com/a16z/helios,0.999999999999944,628,196878.0,0.6740220469622176,0.0294117647058823
https://github.com/paradigmxyz/reth,0.9999999999999601,470,110215.0,0.3837737278955573,0.0294117647058823
https://github.com/scaffold-eth/scaffold-eth-2,0.9999999999999284,859,368511.0,0.688816951981115,0.0294117647058823
https://github.com/vyperlang/vyper,1.0,10,45.0,0.9323242887762672,0.0294117647058823
https://github.com/hyperledger-web3j/web3j,0.0,0,0.0,0.2430654500988898,0.0294117647058823
https://github.com/ethereum/go-ethereum,0.9999999999999986,116,6670.0,0.8467503069554304,0.0294117647058823
https://github.com/nomicfoundation/hardhat,0.9999999999999869,1891,1786995.0,0.5435417307291117,0.0294117647058823

Summing num_deps_combinations:

Total number of dependency pairs (num_deps_combinations): 8352618.0

\[\frac{34\cdot(34-1)}2+34+8352618=8353774\]

Footnotes:

for some reason, only 4381 such nodes and 10075 edges are present in the visualization graph

I checked this is the case with the sample submission

i.e. it is the ground truth for the logits of your weights; we take the MSE of your logits against this

⁴

I’m not really sure of the exact format this data is stored. The competition instructions state it is stored as https://github.com/b1,https://github.com/b2,advantage_b_over_a but this doesn’t make sense as it must also include \(A\); since multiple projects \(A\) can have the same dependencies \(B_1,B_2\). In general I don’t know where to find the train and public test datasets mentioned in the competition instructions.

⁵

BLEG: I’m not sure how this is currently being done, so leaving it abstract — is it just the average weighted by past peer scores earned?

Table of Contents

0.1. problem formalization

0.2. deepfunding scoring

0.3. scoring for level 2

0.3.1. straightforward approach

0.3.2. shapley values from relative comparisons

0.4. appendix: question generation script

1. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.\

2. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work.\

3. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

3.1. appendix: sample questions

4. 0.2 – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies. Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).

5. 0.5 – The project is heavily dependent on its dependencies but also has substantial original work. Example: An Ethereum wallet.

6. 0.8 – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.

6.1. appendix: helper csv

Footnotes: