problem formalization

(notation: to match the graph on the deepfunding website, all arrows are in the direction of dependency, i.e. P → Q means P depends on Q.)

We have a tree (well DAG) of depth exactly 2. - Depth 0 is a single node, ethereum — call this O. - Depth 1 is 34 nodes, "seed repositories" — A1, …A34. - Depth 2 are the ~5000 code dependencies of the seed repositories B1, …B4381; with a total of ~15000 edges of the form (Ai,Bj)1

The task of a contestant is to provide: - Level 1: weights wO → A, denoted like https://github.com/a,ethereum,0.2 such that AwO → A = 1 [34 outputs] - Level 2: self-weights wA denoted like https://github.com/a,originality,0.6 [34 outputs] - Level 3: weights wA → B denoted like https://github.com/b,https://github.com/a,0.6 such that the sum of A's dependencies BwA → B = 1 i.e. (note that they add up to 1 and not to 1 − wA2) [~15000 outputs]

A juror is given random samples of: - pairs of edges ((OA1),(OA2)) for which they give the "relative advantage of A2 over A1 to O" j(OA1), (OA2) (taken3 as measured in logits). - Depth 1 nodes A for which they directly estimate the originality score jA. - pairs of edges ((AB1),(AB2)) for which they give the "relative advantage of B2 over B1 to A" j(AB1), (AB2).4

To give values for all of these would take, where nA is the number of dependencies of A:

$$\frac{34\cdot(34-1)}{2}+34+\sum_{A}\frac{n_A(n_A-1)}{2}$$ I ran a quick script to calculate this from the sample submissions (see "helper csv" below), and it comes out to 8,353,774. In particular this means we would need 8,353,774 events if we directly implemented a distillation market. So we need something smarter — ideally something that doesn't require more than 15000 questions for the 15000 weights.

deepfunding scoring

First let me quickly go over how Deepfunding scores the contestants.

The cost for Level-1 answers is |log(wO → A2/wO → A1)−j(OA1), (OA2)|2 , summed over all pairs ((OA1),(OA2)) for which the juror has provided an estimate.

The cost for Level-2 answers is simply |wAjA|2 summed over all A for which the juror has provided an estimate.

The cost for Level-3 answers is again |log(wA → B2/wA → B1)−j(AB1), (AB2)|2 , summed over all pairs ((AB1),(AB2)) for which the juror has provided an estimate.

BLEG: I'm not sure how these are to be weighted. The concrete instructions just say they are summed over all juror samples, but this depends on how many juror samples are taken of each category — this is probably important if we want to weight questions properly. We should ask them, or maybe we can just create another event to ask the market what it thinks they will do :)

This covers how Deepfunding will score our final model submission. As we will see, this does not necessarily straightforwardly translate to how we score miners in preparing our model.

scoring for level 2

Level-2 is straightforward — we simply create a question for each A (depth-1 node) asking "how original is A?" and score as:

s(wA) = |wAjA|2

if jA is elicited and 0 is otherwise. Then perform the peer score adjustment (otherwise the miners are incentivized to just not bet). ## scoring for level 1 and level 3

The basic problem for scoring Level-1 and Level-3 questions is:

Three possible solutions:

straightforward approach

Here's one idea: for Level-3 we create one event per A i.e. per depth-1 node (and for Level-1 we analogously create just one event for O) — to forecast this event the miner reports a value in wA ∈ [0,1]nA such that wA = 1, i.e. weights for all of A's dependencies — and this answer is scored in a special way (rather than just as a standard continuous random variable):

s(wA) = ∑|log(wA → B2/wA → B1)−j(AB1), (AB2)|2 where the sum is over pairs (AB1,AB2) for which the jury ultimately gives an answer.

(and again just make the peer score adjustment as per normal)

This way we have just 34 events for Level-3 and 1 event for Level-1 (in addition to the 34 events for Level-2).

Key questions aka potential problems with tiny numbers: - The answers to these questions might be very high-dimensional vectors — the lowest is 6, the highest is 2277 (full numbers in helper csv section). Could this be an issue? Can LLM-based miners even meaningfully fit so much in their context window (I mean they can but like, usefully)? - How "advanced" are the miners on the network right now? If they're all just simple LLM callers (without any domain-specific engineering) they would probably have a problem with this task, e.g. even to produce such tiny numbers. This would also be a problem for the Shapley values solution, actually.

I mean, I can imagine making a decent miner by e.g. just asking an LLM to make relative comparisons and fitting some model to it, but if the miners on the network do not do such things, it would be no use—wait, actually that gives me an idea, see section "reconstructing from relative comparisons". ### shapley values

Here's the idea: we create events for each edge, and score miners based on how useful their weight estimate was to the final cost function. Fortunately calculating Shapley values here is easy, because the cost is independent of any permutations.

For each edge weight wA → B (and likewise for the 34 edges wO → A) we create an event:

Estimate the relative importance of dependency B to project A as a number between 0 and 1.

{{ description of how miners will be scored, i.e. a more practical summary of this section }}

{{ training set data }}

From all the miner estimates for wA → B we get a consensus estimate A → B in the usual way5 .

[IGNORE THIS. go with either the straightforward approach or the below one]

shapley values from relative comparisons

Ok, here's perhaps the most promising approach: we do give miners pairs of edges. But we don't need to give them all pairs of edges.

For node A with dependencies B1Bn, we can just write questions for the n − 1 adjacent pairs of edges: - c(AB1), (AB2) - c(AB2), (AB3) - … - c(ABn − 1), (ABn)

How much more important is dependency B_{i+1} than B_i to A? i.e. estimate log(w_{A→B_{i+1}} / w_{A→B_i})

...

… and calculate the implied pairwise comparison for any c(ABi), (ABj) (specifically for those edge pairs the jury scores) by simply summing:

$$ c_{(A\to B_i),(A\to B_j)} = \sum_{k=i}^{j-1}c_{(A\to B_i),(A\to B_{i+1})} $$ (for i < j. For i > j we can just take the negative of the opposite pair) We may be tempted to say that life is solved — we can simply calculate the scores based on these "implied comparisons" … but again, maybe we can't do that, because we want scoring functions to be modular.

Instead, for each j(ABi), (ABj) that we receive, we can measure the relative contributions of each c(ABk), (ABk + 1) to the cost function. We imagine that before the miners' forecasts, all c(ABk), (ABk + 1) were initialized to zero (i.e. a uniform prior). Then these forecasts define a coalitional game, as follows.

Definition: a single miner's forecasts as a coalitional game. The miner's forecast on each c(ABk), (ABk + 1) defines a player i ∈ {1, …n − 1} in an n − 1-player coalitional game, with a value function on subset S ⊆ {1, …n − 1} as follows:

v(S) =  − ∑(i,j)(|j(ABi), (ABj)−∑k ∈ S∩{i, …j − 1}c(ABk), (ABk + 1)|2−|j(ABi), (ABj)|2) (where the outer summation is taken over all (i,j) pairs such that j(ABi), (ABj) is in the jury sample)

(crucially, we can just take the remaining forecasts in the expression as "external facts of the world", i.e. information known to the validator — so the scoring rule itself is modular.)

This lets us take the Shapley value in the definitional way.

$$ \begin{align*} s(c_{(A\to B_k),(A\to B_{k+1})}) &=\varphi_i(v)\\ &= \frac1{n!}\sum_R\left[v(\{j\mid j\le_R i\}\cup\{i\}) - v(\{j\mid j\le_R i\})\right] \end{align*} $$ (and then make the peer score adjustment against all other miners) ## most relevant resources

appendix: question generation script

import csv
import json
import pandas as pd
import numpy as np

# imports for fetching repo heuristics
import requests
import time
from datetime import datetime
from urllib.parse import urlparse
import os
from dotenv import load_dotenv
from functools import lru_cache

load_dotenv()

INCLUDE_REPO_HEURISTICS = True

if "GITHUB_TOKEN" not in os.environ and INCLUDE_REPO_HEURISTICS:
    print("Warning: GitHub API token not found. You will get rate-limited. You should set INCLUDE_REPO_HEURISTICS to False.")

L1_TRAIN_SET_EXAMPLES = ""
L2_TRAIN_SET_EXAMPLES = ""
L3_TRAIN_SET_EXAMPLES = ""

PROMPT_INTRO = "This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem."

USE_LONG_JUROR_LIST = False
SHORT_JUROR_LIST = "\nThe jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der."
LONG_JUROR_LIST = """
This is the list of publicly known jurors, if it helps you. The jurors are expected to be experts in the Ethereum ecosystem.


| Juror                 | Nominator         | Votes | Affiliation            | Github   | ENS/Wallet  |
|-----------------------|-------------------|-------|------------------------|----------|-------------|
| Vitalik Buterin       | Invitation        | 10    | EF                     | vbuterin | vitalik.eth |
| Changwu               | Vitalik Buterin   | 0     | imwallet               |          |             |
| Justin Drake          | Vitalik Buterin   | 0     | EF                     |          |             |
| Anton Cheng           | Changwu           | 5     |                        |          |             |
| Nicholas Lin          | Changwu           |       | imwallet               |          |             |
| Toni Wahrstatter      | Justin Drake      | 17    | EF                     |          |             |
| Ladislaus             | Justin Drake      | 11    | EF                     |          |             |
| DC Builder            | Anton Cheng       | 10    | worldcoin              |          |             |
| Vectorized            | Anton Cheng       | 10    |                        |          |             |
| Jason                 | Nicholas Lin      | 32    |                        |          |             |
| Oskar                 | Nicholas Lin      | 2     |                        |          |             |
| Alex Stokes           | Toni Wahrstatter  | 4     | EF                     |          |             |
| Parithosh Jayanti     | Toni Wahrstatter  |       | Nethermind             |          |             |
| Auston Sterling       | Ladislaus         | 3     |                        |          |             |
| Marius Van Der        | Ladislaus         | 10    |                        |          |             |
| Mark Tyneway          | Vectorized        |       | Optimism               |          |             |
| Georgios              | Vectorized        |       | Reth                   |          |             |
| TCZPL                 | Jason             |       |                        |          |             |
| Ambition Chen         | Jason             |       | 35                     |          |             |
| Adrian                | Oskar             |       |                        |          |             |
| Chih Cheng Liang      | Oskar             |       |                        |          |             |
| Matt (lightclient)    | Alex Stokes       |       |                        |          |             |
| Josh Rudolf           | Alex Stokes       |       | EF                     |          |             |
| Mikhail Kalinin       | Paritosh Jayanti  |       | Nethermind             |          |             |
| Marek Morakzynski     | Paritosh Jayanti  |       | Nethermind             |          |             |
| Nixo                  | Auston Sterling   | 7     | EF                     |          |             |
| Logris                | Auston Sterling   |       |                        |          |             |
| Hudson Jameson        | Marius Van Der    |       |                        |          |             |
| Terence Tsao          | Marius Van Der    | 4     | Prysm                  |          |             |
| Jacek                 | Terence Tsao      |       | nimbus                 |          |             |
| Adrian                | Terence Tsao      |       | lighthouse             |          |             |
| Haurog                | Logris            |       |                        |          |             |
| Pooja Ranjan          | Nixo              |       | ethereum cat herders   |          |             |
| Butta                 | Nixo              |       |                        |          |             |
| Tim Beiko             | Pooja Ranjan      |       |                        |          |             |
| Sassal0x              | Pooja Ranjan      |       |                        |          |             |
| G                     | Marek Morakzynski |       |                        |          |             |
| Ahmed                 | Marek Morakzynski |       |                        |          |             |
| Ansgar                | Ahmed             |       |                        |          |             |
| Potuz                 | Ahmed             |       |                        |          |             |
| Preston               | Potuz             |       |                        |          |             |
| Nishant               | Potuz             |       |                        |          |             |
| Felix Lange           | Mikhail Kalinin   |       | go ethereum            |          |             |
| Piper Merriam         | Mikhail Kalinin   | 6     |                        |          |             |
| Janmajaya             | Chih Cheng Liang  |       |                        |          |             |
| Graham                | Ambition Chen     |       |                        |          |             |
| Banri                 | Ambition Chen     |       |                        |          |             |
| adjust                | Banri             |       |                        |          |             |
| yanyanho dapplearning | TCZPL             |       |                        |          |             |
| boge james (weimumu)  | TCZPL             |       |                        |          |             |
| Kelvin                | Mark Tyneway      |       |                        |          |             |
| ml_sudo               | Mark Tyneway      |       |                        |          |             |
| Jason Carver          | Piper Merriam     |       |                        |          |             |
| Redwan Meslem         | Invitation        |       | web3.js                |          |             |
| Richard Moore         | Invitation        |       | ethers.js              |          |             |
| tom                   | Invitation        |       | Viem                   |          |             |
| Patricio              | Invitation        |       | Hardhat                |          |             |
| Andrew                | Invitation        |       | Remix                  |          |             |
| Bryant Eisenbach      | Invitation        |       | Ape                    |          |             |
| benny                 | Invitation        |       | Boa                    |          |             |
| Ligi                  | Invitation        |       | Chainlist              |          |             |
| benny                 | Invitation        |       | Vyper                  |          |             |
| Kaan                  | Invitation        |       | Sourcify               |          |             |
| Austin Griffith       | Invitation        |       | Scaffold-eth (v1 + v2) |          |             |
| Jaydon                | zengjiajun.eth    |       | elytro                 |          |             |
| Joi                   | zengjiajun.eth    |       | elytro                 |          |             |
| Marc                  | Invitation        |       | web3.py                |          |             |
| Felipe                | Invitation        |       | web3.py                |          |             |
| Wesley                | Sky               |       | EF                     |          |             |
"""
# https://research.allo.capital/t/join-the-deep-funding-jury/99

TRAIN_SET_EXAMPLES_PROMPT = (
    "Here are some existing juror answers from the public 'training set'."
)

L1_PROMPT_BASE = """

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION>
{repo1} and {repo2} are dependencies of Ethereum. Estimate the ratio of importance of {repo2} to {repo1}.
E.g. if {repo2} is 10 times more important then {repo1} then answer "10"; if {repo1} is 10 times more important than {repo2} then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

"""


L2_PROMPT_BASE = """

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION>
How much of {repo}'s value comes from itself, versus its dependencies?
E.g.
*   **0.2** – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.\
    *Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).*
*   **0.5** – The project is heavily dependent on its dependencies but also has substantial original work.\
    *Example: An Ethereum wallet.*
*   **0.8** – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between your answer and the jury's answer.

Your answer must be a float between 0 and 1.

"""

L3_PROMPT_BASE = """

For this question, we are looking at the {parent} repository. You will need to estimate the relative importances of two dependencies of this repository -- i.e. which of their dependencies matters more for {parent}.

<QUESTION>
{repo1} and {repo2} are dependencies of {parent}. Estimate the ratio of importance of {repo2} compared to {repo1} for {parent}.
E.g. if {repo2} is 10 times more important then {repo1} for {parent} then answer "10"; if {repo1} is 10 times more important than {repo2} then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

"""


class RepoHeuristics:
    """Fetch and cache heuristics about GitHub repositories."""

    def __init__(self, github_token=None, cache_ttl=3600):
        """
        Initialize the heuristics fetcher.

        Args:
            github_token: GitHub API token (optional but recommended to avoid rate limits)
            cache_ttl: Time in seconds to cache repository data
        """
        self.github_token = github_token or os.environ.get('GITHUB_TOKEN')
        self.headers = {'Authorization': f'token {self.github_token}'} if self.github_token else {}
        self.cache_ttl = cache_ttl
        self._repo_cache = {}

    @staticmethod
    def parse_github_url(repo_url):
        """Extract owner and repo name from a GitHub URL."""
        if not repo_url or 'github.com' not in repo_url:
            return None, None

        parsed = urlparse(repo_url)
        path_parts = parsed.path.strip('/').split('/')

        if len(path_parts) < 2:
            return None, None

        return path_parts[0], path_parts[1]

    @lru_cache(maxsize=128)
    def get_repo_info(self, repo_url):
        """
        Fetch detailed information about a repository.

        Args:
            repo_url: Full URL to the GitHub repository

        Returns:
            Dictionary with repository information or None on failure
        """
        owner, repo = self.parse_github_url(repo_url)
        if not owner or not repo:
            return None

        # Check cache first
        cache_key = f"{owner}/{repo}"
        if cache_key in self._repo_cache:
            cached_data, timestamp = self._repo_cache[cache_key]
            if time.time() - timestamp < self.cache_ttl:
                return cached_data

        # Fetch repository data
        try:
            repo_api_url = f"https://api.github.com/repos/{owner}/{repo}"
            response = requests.get(repo_api_url, headers=self.headers)
            response.raise_for_status()
            repo_data = response.json()

            # Fetch contributors
            contributors_url = f"{repo_api_url}/contributors?per_page=5"
            contributors_resp = requests.get(contributors_url, headers=self.headers)
            contributors_resp.raise_for_status()
            contributors = contributors_resp.json()

            # Fetch languages
            languages_url = f"{repo_api_url}/languages"
            languages_resp = requests.get(languages_url, headers=self.headers)
            languages_resp.raise_for_status()
            languages = languages_resp.json()

            # Build complete repo info
            repo_info = {
                'name': repo_data.get('name'),
                'full_name': repo_data.get('full_name'),
                'description': repo_data.get('description'),
                'stars': repo_data.get('stargazers_count', 0),
                'forks': repo_data.get('forks_count', 0),
                'watchers': repo_data.get('watchers_count', 0),
                'open_issues': repo_data.get('open_issues_count', 0),
                'created_at': repo_data.get('created_at'),
                'updated_at': repo_data.get('updated_at'),
                'contributors': [c.get('login') for c in contributors[:5]],
                'languages': languages,
                'homepage': repo_data.get('homepage'),
                'license': repo_data.get('license', {}).get('name') if repo_data.get('license') else None,
                'topics': repo_data.get('topics', []),
                'size': repo_data.get('size', 0)
            }

            # Cache the result
            self._repo_cache[cache_key] = (repo_info, time.time())

            return repo_info

        except Exception as e:
            print(f"Error fetching data for {repo_url}: {e}")
            return None

    def format_repo_heuristics(self, repo_url):
        """Format repository information as a readable string."""
        repo_info = self.get_repo_info(repo_url)
        if not repo_info:
            return f"Repository {repo_url} information not available."

        # Format languages as percentages
        total_bytes = sum(repo_info['languages'].values()) if repo_info['languages'] else 1
        language_percentages = {lang: f"{count/total_bytes*100:.1f}%" 
                               for lang, count in repo_info['languages'].items()}

        # Calculate age
        created_date = datetime.strptime(repo_info['created_at'], "%Y-%m-%dT%H:%M:%SZ") if repo_info['created_at'] else None
        age = (datetime.now() - created_date).days // 30 if created_date else "unknown"

        info_string = f"""[{repo_info['full_name']}]:
- Description: {repo_info['description'] or 'No description'}
- Stars: {repo_info['stars']}, Forks: {repo_info['forks']} 
- Age: {age} months, Last updated: {repo_info['updated_at'].split('T')[0] if repo_info['updated_at'] else 'Unknown'}
- Main languages: {', '.join(f"{lang} ({pct})" for lang, pct in list(language_percentages.items())[:3])}
- Top contributors: {', '.join(repo_info['contributors']) if repo_info['contributors'] else 'Unknown'}
- Topics: {', '.join(repo_info['topics']) if repo_info['topics'] else 'None'}
"""
        return info_string

    def compare_repos(self, repo1_url, repo2_url):
        """Compare two repositories and return a formatted comparison string."""
        repo1_info = self.get_repo_info(repo1_url)
        repo2_info = self.get_repo_info(repo2_url)

        if not repo1_info or not repo2_info:
            return "Comparison information not available for one or both repositories."

        # Compare star counts
        star_ratio = repo2_info['stars'] / max(1, repo1_info['stars'])
        if star_ratio > 1.5:
            star_comparison = f"{repo2_info['full_name']} has {star_ratio:.1f}x more stars than {repo1_info['full_name']}"
        elif star_ratio < 0.67:
            star_comparison = f"{repo1_info['full_name']} has {(1/star_ratio):.1f}x more stars than {repo2_info['full_name']}"
        else:
            star_comparison = f"Both repositories have similar star counts ({repo1_info['stars']} vs {repo2_info['stars']})"

        # Compare activity
        try:
            repo1_date = datetime.strptime(repo1_info['updated_at'], "%Y-%m-%dT%H:%M:%SZ") if repo1_info['updated_at'] else None
            repo2_date = datetime.strptime(repo2_info['updated_at'], "%Y-%m-%dT%H:%M:%SZ") if repo2_info['updated_at'] else None

            if repo1_date and repo2_date:
                date_diff = abs((repo1_date - repo2_date).days)
                if date_diff > 30:
                    more_recent = f"{repo1_info['full_name'] if repo1_date > repo2_date else repo2_info['full_name']} has been updated more recently"
                else:
                    more_recent = "Both repositories have been updated recently"
            else:
                more_recent = "Update information not available for comparison"
        except:
            more_recent = "Error comparing dates"

        comparison = f"""
1. {repo1_info['full_name']} vs 2. {repo2_info['full_name']}
- Star comparison: {star_comparison}
- Activity: {more_recent}
- Age: {repo1_info['created_at'].split('T')[0] if repo1_info['created_at'] else 'Unknown'} vs {repo2_info['created_at'].split('T')[0] if repo2_info['created_at'] else 'Unknown'}
- Languages: 
{repo1_info['full_name']}: {', '.join(list(repo1_info['languages'].keys())[:3]) if repo1_info['languages'] else 'Unknown'}
{repo2_info['full_name']}: {', '.join(list(repo2_info['languages'].keys())[:3]) if repo2_info['languages'] else 'Unknown'}
"""
        return comparison

repo_heuristics = RepoHeuristics()

def _format_repo_name(repo: str):
    """Format a repository name for display."""
    return (
        repo.replace("https://", "")
        .replace("http://", "")
        .replace("www.", "")
        .replace("github.com", "")
        .strip("/")
    )


def format_l1_prompt(repo1, repo2):
    prompt = PROMPT_INTRO + L1_PROMPT_BASE.format(
        repo1=_format_repo_name(repo1), repo2=_format_repo_name(repo2)
    )

    if INCLUDE_REPO_HEURISTICS:
        # Add heuristics about the repositories
        try:
            comparison = repo_heuristics.compare_repos(repo1, repo2)
            prompt += "Repository Comparison:\n" + comparison
        except Exception as e:
            print(f"Warning: Could not fetch repository comparison for {repo1} and {repo2}: {e}")

    prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST
    prompt += (
        TRAIN_SET_EXAMPLES_PROMPT + "\n\n" + L1_TRAIN_SET_EXAMPLES
        if L1_TRAIN_SET_EXAMPLES
        else ""
    )
    return prompt


def format_l2_prompt(repo):
    prompt = PROMPT_INTRO + L2_PROMPT_BASE.format(repo=_format_repo_name(repo))

    if INCLUDE_REPO_HEURISTICS:
        # Add heuristics about the repository
        try:
            repo_info = repo_heuristics.format_repo_heuristics(repo)
            prompt += "Repository Information:\n" + repo_info
        except Exception as e:
            print(f"Warning: Could not fetch repository information for {repo}: {e}")

    prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST
    prompt += (
        TRAIN_SET_EXAMPLES_PROMPT + "\n\n" + L2_TRAIN_SET_EXAMPLES
        if L2_TRAIN_SET_EXAMPLES
        else ""
    )
    return prompt


def format_l3_prompt(repo1, repo2, parent):
    prompt = PROMPT_INTRO + L3_PROMPT_BASE.format(
        repo1=_format_repo_name(repo1),
        repo2=_format_repo_name(repo2),
        parent=_format_repo_name(parent),
    )

    if INCLUDE_REPO_HEURISTICS:
        # Add heuristics about the parent repository
        try:
            parent_info = repo_heuristics.format_repo_heuristics(parent)
            prompt += "Parent Repository Information:\n" + parent_info

            # Add comparison of the two dependency repositories
            comparison = repo_heuristics.compare_repos(repo1, repo2)
            prompt += "\nDependency Comparison:\n" + comparison
        except Exception as e:
            print(f"Warning: Could not fetch repository information: {e}")

    prompt += LONG_JUROR_LIST if USE_LONG_JUROR_LIST else SHORT_JUROR_LIST
    prompt += (
        TRAIN_SET_EXAMPLES_PROMPT + "\n\n" + L3_TRAIN_SET_EXAMPLES
        if L3_TRAIN_SET_EXAMPLES
        else ""
    )
    return prompt


def load_csv(file_path):
    """Load CSV file into a pandas DataFrame."""
    try:
        df = pd.read_csv(file_path, skipinitialspace=True)
        return df
    except Exception as e:
        raise Exception(f"Error loading CSV file: {e}")


def extract_important_repos(df):
    """Extract the list of important repositories from rows with ethereum as parent."""
    # Get repos that have ethereum as parent (first 35 rows)
    important_repos = list(df[df["parent"] == "ethereum"]["repo"])
    return important_repos


def validate_important_repos(df, important_repos):
    """Perform validations on the list of important repositories."""
    # Validation (a): check there are exactly 35 important repos
    if len(important_repos) != 35:
        raise ValueError(
            f"Expected 35 important repos, but found {len(important_repos)}"
        )

    # Validation (b): Check that repos with ethereum as parent match repos with originality as parent
    ethereum_children = set(df[df["parent"] == "ethereum"]["repo"].unique())
    originality_children = set(df[df["parent"] == "originality"]["repo"].unique())

    if ethereum_children != originality_children:
        diff = ethereum_children.symmetric_difference(originality_children)
        raise ValueError(f"Mismatch between ethereum and originality lists: {diff}")

    # Validation (c): Check that all repos in the middle column (excluding ethereum and originality)
    # are in the important_repos list
    middle_sections = df[~df["parent"].isin(["ethereum", "originality"])]
    middle_parents = set(middle_sections["parent"].unique())

    # Check if any middle parent is not in important_repos
    if not middle_parents.issubset(set(important_repos)):
        invalid_parents = middle_parents - set(important_repos)
        raise ValueError(
            f"Found items in middle column that are not in important_repos: {invalid_parents}"
        )

    return True


def calculate_dependency_weights(df, important_repos):
    """
    Calculate total weights of dependencies for each important repo
    and validate they sum to 1.
    """
    dependency_weights = {}
    dependency_counts = {}

    for repo in important_repos:
        # Get rows where this repo is the parent (excluding ethereum and originality rows)
        deps = df[
            (df["parent"] == repo) & ~df["repo"].isin(["ethereum", "originality"])
        ]

        # Calculate sum of weights
        total_weight = deps["weight"].sum()
        dependency_weights[repo] = total_weight

        # Count number of dependencies
        dependency_counts[repo] = len(deps)

        # Validation (d): Check if weights sum to approximately 1
        if len(deps) > 0 and not np.isclose(total_weight, 1.0, atol=1e-6):
            raise ValueError(f"Weights for {repo} sum to {total_weight}, not 1.0")

    return dependency_weights, dependency_counts


def calculate_combinations(n):
    """Calculate n choose 2, which is n*(n-1)/2"""
    return n * (n - 1) / 2


def get_repo_classification_weights(df, important_repos):
    """Get the ethereum and originality weights for each important repo."""
    ethereum_weights = dict(
        zip(
            df[df["parent"] == "ethereum"]["repo"],
            df[df["parent"] == "ethereum"]["weight"],
        )
    )

    originality_weights = dict(
        zip(
            df[df["parent"] == "originality"]["repo"],
            df[df["parent"] == "originality"]["weight"],
        )
    )

    return ethereum_weights, originality_weights


def compile_output_csv(
    important_repos,
    dependency_weights,
    dependency_counts,
    ethereum_weights,
    originality_weights,
    output_file,
):
    """Create the output CSV file with the required columns."""
    combinations = {}
    total_combinations = 0

    for repo in important_repos:
        count = dependency_counts.get(repo, 0)
        comb = calculate_combinations(count)
        combinations[repo] = comb
        total_combinations += comb

    with open(output_file, "w", newline="") as csvfile:
        fieldnames = [
            "important_repo",
            "sum_dep_weights",
            "num_deps",
            "num_deps_combinations",
            "originality",
            "ethereum",
        ]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for repo in important_repos:
            writer.writerow(
                {
                    "important_repo": repo,
                    "sum_dep_weights": dependency_weights.get(repo, 0),
                    "num_deps": dependency_counts.get(repo, 0),
                    "num_deps_combinations": combinations.get(repo, 0),
                    "originality": originality_weights.get(repo, 0),
                    "ethereum": ethereum_weights.get(repo, 0),
                }
            )

    return total_combinations


def generate_questions(df, important_repos, questions_file):
    """
    Generate three lists of questions based on the data and write them to a JSONL file:
    1. Level 1: Questions about consecutive pairs of important repos
    2. Level 2: Questions about each important repo's originality
    3. Level 3: Questions about consecutive pairs of dependencies for each important repo

    Args:
        df: DataFrame containing the CSV data
        important_repos: List of important repositories
        questions_file: Path to the output JSONL file

    Returns:
        Tuple of (level1_count, level2_count, level3_count) indicating the number of questions generated
    """
    # Open the JSONL file for writing
    with open(questions_file, "w") as jsonl_file:
        # Generate and write Level 1 questions (consecutive pairs of important repos)
        level1_count = _generate_and_write_level1_questions(important_repos, jsonl_file)

        # Generate and write Level 2 questions (each important repo's originality)
        level2_count = _generate_and_write_level2_questions(important_repos, jsonl_file)

        # Generate and write Level 3 questions (consecutive pairs of dependencies for each important repo)
        level3_count = _generate_and_write_level3_questions(
            df, important_repos, jsonl_file
        )

    return level1_count, level2_count, level3_count


def _generate_and_write_level1_questions(important_repos, jsonl_file):
    """
    Generate and write questions for consecutive pairs of important repos.

    Args:
        important_repos: List of important repositories
        jsonl_file: File handle for the output JSONL file

    Returns:
        Number of questions generated
    """
    count = 0

    # Loop through each consecutive pair, including the wrap around from last to first
    for i in range(len(important_repos)):
        repo1 = important_repos[i]
        repo2 = important_repos[(i + 1) % len(important_repos)]  # Wrap around to start

        # Format the question
        content = format_l1_prompt(repo1, repo2)

        # Create the question object
        question = {
            "level": 1,
            "repo1": repo1,
            "repo2": repo2, 
            "parent": "ethereum",
            "content": content,
        }

        # Write to JSONL file
        json.dump(question, jsonl_file)
        jsonl_file.write("\n")
        count += 1

    return count


def _generate_and_write_level2_questions(important_repos, jsonl_file):
    """
    Generate and write questions about each important repo's originality.

    Args:
        important_repos: List of important repositories
        jsonl_file: File handle for the output JSONL file

    Returns:
        Number of questions generated
    """
    count = 0

    # Process each important repo
    for repo in important_repos:
        content = format_l2_prompt(repo)

        # Create the question object
        question = {
            "level": 2,
            "repo": repo,
            "parent": "originality",
            "content": content,
        }

        # Write to JSONL file
        json.dump(question, jsonl_file)
        jsonl_file.write("\n")
        count += 1

    return count


def _generate_and_write_level3_questions(df, important_repos, jsonl_file):
    """
    Generate and write questions about consecutive pairs of dependencies for each important repo.

    Args:
        df: DataFrame containing the CSV data
        important_repos: List of important repositories
        jsonl_file: File handle for the output JSONL file

    Returns:
        Number of questions generated
    """
    count = 0

    # Filter for the middle rows (level 3 entries)
    middle_rows = df[~df["parent"].isin(["ethereum", "originality"])]

    # Group by parent repo
    for parent_repo in important_repos:
        # Get dependencies for this parent repo
        dependencies = middle_rows[middle_rows["parent"] == parent_repo][
            "repo"
        ].tolist()

        # Generate questions for consecutive pairs of dependencies
        if len(dependencies) > 1:  # Only if there are at least 2 dependencies
            for i in range(len(dependencies)):
                repo1 = dependencies[i]
                repo2 = dependencies[
                    (i + 1) % len(dependencies)
                ]  # Wrap around to start

                content = format_l3_prompt(repo1=repo1, repo2=repo2, parent=parent_repo)

                # Create the question object
                question = {
                    "level": 3,
                    "repo1": repo1,
                    "repo2": repo2,
                    "parent": parent_repo,
                    "content": content,
                }

                # Write to JSONL file
                json.dump(question, jsonl_file)
                jsonl_file.write("\n")
                count += 1

    return count


def main(input_file, output_file, questions_file=None):
    """Main function to process the CSV file and optionally generate questions."""
    try:
        # Load CSV file
        df = load_csv(input_file)

        # Extract important repos (those with ethereum as parent)
        important_repos = extract_important_repos(df)

        # Validate important repos
        validate_important_repos(df, important_repos)

        # Calculate dependency weights and counts
        dependency_weights, dependency_counts = calculate_dependency_weights(
            df, important_repos
        )

        # Get ethereum and originality weights
        ethereum_weights, originality_weights = get_repo_classification_weights(
            df, important_repos
        )

        # Create output CSV and get total combinations
        total_combinations = compile_output_csv(
            important_repos,
            dependency_weights,
            dependency_counts,
            ethereum_weights,
            originality_weights,
            output_file,
        )

        print(f"Successfully processed {input_file} and created {output_file}")
        print(
            f"Found {len(important_repos)} important repositories with valid weight distributions."
        )
        print(
            f"Total number of dependency pairs (num_deps_combinations): {total_combinations}"
        )

        # Generate questions if questions_file is provided
        if questions_file:
            level1_count, level2_count, level3_count = generate_questions(
                df, important_repos, questions_file
            )

            print(f"\nGenerated and wrote to {questions_file}:")
            print(f"- {level1_count} level 1 questions (important repo comparisons)")
            print(f"- {level2_count} level 2 questions (repo originality)")
            print(f"- {level3_count} level 3 questions (dependency comparisons)")
            print(f"Total: {level1_count + level2_count + level3_count} questions")

    except Exception as e:
        print(f"Error: {e}")
        raise


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Process repo dependency CSV file.")
    parser.add_argument("input_file", help="Path to the input CSV file")
    parser.add_argument(
        "--output_file",
        default="repo_analysis.csv",
        help="Path to the output CSV file (default: repo_analysis.csv)",
    )
    parser.add_argument(
        "--questions_file", help="Path to output JSONL file for generated questions"
    )

    args = parser.parse_args()
    main(args.input_file, args.output_file, args.questions_file)

appendix: sample questions

Level 1

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will need to estimate the relative importances of two direct dependencies of Ethereum:

<QUESTION>
web3/web3.js and prysmaticlabs/prysm are dependencies of Ethereum. Estimate the ratio of importance of prysmaticlabs/prysm to web3/web3.js.
E.g. if prysmaticlabs/prysm is 10 times more important then web3/web3.js then answer "10"; if web3/web3.js is 10 times more important than prysmaticlabs/prysm then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

Repository Comparison:

1. web3/web3.js vs 2. prysmaticlabs/prysm
- Star comparison: web3/web3.js has 5.5x more stars than prysmaticlabs/prysm
- Activity: Both repositories have been updated recently
- Age: 2014-09-30 vs 2018-01-11
- Languages: 
  • web3/web3.js: TypeScript, JavaScript, Shell
  • prysmaticlabs/prysm: Go, Starlark, Shell

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

Level 2

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, you will be given a repository and you need to estimate how much of its value belongs to that repository itself, versus its dependencies.

<QUESTION>
How much of vyperlang/vyper's value comes from itself, versus its dependencies?
E.g.
*   **0.2** – The project is largely a fork or wrapper of something else; it does less original work relative to the work in its dependencies.    *Examples: Brave (a fork of Chromium), Ollama (a wrapper of llama.cpp).*
*   **0.5** – The project is heavily dependent on its dependencies but also has substantial original work.    *Example: An Ethereum wallet.*
*   **0.8** – The project is mostly original work and depends only on generic libraries; it could likely have been built without those dependencies if necessary.
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between your answer and the jury's answer.

Your answer must be a float between 0 and 1.

Repository Information:
[vyperlang/vyper]:
- Description: Pythonic Smart Contract Language for the EVM
- Stars: 4995, Forks: 837 
- Age: 101 months, Last updated: 2025-03-17
- Main languages: Python (99.8%), Makefile (0.1%), Batchfile (0.1%)
- Top contributors: jacqueswww, charles-cooper, iamdefinitelyahuman, fubuloubu, DavidKnott
- Topics: ethereum, ethereum-dapp, language, python, vyper

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

Level 3

This question is for the `Deepfunding competition`, a distillation prediction market for determining the relative importances of different dependencies in the Ethereum ecosystem.

For this question, we are looking at the prysmaticlabs/prysm repository. You will need to estimate the relative importances of two dependencies of this repository -- i.e. which of their dependencies matters more for prysmaticlabs/prysm.

<QUESTION>
coreos/go-systemd and herumi/bls-eth-go-binary are dependencies of prysmaticlabs/prysm. Estimate the ratio of importance of herumi/bls-eth-go-binary compared to coreos/go-systemd for prysmaticlabs/prysm.
E.g. if herumi/bls-eth-go-binary is 10 times more important then coreos/go-systemd for prysmaticlabs/prysm then answer "10"; if coreos/go-systemd is 10 times more important than herumi/bls-eth-go-binary then answer "0.1".
</QUESTION>

This exact question will be asked to the Deepfunding jury.
**Your job is to predict how the Deepfunding jury will answer this question, and answer as close as possible to what the jury will.**
To be exact: we will score you based on the mean-squared-error between the log of your answer and the log of the jury's answer.

Your answer must be a positive float.

Parent Repository Information:
[prysmaticlabs/prysm]:
- Description: Go implementation of Ethereum proof of stake
- Stars: 3546, Forks: 1090 
- Age: 87 months, Last updated: 2025-03-17
- Main languages: Go (93.6%), Starlark (5.5%), Shell (0.5%)
- Top contributors: terencechain, prestonvanloon, rauljordan, nisdas, rkapka
- Topics: ethereum

Dependency Comparison:

1. coreos/go-systemd vs 2. herumi/bls-eth-go-binary
- Star comparison: coreos/go-systemd has 36.6x more stars than herumi/bls-eth-go-binary
- Activity: Both repositories have been updated recently
- Age: 2013-09-13 vs 2019-10-19
- Languages: 
  • coreos/go-systemd: Go, Shell
  • herumi/bls-eth-go-binary: Go, C, C++

The jurors are expected to be experts in the Ethereum ecosystem. Some known names include: Jason, Toni Wahrstatter, Ladislaus, Vitalik Buterin, DC Builder, Vectorized and Marius Van Der.

appendix: helper csv

just the result of a summarization script on their sample submission file — the main importance is in the num_deps (nA for each A) and num_deps_combinations (nA(nA−1)/2) columns.

important_repo,sum_dep_weights,num_deps,num_deps_combinations,originality,ethereum
https://github.com/prysmaticlabs/prysm,0.9999999999999936,245,29890.0,0.8650592053492349,0.0294117647058823
https://github.com/ethereum/fe,0.9999999999999927,301,45150.0,0.9443691638516738,0.0294117647058823
https://github.com/ethereum/remix-project,0.9999999999999305,2277,2591226.0,0.7100532015645752,0.0294117647058823
https://github.com/eth-infinitism/account-abstraction,0.999999999999991,863,371953.0,0.4286456194839025,0.0294117647058823
https://github.com/wevm/viem,0.9999999999999376,725,262450.0,0.1450951555623104,0.0294117647058823
https://github.com/nethereum/nethereum,0.9999999999999997,57,1596.0,0.4018369295764015,0.0294117647058823
https://github.com/ethers-io/ethers.js,0.9999999999999998,138,9453.0,0.0095957100544701,0.0294117647058823
https://github.com/chainsafe/lodestar,0.9999999999999116,1516,1148370.0,0.8811032731861512,0.0294117647058823
https://github.com/ethereum-lists/chains,0.9999999999999997,6,15.0,0.6236088131630412,0.0294117647058823
https://github.com/sigp/lighthouse,0.9999999999999987,464,107416.0,0.9133744057295108,0.0294117647058823
https://github.com/ethereum/py-evm,1.0,11,55.0,0.317338474800942,0.0294117647058823
https://github.com/hyperledger/besu,0.0,0,0.0,0.774361090611847,0.0294117647058823
https://github.com/erigontech/erigon,0.9999999999999813,253,31878.0,0.932572749866548,0.0294117647058823
https://github.com/vyperlang/titanoboa,0.9999999999999989,27,351.0,0.1964900800509441,0.0294117647058823
https://github.com/alloy-rs/alloy,0.9999999999999994,19,171.0,0.3690905969681286,0.0294117647058823
https://github.com/ethereumjs/ethereumjs-monorepo,0.9999999999999718,828,342378.0,0.8417883513048304,0.0294117647058823
https://github.com/foundry-rs/foundry,0.9999999999999529,482,115921.0,0.6458766968885356,0.0294117647058823
https://github.com/safe-global/safe-smart-account,0.9999999999999712,538,144453.0,0.6011871268121423,0.0294117647058823
https://github.com/consensys/teku,0.9999999999999998,137,9316.0,0.4685428282978935,0.0294117647058823
https://github.com/grandinetech/grandine,0.9999999999999785,438,95703.0,0.9124435469744914,0.0294117647058823
https://github.com/ethereum/sourcify,0.9999999999999243,908,411778.0,0.4089762481898589,0.0294117647058823
https://github.com/ethereum/solidity,1.0,3,3.0,0.1090974834405934,0.0294117647058823
https://github.com/status-im/nimbus-eth2,0.9999999999999982,104,5356.0,0.417548394392558,0.0294117647058823
https://github.com/openzeppelin/openzeppelin-contracts,0.9999999999999536,562,157641.0,0.3373326583791293,0.0294117647058823
https://github.com/ethereum/web3.py,0.9999999999999996,13,78.0,0.8039938317729571,0.0294117647058823
https://github.com/nethermindeth/nethermind,0.0,0,0.0,0.4171925064865261,0.0294117647058823
https://github.com/apeworx/ape,0.9999999999999994,38,703.0,0.3110356991327095,0.0294117647058823
https://github.com/a16z/helios,0.999999999999944,628,196878.0,0.6740220469622176,0.0294117647058823
https://github.com/paradigmxyz/reth,0.9999999999999601,470,110215.0,0.3837737278955573,0.0294117647058823
https://github.com/scaffold-eth/scaffold-eth-2,0.9999999999999284,859,368511.0,0.688816951981115,0.0294117647058823
https://github.com/vyperlang/vyper,1.0,10,45.0,0.9323242887762672,0.0294117647058823
https://github.com/hyperledger-web3j/web3j,0.0,0,0.0,0.2430654500988898,0.0294117647058823
https://github.com/ethereum/go-ethereum,0.9999999999999986,116,6670.0,0.8467503069554304,0.0294117647058823
https://github.com/nomicfoundation/hardhat,0.9999999999999869,1891,1786995.0,0.5435417307291117,0.0294117647058823

Summing num_deps_combinations:

Total number of dependency pairs (num_deps_combinations): 8352618.0

$$\frac{34\cdot(34-1)}2+34+8352618=8353774$$