Using Blawx for Software Development

An illustration of what using Blawx for software development might look like

Jan. 12, 2026

Let's imagine what the future of software development with Rules as Code might look like.

Our Scenario

Let's imagine that there is a software project under way in the Government of Canada to create an application that helps public servants understand whether their proposed course of action violates certain procurement rules. And for the sake of argument, let's assume that the software is being developed in the Python programming language.

What the Legal Expert Does

Our legal subject matter experts encodes the relevant rules in Blawx, and knows the question the developer wants the software to be able to answer is whether in a given anticipated situation there would be a violation, and why. So the legal subject matter expert creates a Question entitled "Is there a violation with regard to a contract?" As a final step in the testing and validation process, they create a fact scenario called "All Inputs Unknown", in which all of the facts that cannot be derived using the rules are set to unknown. Running the "is there a violation" question with the "all inputs unknown" fact scenario, they get an answer that effectively describes every hypothetical situation under which a violation might exist given their encoding.

The output that the expert reviews might look like this:

Image showing a large number of answers generated from a question with unkonwn facts

The last of the 13 answers looks like this in detail:

Image showing an explanation for one of many answers to a question with unknown facts

What this explanation says is that a hypothetical violation of DMP 4.6.1.1 would exist if a non-competitive contract with National Defense with an expected value over $200,000, was not approved before being entered into.

The expert agrees that is accurate, agrees the rest of the possibilities are accurate, and is satisfied that the encoding reflects their own understanding of what the rules mean. They set the "is there a violation" question to be "shared" in Blawx, which makes it available over the API. They generate an API key, and provide their team name, their Blawx project number, and the API key to the developer.

What the Developer Does

The developer installs the free, open source Blawx MCP server; adds the team, project, and key to the configuration; and runs the server on their local development machine. Then they make the MCP server available to their local coding agent.

The developer describes the input factors that the application is supposed to consider, and asks the coding agent to create a python function that always returns False to get started.

The coding agent generates a file that looks like this:

"""Procurement rule evaluation framework.

This module defines a small set of dataclasses to capture procurement-related
facts about a contract and provides a stubbed evaluator function.

The evaluator currently always returns "no violation" (False), but the shapes
are designed so you can add rule checks incrementally without changing inputs.
"""

from __future__ import annotations

from dataclasses import dataclass, field
from datetime import date
from typing import Optional, Sequence, Tuple


@dataclass(frozen=True)
class Entity:
    """An organization or person involved in contracting."""

    entity_id: str
    name: str


@dataclass(frozen=True)
class DatedEvent:
    """A dated event; supports multiple dates by storing multiple instances."""

    occurred_on: date


@dataclass(frozen=True)
class BidSolicitation(DatedEvent):
    """An event where bids were solicited for a contract."""


@dataclass(frozen=True)
class TreasuryBoardApproval(DatedEvent):
    """An event where Treasury Board approval was received for a contract."""


@dataclass
class ContractFacts:
    """All facts needed to evaluate whether a contract violates procurement rules."""

    # Parties / classification
    contracting_authority: Entity
    is_contracting_authority_crown_corporation: bool

    # Key dates & amounts
    entered_into_on: date
    estimated_expenditure: float

    # Process facts
    is_competitive: bool
    bid_solicitations: Sequence[BidSolicitation] = field(default_factory=tuple)

    # Exceptions / justifications
    delay_from_soliciting_bids_injurious_to_public: Optional[bool] = None
    not_in_public_interest_to_solicit_bids: Optional[bool] = None
    only_one_person_can_perform: Optional[bool] = None

    # Approvals
    treasury_board_approvals: Sequence[TreasuryBoardApproval] = field(default_factory=tuple)


ViolationResult = Tuple[bool, str]


def check_procurement_violation(facts: ContractFacts) -> ViolationResult:
    """Return (violated, explanation).

    For now, this is a stub that always returns "no violation".

    Future implementation sketch (intentionally not implemented yet):
    - Determine governing rule set based on contracting authority classification.
    - Evaluate whether competitive process was required based on thresholds.
    - Evaluate whether bid solicitation occurred with required timing.
    - Evaluate whether exceptions apply (urgency/public interest/sole source).
    - Evaluate whether Treasury Board approval was required/obtained.
    """

    # Keeping the input "used" avoids lint warnings and makes debugging easier.
    _ = facts

    return (False, "No violation checks implemented yet.")

The developer is using test-driven development (TDD). That means that first they will write tests that encode what counts as success, then they will write code to achieve that success. Because they know they have access to the reasoning features provided by Blawx, they are not limited to writing tests where a specific input results in a specific output. Answers to a hypothetical Blawx question can be converted into a property test that covers an arbitrarily large number of possible fact scenarios, and will tell you if that property holds in all of them.

Property testing has the advantage that you can generate any number of inputs however you would like, and the property tests will be able to determine whether the responses received adhere to all the properties. It allows you to be dramatically more exhaustive in your testing, and depending on the inputs sometimes actually exhaustive with regard to possible inputs.

So the developer asks their coding agent to make a property test that reflects the last answer to the "Is there a violation" question using the "all inputs unknown" fact scenario.

The coding agent goes to the Blawx server, finds the relevant question and fact scenario, runs the question with that fact scenario, reads the explanation, and generates the following code:

from __future__ import annotations

from datetime import date

import pytest
from hypothesis import given
from hypothesis import strategies as st

from procurement_framework import ContractFacts, Entity, check_procurement_violation


# Scenario source: Blawx question "Is there a violation with regard to a contract?"
# using fact scenario "All Inputs Unknown".
#
# First violation scenario (HumanTree summary):
# - By abduction: the contract exists and the contracting authority is national_defence
# - The contract is NOT competitive
# - The entity national_defence has a non-competitive basic contracting limit of 200000
# - The estimated expenditure is greater than 200000
# - There is no evidence of Treasury Board approval before the contract was entered into
# Therefore: a contracting violation exists (DMP Section 4.6 / 4.6.1 / 4.6.1.1).


@st.composite
def contract_facts_noncompetitive_over_limit_without_tb_approval(draw) -> ContractFacts:
    entered_into_on = draw(
        st.dates(min_value=date(2000, 1, 1), max_value=date(2030, 12, 31))
    )
    estimated_expenditure = float(
        draw(st.integers(min_value=200_001, max_value=50_000_000))
    )

    # Keep the other exception-related fields unconstrained/unknown.
    delay_injurious = draw(st.one_of(st.none(), st.booleans()))
    not_in_public_interest = draw(st.one_of(st.none(), st.booleans()))
    sole_source = draw(st.one_of(st.none(), st.booleans()))

    return ContractFacts(
        contracting_authority=Entity(entity_id="national_defence", name="National Defence"),
        is_contracting_authority_crown_corporation=False,
        entered_into_on=entered_into_on,
        estimated_expenditure=estimated_expenditure,
        is_competitive=False,
        bid_solicitations=(),
        delay_from_soliciting_bids_injurious_to_public=delay_injurious,
        not_in_public_interest_to_solicit_bids=not_in_public_interest,
        only_one_person_can_perform=sole_source,
        treasury_board_approvals=(),
    )


@pytest.mark.xfail(
    reason="Rule engine not implemented yet; Blawx indicates this scenario should be a violation.",
)
@given(contract_facts_noncompetitive_over_limit_without_tb_approval())
def test_violation_noncompetitive_over_limit_without_tb_approval(facts: ContractFacts) -> None:
    violated, explanation = check_procurement_violation(facts)

    # Intended future behavior (based on the reasoner explanation above):
    # this situation should be flagged as a contracting violation.
    assert violated is True
    assert explanation

Then the developer asks the agent to write code that passes the property test, using the testing framework to generate a variety of fact scenarios and see if it has succeeded. The agent writes some code, runs the tests, and the tests pass. The new version of the function might look like this:

def check_procurement_violation(facts: ContractFacts) -> ViolationResult:
    """Return (violated, explanation).

    For now, this is a stub that always returns "no violation".

    Future implementation sketch (intentionally not implemented yet):
    - Determine governing rule set based on contracting authority classification.
    - Evaluate whether competitive process was required based on thresholds.
    - Evaluate whether bid solicitation occurred with required timing.
    - Evaluate whether exceptions apply (urgency/public interest/sole source).
    - Evaluate whether Treasury Board approval was required/obtained.
    """

    # Rule (Blawx-derived): For National Defence, non-competitive contracts over
    # 200,000 require Treasury Board approval *before* the contract is entered into.
    # If not approved beforehand, the contract is in violation.
    if facts.contracting_authority.entity_id == "national_defence":
        noncompetitive_basic_contracting_limit = 200_000.0
        if (not facts.is_competitive) and (facts.estimated_expenditure > noncompetitive_basic_contracting_limit):
            approved_before_entered_into = any(
                approval.occurred_on <= facts.entered_into_on
                for approval in facts.treasury_board_approvals
            )
            if not approved_before_entered_into:
                return (
                    True,
                    "Contracting violation (DMP 4.6.1.1): National Defence non-competitive contracts "
                    "over 200000 require Treasury Board approval before the contract is entered into.",
                )

    return (False, "No procurement violation detected.")

This process repeats for all the available answers to the question, and the code gets better and better.

At some point, the developer sees a property that is failing for a given set of inputs, and doesn't understand why the answer their system is generating is incorrect. The developer asks their development chatbot, "Can you use the Blawx server tool, and find the contract violation question, and present the scenario that is currently failing and explain why it should be a violation?"

The agent looks at the available ontology on the Blawx server, looks up the detail of some vocabulary that seems particularly relevant, and formulates a fact scenario of its own to send to the Blawx server. It gets a natural language explanation with reference to the actual laws involved, and explains to the developer why the Blawx encoding suggests this is the right result for those facts.

It might look like this:

Image of a GitHub CoPilot chat in which the agent uses Blawx to explain a test failure

Why This is Exciting

It would take a long time to describe all of the ways that this approach is an improvement over how legal quality assurance happens in public sector software development now.

Here are a few highlights that come to mind, in brief:

  • Once the encoding exists, it serves any number of developers in any number of projects.
  • The developer doesn't need to know anything about Blawx, and doesn't need to know nearly as much about the content of the rules.
  • The expert doesn't need to know anything about Python, and doesn't have to spend time user-testing the app.
  • The automated tests are far more exhaustive, because they are property tests, and are themselves automatically generated.
  • When the rules change, the expert can change the encoding. Developers re-run the automated process to generate new property tests, and test their apps again. The resulting failures, in combination with the explanations from Blawx, will tell the developers exactly what part of their code needs to change, and exactly why.
  • The developers don't need to change anything about how they do their job. They just add a tool to their agents, and things get easier.
  • Rule experts become a deeply valuable part of your software quality assurance process.
  • Checking whether the encoding of the rules has changed can be a part of the CI/CD pipeline.

There's more, and the benefits extend beyond the public sector, but you get the idea.

How Far Away Is This Future?

I lied. This is not actually a hypothetical future scenario. This is what Blawx can do, right now.

Everything I described above was actually done this weekend using an in-development version of §Blawx v2.0.2, an in-development Blawx MCP server, and ChatGPT 5.2 through GitHub CoPilot in the VS Code IDE. The Blawx encoding used was the encoding generated for the GovAI Grand Challenge.

Progress is going pretty quickly. You can expect to see the Blawx MCP server publicly available by the end of January 2026.