Mastering Dependency Management with Nix in Machine Learning Projects

Navigating the complexities of dependency management in large-scale machine learning projects can quickly become an overwhelming task. While traditional approaches offer some relief, they often fall short as project complexity increases, introducing inefficiencies that hinder development. This blog post is a chronicle of transitioning from conventional dependency management practices to adopting Nix, marking a significant shift towards simplifying and enhancing the development workflow.

Overview

Intro

Note: This journey into the world of Nix, supplemented by insights from AI tools including ChatGPT, captures my initial exploration into employing Nix for effective dependency management. I warmly welcome feedback and insights from the broader Nix community.

This guide is designed for incremental learning and application. You’re encouraged to tackle it step-by-step, constructing your project from the outset to gain a thorough understanding of Nix’s capabilities. However, the process is structured to support flexibility; feel free to pause at the end of any chapter and resume your journey later, picking up right where you left off. Whether you’re experimenting with a single feature or ready to dive into the next chapter, the guide accommodates your pace.

For those who prefer to have a reference or need assistance troubleshooting, the accompanying GitHub repository, nix-intro-examples, includes the full project and all intermediate steps, captured through commits. This resource is intended as a supportive reference for comparison or if you encounter challenges while following the examples independently.

From Chaos to Order

My initial approach to managing dependencies involved using Ubuntu-based Docker images, built on CUDA for GPU acceleration. This method, coupled with Anaconda or Miniconda for Python environment management, provided a semblance of control. However, it was far from foolproof. Custom-built packages, such as FFmpeg and OpenCV with CUDA support, often led to conflicting references between system and Conda environments, complicating the development workflow.

Discovering Nix: A Turning Point

The shift to Nix emerged as a pivotal moment, driven by the quest for a more streamlined and manageable development environment. Nix’s declarative nature promised an end to compatibility woes by ensuring precise and reproducible environments. Transitioning to Nix was not without challenges, particularly its steep learning curve. However, the payoff in dependency management efficiency was undeniable.

Embarking on the Nix Journey

One of Nix’s key advantages is its compatibility with virtually any existing distribution, making it an extremely versatile tool. You don’t need to start with NixOS to benefit from Nix’s powerful package management capabilities. This flexibility allows developers to maintain their current operating system environment while leveraging Nix to handle complex dependency graphs and ensure consistent, reproducible environments.

Let’s embark on our Nix journey with the initial setup:

# Install Nix (official)
sh <(curl -L https://nixos.org/nix/install) --daemon

# Verify the installation
nix-env --version

Note: If you have a systems with enhanced security measures the official way might not work but the community has built an easy-to use installer for you Determinate Nix Installer

(Alternative, if the offical way fails)

# Install Nix (Determinate Nix Installer, works on systems with enhanced security, like fedora)
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install

# Verify the installation
nix-env --version

This installation script sets up Nix in a multi-user configuration, which is recommended for most use cases. After installation, you can immediately start using Nix to manage dependencies for your projects.

  • For the official Nix installation guide, click here.
  • For the Determinate Systems installer for hardened systems, click here.

Basic Functionality

In this section, we’ll cover the essentials to set up a project using Nix. The goal is to establish a development environment that ensures consistency and reproducibility, regardless of the underlying operating system. This involves creating a dedicated Nix folder within your project, adding nix/shell.nix and nix/project.nix configurations, and introducing a simple Python file to verify CUDA availability.

Creating Your First Nix Shell

A Nix shell encapsulates your project’s development environment, specifying all required dependencies, including tools and libraries. Here’s how to create a nix/shell.nix file that includes Python 3, touch, git, and curl:

{ pkgs ? import <nixpkgs> {} }:

pkgs.mkShell {
  buildInputs = [
    pkgs.python3
    pkgs.git
    pkgs.curl
  ];
}

This configuration ensures that these tools are available in your project’s environment, enabling a consistent workflow across all development setups.

Now let’s enter the shell: nix-shell nix/shell.nix

For discovering and incorporating official packages into your environment, the Nix package search at https://search.nixos.org/packages serves as an invaluable resource. As we progress, we’ll also explore customizing and crafting our unique packages tailored to our project’s specific needs.

Transforming Your Project into a Library

To streamline dependency management and project packaging, it’s efficient to structure your Python files as a library. This approach simplifies the process of packaging your project with Nix, making it more manageable and modular. By organizing your project as a library, you can leverage Nix’s packaging capabilities to define dependencies explicitly and build your project in isolated environments.

Kickstarting Your Project: The “Hello, World!” Example

To get started, let’s create a simple Python project that outputs “Hello, World!”. This example will help you familiarize yourself with the basic project structure and the process of integrating Nix.

  1. Project Structure: Organize your project files under the src/myproject directory. Your main script, hello.py, will reside here, alongside the __init__.py file to treat this directory as a Python package.

  2. Hello, World! Script: Create hello.py with the following content:

# src/myproject/hello.py

def greet():
    print("Hello, World!")

if __name__ == "__main__":
    greet()
  1. Setup Script: Add a setup.py file at the src/ level to manage your project as a package:
# src/setup.py
from setuptools import setup, find_packages

setup(
    name="myproject",
    version="0.1",
    packages=find_packages(),
)
  1. Initialization: To ensure your myproject directory is recognized as a Python package, create an __init__.py file within it. Since we’ve included touch in our Nix shell environment, you can easily create this file using the following command:
[nix-shell]$ touch src/myproject/__init__.py

Note: The use of touch here is seamless, thanks to its inclusion in our shell configuration!

This setup lays the groundwork for a Python project managed with Nix, setting the stage for further development and packaging.

To safeguard your project’s progress and ensure version control from the get-go, let’s also initialize a Git repository. This step is straightforward in the Nix shell, even if Git isn’t installed on your primary system:

[nix-shell]$ git init .
[nix-shell]$ git add -A
[nix-shell]$ git commit -m "Initial project setup"

Note: This process works flawlessly within the Nix shell, showcasing another advantage of managing your development environment with Nix.

Changes on GitHub

Packaging Your Python Project with Nix

To effectively manage your Python project with Nix, start by creating a nix/project.nix file. This configuration specifies how to package your project, detailing its dependencies and build process. Utilizing Nix’s buildPythonPackage from the pythonPackages collection simplifies defining your project’s packaging requirements.

Below is a foundational example of a nix/project.nix file, illustrating how to structure it:

{ pkgs ? import <nixpkgs> {}
, python3Packages ? pkgs.python3Packages }:
let
  project_root = pkgs.lib.cleanSource ../.; # Cleans the parent directory for use
in
python3Packages.buildPythonPackage rec {
  pname = "myproject";
  version = "0.1";

  src = "${project_root}/src";
  pythonPath = [ "${project_root}/src" ];

  propagatedBuildInputs = [
    python3Packages.numpy # Example of a dependency
  ];

  # Disable or enable tests
  doCheck = false; # Set to true to enable test execution
  checkInputs = [ python3Packages.pytest ]; # Dependencies for running tests
  checkPhase = ''
    export PATH=${pkgs.lib.makeBinPath [ python3Packages.pytest ]}:$PATH
    cd ${project_root}/tests
    pytest
  '';
}

Key elements of this nix/project.nix file include:

  • pname and version define the package’s name and version.
  • src designates the source code’s location, using ${project_root}/src to specify where the Python code resides.
  • propagatedBuildInputs lists the project’s dependencies, such as numpy in this example.
  • pythonPath specifies the path to the Python modules, enabling Nix to locate your project’s source code.
  • doCheck toggles the execution of the project’s test suite during the build. It’s set to false by default but can be enabled as needed.
  • The checkPhase outlines how to run the test suite, specifying the commands and dependencies required.

To test the packaging process and ensure everything is set up correctly, run:

[nix-shell]$ nix-build nix/project.nix

This command attempts to build your package according to the specifications in nix/project.nix. Successfully executing this command indicates that you have correctly packaged your first Nix-managed project. Next, let’s explore how to utilize this package effectively.

Changes on GitHub

Integrating nix/project.nix into Your Development Environment

To seamlessly incorporate your package definition into the Nix development environment, reference nix/project.nix within your nix/shell.nix. This integration ensures that every time you enter the Nix shell, your project environment is automatically set up with all necessary dependencies. Modify your nix/shell.nix as shown below to include your project package:

{ pkgs ? import <nixpkgs> {} }:
let
  myProject = import ./nix/project.nix { pkgs = pkgs; };
in
pkgs.mkShell {
  buildInputs = [
    pkgs.python3
    myProject
    pkgs.git
    pkgs.curl
  ];
}

This modification to nix/shell.nix effectively incorporates your Python project as a Nix package into your shell environment, simplifying the management of dependencies and streamlining project builds within a reproducible environment.

Changes on GitHub

Activating Your Nix-Managed Environment

To activate and work within your newly defined Nix environment, you may first want to exit any existing shell sessions:

[nix-shell]$ exit # Optional, but cleanly exits the old shell

Then, initiate your Nix shell with the updated configuration:

$ nix-shell nix/shell.nix # Enter the shell with your project configurations

Once inside the Nix shell, you can run your Python script to verify that everything is set up correctly:

[nix-shell]$ python -m myproject.hello # Execute your script

This command runs the “Hello, World!” script we created earlier, demonstrating your project’s successful integration into a Nix-managed development environment.

Exploring Nix Syntax with nix-repl

To get a better grasp of Nix’s syntax and how it operates, we’ll use nix repl, a tool that lets you interactively experiment with Nix expressions. This practical approach will help you familiarize yourself with the basic constructs of Nix. Start nix repl by typing it into your nix-shell.

[nix-shell]$ nix repl

Understanding let ... in

The let ... in expression allows you to define local variables within a Nix expression. These variables can then be used in the expression that follows the in.

Example to Try in nix-repl:

nix-repl> let a = 10; b = 20; in a + b

Copy and paste this into your nix-repl. This expression defines two variables, a and b, and calculates their sum.

Unpacking Function Arguments: { ... }: ...

Nix functions can accept arguments in a set, allowing you to unpack variables directly from the set.

Example to Try in nix-repl:

nix-repl> ( { name, value }: "Name: ${name}, Value: ${toString value}" ) { name = "Nix"; value = 42; }

This function takes a set with name and value, and returns a string incorporating both. The toString function is used to convert the integer value to a string.

Default Values in Functions: { p ? ... }

You can provide default values for function parameters within the { ... } notation. If an argument isn’t provided when the function is called, the default value is used.

Example to Try in nix-repl:

nix-repl> ( { text ? "Hello, Nix!" }: text ) {}

This function returns the default text because no argument is provided. Try modifying it to pass a different string:

nix-repl> ( { text ? "Hello, Nix!" }: text ) { text = "Learning Nix is fun!"; }

Putting It All Together

You can combine these elements to create more complex expressions. Nix’s functional nature allows you to build reusable, modular configurations.

Composite Example to Try:

nix-repl> let add = { x, y ? 1 }: x + y; in add { x = 5; }

This defines a simple function add that takes an argument x and an optional argument y, with a default value of 1, then uses it to calculate a sum.

Exploring Further with nix-repl

The nix-repl offers a rich set of commands beyond the basic examples we’ve explored. To discover these additional capabilities, simply type :? within the nix-repl. This command reveals a comprehensive list of options available to you, including loading and building Nix expressions from files, among other advanced debugging tools. While the examples provided give a solid foundation, don’t hesitate to explore these more powerful features as you become more comfortable with Nix.

When you’re ready to conclude your nix-repl session, exiting is straightforward. Simply type :q and press Enter. This command will gracefully close the nix-repl, returning you to your standard terminal prompt.

nix-repl> :?
nix-repl> :q

This exploration into the nix-repl is just the beginning of what’s possible with Nix. As you grow more familiar with its syntax and capabilities, you’ll find it an invaluable tool for managing complex dependencies and environments in a reproducible and declarative manner.

Elevating Your Nix Skills: Incorporating CUDA and Docker

Delving into more advanced Nix functionalities opens up a world of possibilities for managing intricate project dependencies. This includes leveraging pre-built binaries like torch-bin for PyTorch with CUDA support and efficiently packaging your environment for Docker. These steps underscore Nix’s robustness in orchestrating elaborate environments effortlessly.

Structuring Complex Dependencies

For projects requiring extensive dependency management, compartmentalizing these dependencies into a dedicated Nix file simplifies the process. Let’s focus on setting up nix/dependencies.nix. This file will utilize overlays to substitute the default torch package with torch-bin, optimizing resource usage during builds.

Step 1: Define nix/dependencies.nix

This configuration outlines an overlay to substitute PyTorch, enables proprietary packages, and activates CUDA support. Here’s a streamlined example:

{ pkgs ? import <nixpkgs> {
    config = {
      allowUnfree = true;
      cudaSupport = true;
    };
  }
, lib ? pkgs.lib
, my_python ? pkgs.python3
, cudatoolkit ? pkgs.cudaPackages.cudatoolkit
, }:
let
  python_packages = my_python.pkgs;
in {
  pkgs = pkgs;
  lib = lib;
  my_python = my_python;
  cudatoolkit = cudatoolkit;
  dependencies = with pkgs; [
    python_packages.numpy
    python_packages.torch-bin
    cudatoolkit
  ];
}

Note: Enabling allowUnfree is necessary for incorporating proprietary software like CUDA. The cudaSupport flag globally empowers packages with CUDA capabilities, fostering a seamless integration.

Step 2: Integrate Dependencies into Your Project

Revise your nix/project.nix to leverage nix/dependencies.nix, ensuring a cohesive environment:

{ project_dependencies ? import ./dependencies.nix { }
, }:
let
  pkgs = project_dependencies.pkgs;
  lib = project_dependencies.lib;
  python_packages = project_dependencies.my_python.pkgs;
  project_root = pkgs.lib.cleanSource ../.; # Cleans the parent directory for use
in
python_packages.buildPythonPackage rec {
  pname = "myproject";
  version = "0.1";

  src = "${project_root}/src";
  pythonPath = [ "${project_root}/src" ];

  propagatedBuildInputs = project_dependencies.dependencies;

  # Disable or enable tests
  doCheck = false; # Set to true to enable test execution
  checkInputs = [ python_packages.pytest ]; # Dependencies for running tests
  checkPhase = ''
    export PATH=${pkgs.lib.makeBinPath [ python_packages.pytest ]}:$PATH
    cd ${project_root}/tests
    pytest
  '';
}

Verify the build integrity with:

[nix-shell]$ nix-build project.nix  # This may take some time for downloading and compiling.

Step 3: Adjusting shell.nix for Dependency Integration

To reflect your project’s updated dependency management in your development environment, adapt your shell.nix. This modification aligns your Nix shell with the project’s complex dependencies, ensuring consistency across development setups. For further reading on managing CUDA within Nix, consult the resources here and here.

{ dependencies ? import ./dependencies.nix { } }:
let
  pkgs = dependencies.pkgs;
  myProject = import ./project.nix { project_dependencies = dependencies; };
in
pkgs.mkShell {
  buildInputs = [
    pkgs.python3
    myProject
    pkgs.git
    pkgs.curl
    pkgs.linuxPackages.nvidia_x11
    pkgs.ncurses5
  ];
  shellHook = ''
    export CUDA_PATH=${dependencies.cudatoolkit}
    export LD_LIBRARY_PATH=/usr/lib/wsl/lib:${pkgs.linuxPackages.nvidia_x11}/lib:${pkgs.ncurses5}/lib
    export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
    export EXTRA_CCFLAGS="-I/usr/include"
  '';
}

Step 4: Verifying CUDA Availability

To confirm that CUDA is correctly configured in your project, update the hello.py script to include a check for CUDA availability. This test underscores the practical application of your Nix setup in a real-world scenario:

# src/myproject/hello.py
import torch

def greet():
    print(f"Hello, World! Cuda available: {torch.cuda.is_available()}")

if __name__ == "__main__":
    greet()

Finally, to apply the recent changes and verify everything is in order, exit and re-enter your Nix shell. Then, run your updated Python script to see the results:

[nix-shell]$ exit
$ nix-shell nix/shell.nix
[nix-shell]$ python -m myproject.hello # Execute your script
Hello, World! Cuda available: True # Or False depending on your system

Changes on GitHub

Building Your First Docker Container with Nix

Incorporating Docker into your Nix-managed environment extends the reproducibility and portability of your setup. Drawing inspiration from a practical guide on leveraging Nix with NVIDIA Docker, let’s adapt the strategy to craft a Docker container for our project. The following configuration, saved as nix/docker/buildCudaLayeredImage.nix, outlines the process for creating a layered Docker image with CUDA support:

# https://sebastian-staffa.eu/posts/nvidia-docker-with-nix/
# https://github.com/Staff-d/nix-cuda-docker-example
{  cudatoolkit
,  buildLayeredImage
,  lib
,  name
,  tag ? null
,  fromImage ? null
,  contents ? null
,  config ? {Env = [];}
,  extraCommands ? ""
,  maxLayers ? 2
,  fakeRootCommands ? ""
,  enableFakechroot ? false
,  created ? "2024-03-08T00:00:01Z"
,  includeStorePaths ? true
}:

let

  # cut the patch version from the version string
  cutVersion = with lib; versionString:
    builtins.concatStringsSep "."
      (take 3 (builtins.splitVersion versionString )
    );

  cudaVersionString = "CUDA_VERSION=" + (cutVersion cudatoolkit.version);

  cudaEnv = [
    "${cudaVersionString}"
    "NVIDIA_VISIBLE_DEVICES=all"
    "NVIDIA_DRIVER_CAPABILITIES=all"

    "LD_LIBRARY_PATH=/usr/lib64/"
  ];

  cudaConfig = config // {Env = cudaEnv;};

in buildLayeredImage {
  inherit name tag fromImage
    contents extraCommands
    maxLayers
    fakeRootCommands enableFakechroot
    created includeStorePaths;

  config = cudaConfig;
}

This Nix expression employs buildLayeredImage (or streamLayeredImage for some versions) to create a Docker image that includes CUDA support, customizing the creation date as necessary.

Creating an Entrypoint Script

Define an entrypoint for the Docker image in entrypoint.sh. This script initiates your project, ensuring it executes upon container startup:

#!/bin/env bash

echo "my entry point"
python -m myproject.hello

Crafting the Docker Container Build Script

The nix/build_container.nix script orchestrates the Docker image creation, incorporating the project and its dependencies:

{
    project_dependencies ? import ./dependencies.nix {}
}:
let
  pkgs = project_dependencies.pkgs;
  lib = project_dependencies.lib;
  cudatoolkit = project_dependencies.cudatoolkit;
  project = import ./project.nix { project_dependencies = project_dependencies; };
  entrypointScriptPath = ../entrypoint.sh; # Adjust the path as necessary
  entrypointScript = pkgs.runCommand "entrypoint-script" {} ''
    mkdir -p $out/bin
    cp ${entrypointScriptPath} $out/bin/entrypoint
    chmod +x $out/bin/entrypoint
  '';
in import ./docker/buildCudaLayeredImage.nix {
  inherit cudatoolkit;
  buildLayeredImage = pkgs.dockerTools.streamLayeredImage;
  lib = pkgs.lib;
  maxLayers = 2;
  name = "project_nix";
  tag = "latest";

  contents = [
    pkgs.coreutils
    pkgs.findutils
    pkgs.gnugrep
    pkgs.gnused
    pkgs.gawk
    pkgs.bashInteractive
    pkgs.which
    pkgs.file
    pkgs.binutils
    pkgs.diffutils
    pkgs.less
    pkgs.gzip
    pkgs.btar
    pkgs.nano
    (pkgs.python311.withPackages (ps: [
      project
    ]))
    entrypointScript
  ];
  config = {
    Entrypoint = ["${entrypointScript}/bin/entrypoint"];
  };
}

Changes on GitHub

Building and Running the Container

Execute the following command within a Nix shell to build and load your Docker image:

[nix-shell]$ $(nix-build --no-out-link nix/build_container.nix) | docker load

This approach maintains the integrity of your nix/project.nix, ensuring that the containerization process is both transparent and adaptable to project changes. For those familiar with Docker, nix/build_container.nix offers a clear method for containerization. Those new to Docker can rely on this script as a stable foundation, requiring little to no adjustments for different project setups.

Executing the Docker Image

To run your newly created Docker image:

docker run project_nix

This command should output:

My entry point
Hello, World! Cuda available: False

To enable CUDA, ensure you have the necessary GPU support and run:

docker run --gpus all project_nix

This will confirm CUDA’s availability within your Dockerized environment:

My entry point
Hello, World! Cuda available: True

Customizing Dependencies in Nix

So far, we’ve primarily utilized official Nix packages. However, you might encounter situations where the necessary package is unavailable or exists only in an unstable channel. In such cases, Nix’s overlay system allows you to replace, change, or overwrite dependencies to meet your project’s needs.

Case Study: Integrating PyTorch Lightning

PyTorch Lightning is a prime example of a package that, at times, may only be available in the NixOS unstable channel. Moreover, integrating it might require replacing its dependency on the standard PyTorch package with a custom version. Here’s how you can achieve this with a Nix overlay:

Step 1: Define the Overlay (/nix/overlay/replace-torch.nix)

Create an overlay file that conditionally replaces PyTorch and its related packages with alternative versions, if specified:

{ do_replace ? false
, replacement_torch ? false
, replacement_torchvision ? false
, replacement_torchaudio ? false
, replacement_python ? false }:
final: prev:
let
  real_python = if do_replace then replacement_python else prev.python311;
  real_torch = if do_replace then replacement_torch else real_python.pkgs.torch-bin;
  real_torchvision = if do_replace then replacement_torchvision else real_python.pkgs.torchvision-bin;
  real_torchaudio = if do_replace then replacement_torchaudio else real_python.pkgs.torchaudio-bin;
  real_python311 = real_python.override {
    packageOverrides = final_: prev_: {
      torch = real_torch;
      torchvision = real_torchvision;
      torchaudio = real_torchaudio;
      pytorch-lightning = prev_.pytorch-lightning.override {
        torch = real_torch;
      };
      tensorboardx = prev_.tensorboardx.override {
        torch = real_torch;
      };
    };
  };
in {
  python311 = real_python311;
  my_python = real_python311;
}

Step 2: Update Your Dependencies (dependencies.nix)

In your nix/dependencies.nix, incorporate the overlay and specify any replacements as needed. This configuration allows you to use both stable and unstable packages, customizing dependencies according to your project requirements:

# change dependencies.nix
{ pkgs ? import <nixpkgs> {
    overlays = [
      (import ./overlay/replace-torch.nix { })
    ];
    config = {
      allowUnfree = true;
      cudaSupport = true;
    };
  }
, lib ? pkgs.lib
, my_python ? pkgs.python3
, cudatoolkit ? pkgs.cudaPackages.cudatoolkit
, unstable_pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz") {
    overlays = [
      (import ./overlay/replace-torch.nix {
          do_replace = true;
          replacement_torch = my_python.pkgs.torch;
          replacement_torchvision = my_python.pkgs.torchvision;
          replacement_torchaudio = my_python.pkgs.torchaudio;
          replacement_python = my_python;
      })
    ];
  }
}:
let
  python_packages = my_python.pkgs;
  unstable_python_packages = unstable_pkgs.my_python.pkgs;
in {
  pkgs = pkgs;
  lib = lib;
  my_python = my_python;
  cudatoolkit = cudatoolkit;
  dependencies = with pkgs; [
    python_packages.numpy
    python_packages.torch-bin
    unstable_python_packages.pytorch-lightning
    cudatoolkit
  ];
}

Observing the Impact

When you rebuild your container using the specified nix-build command, you’ll notice PyTorch Lightning among the layers—demonstrating the overlay’s effect. This method grants you fine control over which packages are included in your container, ensuring that your environment aligns precisely with your project’s dependencies.

Changes on GitHub

Step 3: Why do we need the overlay again?

To illustrate the importance of overlays, consider temporarily disabling the package overrides in nix/overlay/replace-torch.nix by commenting out the lines related to dependency replacements:

# Within /nix/overlay/replace-torch.nix, comment out:
    packageOverrides = final_: prev_: {
      #torch = real_torch;
      #torchvision = real_torchvision;
      #torchaudio = real_torchaudio;
      #pytorch-lightning = prev_.pytorch-lightning.override {
      #  torch = real_torch;
      #};

Next, trigger a build process (cancel after the output starts):

[nix-shell]$ $(nix-build --no-out-link nix/build_container.nix) | docker load
these 23 derivations will be built:
  /nix/store/d9zwjxg8ny4n2ybcahmc4v4ghks801b4-cuda_nvcc-12.1.105.drv
  /nix/store/6f6fhz2awqqgrr70zp359kpx0xa6ky2d-python3.11-triton-2.1.0.drv
  /nix/store/fphiqwvnwjd4pn6pa5lj7b08gb5ns8dn-cuda_profiler_api-linux-x86_64-12.2.140-archive.tar.xz.drv
  /nix/store/g0izqr84dq53zm1vlgvq1is8l4x2sq0l-cuda_profiler_api-12.2.140.drv
  /nix/store/3p4lc5ym7hwf9lhbf3gpy89vskc89jay-cuda_nvcc-linux-x86_64-12.2.140-archive.tar.xz.drv
  /nix/store/m3i09zc16d5179wihcr0frdzl4pdrdhw-cuda_nvcc-12.2.140.drv
  /nix/store/rk04gwl9al2xjrvflph8rn0z0jnpzip8-source.drv
  /nix/store/l2mv89xq94klgrhhd33ka52rgv8rx51f-nccl-2.20.3-1.drv
  /nix/store/cynqbwdlx3dq3jcwpgqqyffwbysgq4al-cuda_nvml_dev-linux-x86_64-12.2.140-archive.tar.xz.drv
  /nix/store/pi6pqqxvb3xihs30dgshszz90ydmrnm7-cuda_nvml_dev-12.2.140.drv
  /nix/store/zwdvqi744rgx5v8z23qwdl720941dcvs-magma-2.7.2.drv
  /nix/store/6whl2wy4li5ckvpx3v1k28hry9fnly61-python3.11-torch-2.2.1.drv
  /nix/store/7b37pwvsvj0zgazb1410dlfr2qqhhhwb-python3.11-torchmetrics-1.3.1.drv
  /nix/store/6kjfz8r8g736d9a8nqkkgbb9z49jljal-python3.11-torchvision-0.17.1.drv
  /nix/store/npsah4dcxjbnnrz4g4vmb8znxr2kncjr-python3.11-tensorboardx-2.6.2.drv
  /nix/store/r6qy1qpx1084zm17rmdlsq7r2x1vpglp-python3.11-pytorch-lightning-2.1.3.drv
^Cerror: interrupted by the user

Cancel with Ctrl+C

You’ll notice the build process attempts to rebuild pytorch-lightning and its dependencies from scratch, including the default torch package from the unstable channel, despite our previous efforts to build them with custom settings. This happens because, without the overlays, Nix falls back to using the original package definitions and dependencies.

Overlays allow us to inject our customizations into the Nix package definitions, effectively enabling us to replace certain parameters, like dependencies, with our preferred versions. When we remove the overlay customizations, Nix no longer has the instructions to use our custom dependencies and reverts to the default behavior, as illustrated by the build process attempting to fetch and build the original versions of pytorch-lightning and its dependencies.

By utilizing overlays, such as in our nix/overlay/replace-torch.nix, we gain fine-grained control over package dependencies. This method allows us to dictate exactly which versions of packages like torch, torchvision, and torchaudio are used, ensuring compatibility and meeting specific requirements of our project.

For a deeper dive into how packages are defined and how dependencies are managed in Nix, you can explore the official Nix package repository, such as the definition for pytorch-lightning here.

In summary, overlays are a powerful tool in Nix for customizing package behaviors, especially for replacing dependencies. They provide a flexible way to ensure that your project uses the exact versions of packages you need, without being constrained by the defaults provided in Nix channels.

Changes on GitHub

Packaging docTR with Nix: A Practical Example

Creating a Nix package for third-party libraries such as docTR can be streamlined by leveraging tools and resources efficiently. One effective strategy I often employ involves consulting with ChatGPT for initial guidance and insights. The aim is to create a self-contained Nix expression that encapsulates all necessary dependencies and configurations, ensuring a smooth integration into the broader project environment.

Crafting the Nix Expression for docTR Below is a comprehensive Nix expression for packaging the docTR library, create it at nix/packages/doctr.nix. This setup ensures that all parameters are optional, providing flexibility and making the package self-contained:

{ pkgs ? import <nixpkgs> {}
, lib ? pkgs.lib
, my_python ? pkgs.python311
, buildPythonPackage ? my_python.pkgs.buildPythonPackage
, fetchFromGitHub ? pkgs.fetchFromGitHub }:

buildPythonPackage rec {
  pname = "doctr";
  version = "0.8.1";  # Update this to the latest release version

  src = fetchFromGitHub {
    owner = "mindee";
    repo = pname;
    rev = "v${version}";
    sha256 = "rlIGq5iHDAEWy1I0sAXVSN2/Jh2ub/xLCLCLLp7+9ik=";  # Generate this with nix-prefetch-github
  };

  nativeBuildInputs = [ my_python.pkgs.setuptools ];
  propagatedBuildInputs = [ my_python.pkgs.torch ];


  postInstall = let
    libPath = "lib/${my_python.libPrefix}/site-packages";
  in
    ''
      mkdir -p $out/nix-support
      echo "$out/${libPath}" > "$out/nix-support/propagated-build-inputs"
    '';

  doCheck = false;

  meta = with lib; {
    description = "Doctr Python package";
    homepage = https://github.com/mindee/doctr;
    license = licenses.asl20;  # Update as per project's licensing
    maintainers = [ ];  # You need to add your name to the list of Nix maintainers
  };
}

To validate the package, execute:

[nix-shell]$ nix-build nix/packages/doctr.nix

This command builds the docTR package, allowing you to verify its correctness and functionality.

Integrating docTR into Your Project Once the docTR package meets your requirements, incorporate it into your project’s dependencies by updating nix/dependencies.nix. This ensures docTR is recognized as part of your project’s environment and can be used alongside other dependencies:

# ... previous content omitted for brevity ...
let
  python_packages = my_python.pkgs;
  unstable_python_packages = unstable_pkgs.my_python.pkgs;
  python_doctr = pkgs.callPackage ./packages/doctr.nix {
    pkgs = pkgs;
    my_python = my_python;
  };
in {
  pkgs = pkgs;
  lib = lib;
  my_python = my_python;
  cudatoolkit = cudatoolkit;
  python_doctr = python_doctr; # optional: Make it available if you need it
  dependencies = with pkgs; [
    python_packages.numpy
    python_packages.torch-bin
    unstable_python_packages.pytorch-lightning
    python_doctr
    cudatoolkit
  ];
}

By following these steps, you seamlessly integrate docTR into your project, enabling its use within the Nix-managed environment. This approach not only highlights the flexibility and power of Nix in handling complex dependencies but also demonstrates a practical workflow for incorporating third-party Python libraries into your development ecosystem.

Changes on GitHub

Tailoring ffmpeg with Custom Build Flags in Nix

Achieving a tailored build of complex dependencies like ffmpeg, especially with custom flags, can often be cumbersome on traditional systems. Nix, however, simplifies this process remarkably well through the use of overlays. This approach allows you to specify custom build options without altering the package’s default configuration.

Creating an ffmpeg Overlay To customize ffmpeg, you’ll start by defining an overlay. This is done in a file named nix/overlay/ffmpeg.nix, where you can specify your desired build flags:

# https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/ffmpeg/generic.nix
self: super: {
  ffmpeg = super.ffmpeg.override {
    withDebug = false;
    buildFfplay = false;
    withHeadlessDeps = true;
    withCuda = true;
    withNvdec = true;
    withFontconfig = true;
    withGPL = true;
    withAom = true;
    withAss = true;
    withBluray = true;
    withFdkAac =true;
    withFreetype = true;
    withMp3lame = true;
    withOpencoreAmrnb = true;
    withOpenjpeg = true;
    withOpus = true;
    withSrt = true;
    withTheora = true;
    withVidStab = true;
    withVorbis = true;
    withVpx = true;
    withWebp = true;
    withX264 = true;
    withX265 = true;
    withXvid = true;
    withZmq = true;
    withUnfree = true;
    withNvenc = true;
    buildPostproc = true;
    withSmallDeps = true;
  };
}

This overlay script overrides the default ffmpeg package to fine-tune its features and codecs based on your project’s requirements.

Integrating the Overlay Next, incorporate this overlay into your Nix environment by adding it to the list of overlays in nix/dependencies.nix:

{ pkgs ? import <nixpkgs> {
    overlays = [
      (import ./overlay/replace-torch.nix { })
      (import ./overlay/ffmpeg.nix)
    ];
    config = {
      allowUnfree = true;
      cudaSupport = true;
    };
  }
# Remaining content omitted for brevity

By adding the ffmpeg overlay to your environment, you enable the custom-configured ffmpeg build across your Nix-managed project.

Quick Testing in nix/shell.nix To ensure everything is set up correctly, include ffmpeg in the buildInputs of your nix/shell.nix:

# ... previous content omitted for brevity ...
  buildInputs = [
    pkgs.python3
    myProject
    pkgs.git
    pkgs.curl
    pkgs.linuxPackages.nvidia_x11
    pkgs.ncurses5
    pkgs.ffmpeg
  ];
# ... previous content omitted for brevity ...

Verifying the Custom ffmpeg Build Exit any existing Nix shell sessions and re-enter to load the latest configurations. Then, test your ffmpeg build to confirm the custom flags are active:

[nix-shell]$ exit
$ nix-shell nix/shell.nix # Might take a while as it compiles ffmpeg
[nix-shell]$ ffmpeg -version

The output should reflect your custom build settings, indicating success:

ffmpeg version 6.1 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13.2.0 (GCC)
configuration: ... --enable-cuda --disable-cuda-llvm ...

Changes on GitHub

Optimizing Docker Container Layering with Custom Strategies

The quest for efficient Docker container layering led me to develop a solution that deviates from traditional approaches: nix-docker-layering. Standard layering often results in an imbalance, where initial layers are disproportionately small, and a bulk of dependencies get pushed to the final layer. This default method layers each package individually, prioritizing those with the widest dependency reach.

The nix-docker-layering project introduces a novel strategy, generators.equal, which aims to distribute packages more evenly across layers, thus achieving a more balanced size distribution. Here’s how you can integrate this approach:

Step 1: Modify nix/docker/buildCudaLayeredImage.nix to include two new parameters provided by nix-docker-layering, passing them to the buildLayeredImage function:

#...
,  slurpfileGenerator
,  genArgs ? {}
# ...
in buildLayeredImage {
  inherit name tag fromImage
    contents extraCommands
    maxLayers
    fakeRootCommands enableFakechroot
    created includeStorePaths
    slurpfileGenerator genArgs;
#...

Step 2: Integrate the nix-docker-layering project into nix/build_container.nix, specifying the desired strategy and adjusting maxLayers for enhanced layering:

# ...
let
  pkgs = project_dependencies.pkgs;
  docker_layering = (import (fetchTarball {
    # URL of the tarball archive of the specific commit, branch, or tag
    url = "https://github.com/matthid/nix-docker-layering/archive/1.0.0.tar.gz";
    sha256 = "0g5y363m479b0pcyv0vkma5ji3x5w2hhw0n61g2wgqaxzraaddva";
  }) { inherit pkgs; });
# ...

in import ./docker/buildCudaLayeredImage.nix {
  inherit cudatoolkit;
  buildLayeredImage = docker_layering.streamLayeredImage;
  slurpfileGenerator = docker_layering.generators.equal;
  lib = pkgs.lib;
  maxLayers = 20;
#...

Step 3: Evaluate the new layering strategy by building and loading your Docker container:

[nix-shell]$ $(nix-build --no-out-link nix/build_container.nix) | docker load

Post-build, the output should reveal a more strategic distribution of layers:

...
Using size 825445781 (15683469850 / 19).
Adding layer 0 with size 826521078 and 271 elements.
Adding layer 1 with size 1035341240 and 12 elements.
Adding layer 2 with size 847073011 and 55 elements.
Adding layer 3 with size 1189094488 and 5 elements.
Adding layer 4 with size 1276134933 and 1 elements.
Adding layer 5 with size 933546895 and 2 elements.
Adding layer 6 with size 946925035 and 8 elements.
Adding layer 7 with size 833358245 and 44 elements.
Adding layer 8 with size 1726895617 and 43 elements.
Adding layer 9 with size 1067920473 and 69 elements.
Adding layer 10 with size 4994209074 and 4 elements.
Adding (last) layer 11 with size 6449761
...
Creating layer 12 from paths: [... '/nix/store/gpwxhvj47vrpp7szyzlvq1s4pz7q55k9-python3.11-myproject-0.1' ...]

This customized strategy results in a pragmatic balance: initial layers house the foundational packages least likely to change, while the application itself resides in the accessible final layer. Notably, the effective layer count may fall short of maxLayers due to the strategy’s efficient packing approach, which aims to minimize the last layer’s size and accommodate oversized packages as seen.

Changes on GitHub

Conclusion

Embarking on the journey with Nix, from the fundamentals to advanced techniques, demonstrates its profound impact on simplifying and optimizing project management and deployment. This exploration has not only highlighted Nix’s capabilities in managing dependencies but also its flexibility in integrating with diverse ecosystems, from simple Python applications to complex Docker containerization strategies.

Embracing the Power of Nix

The versatility of Nix, illustrated through real-world examples, offers a glimpse into its potential to revolutionize development workflows. By leveraging Nix, we navigated the intricacies of dependency management, package customization, and even refined Docker container layering, showcasing Nix’s ability to cater to specific project needs while ensuring consistency and reproducibility.

A Resource for the Nix Community

To support your journey with Nix, I’ve compiled the complete project, including all intermediate steps, as a series of commits in a GitHub repository: nix-intro-examples. This resource is designed to provide hands-on guidance and inspire further exploration into the endless possibilities Nix offers.

Final Thoughts

The transition to Nix for dependency management and beyond represents not just a shift in tools but a paradigm change towards a clearer, more manageable approach to software development. I hope this guide serves as a beacon for those navigating the complexities of dependency management, offering a path to mastery in utilizing Nix for a seamless, efficient development experience.

Paket internals - Framework Restrictions

Intro

Warning: This is a quite technically detailed article (besides some introduction). If you search for a paket intro there is a perfect up to date blog-series by Isaac:

There was a time I remember perfectly well when I thought that NuGet works perfectly fine. Additionally, I was already heavily contributing to FSharp.Formatting and from time to time to FAKE, so how could I use just another tool which might need some attention?

Paket Contributions FSharp.Formatting Contributions

Why would you contribute to a project when you still use the alternative? Well, time has changed - after wasting enough time with NuGet updates and other shortcomings I’m now a happy Paket user and contributor for quite some time now.

Why not “new” NuGet?

Personally, some things need to change before I even consider using NuGet again:

  • git diff needs to show which versions (including transitive packages) changed: This is a very good feature of Paket and helped several times to debug and find complicated version conflicts.
  • restore needs to restore exact version OR fail, nothing in between: How can you debug complicated conflicts when the versions you get are random?
  • update needs to update my transitive packages as far as they are allowed by the corresponding version constraints: Otherwise I always need to specify all transitives myself to get bugfixes - in no way better than packages.config

As you can see there is no point in going back - even after they redesigned their whole system.

Paket internals

Now let’s see how it works!

Warning2: Some internals are very hard to understand or to get right. So this post doesn’t demand correctness or completness.

Update

Update is the process of resolving your paket.dependencies file to an paket.lock file in a fully automated manner. This process happens in several steps:

  • Parse the paket.dependencies file
  • Find a solution for the dependencies riddle
  • Write paket.lock
  • Download and extract the packages
  • Analyse the package contents and figure out “relevant” files
  • Edit project files according to paket.references (internally called “Install”)
  • Show the final Report

Some of this stuff is pretty “primitive” and I will not look into it a lot (parsing, writing and extracting for example) other things like finding the solution have become so complex that I only understand parts of it.

Datastructures and Resolving (overview)

Framework restrictions

Because it helps understanding the resolver and makes explaining it a lot easier we start with a quite (if not the most) important concept in Paket: A framework restriction.

Let’s first go briefly to the terminology (from Paket’s view):

  • framework identifier: Basically everything besides portables
  • target profile or profile or platform: Either a framework identifier to represent a single platform or a list of identifiers to represent a portable profile.
  • framework (without identifier) often used as synonym for profile
  • tfm (target framework moniker) a string identifier specified by NuGet for a particular platform. Basically, “target profile” = “tfm”

It is a bit problematic that we (the paket developers) don’t always use exactly defined terms due to historical reasons - often you get the exact meaning only by context.

Now, you need to understand that the NuGet ecosystem provides packages for a lot of existing platforms. Those platforms are documented here. Therefore I simplified the FrameworkIdentifier type definition from the real one:

Generally I will simplify type definitions to make the point more clear. Often we store additional data in such datastructures for performance or UX reasons (for example to provide better warnings/errors at certain places).

type FrameworkIdentifier =
    | Net45
    | Net46
    //| ... lots of others

type TargetProfile =
    | SinglePlatform of FrameworkIdentifier
    | PortableProfile of FrameworkIdentifier list


type FrameworkRestrictionP =
    private
    | ExactlyP of TargetProfile
    | AtLeastP of TargetProfile
    // Means: Take all frameworks NOT given by the restriction
    | NotP of FrameworkRestrictionP
    | OrP of FrameworkRestrictionP list
    | AndP of FrameworkRestrictionP list
type FrameworkRestrictionLiteralI =
    | ExactlyL of TargetProfile
    | AtLeastL of TargetProfile
type FrameworkRestrictionLiteral =
    { LiteraL : FrameworkRestrictionLiteralI; IsNegated : bool }
type FrameworkRestrictionAndList =
    { Literals : FrameworkRestrictionLiteral list }
type FrameworkRestriction =
    private { OrFormulas : FrameworkRestrictionAndList list }
type FrameworkRestrictions =
    | ExplicitRestriction of FrameworkRestriction
    | AutoDetectFramework

So in simple words a framework restriction is a general formula which describes a set of TargetProfile in DNF. We decided to use DNF because in our domain they tend to be shorter and keeping them in DNF throughout the applications allows us to simplify formulas along the way with simple algorithms. Example of such formulas are >= net45 or OR (>= net45) (<netcoreapp1.0).

As briefly described above, each formula represents a set of profiles and this set is defined like this:

    member x.RepresentedFrameworks =
        match x with
        | FrameworkRestrictionP.ExactlyP r -> [ r ] |> Set.ofList
        | FrameworkRestrictionP.AtLeastP r -> r.PlatformsSupporting
        | FrameworkRestrictionP.NotP(fr) ->
            let notTaken = fr.RepresentedFrameworks
            Set.difference KnownTargetProfiles.AllProfiles notTaken
        | FrameworkRestrictionP.OrP (frl) ->
            frl
            |> Seq.map (fun fr -> fr.RepresentedFrameworks)
            |> Set.unionMany
        | FrameworkRestrictionP.AndP (frl) ->
            match frl with
            | h :: _ ->
                frl
                |> Seq.map (fun fr -> fr.RepresentedFrameworks)
                |> Set.intersectMany
            | [] -> 
                KnownTargetProfiles.AllProfiles

Basically “AtLeast” (>=) means “all profiles which are supporting the current profile”. “Supporting” in that sense means that if I have two profiles X and Y and create a new project targeting Y and I can reference packages with binaries build against X we say Y supports X. For example net46 supports net45 therefore net46 is part of the set >= net45. Some further examples:

  • net47 is in >= netstandard10
  • netcoreapp10 is in >= netstandard16
  • net45 is in < netstandard12 which is equivalent to NOT (>= netstandard12), because net45 is NOT in >= netstandard12
  • net45 is NOT in < netstandard13

It is confusing, even for me, who wrote this stuff. The important thing here is to get away from the thinking of “smaller” and “higher” because it has no real meaning. On the other hand “supports” has a well defined meaning. Also don’t try to give < tfm any meaning besides a particular set of frameworks. This makes reasoning a lot simpler (just see it as an intermediate value used for simplifications and calculations). Technically, you could see it as “all platforms not supporting a particular tfm”.

So, now we know that a framework restriction is a formula which represents a particular list of frameworks (which we can now calculate given the NuGet documentation from above). But why do we need them?

The answer is the resolver phase. Let’s compare with plain NuGet: In NuGet you have a project targeting a single platform. NuGet now goes hunting for compatible packages for this particular platform. So at resolution time it knows which dependencies to take and what files it needs to install from a package. You might say: But what about new NuGet? The answer is the principle is not different at all. In the new world they resolve for each platform separatly exactly as described. This - in addition to how they resolve package versions - makes the resolution phase dead simple.

Paket on the other hand has a different world view. We assume the following to be true:

  • Packages properly define their dependencies (Note: NuGet explicitely assumes the reverse)
  • You want to reach a unified view of your dependencies-tree. This means you accept different packages for different platforms but you only accept a single version of a package supporting all your target profiles.

This means we tell our resolver our acceptable range of dependencies and the list of frameworks (see what I did there?) we want to build for. Obviously in practice we use framework restrictions for this:


type PackageName = string
type SemVerInfo = string
type VersionRangeBound =
    | Excluding
    | Including
type VersionRange =
    | Minimum of SemVerInfo
    | GreaterThan of SemVerInfo
    | Maximum of SemVerInfo
    | LessThan of SemVerInfo
    | Specific of SemVerInfo
    | OverrideAll of SemVerInfo
    | Range of fromB : VersionRangeBound * from : SemVerInfo * _to : SemVerInfo * _toB : VersionRangeBound
type VersionRequirement = VersionRange

type InstallSettings = 
    { FrameworkRestrictions: FrameworkRestrictions }

type PackageRequirementSource =
| DependenciesFile of string
| Package of PackageName * SemVerInfo * PackageSource

type PackageRequirement =
    { Name : PackageName
      VersionRequirement : VersionRequirement
      Parent: PackageRequirementSource
      Graph: PackageRequirement Set
      Sources: PackageSource list
      Settings: InstallSettings }

type DependencySet = Set<PackageName * VersionRequirement * FrameworkRestrictions>

type ResolvedPackage = {
    Name                : PackageName
    Version             : SemVerInfo
    Dependencies        : DependencySet
    Unlisted            : bool
    IsRuntimeDependency : bool
    IsCliTool           : bool
    Settings            : InstallSettings
    Source              : PackageSource
} 

val Resolve : Set<PackageRequirement> -> Map<PackageName, ResolvedPackage>

As you can see we input a set of PackageRequirement where every requirement contains framework restrictions in its settings.

But what do we get? The answer is that we not only need to know the version and the name of a particular package but also on which list of frameworks we are having this dependency. Again this is part of the settings of the result.

Need an example? Consider package A:

  • For tfm-b it depends on B in version 1.0
  • For tfm-c it depends on C in version 1.0

These conditions are called “dependency groups”. Here we have two dependency groups, one for tfm-b and one for tfm-c

Now if we put into the resolver that we want to depend on A and build for all frameworks (= have no framework restrictions, note in this simplified scenario I could say we use the list OR (= tfm-b) (= tfm-c) - see framework restrictions are lists ;)). What do we want as result?

Well, we want Paket to tell us that B needs to be installed but only for the list of frameworks >= tfm-b and C for the list of frameworks >= tfm-c! Can you spot the error?

The error is that the answer is wrong and the correct one depends on how tfm-b and tfm-c relate to each other!

For example consider tfm-b supports tfm-c which means tfm-b is in >= tfm-c. Then the correct answer is that we need to install B for >= tfm-b and C for AND (>= tfm-c) (< tfm-b). (Think of tfm-b = net45 and tfm-c = net40). The reason for this is that we should always decide for a single “dependency group”.

Another interesting case is when they don’t directly relate to each other (ie none is supported by the other) but AND (>= tfm-c) (>= tfm-b) is not the empty set. For example consider tfm-c = netstandard10 and tfm-b = net35. Now the correct answer is kind of difficult. Because for the set AND (>= tfm-c) (>= tfm-b) there is no good answer. What paket does is it reuses it’s internal cost function to figure out if a random member of the AND set matches better to tfm-c or tfm-b and then assigns the remaining items to the result. Lets assume the missing list matches “better” to tfm-c then we get:

  • Install C when OR (>= tfm-c) (AND (>= tfm-c) (>= tfm-b)) which will be simplified to >= tfm-c
  • Install B when AND (>= tfm-b) (< tfm-c).

The above logic is encoded in this 51 line function which probably needs a bit time to read (it took quite a bit time to write “correctly”, so that’s probably fair).

Just when writing this blog post I noticed that there is a bug in the above logic. Please send a PR if you can figure out what the problem is ;)

Ok this is enough for now, more internals might follow…

Please ping me on twitter to tell me what you want to know next. Or open an issue on Paket :)

Having Fun with Computation Expressions

Or how I build a generic computation builder library.

Over the last days I build a F# API for google music by using gmusicapi and pythonnet (two awesome projects by the way). The Python C-API requires you to request the GIL (Global Interpreter Lock) before you can safely use the API. Because I knew I would forget to do this all over the place I decided to mark those places explicitly with the help of computation expressions. Doing this on a low level means I can safely use it on higher levels.

I decided to build the computation expression like this:

type PythonData<'a> =
  private { 
    Delayed : (unit -> 'a)
    mutable Cache : 'a option
  }
let pythonFunc f = { Delayed = f; Cache = None }
let internal unsafeExecute (f:PythonData<_>) =
  match f.Cache with
  | Some d -> d
  | None ->
    let res = f.Delayed()
    f.Cache <- Some res
    res
let private getPythonData = unsafeExecute
let runInPython f = 
  use __ = Python.Runtime.Py.GIL()
  f |> getPythonData

The builder is straightforward from there (see link above).

Of course now we need to interact with sequences and er need something like FSharp.Control.AsyncSeq. Basically all I had to do was copy the code from there and replace the builder.

Wait… What? We will look into this later.

Now I got really curious, I really only want to replace the ‘runInPython’ function and there is nothing specific about my python problem in the builders. Can we be more generic here? Just adding more run functions is not really practical as then users can just use the wrong one…

Let the fun begin… Lets first start with a general purpose delayed builder and lets see what we can do from there:


type Delayed<'a> =
  private { 
    Delayed : (unit -> 'a)
    mutable Cache : 'a option
  } 
  
module Delayed =
  let create f = { Delayed = f; Cache = None }
  let execute (f:Delayed<_>) =
    match f.Cache with
    | Some d -> d
    | None ->
      let res = f.Delayed()
      f.Cache <- Some res
      res
      
  let map f d =
    (fun () -> f (d |> execute)) |> create

type ConcreteDelayedBuilder() =
	let create f = Delayed.create f
	let execute e = Delayed.execute e
	member x.Bind(d, f) =
		(fun () -> 
			let r = d |> execute
			f r |> execute
			) |> create

	member x.Return(d) = 
		(fun () -> d) |> create
	member x.ReturnFrom (d) = d
	member x.Delay (f) = 
		(fun () -> f() |> execute) |> create
	member x.Combine (v, next) = x.Bind(v, fun () -> next)
	member x.Run (f) = f
	member x.Zero () = (fun () -> ()) |> create
	member x.TryWith (d, recover) =
		(fun () -> 
		try
			d |> execute
		with e -> recover e |> execute) |> create
	member x.TryFinally (d, final) =
		(fun () -> 
		try
			d |> execute
		finally final ()) |> create
	member x.While (condF, body) =
		(fun () -> 
		while condF() do
			body |> execute) |> create
	member x.Using (var, block) =
		(fun () -> 
		use v = var
		block v |> execute) |> create
	member x.For (seq, action) = 
		(fun () -> 
		for item in seq do
			action item |> execute) |> create

let delayed = ConcreteDelayedBuilder()

Ok looks good: We have a simple delayed builder. What we want now is some kind of converter to convert this Delayed<'T>' in a PythonData<’T>`

I would design the type like this:

  type PythonData<'T> = private { D : Delayed<'T> }
  let runInPython f = 
    use __ = Python.Runtime.Py.GIL()
    f.D |> Delayed.execute

Therefore callers cannot use the underlying Delayed object. But how do we generically get a computation builder and how would the result look like?

We would like to build a generic (computation expression builder) type with some kind of converter parameter which itself calls the regular delayed builder.

type IDelayedConverter<'b> =
	member ToDelayed : 'b<'a> -> Delayed<'a>
	member OfDelayed : Delayed<'a> -> 'b<'a>

Something like this is what we want, but sadly this is not possible (see uservoice). Can we work around this limitation? I decided to use a interface with marker classes for this. If you have a better idea let me know!

type IDelayed<'b, 'a> = interface end
type DefaultMarker = class end 
type Delayed<'a> =
  private { 
    Delayed : (unit -> 'a)
    mutable Cache : 'a option
  } with
  interface IDelayed<DefaultMarker, 'a>

/// Ideally we want 'b to be a type constructor and return 'b<'a>...
type IDelayedConverter<'b> =
  abstract ToDelayed : IDelayed<'b, 'a> -> Delayed<'a>
  abstract OfDelayed : Delayed<'a> -> IDelayed<'b, 'a>

Now we can change our computation builder to take an instance of a converter:

type ConcreteDelayedBuilder<'b>(conv : IDelayedConverter<'b>) =
    let execute a = a |> conv.ToDelayed |> Delayed.execute
    let create f = f |> Delayed.create |> conv.OfDelayed

	// .. Continue with the old code.

Which leads to our default instance like this:

  // Add to the Delayed module...
  let conv =
    { new IDelayedConverter<DefaultMarker> with
       member x.ToDelayed p = (p :?> Delayed<'a>)
       member x.OfDelayed d = d :> IDelayed<DefaultMarker, _> }


[<AutoOpen>]
module DelayedExtensions =
  
  let delayed = ConcreteDelayedBuilder(Delayed.conv)

Nice! Now we can create the python builder like this:

module Python =
  type PythonDataMarker = class end 
  type PythonData<'T> = private { D : Delayed<'T> } with
    interface IDelayed<PythonDataMarker, 'T>
  let internal pythonConv =
    { new IDelayedConverter<PythonDataMarker> with
       member x.ToDelayed p = (p :?> PythonData<'a>).D
       member x.OfDelayed d = { D = d } :> IDelayed<PythonDataMarker, _> }
  let runInPython f = 
    use __ = Python.Runtime.Py.GIL()
    pythonConv.ToDelayed f |> Delayed.execute
  
  let python = ConcreteDelayedBuilder(pythonConv)

A bit of setup but wow we almost made it. What we now want is the pythonSeq or delayedSeq computation builder. When we think about it we want a generic builder which takes the regular builder as parameter.

Oh that sounds like it will create a bunch of problems, but lets start to convert the AsyncSeq code. In theory all we need to do now is copy the AsyncSeq code and replace

  • AsyncSeq<'T> -> DelayedSeq<'b, 'T>
  • IAsyncEnumerator<'T> -> IDelayedEnumerator<'b, 'T>
  • Potentially add some type paramters to the helper classes.
  • replace the async builder with our parameter builder
  • Async<'T> -> IDelayed<'b, 'T>

First problem: We cannot use modules to define our functionality because we actually have a parameter (the underlying builder).

So lets start with the interfaces

type IDelayedEnumerator<'b, 'T> =
  abstract MoveNext : unit -> IDelayed<'b, 'T option>
  inherit System.IDisposable

type IDelayedEnumerable<'b, 'T> =
  abstract GetEnumerator : unit -> IDelayedEnumerator<'b, 'T>

type DelayedSeq<'b, 'T> = IDelayedEnumerable<'b, 'T>

This will do.

Lets start with empty (from here):

type DelayedSeqBuilder<'b>(builder : ConcreteDelayedBuilder<'b>) =
  //[<GeneralizableValue>]
  member x.empty<'T> () : DelayedSeq<'b, 'T> = 
        { new IDelayedEnumerable<'b, 'T> with 
              member x.GetEnumerator() = 
                  { new IDelayedEnumerator<'b, 'T> with 
                        member x.MoveNext() = builder { return None }
                        member x.Dispose() = () } }
 

Ok, it actually compiles. This is a good step forward. There we go: A computation builder as a parameter. The next special thing happens when they define helper types within their module. We cannot do this in a type. Therefore we will just move all the helpers above the DelayedSeqBuilder<'b> type and mark them as internal (see here).

Why did I name this type DelayedSeqBuilder<'b> and not like the corresponsing module? Because we cannot define the computation builder inside. Instead we will define all the functions here as members. This will later make the following possible:

  let seq =
    pythonSeq {
      for i in [1, 2, 3] do
        // Call Python API
        let! t = tf
        yield t + "test"
    }

  let first =
    seq
    |> pythonSeq.firstOrDefault "default"

But we are not there yet. So instead of defining the AsyncSeqBuilder inside the module we will just define everything in the builder itself. We are now here in AsyncSeq, or here with the port.

Now they do something which is not good for us. They define asyncSeq inside the module to define the higher level functionality as extension methods later. One example is the emitEnumerator function. We cannot create an instance, because we don’t know the paramter. But wait we already have an this (x) reference to ourself. What if we use that?

  member x.emitEnumerator (ie: IDelayedEnumerator<'b, 'T>) = x {
      let! moven = ie.MoveNext() 
      let b = ref moven 
      while b.Value.IsSome do
          yield b.Value.Value 
          let! moven = ie.MoveNext() 
          b := moven }

And again it compiles! This is crasy. With this (!) we can easily port the rest of the functionality.

Now we just need to extend the regular builder with the async for. But this is straightforward as well:

[<AutoOpen>]
module DelayedSeqExtensions =
  // Add asynchronous for loop to the 'async' computation builder
  type ConcreteDelayedBuilder<'b> with
    member internal x.For (seq:DelayedSeq<'b, 'T>, action:'T -> IDelayed<'b, unit>) =
      let seqBuilder = DelayedSeqBuilder(x)
      seq |> seqBuilder.iterDelayedData action 

We can define the final functionality by using both builders. But wait there is one more problem. When we define extension methods for the DelayedSeqBuilder<'b> type we cannot access the underlying builder parameter anymore. So lets add a property to access it:

type DelayedSeqBuilder<'b>(builder : ConcreteDelayedBuilder<'b>) =
  // ... 
  member x.Builder = builder
// ...
[<AutoOpen>]
module DelayedSeqExtensions =
  // ...
  type DelayedSeqBuilder<'b> with
    member x.tryLast (source : DelayedSeq<'b, 'T>) = x.Builder { 
        use ie = source.GetEnumerator() 
        let! v = ie.MoveNext()
        let b = ref v
        let res = ref None
        while b.Value.IsSome do
            res := b.Value
            let! moven = ie.MoveNext()
            b := moven
        return res.Value }

And we can finaly create the computation builder instances:

  let delayedSeq = new DelayedSeqBuilder<_>(delayed)

And for our python use case:

  let python = ConcreteDelayedBuilder(pythonConv)
  let pythonSeq = DelayedSeqBuilder(python)

What have we done here:

  • We created a library to create your own computation builders with minimal amount of code (by providing an interface implementation)
  • We worked around F# not supporting high order generics
  • We used a computation builder as parameter
  • We used the this reference as computation builder
  • We used a property as computation builder

It’s really sad that we cannot define the DelayedSeqBuilder more generically. I’m pretty sure that would be usable for some exting computation builders as well. Maybe there is something we can do with type providers here ;).

This is all for now :). You can use this in your code via nuget. Of course the code is on github.

Stream Amazon Prime to UPNP Devices via WLAN

Ok this is weird: I simply want to stream an Amazon Prime movie to our receiver supporting UPNP to be able to watch the film on the big TV screen and ideally use the notebook at the same time. Why is this so incredibly hard to realize and you need several software projects to realize it?

First I tried to use plugins for already existing software within our network. Namely:

  • Amazon Prime Plugin for Plex, which doesn’t exist apparently
  • The people from the plex thread talk about a plugin for XBMC/Kodi. The plugin wouldn’t even install on the device I wanted (older debian installation). But out of curiousity I tried with the latest version on Windows, installed the plugin and tadaaah it didn’t work Issue1 and Issue2. So the plugin is no longer usable…
  • Because we have an older linux device in place (exactly for watching stuff) I tried to just login and play the videos there, however firefox and chromium apparently get no love from amazon. I even installed the latest chrome, but amazon would tell me something about missing plugins :( (Note that this wasn’t my prefered solution anyway…)

Obviously it isn’t as simple as I thought initially. So how about a more generic solution. What if I just stream the whole desktop to the device?

If you start googleing about this it really looked like VLC is the way to go. I tried. I really tried. But whenever I tried to forward the desktop to a stream (http://, udp or whatever) VLC crashed. So while in theory VLC does exactly what we need it doesn’t work (at least it didn’t for me).

Now several hours later I’m still not ready to give up. How can such a simple thing not work? You know what? Gaming has been really successfull in streaming things via twitch.tv, so I guess I should look there for mature software and see how far I can get…

First I installed OBS the first software recommended by Twitch.tv and actually open source (so I thought it might be open enough to do other things than stream for twitch.tv). So now OBS is designed to stream to a streaming server and not to UPNP device so I need my own streaming server. After some more help from google it was clear that OBS itself provides the required documentation to do it. The only problem is that I’m using windows and had no intention of compiling anything, therefore I used a precompiled nginx version with rtmp support.

For recording I used the following trick:

  • Before starting chrome change the default audio device to a virtual one (or one that isn’t connected/unused): Screenshot Setup before starting chrome
  • Now I can add the device to OBS (and hear it later correctly synced on the UPNP device via stream)
  • Start chrome
  • Add the chrome windows to OBS as you like: Screenshot Setup after starting chrome

For nginx I basically used the exact configuration from the documentation above:

C:\Users\dragon\Downloads\nginx-rtmp-win32-master\conf:

worker_processes  1;

error_log  logs/error.log debug;

events {
    worker_connections  1024;
}

rtmp {
    server {
        listen 8089;
        #chunk_size 256;

        #wait_key on;
        #wait_video on;
		
        application live {
            live on;
            record off;

            #wait_key on;
            #interleave on;
            #publish_notify on;
            #sync 10ms;
            #wait_video on;
            #allow publish 127.0.0.1;
            #deny publish all;
            #allow play all;
        }
		
		#application hls {
		#	live on;
		#	hls on;  
		#	hls_path temp/hls;  
		#	hls_fragment 8s;  
		#}
    }
}

http {
    server {
        listen      8088;
		
        location / {
            root www;
        }
		
        location /stat {
            rtmp_stat all;
            rtmp_stat_stylesheet stat.xsl;
        }

        location /stat.xsl {
            root www;
        }
		
		location /hls {  
           #server hls fragments  
			types{  
				application/vnd.apple.mpegurl m3u8;  
				video/mp2t ts;  
			}  
			alias temp/hls;  
			expires -1;  
        }  

    }
}

I like that the pre compiled nginx already has setup a page where I can directly what the stream if I want to.

After starting nginx in a console window (or by double clicking nginx.exe) your streaming server should be ready and you can finalize the OBS setup and start the stream:

  • First make sure you setup the “Stream” settings exactly the way you setup the nginx configuration. Note that the Stream key doesn’t matter (here I use “test”) but is used again later: Screenshot Setup 'Stream'

  • Then setup the output in a way suited to you (I only use an higher bitrate for better quality): Screenshot Setup 'Output'

  • Then setup the video in a way suited to your TV (change to the optimal resolution for your TV): Screenshot Setup 'Video'

Now the stream can be started successfully and we can even watch it via http://localhost:8088 by entering rtmp://127.0.0.1:8089/live/test in the box.

We are almost there, because we only need to bridge the gap between the rtmp stream and the UPNP device. Luckily serviio can close the gap:

Screenshot Setup 'Serviio'

Further information for this setup

  • There is definitely a high delay (several seconds) depending on your configuration as this is not meant to be ‘live’.

  • Note that you can even lock your laptop, the stream will continue to work!

  • You can however not ‘minimize’ chrome (move it to the edge of the screen)

  • This method is not limited to Amazon Prime at all, stream whatever you like!

  • If you don’t mind the delay you can use the UPNP device as second monitor:

    • First setup a fake display like this, that or use a HDMI-to-VGA adapter This will make windows think that a second monitor is attached.
    • Now you can setup OBS to stream this second monitor instead of the chrome window.

News replaced with blog

Just a short information that now the ‘news’ section has been replaced with a blog here on yaaf.de. At the end of this post I show all the code needed to include a basic markdown based blog to your website via FSharp.Formatting!

Some more features will be added soon:

  • Ability to comment blog posts.
  • Ability to filter posts by year and tag.

The blog was build with the lovely FSharp.Formatting library (the first time I actually used it as a regular library!). It’s a very simple in-memory implementation (which is fine for the low number of posts) and all posts are markup files in a folder which are processed to html by FSharp.Formatting. I added some trickery to allow embedding of title, author, date and tags into the template file (basically I read the first headline and remove it from the parsed markup file).

A blog entry file looks like this:

# Date: 2015-12-17; Title: News replaced with blog; Tags: blog,yaaf; Author: Matthias Dittrich

Just a short information that now the 'news' section has been replaced with a blog here on yaaf.de.
At the end of this post I show all the code needed to include a basic markdown based blog to your website via `FSharp.Formatting`!

Some more features will be added soon:

 - Ability to comment blog posts.
 - Ability to filter posts by year and tag.

... continue with markup ....

Of course while adding the blog I found a bug and fixed it… And because I was already at the FSF code I tried to help ademar with a PR which I want to merge a long time now: https://github.com/tpetricek/FSharp.Formatting/pull/331 (see related PR)

Edit 2015-12-18:

Of course I had to update some css scripts such that blog post links are properly wrapped and the embedded code is as well.

The following css does the trick for links:

#content .inbox a {
    /* Prevent Links from breaking the Layout (blogposts!) */
    display: inline-block;
    word-break: break-all;
}

The problem was that long automatically converted links would not wrap properly, so this tells the browser that I want to break links on every character.

Then I had to handle the regular pre tag (because I modified the FSF generation to be comaptible with prismjs):

#content .inbox {
	/* basically the same as margin above, but helps with pre tags */
	max-width: calc(100% - 50px);
}

It’s strange that the pre-tag seems to ignore the regular wrapping rule and must be forced to show the scroll-bar by setting the max-width property of the parent element. I could see in the browser that the setup was correct as it was floating ok up to the point where it had to show the scroll-bars (and hide/wrap text).

The third problem had to do with the tables FSF is generating for F# scripts (for F# I still use the FSF defaults):

/* Fix that F# code scrolls properly with the page (50px = 2 * Margin of the inbox) */
table.pre {
    table-layout: fixed;
    width: calc(100% - 50px);
}

table.pre pre {
  /* Show scrollbar when size is too small */
  overflow: auto;
}

table.pre td.lines {
  /* Align on top such that line numbers are at the correct place when the scrollbar is shown */
  vertical-align: top;
}

It seems to be the regular behavior that tables do not wrap out of the box, see this.

And finally here is the box with some F# code to test the css changes (Note: this is all the code I used to process the markdown files to html via FSF):

/// Simple in-memory database for my (quite limited number of) blogpost.
namespace Yaaf.Website.Blog
open System
/// Html content of a post
type Html = RawHtml of string
/// The title of a post
type Title = Title of string
/// The stripped title of a post
type StrippedTitle = StrippedTitle of string

type Post =
  { Date : DateTime
    Title : Title
    Content : Html
    Teaser : Html
    TipsHtml : Html
    Tags : string list
    Author: string }

open System.Collections.Generic
type PostDb = IDictionary<DateTime * StrippedTitle, Post>

module BlogDatabase =
  open System.IO
  open System.Web
  open FSharp.Markdown
  open FSharp.Literate
  

  let private formattingContext templateFile format generateAnchors replacements layoutRoots =
      { TemplateFile = templateFile 
        Replacements = defaultArg replacements []
        GenerateLineNumbers = true
        IncludeSource = false
        Prefix = "fs"
        OutputKind = defaultArg format OutputKind.Html
        GenerateHeaderAnchors = defaultArg generateAnchors false
        LayoutRoots = defaultArg layoutRoots [] }

  let rec private replaceCodeBlocks ctx = function
      | Matching.LiterateParagraph(special) -> 
          match special with
          | LanguageTaggedCode(lang, code) -> 
              let inlined = 
                match ctx.OutputKind with
                | OutputKind.Html ->
                    let code = HttpUtility.HtmlEncode code
                    let codeHtmlKey = sprintf "language-%s" lang
                    sprintf "<pre class=\"line-numbers %s\"><code class=\"%s\">%s</code></pre>" codeHtmlKey codeHtmlKey code
                | OutputKind.Latex ->
                    sprintf "\\begin{lstlisting}\n%s\n\\end{lstlisting}" code
              Some(InlineBlock(inlined))
          | _ -> Some (EmbedParagraphs special)
      | Matching.ParagraphNested(pn, nested) ->
          let nested = List.map (List.choose (replaceCodeBlocks ctx)) nested
          Some(Matching.ParagraphNested(pn, nested))
      | par -> Some par
      
  let private editLiterateDocument ctx (doc:LiterateDocument) =
    doc.With(paragraphs = List.choose (replaceCodeBlocks ctx) doc.Paragraphs)

  let parseRawDate (rawDate:string) = DateTime.ParseExact(rawDate, "yyyy-MM-dd", System.Globalization.CultureInfo.InvariantCulture)

  let parseRawTitle (rawTitle:string) =
    // 2015-12-17: Testpost with some long title
    let splitString = " - "
    let firstColon = rawTitle.IndexOf(splitString)
    if firstColon < 0 then failwithf "invalid title (expected instance of ' - ' to split date from title): '%s'" rawTitle
    let rawDate = rawTitle.Substring(0, firstColon)
    let realTitle = rawTitle.Substring(firstColon + splitString.Length)
    Title (realTitle.Trim()), parseRawDate rawDate

  let parseHeaderLine (line:string) =
    let splitString = ": "
    let firstColon = line.IndexOf(splitString)
    if firstColon < 0 then failwithf "invalid header line (expected instance of ': ' to split header key from value): '%s'" line
    let rawKey = line.Substring(0, firstColon)
    let rawValue = line.Substring(firstColon + splitString.Length)
    rawKey, rawValue


  let extractHeading (doc:LiterateDocument) =
    let filtered, heading =
      doc.Paragraphs
      |> Seq.fold (fun (collected, oldHeading) item ->
        let takeItem, heading =
          match oldHeading, item with
          | None, (Heading(1, text)) ->
            let doc = MarkdownDocument([Span(text)], dict [])
            false, Some(Formatting.format doc false OutputKind.Html)
          | None, _ -> true, None
          | _ -> true, oldHeading
        (if takeItem then item :: collected else collected), heading) ([], None)
    heading, doc.With(paragraphs = List.rev filtered)

  let evalutator = lazy (Some <| (FsiEvaluator() :> IFsiEvaluator))
  let readPost filePath =
    // parse the post markup file
    let doc = Literate.ParseMarkdownFile (filePath, ?fsiEvaluator = evalutator.Value)
    
    // generate html code from the markdown
    let ctx = formattingContext None (Some OutputKind.Html) (Some true) None None
    let doc =
      doc
      |> editLiterateDocument ctx
      |> Transformations.replaceLiterateParagraphs ctx
    let heading, doc = extractHeading doc
    let content = Formatting.format doc.MarkdownDocument ctx.GenerateHeaderAnchors ctx.OutputKind
    let rec getTeaser (currentTeaser:string) (paragraphs:MarkdownParagraphs) =
      if currentTeaser.Length > 150 then currentTeaser
      else
        match paragraphs with
        | p :: t ->
          let convert = Formatting.format (doc.With(paragraphs = [p]).MarkdownDocument) ctx.GenerateHeaderAnchors ctx.OutputKind
          getTeaser (currentTeaser + convert) t
        | _ -> currentTeaser

    let title, date, tags, author =
      match heading with
      | Some header ->
        let headerValues =
          header.TrimEnd().Split([|"; "|], StringSplitOptions.RemoveEmptyEntries)
          |> Seq.map parseHeaderLine
          |> dict
        
        Title headerValues.["Title"], parseRawDate headerValues.["Date"],
        (match headerValues.TryGetValue "Tags" with
        | true, tags -> tags.Split([|","|], StringSplitOptions.RemoveEmptyEntries)
        | _ -> [||]),
        match headerValues.TryGetValue "Author" with
        | true, author -> author
        | _ -> "Unknown"
      | None ->
        let name = Path.GetFileNameWithoutExtension filePath
        let title, date = parseRawTitle name
        title, date, [||], "Unknown"

    let tipsHtml = doc.FormattedTips
    { Date = date; Title = title; Content = RawHtml content; TipsHtml = RawHtml tipsHtml; 
      Tags = tags |> List.ofArray; Author = author; Teaser = RawHtml (getTeaser "" doc.Paragraphs) }

  let toStrippedTitle (Title title) =
    StrippedTitle (title.Substring(0, Math.Min(title.Length, 50)))

  let readDatabase path : PostDb =
    // Blogposts are *.md files within the given path
    Directory.EnumerateFiles(path, "*.md")
    |> Seq.map (readPost >> (fun p -> (p.Date, toStrippedTitle p.Title), p))
    |> dict