Selective Embedded Just-In-Time Specialization

Achieving near optimal performance in high level productivity languages

Get Started View on GitHub

The SEJITS project provides tools and frameworks to bridge the gap between high level productivity languages and high performance hardware. Specialized kernels written with SEJITS can provide near optimal performance using information available only at run-time.

  • Multiple frameworks for specializer development
  • Multiple backend support: C, OpenCl, OpenMp
  • Automatic project generation
  • Travis Continuous Integration
  • Many example implementations
  • IPython Integration
  • Custom AST transformation GUI
  • Integrated energy monitor API
  • Specializer composition tools
  • Open source on github
  • Complete set of docs

You can find code examples at Berkeley SEJITS github/ucb-sejits page. Latest SEJITS Framerwork (ctree) with many example specializers

from numpy import *
from examples.stencil_grid.stencil_kernel import *
from examples.stencil_grid.stencil_grid import StencilGrid

import sys

alpha = 0.5
beta = 1.0

class LaplacianKernel(StencilKernel):
    def __init__(self, alpha, beta):
        super(LaplacianKernel, self).__init__()
        self.constants = {'alpha': alpha, 'beta': beta}

    def kernel(self, in_grid, out_grid):
        for x in in_grid.interior_points():
            out_grid[x] = alpha * in_grid[x]
            for y in in_grid.neighbors(x, 1):
                out_grid[x] += beta * in_grid[y]

nx = int(sys.argv[1])
ny = int(sys.argv[2])
nz = int(sys.argv[3])
input_grid = StencilGrid([nx, ny, nz])
output_grid = StencilGrid([nx, ny, nz])

for x in input_grid.interior_points():
    input_grid[x] = random.randint(nx * ny * nz)

laplacian = LaplacianKernel(alpha, beta)
for i in range(50):
    for x in input_grid.interior_points():
        input_grid[x] = random.randint(nx * ny * nz)
    laplacian.kernel(input_grid, output_grid)

class ArrayOp(object):
    A class for managing independent operation on elements
    in numpy arrays.

    def __init__(self):
        """Instantiate translator."""
        self.c_apply_all = OpTranslator(get_ast(self.apply), "apply_all")

    def __call__(self, A):
        """Apply the operator to the arguments via a generated function."""
        return self.c_apply_all(A)

# ---------------------------------------------------------------------------
# User code

class Doubler(ArrayOp):
    """Double elements of the array."""

    def apply(n):
        return n * 2
/** Four-by-four multiply using a look-up table.
    class Mul extends Module {
    val io = new
    Bundle {
    val x = UInt(INPUT, 4)
    val y = UInt(INPUT, 4)
    val z = UInt(OUTPUT, 8)
    val muls = new ArrayBuffer[UInt]()

    for (i <-
    0 until 16)
    for (j <-
    0 until 16)
    muls += UInt(i * j, width = 8)
    val tbl = Vec(muls)
    io.z := tbl((io.x << UInt(4)) | io.y)
/** A n-bit adder with carry in and carry out
    class Adder(val n:Int) extends Module {
    val io = new
    Bundle {
    val A = UInt(INPUT, n)
    val B = UInt(INPUT, n)
    val Cin = UInt(INPUT, 1)
    val Sum = UInt(OUTPUT, n)
    val Cout = UInt(OUTPUT, 1)
    //create a vector of FullAdders
    val FAs = Vec.fill(n){ Module(new FullAdder()).io }
    val carry = Vec.fill(n+1){ UInt(width = 1) }
    val sum = Vec.fill(n){ Bool() }

    //first carry is the top level carry in
    carry(0) := io.Cin

    //wire up the ports of the full adders
    for (i <-
    0 until n) {
    FAs(i).a := io.A(i)
    FAs(i).b := io.B(i)
    FAs(i).cin := carry(i)
    carry(i+1) := FAs(i).cout
    sum(i) := FAs(i).sum.toBool()
    io.Sum := sum.toBits().toUInt()
    io.Cout := carry(n)

There are many resources available to get you started on SEJITS.

  • The CTree Framework is the latest greatest SEJITS implementation, with new features LLVM-IR backend, support for OpenCL and the OpenTuner AutoTuning engine.

  • The Hindemith Framework is a framework for composing high-performance code from Python descriptions.

  • The ASP Framework is the original SEJITS framework, with support for CUDA and C backends.

  • Stencil Code -- A specializer that delivers a compact DSL for stencil grid calculations
  • Hindemith -- A specializer for optical flow and STAPP computation
  • akx -- Matrix powers is the basis for the Krylov subspace which spans {x, Ax, A^2x, ..., A^kx}
  • SVM A SEJITS implementation of a support vector machine using ctree