Achieving near optimal performance in high level productivity languages
Get Started View on GitHubThe SEJITS project provides tools and frameworks to bridge the gap between high level productivity languages and high performance hardware. Specialized kernels written with SEJITS can provide near optimal performance using information available only at run-time.
You can find code examples at Berkeley SEJITS github/ucb-sejits page. Latest SEJITS Framerwork (ctree) with many example specializers
from numpy import * from examples.stencil_grid.stencil_kernel import * from examples.stencil_grid.stencil_grid import StencilGrid import sys alpha = 0.5 beta = 1.0 class LaplacianKernel(StencilKernel): def __init__(self, alpha, beta): super(LaplacianKernel, self).__init__() self.constants = {'alpha': alpha, 'beta': beta} def kernel(self, in_grid, out_grid): for x in in_grid.interior_points(): out_grid[x] = alpha * in_grid[x] for y in in_grid.neighbors(x, 1): out_grid[x] += beta * in_grid[y] nx = int(sys.argv[1]) ny = int(sys.argv[2]) nz = int(sys.argv[3]) input_grid = StencilGrid([nx, ny, nz]) output_grid = StencilGrid([nx, ny, nz]) for x in input_grid.interior_points(): input_grid[x] = random.randint(nx * ny * nz) laplacian = LaplacianKernel(alpha, beta) for i in range(50): for x in input_grid.interior_points(): input_grid[x] = random.randint(nx * ny * nz) laplacian.kernel(input_grid, output_grid)
class ArrayOp(object): """ A class for managing independent operation on elements in numpy arrays. """ def __init__(self): """Instantiate translator.""" self.c_apply_all = OpTranslator(get_ast(self.apply), "apply_all") def __call__(self, A): """Apply the operator to the arguments via a generated function.""" return self.c_apply_all(A) # --------------------------------------------------------------------------- # User code class Doubler(ArrayOp): """Double elements of the array.""" def apply(n): return n * 2
/** Four-by-four multiply using a look-up table. */ class Mul extends Module { val io = new Bundle { val x = UInt(INPUT, 4) val y = UInt(INPUT, 4) val z = UInt(OUTPUT, 8) } val muls = new ArrayBuffer[UInt]() for (i <- 0 until 16) for (j <- 0 until 16) muls += UInt(i * j, width = 8) val tbl = Vec(muls) io.z := tbl((io.x << UInt(4)) | io.y) }
/** A n-bit adder with carry in and carry out */ class Adder(val n:Int) extends Module { val io = new Bundle { val A = UInt(INPUT, n) val B = UInt(INPUT, n) val Cin = UInt(INPUT, 1) val Sum = UInt(OUTPUT, n) val Cout = UInt(OUTPUT, 1) } //create a vector of FullAdders val FAs = Vec.fill(n){ Module(new FullAdder()).io } val carry = Vec.fill(n+1){ UInt(width = 1) } val sum = Vec.fill(n){ Bool() } //first carry is the top level carry in carry(0) := io.Cin //wire up the ports of the full adders for (i <- 0 until n) { FAs(i).a := io.A(i) FAs(i).b := io.B(i) FAs(i).cin := carry(i) carry(i+1) := FAs(i).cout sum(i) := FAs(i).sum.toBool() } io.Sum := sum.toBits().toUInt() io.Cout := carry(n) }
There are many resources available to get you started on SEJITS.
The CTree Framework is the latest greatest SEJITS implementation, with new features LLVM-IR backend, support for OpenCL and the OpenTuner AutoTuning engine.
The Hindemith Framework is a framework for composing high-performance code from Python descriptions.
The ASP Framework is the original SEJITS framework, with support for CUDA and C backends.
An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns by Leonard Truong, Chick Markley, Armando Fox. Conference Workshop on Stencil Computations 2014.
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication by James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, and Omer Spillinger. Proc. IPDPS 2013.
Bootstrapping Big Data in the Cloud by Peter Birsinger, Richard Xia, Armando Fox. Proc. 2013 SIAM Conference on Computational Science and Engineering.
High-Productivity and High-Performance Analysis of Filtered Semantic Graphs by Aydin Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams. Proc. IPDPS 2013.
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages, by Shoaib Kamil. UC Berkeley EECS Department Technical Report #2013-1.
Parallel High Performance Bootstrapping in Python, by Aakash Prasad, David Howard, Shoaib Kamil, Armando Fox. Proc. 11th Python in Science Conference (SciPy 2012)
Bringing parallel performance to python with domain-specific selective embedded just-in-time specialization by Shoaib Kamil, Derrick Coetzee, Armando Fox. Proc. 10th Python in Science Conference (SciPy 2011)
SEJITS: Getting Productivity And Performance With Selective Embedded JIT Specialization by Bryan Catanzaro and Shoaib Kamil and Yunsup Lee and Krste Asanovic and James Demmel and Kurt Keutzer and John Shalf and Kathy Yelick and Armando Fox. Proc. 2009 Workshop on Programming Models for Emerging Architectures (PMEA 2009).