blog Latest

Matrix Multiplication On GPU: Part 2, Tiling

Breaking down large matrix multiplications into tiles

Matrix Multiplication On GPU: Part 2, Tiling

8 min read

4 Oct 24

blog
Matrix Multiplication on GPU: Faster than Nvidia, Sometimes

10 min read

1 Oct 24

blog

Gradients of Softmax and Logsumexp

Essential functions for categorical distributions and attention mechanisms in machine learning

4 min read

4 May 24

blog

C++: Pattern Matching Template Types

How to check if a template type matches a pattern? Something like is_like_v<T, vector<int,_>>.

6 min read

2 Mar 24

blog

C++: Overloading the Spaceship Operator, A Recipe

How to overload the three-way comparison (spaceship) operator<=>, and a reminder to overload operator== as well.

3 min read

9 Feb 24

blog

C++: Check if a type is an instantiation of a given class template

How to implement an is_instance_of type trait.

3 min read

5 Feb 24

blog

C++: Forwarding references, overload resolution, and taking back control

Consider merging overloads into one function with forwarding reference parameters

5 min read

27 Jan 24

blog

C++: Disable implicit conversion in specific contexts only

You get one implicit conversion, so burn it with a wrapper

2 min read

14 Nov 23

blog

C++: Revisiting combinatorial instantiation of templates with std::variant

Compiler optimizations can break it, function attributes can fix it.

3 min read

16 Aug 23

blog

C++: Combinatorial instantiation of templates with std::variant

An alternative to explicit instantiations and macros.

3 min read

11 Jun 23

blog

C++: Avoiding Argument Dependent Lookup

A little trick using an extra namespace and cross-import.

2 min read

30 Apr 23

blog

Sums of Discrete Random Variables as Banded Matrix Products

Zero-stride catch and a custom CUDA kernel.

Sums of Discrete Random Variables as Banded Matrix Products

10 min read

16 Mar 23