I want to share some tricks for making Matlab function more efficient and robust. I decide to write a series of blog posts. This is the first one of this series, in which I want to show a simple function for computing pairwise Euclidean distances between points in high dimensional vector space.
function D = dumDistance(X,Y) n1 = size(X,2); n2 = size(Y,2); D = zeros(n1,n2); for i = 1:n1 for j = 1:n2 D(i,j) = sum((X(:,i)-Y(:,j)).^2); end end
where, is column vector of squared norms of all vectors in X. We can directly implement this by a one-liner:
function D = sqDistance(X, Y) D = bsxfun(@plus,dot(X,X,1)',dot(Y,Y,1))-2*(X'*Y);
In my rough tests, the new code is about 60+ times faster than the old one for random dense matrices. For sparse matrices, the gap is even larger. That is just the power of vectorization.
If you want to compute the pairwise distances between all point pairs in a point set, you can simply replace the Y with X in the function. Matlab will do extra optimization for the matrix product like X’*X, where only half of the computation is done. A by-product is that the result matrix D is ensured to be symmetric, which usually cannot be guaranteed due to numerical representation of float number.
The takeaway of this post might be:
(2) Matrix multiplication like M=X’*X is efficient (cost half time of ordinary matrix multiplication) and D is guaranteed to be symmetric.
(3) Write your algorithm in Matrix form before writing the code.