In this video, we’ll apply the divide and

conquer algorithm design paradigm to the problem of multiplying matrices. This will

culminate in the study of Strassen’s Matrix Multiplication Algorithm. And this

is a super cool algorithm for two reasons. First of all, Strassen’s Algorithm is

completely non trivial. It is totally non-obvious, very clever. Not at all clear

how Strassen ever came up with it. The second cool feature, is, it’s for such a

fundamental problem. So computers, as long as they’ve been in use, from the time they

were invented, up’til today, a lot of their cycles are spent multiplying

matrics, ’cause it just come up all the time in important applications. So let me

first just make sure we’re all clear on what the, what the problem is of.

Multiplying two matrices. So, we’re gonna be interested in three matrices, X, Y, and

Z. They’re all gonna, I’m gonna assume they all have the same dimensions, N by N.

The ideas we’ll talk about are also relevant for multiplying non square

matrices, but we’re not gonna discuss it in this video. The entries in these

matrices, you know, you could think of it as whatever you want. Maybe they’re

integers, maybe they’re rationals, maybe they’re from some [inaudible] field. It

depends on the application. But the point is, they’re just entries that we can add

and multiply. So how is it that you take two N by N matrices, X and Y, and multiply

them producing a new N by N matrix, Z? Well, recall that the IJ entry of Z, that

means the entry in the Ith row and Jth column, is simply the dot product of the

Ith row of X with the Jth column of Y. So if IJ was this red square, this

[inaudible] over in the Z matrix, that would be derived from the corresponding

row of the X matrix, and the corresponding column of the Y matrix. And recall what I

mean by dot product. That just means you take the products of the individual

components, and then add up the results. So ultimately, the ZIJ entry boils down to

a sum over N things, where each of the constituent products is just the XIK

entry. The [inaudible] of the matrix X with the KJ entry, of the matrix Y, where

your K is ranging from one to N. So that’s how ZIJ is defined for a given pair of

indices, I and J. One thing to note is where we often use N to denote the input

size, here we’re using N to note the dimension of each of these matricies. The

input size is not N. The input size is quite a bit bigger than N. Specifically,

each of these are N by N matricies and contain N squared entries. So since

presumably we have to read the input which has size and square. Which happens to

produce the output that also has size and squared. The best we can really hope for

[inaudible] is multiplication hour with the running time n squared. So the

question is how close when we get to it. Before we talk about algorithms for matrix

multiplication, let me just make sure we’re all crystal clear on exactly what

the problem is. So let’s just actually spell out what would be the result of

multiplying two different, two bytes of matrices. So we can. [inaudible] 21

generic 2×2 matricies by just giving the first one entries A, B, C, and D for these

four entries could all be anything. And then we’re multiplying by a second 2×2

matrix, let’s call it entries E, F, G, and H. Now, what’s the result of multiplying

these, where again, it’s going to be a 2×2 matrix for each entry, it’s just the

corresponding dot product of the relevant row of the first matrix and column of the

second matrix. So to get the upper left entry. You take the doc product of the

upper row of the first matrix and the first column of the left column of the

second matrix. So, that results in. Ae plus BG. To get the upper right entry, we

take the dot product of the top row of the left matrix with the right column of the

second matrix. So that gives us AF+BH. And then filling in the other entries the same

way, we get CE+DG and DF+DH. You know, so that’s multiplying two matrices, and we’ve

already discussed the definition in general. Now, suppose you had to write a

program to actually compute to the result of multiplying two N by N matrices. One

natural way to do that would just be to return to the definition and which defines

each of the N squared entries in the Z matrix as a suitable sum of products of

[inaudible] entries of the X and Y matrices. So on the next quiz, I’d like

you to. Figure out what exactly would be the running time of that algorithm as a

function of the matrix dimension N where as usual we count the addition or

multiplication of two individual entries as a constant time operation. So the

correct response to this quiz is the third answer, that the running time of the

straightforward [inaudible] algorithm runs in cubic time relative to the matrix

dimension n. To see this let’s just recall what the definition of the matrix

multiplication was. The definition tells us each entry zij of the output matrix z

is defined as the sum from k=1 to n of. Xik times YKJ. That is the [inaudible]

product of the [inaudible] row of the X matric and the J column of the Y matrix.

Certainly assuming that we have the matrices represented in a way that we can

access a given entry in constant time. And under that assumption, remember each of

these, each of these products. Only takes constant time. And so then to compute ZIJ

we just have to add up these end products. So that’s gonna be theta of N time to

compute a given ZIJ and then there’s an N squared [inaudible] that we have to

compute. There’s N choices for I, N choices for J, so that gives us N squared

times N or cubic running time overall for the natural algorithm, which is really

just a triple for-loop which computes each entry of the output ray separately

using the dot product. So the question as always for the keen algorithm designer is.

Can we do better? Can we beat, in cube time, by multiplying two matrices? And we

might be especially emboldened with the progress that we’ve already seen in terms

of multiplying two integers. We apply the divide and conquer algorithm, to problem

multiplying two end digit integers. And we had, both a naive recursive algorithm, and

a seemingly better. Algorithm due to [inaudible], which made only three

recursive calls. Now we haven’t yet analyzed the running time of that

algorithm. But as we’ll see later, that does indeed beat the quadratic running

time of the grade school algorithm. So it’s very natural to ask, can we do

exactly the same thing here? There’s the obvious [inaudible] algorithm, which

follow straight from the definition. Perhaps analogous to [inaudible], we could

have some clever divide and conquer method, which beats cubic times. So that’s

what we’re gonna explore next. [sound]. Let’s recall the divide and conquer

paradigm, what does it mean to use it. Well, we first have to identify smaller

problems. So if we want to multiply by two NxN matricies we have to identify

multiplications of smaller matricies that we can solve recursively. Once we’ve

figured out how we want to divide the given problem into smaller ones, then in

the conquer step we simply invoke our own algorithm recursively that’s going to

recursively multiply the smaller matricies together. And then, in general, we’ll have

to combine the results of the recursive calls to get the solution for the original

problem, in our case, to get the product of the original two matricies. From the

product of what ever sub matrices we identify. So how would be apply the divide

and conquer paradigm to matrices? So we’re given two N by N matrices, and we have

to somehow identify smaller pairs of square matrices that we can multiply

recursively. So the idea, I think, is fairly natural. So we start with a big N by N matrix X. And so those N rows and N columns, we have to somehow divide

into smaller pieces. Now, the first thing you might think about is that you put it

in its left half and its right half and now it goes into what we’ve been doing

with the rays, but then we’re going to break X into two matrices which are no

longer squared which are N over two in one dimension and have length N in the other

dimension and we want to recursively call a subroutine that multiplies square

matrices. So what seems like the clear thing to do is to divide X into quadrants.

Okay, so we have four pieces of X. Each is gonna be N over two by N over two,

corresponding to the different quarters of this matrix. So let’s call these different

quadrants or blocks, in matrix terminology, A, B, C, and D. All of these

are N over two by N over two matrices. As usual, for simplicity, I’m assuming that N

is even, and as usual, it doesn’t really matter. And we can do the same trick with

Y. So we’ll divide Y into quadrants. And number two by N number two matrices which

we’ll call E, F, G and H. So one thing that’s cool about matrices, is when you

split them into blocks, and you multiply them, the blocks just behave as if they

were atomic elements. So what I mean by that is that the product of X and Y can be

expressed in terms of its quadrants and each of its four quadrants, each of its

four corners can be written as a suitable arithmetic expression of the quadrants of

X and Y. So here’s exactly what those formulas are. They are exactly analogous

to when we just multiplied pair of two by two matrices. So I’m not going to formally

prove this fact. I’m sure many of you, have seen it before or are familiar with

it. And if you haven’t, it’s actually quite easy to prove. It’s not obvious, you

can’t see it off the top of your head, necessarily. But if you go back to the

definition, it’s quite easy to verify. The D, when you multiple X and Y, you can

express as quadrants to blocks, in terms of the blocks of X and Y. So what we just

did is completely analogous to when talking about integer multiplication and

you wanted to multiply two integers, little x and little y, and we broke them

into pairs of N over two digits. And then we just took the expansion and we observed

how that expansion could be written in terms of products of N over two digit

numbers. Same thing going on here except with matrices. So now, we’re in business,

as far as a recursive approach. We wanna multiply X and Y. They’re N by N matrices.

We recognize we can express that product X times Y, in terms of the products of N

over two by N over two matrices. Things we’re able to multiply recursively, plus

additions. And additions [inaudible] clearly easy to multiply two different

matrices with say, N squared entries each, it’s gonna be linear in the number of

entries. So it’s gonna be N squared [inaudible] add two matrices that are N by

N. So this immediately leads us to our first recursive algorithm. To describe it,

let me quickly rewrite that expression we just saw on the previous slide. And now,

our first recursive algorithm is simply to evaluate all of these expressions in the

obvious way. So specifically, in step one, we recursively compute all of the

necessary products, and observe that there are eight products that we have to

compute. Eight products of N over two by N over two matrices. There are four entries

in this expansion of X times Y. Each of the, each of the blocks is the sum of two

products, and none of the products re-occurred, they’re all distinct. So,

naively, if you wanna evaluate this, we have to eight different products of N over

two by N over two matrices. Once those recursive calls complete, then all we do

is do the, necessary four additions. As we discussed, that takes time proportional to

the number of entries in a matrix. So this is gonna take quadratic time overall,

quadratic [inaudible] N, linear in the number of entries. Now, the question you

should be asking is. Is this a good algorithm? Was this good for anything,

this recursive approach, splitting X and Y into these blocks, expanding the product

in terms of these blocks, the recursively computing each of the blocks. And I want

to say it’s totally not obvious, it is not clear what the running time of this

recursive algorithm is. I’m going to go ahead and give you a spoiler which is

going to follow from the master method that we’ll talk about in the next lecture.

But it turns out with this kind of recursive algorithm where you do eight

recursive calls, each on a problem with dimensions half as much as what you

started with, and then do quadratic [inaudible] outside. The right time is

going to be. Cubic. So exactly the same as with the straightforward iterative

algorithm that follows from the definition. That was cubic, it turns out,

and that was clearly cubic. This one, although it’s not obvious, is cubic as

well. So no better, no worse than the straightforward iterative algorithm. So in

case you’re feeling disappointed that we went through all this work and this sort

of seemingly clever divide and conquer approach for matrix multiplication, and,

and came out at the end no better than the interactive algorithm. Well, there’s

really no reason to despair, cause remember, back in integer multiplication, we had a

straightforward recursive algorithm where we had to do four recursive calls,

products of N over two digit numbers. But then, we had [inaudible] trick which said,

oh, if we only did more clever products and more clever additions and

subtractions, then we can get away with only three recursive calls. And we’ll see

later that, that isn’t even significant savings, in the time [inaudible]

multiplication. And we’ve done nothing analogously. [inaudible] douse’s trick, it

was matrix multiplication problem. All we did is the naive expansion in terms of

sub-matrices, and the naive evaluation of the resulting expressions. So. $64,000

question is then, can we do something clever to reduce the number of recursive

calls from eight down to something lower and that is where Strassen’s algorithm

comes in. So at a high level, Strassen’s algorithm has two steps, just like the

first recursive algorithm that we discussed. It recursively computes some

products of smaller matrices and over two [inaudible] by two matrices. [sound] But

there’s only going to be seven of them. But they will be much less

straightforward, they will be much more cleverly chosen than in the first

recursive algorithm. [sound]. And step two, then, is to actually produce the

product of X and Y, produce each of those four blocks that we saw, with suitable,

additions and subtractions of these seven products. And again, these are much less

straightforward than in the first recursive algorithm. [sound]. And so while

the additions and tractions involved will be a little bit more numerous, then they

were in the naive recursive algorithm. It’s only gonna change the work in that

part of the algorithm by a constant factor. So we’ll still spend only theta

even squared work adding and subtracting things. And we get a huge win in

decreasing the number of recursive calls from eight to seven. Now, just in case you

have the intuition that shaving off one of the recursive calls. Should only decrease

the running time of an algorithm by one-eighth, by 12.5%, in fact it has a

tremendously more amplified effect because we do one less recursive call over and

over and over again as we keep recursing in the algorithm. So it makes a

fundamental difference in the eventual running time of the algorithm, as we’ll

explore in detail in the next set of lectures, when we discuss the master

method. So again. A bit of a spoiler alert. What you’re gonna see in the next

set of lectures is indeed. [inaudible] Rhythm does beat [inaudible]. It’s better

than [inaudible]. You’ll have to watch the next set of lectures just so you know what

the running time is. When right now, that [inaudible] call is what changes the

[inaudible] cubic. Now, 1969 was obviously, quite a bit before my time,

but. By all accounts, from people I’ve talked to who were around then, and from,

you know, what the books say, Strassen’s Algorithm totally blew people’s minds at

the time. Everybody was assuming that there’s no way you could do better than

the iterative algorithm, the cubic algorithm. It just seemed that matrix

multiplication intuitively fundamentally required all of the calculations that are

spelled out in the definition. So Strassen’s Algorithm is an early glimpse

of the magic and of the power. Of clever algorithm design. That if you really have

a serious ingenuity, even for super fundamental problems, you can come up with

fundamental savings over the more straightforward solutions. So those are

the main points I wanted to talk about Strassen’s Algorithm, how you can beat

cubic time by saving a recursive call with soon to be chosen clever products and

clever addition subtractions. Maybe a few of you are wondering, you know, what are

these cleverly chosen products? Can you really do this? And I don’t blame you.

There’s no reason to believe me, just cuz I sort of spelled out this [inaudible]

idea. It’s not obvious this should work. You might actually want to see the

products. So, for those of you like that, this last slide is for you. So here is

Straus’ algorithm in it’s somewhat gory detail. So let me tell you what the seven

products are that we are going to form. I’m going to label them P1 through P7 and

they’re all going to be defined in terms of the blocks of the inter-matrices X and

Y. So let me just remind you that we think of X in terms of it’s blocks, A B C D. And

we think of Y in terms of its blocks E, F, G, H. And remember, A through H are all N

over two by N over two sub-matricies. So here are the seven products, P1 through

P7. P1 is A times quantity F minus H. P2 is quantity A plus B times H. P3 is C plus

D times E. [sound]. P4 is D times G – E, P5 is quantity A+D + quantity A+H. P six

is quantity B minus D times quantity G plus H and finally P seven is quantity A

minus C E plus F. So I hope you’ll agree that these are indeed only seven products,

and we could compute these with seven recursive calls. We’ve preprocessed with a

little bit of additions and subtractions. We have to compute F minus H, A plus B, C

plus D and so on. We compute all these new matrices from the blocks, and we can then

recursively, with seven recursive calls, do these seven products that operate on N

over two by N over two matrices. Now, the question is, why is this useful? Why on

Earth would we wanna know the [inaudible] of these seven products? So the amazing

other part of the algorithm is that from just these seven products, we can, using

only addition and subtraction, recover all four of the blocks of. A X times Y So x

times y. You’ll recall we expanded because of the blocks. So we previously computed

this to be AE+BG in the upper left corner, and [inaudible] expressions for the upper

right, lower left, and lower right blocks. So this we already know. So the content of

the claim is that these four blocks also arise from the seven products in the

following way. So the claim here is that these two different expressions for X

times Y are exactly the same and they’re the same block by block. So in other

words, what the claim is then this. Crazy expression. P five plus P four minus P two

plus P six. Where those were four of the products we listed above. That is

precisely A plus B G. Similarly we’re, we’re claiming that P1 plus P2 is exactly

AF plus BH. That’s actually easy to see. P3 plus P4 is CE plus DG. That’s also easy

to see, and then the other [inaudible] one is that P1 plus P5 minus P3 minus P7 is

exactly the same as CF plus DH, so all four of those hold. So let me just so you

believe me cuz I don’t know why you would believe me unless I actually showed you

some of this derivation. Let’s just look at the proof of one of the cases of the

upper left corner. So that is, let’s just expand out this crazy expression.

P5+P4-P2+P6, what do we get? Well, from P5, we get AE+AH+D+DH, and we add P4, so

that’s gonna give us, Plus DG minus DE, then we subtract P2, so that it gives us a

minus, minus DH and then we add in P6. So that gives us a PG plus BH minus DG minus

DH. Okay, so what happens next, well now we look for cancellations. So we cancel

the H’s. We cancel the D.E.’s, we cancel the D.H.’s. We cancel the DGs. We cancel

the BHs. And holy cow what do we get, we get A, E. Plus B G. That is, we get

exactly what we were suppose to. In. The upper left block of X times Y. So we just

actually verified that this equation holds for the upper left block. It’s quite easy

to see that it holds for the upper right and lower left blocks and a comparable

calculation verifies it for the lower right blocks of the two. So summarizing,

because this claim holds.>>Because, actually, we can recover the four blocks

of S times Y from the seven products. Strauss’ algorithm works in the following

way: you compute the seven products, P1 through P7, using seven recursive calls.

Then you just compute the four blocks using some extra additions and

subtractions as shown in the claim. So seven recursive calls on a number two by

number two matrices, plus. N squared work to do the necessary additions as we’ll see

on the master method lecture that is actually sufficient for. Sub humid time.

Now I sympathize with you if you have the following question. Which is how on Earth

did Strauson come up with this? And indeed, this sort of illustrates, the

difference between checking somebody’s proof, and coming up with a proof. Given

that I told you the magical seven products and how you, from them, can recover the

four desired blocks of x times y, it’s really just mechanical to see that it

works. It’s a totally different story of how you come up with p1 through p7 in the

first place. So how did Strassen come up with them? Honestly, your guess is as good

as mine.

Your videos are very well explained and you simplify these concepts greatly!

FINALLY, someone not from IIT actually explains this with visual examples and goes beyond 2×2 matrices

Since Strassen lives in my city, should I ask him how he came up with that? π