Oldest known version of this page was edited on 2005-10-30 04:32:28 by RichardBerg []
Page view:
(originally posted
here∞)
Hi Glen, glad to see
MacSTL is active -- I like what I've seen so far. I don't have an SSE3-capable machine in front of me for testing, so don't take anything here as gospel when implementing the canonical vector library of record. As you probably know, unaligned loads are rarely the Right Thing(tm) to begin with...
1) Yes. The P4 doesn't have μops for any vector unaligned loads. They didn't "upgrade" MOVDQU because it would destabilize existing code. (It's now possible for another thread to change the CPU/memory state in between the two underlying loads, which authors of MOVDQU would not anticipate.)
2) One would hope that simple memcpy-style streaming could be done with aligned operations. The speed of the memory operation will start to outweigh the extra edge-case logic at a very trivial data size. That said, if you must go unaligned, I don't know which op would fare better. My guess is it depends on a lot of minute factors and would need very astute profiling. Barring that investment, you may as well stick with MOVDQU considering it's supported on a much wider range of processors.
3) Because it's 2 DQW-aligned loads instead of 2 DW-aligned loads.
| | | |*|*|*|*| | the DQW I want
| | | |A|A|B|B| | what MOVDQU grabs - note AA crosses cache lines
|A|A|A|A|B|B|B|B| what LDDQU grabs - each load is aligned
48 64 80 (offset '64' is a cache line boundary)
At least, that's how I understand it. I don't have a P4.
4) I don't think the extra loads will be "factored out," no. The only documented short-circuit comes if the addresses you're asking for are already DQW-aligned, in which case LDDQU *should* degrade gracefully to MOVDQA. (In terms of μops seen by the back end; no guarantee it will have the same decode throughput). Whether the difference will be made moot by the cache, who knows...since it's not atomic and cache pressure is a huge component of streaming perf, I don't recommend trying except for fun.
Back to
CollectedWritings