Notes about vectorization in CPUs
Published:
Some notes while thinking about vectorization and its connection with CPU clock rate
Published:
Some notes while thinking about vectorization and its connection with CPU clock rate
Published:
Summarising results from my experiments in transferring data (pytorch tensors) between GPUs in a fully parallel fashion, i.e. non-blocking on host CPU, and also on the sender and receiver GPUs.
Published:
Estimating with a small experiment, the cache line size of a cpu (Apple M1 in this case) using Python code.
Published:
Understanding with a small experiment, the feature of pytorch of (almost) always having computations in GPU in an asynchronous fashion.
Published:
Notes about some possible in-place operation cases with tensors in pytorch.
Published:
Useful post about how one can think about matrix-vector multiplication in two different ways, which can be useful in different contexts.
Published:
Guide to thinking about what happens when one calls tensor.backward() in pytorch.