Problem Description
Datacenter workloads have a very large code footprint. It means the number of instruction lines present are very large in numbers. As a result not all instruction lines can fit in L1I and even L2 cache. Thus instruction line misses are frequent during fetching of instructions which takes few hundred to thousand cycles to fetch the line from L3 or from main memory. Many a times, decode stage stalls due to unavailability of instructions in decode queue while the instruction lines are being fetched from higher levels of cache hierarchy. This is termed as Decode Starvation. My aim was to optimize processors so that decode starvation is minimized.
Papers
Video of my R&D presentation can be found here
Slides can be found here