Kantheti-F24.tex

\subsection{Raja Kantheti}
I am a master's student in Computer Science at UCCS. I want to choose the thesis path to satisfy the degree requirements. My expected course outcomes are to learn what it means to conduct research, how to write scientific papers, better articulate my ideas and evaluate their novelty, and write at least one paper of any type by the end of the semester.

I want to do my thesis on processor pipeline design, which would optimize branch prediction using additional prefetching and decoding units in parallel with the central decode unit. This pipeline could also mitigate SPECTRE attacks, which exploit Speculative execution. 

I am determined to evaluate my thesis proposal, understand the necessary steps to evaluate the outcomes of my thesis, and grasp the elements of a successful proposal. I am also excited to challenge myself, to see if I have the stamina for long-term research and if this path is the right one for my career progression.

Some personal things about me outside of academia are that I like to be in solitude from time to time, lost in my thoughts and devices, contemplating meta-ethics, talking and debating with myself, and exploring all possibilities. The routines that would allow me to do this are my hobbies: Long walks and longer drives, camping alone, hiking in state/national parks and, wood carving. 
\begin{figure}[h]
\centering
\includegraphics[width=0.25\linewidth]{images/IMG_1667.JPG}
\caption{'tis I. }
\end{figure}

\section*{Output of the gem5 simulator: Running a test program: }
gem5 is a cycle accurate simulator widely used in Computer Architecture Research. 
It is used to simulate the behavior of a computer system. 
The output of the gem5 simulator is a trace of the instructions executed by the processor. 
The trace is a list of instructions executed by the processor, along with the cycle number at which the instruction was executed.
The trace can be used to analyze the performance of the processor, and to identify bottlenecks in the processor design.

The simulation will produce a file with numerous metrics below are some of the metrics used by me for the survey paper.\\
This is a simulation for risc-v ISA on a MINOR CPU for a  likedlist program. 

simSeconds      0.001541 \# Number of seconds simulated (Second)\\
simTicks      1540878200 \# Number of ticks simulated (Tick)\\
finalTick     1540878200 \# Number of ticks from beginning of \\simulation (restored from checkpoints and never reset) (Tick)\\
simFreq     1000000000000 \# The number of ticks per simulated second ((Tick/Second))\\
hostSeconds         0.22 \# Real time elapsed on the host (Second)\\
hostTickRate  6962621095 \# The number of ticks simulated per host second (ticks/s) ((Tick/Second))\\
hostMemory       1158876 \# Number of bytes of host memory used (Byte)\\
simInsts          113770 \# Number of instructions simulated (Count)\\
simOps            113776 \# Number of ops (including micro ops) simulated (Count)\\
hostInstRate      513943 \# Simulator instruction rate (inst/s) ((Count/Second))\\
hostOpRate        513963 \# Simulator op (including micro ops) rate (op/s) ((Count/Second))\\
board.processor.cores0.core.branchPred.lookups        29946 \# Number of BP lookups (Count)\\
board.processor.cores0.core.branchPred.condPredicted  21139 \# Number of conditional branches predicted (Count)\\
board.processor.cores0.core.branchPred.condIncorrect  462 \# Number of conditional branches incorrect (Count)\\
board.processor.cores0.core.branchPred.BTBLookups     10899 \# Number of BTB lookups (Count)\\
board.processor.cores0.core.branchPred.BTBUpdates     396 \# Number of BTB updates (Count)\\
board.processor.cores0.core.branchPred.BTBHits        10187 \# Number of BTB hits (Count)\\
board.processor.cores0.core.branchPred.BTBHitRatio    0.934673 \# BTB Hit Ratio (Ratio)\\
board.processor.cores0.core.branchPred.RASUsed         2019 \# Number of times the RAS was used to get a target. (Count)\\
board.processor.cores0.core.branchPred.RASIncorrect     5 \# Number of incorrect RAS predictions. (Count)\\
board.processor.cores0.core.branchPred.indirectLookups  1790 \# Number of indirect predictor lookups. (Count)\\
board.processor.cores0.core.branchPred.indirectHits     1748 \# Number of indirect target hits. (Count)\\
board.processor.cores0.core.branchPred.indirectMisses    42 \# Number of indirect misses. (Count)\\
board.processor.cores0.core.branchPred.indirectMispredicted  24 \# Number of mispredicted indirect branches. (Count)\\

\section*{Questions For me: }
Your work looks interesting from a computer architecture perspective. However, you have mentioned that it is based on risc-v architecture, in the current context, how is the scenario of this architecture being used in real world applications? What do you see the future prospects of it?
<<<<<<< HEAD
How do you think self reflection has helped you in designing your thesis? DC.
=======
$\longrightarrow$ RISC-V is an open-source instruction set architecture (ISA) based on reduced instruction set computing (RISC) principles. 
It is designed to be simple, extensible, and easy to implement. RISC-V is gaining popularity in the industry, with companies like NVIDIA, Western Digital, and SiFive adopting it for their products. 
The future prospects of RISC-V are bright, as it offers a flexible and customizable architecture that can be tailored to specific applications. 
It is also supported by a vibrant open-source community that is driving innovation and development in the RISC-V ecosystem.

\subsection{Questions from Aaron McKay}
1. Given your interest in optimizing branch prediction and mitigating SPECTRE attacks, what insights did you gain from the gem5 simulation results showing 462 incorrect conditional branch predictions out of 21,139 predictions? How might this inform your thesis work?

$\longrightarrow$ The 462 incorrect conditional branch predictions indicate that the pipeline design is not optimally utilizing branch prediction. The workload was a sieve of eratosthenes implementation is a compute intennsive workload, which has a predictable braching pattern. I was only running this as a sample orkload on the simuator. I think it would be best to implemennt SPCINT95 and see what that would give to answer this question more accurately.

2. Your personal interests include contemplating meta-ethics and exploring possibilities. How has this philosophical approach influenced your thinking about processor security, particularly regarding the ethical implications of speculative execution vulnerabilities?

$\longrightarrow$ Interesting Question. I think the philosophical approach has influenced my thinking about processor security by encouraging me to consider the broader ethical implications of speculative execution vulnerabilities. Now taht I think about it I always thought performace shouldn't exist at thee cost of security.
Speculative execution is a powerful optimization technique that can improve performance but it also introduces security risks. By contemplating meta-ethics and exploring possibilities, I have come to appreciate the importance of balancing performance and security in processor design. I believe that it is essential to consider the ethical implications of speculative execution vulnerabilities and to develop secure processors that prioritize user privacy and data security.
Thanks for the question this has beeen interesting to think about. 

3. How do the additional prefetching and decoding units in your pipeline design improve branch prediction compared to traditional methods?

$\longrightarrow$ Imgine a super scalar pipe line wiht all the functional units. The current problem is the overhead that a processor gets whenn there is branch misprediction. Processors also has speculative execution whcih makes them vulnnerabe to spectre attacks. 
I got to thinking, what if the branch address that is required is already prefetched and decoded ready for the execution. Speculative executiion onnly happens when the instruction is in EX functional unit. So, if we can some how has the branch outcome address that is readdy to be executed with out squashing the entire pipeline, that would be a significant performance gain and the squashing is only performed on the ssecondary fetch and decode units. 
>>>>>>> f5f73c903e279eec7ebc866e4ec7f6f5712fb788