Hi Nicolas,

Thanks for your interest.

It's hard to say if these proposals are applicable for a BSc thesis, because
I cannot really estimate how much work each of these proposals require.
But, I think this is not an issue, we can add or reduce some work later to make it applicable.

My main concern is that I am not really aware of any of these topics,
and I don't know how much I can help you as a supervisor for your thesis.

If I understood correctly, these are your own suggestions (not suggestions of Prof. Thomas Gross).
So, does this mean that you know a lot about these proposals? Do you have a clear plan?
Particularly, do you have experience with Lisp and the Cleavir compiler?
Do you already have some ideas on how to solve the problems of these proposals?
(I don't mean that you have to know all these in the beginning of your thesis,
but I am asking you because I need to know what kind of assistance you need from me.)


Regarding these three proposals, I think the third one about:
"Partial Inlining of Local Functions Using Local Graph Rewriting Local Functions" is more suitable for me.
However, as I said, I would like to know what is your background on this topic.


Regarding other proposals:
I have another proposal for a BSc thesis that I am very interested in.
It's also about linear pipelines but it doesn't require to extend LLVM or any compiler.
I haven't discussed it yet with Prof. Thomas Gross, but if you are interested in this topic we can talk all together.


Topic: Evaluation of the Piper scheduling algorithm for linear pipelines

Piper is an algorithm that integrates pipeline parallelism into the work-stealing scheduler of Cilk.

The conference paper is here:
http://dl.acm.org/citation.cfm?id=2486174

An extended article version is here:
http://dl.acm.org/citation.cfm?id=2809808


An implementation of the Piper algorithm is publicly available for Intel Cilk Plus.
https://www.cilkplus.org/piper-experimental-language-support-pipeline-parallelism-intel-cilk-plus

http://dl.acm.org/citation.cfm?id=2755610



I compare my research work with Piper, and Piper has very poor performance for fine-grained loops.
It doesn't scale well with a larger number of threads, but it's unclear where the overhead comes from.

There might be two main reasons:
a) the work-stealing scheduler performs poorly because a lot of stealing operations occur and degrade the overall performance
b) piper supports arbitrarily nested pipeline and fork-join parallelism, and the overhead may come from this additional flexibility

So, the goal of this thesis will be to experiment with Piper and understand when it performs well and when it does not perform well.
This implies that you have to install Intel Cilk Plus (with icc or g++) which includes the runtime system for Piper.
You will have to write some synthetic benchmarks, and I can also give you some real applications that I use for my research.

To ensure that you understood what is going on with Piper, you will have to explore the source code of the runtime system of Cilk which is relevant to Piper.
(The source code is publicly available, so you can download it, compile it and modify it.)
For example, you can collect some useful statistics about the performance of Piper by adding some simple lines of code in the runtime system.

These steps do not require any prior knowledge. However, depending on your progress and your results,
it may be useful (and very likely) to modify the runtime system of Piper in order to remove the flexibility for arbitrarily nested pipeline and fork-join parallelism.
By doing that, you can compare the original Piper implementation with a lighter implementation that does not have this flexibility.
This will make clear to us if the overhead is inherent in the work-stealing scheduler or it comes from the additional flexibility.
Moreover, depending on your results, you may try to modify the scheduler of Piper in a simple way in order to make it more efficient.

All these steps do not require to modify the Intel Cilk Plus compiler. You will not work on the compiler.
You will focus only on the runtime system and especially on whatever is relevant to Piper and linear pipelines.

If you will find some results that explain the performance of Piper, I think this work could be submitted for publication to some Workshop for Scheduling Strategies.

The advantage of this topic is that I can help you with every step because I know what to expect and I can make suggestions on how you can make progress.
So, if you are interested in it, we can discuss about more details.

To my opinion this is a reasonable topic for a BSc thesis.
I am not sure only about the difficulty of removing the flexibility for nested parallelism from Piper.
But, since you have to remove part of the runtime system and not to extend it, I think it will also be reasonable.

Best regards,

Aristeidis M. Mastoras
Research Assistant
Dept. of Computer Science
ETH Zurich, Switzerland