ترغب بنشر مسار تعليمي؟ اضغط هنا

A long-standing practical challenge in the optimization of higher-order languages is inlining functions with free variables. Inlining code statically at a function call site is safe if the compiler can guarantee that the free variables have the same bindings at the inlining point as they do at the point where the function is bound as a closure (code and free variables). There have been many attempts to create a heuristic to check this correctness condition, from Shivers kCFA-based reflow analysis to Mights Delta-CFA and anodization, but all of those have performance unsuitable for practical compiler implementations. In practice, modern language implementations rely on a series of tricks to capture some common cases (e.g., closures whose free variables are only top-level identifiers such as +) and rely on hand-inlining by the programmer for anything more complicated. This work provides the first practical, general approach for inlining functions with free variables. We also provide a proof of correctness, an evaluation of both the execution time and performance impact of this optimization, and some tips and tricks for implementing an efficient and precise control-flow analysis.
Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks. This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا