I met with fellow Parrot hackers
allison++, cotto++
and chromatic++ recently in Portland, OR (it was jokingly called YAPC::OR on
IRC) to talk about what we call
M0.
M0 stands for "magic level 0" and it is a refactoring of Parrot internals in a
fundamental way.
cotto++ and I have been hacking on a detailed spec (over 35 pages now!) and a
"final prototype"
in Perl 5 in the last few weeks. M0 is as "magic level 0",
which means it consists of the most basic building blocks of a virtual machine,
which the rest of the VM can be built with. The term "magic" means high-level
constructs and conveniences, such as objects, lexical variables, classes and
their associated syntax sugar. M0 is not meant to be written by humans, except
during bootstrapping. In the future, M0 will be probably be generated from
Parrot Intermediate Representation (PIR), Not Quite Perl 6 (NQP) or other High Level Languages (HLLs).
The most important reason for M0 is to correct the fact that too much of Parrot
internals are written in C. Parrot internals is constantly switching between
code written in PIR, other HLL's such as NQP and C. Many types of optimizations
go right out the window when you cross a language boundary. It is best for a
virtual machine to minimize crossing language boundaries if an efficient
JIT compiler
is wanted, which we definitely desire. Since many hotpaths in Parrot
internals cross between PIR and C, they can't be inlined or optimized as much
as we would like.
A few years back, Parrot had a JIT compiler, from which many lessons were
learned. I am sure some people were frustrated when we removed it in 1.7.0 but
sometimes, it is best to start from a clean slate with many more lessons
learned under your belt. Our old JIT did support multiple architectures but
required maintaining a "JIT version" of every opcode on each architecture
supported. Clearly, this method was not going to scale or be maintainable.
I will venture to say that M0 is the culmination of the lessons learned from
our failed JIT. I should note that "failure" does not have a negative
connotation in my mind. Indeed, only through failure are we truly learning. If
you do something absolutely perfectly, you aren't learning.
We are at an exciting time in Parrot's history, in that for a long time, we
wanted an elegant JIT, using all the latest spiffy techniques, but it was
always an abstract idea, "just over there", but not enough to grab a-hold of. A
new JIT that meets these goals absolutely requires something like M0, and is
the driving force for its design. M0 will pave the way for an efficient JIT to
be implemented on Parrot.
M0 currently consists of under 40 opcodes from which (we wager) all the rest of
Parrot can be built upon. This is radically different from how Parrot currently
works, where all of the deepest internals of Parrot are written in heavily
macroized ANSI 89 C.
M0 has a source code, i.e. textual form and a bytecode form. chromatic++
brought up a good point at the beginning of the meeting about the bytecode file
containing a cryptographic hash of the bytecode. This will allow one to
distribute bytecode which can then be cryptographically verified by whoever
eventually runs the bytecode. This is a very "fun" application of cryptography
that I will be looking into further.
allison++ brought up some good questions about how merging bytecode files would
be done. We hadn't really thought about that, so it lead to some fruitful
conversation about how Parrot Bytecode (PBC) is currently merged, what it does
wrong, and how M0 can do it less wronger.
We then talked about what exactly a "Continuation" in M0 means, and tried to clear
up some definitions between what is actually meant by Context, State and Continuation.
chromatic++ also mentioned that an optional optimization for the garbage
collector (GC) would be for it to create a memory pool solely to store
Continuations, since they will be heavily used and many of them will be
short-lived and reference each other, so having them in a small confined memory
region will reduce cache misses. We are filing this under "good to know and we
will do that when we get there."
Next we turned to concurrency, including how we would emulate the various
concurrency models of the languages we want to support, such as Python's Global
Interpreter Lock (GIL).
We decided that M0 will totally ignorant of
concurrency concepts, since it is a "magical" concept that will be implemented
at a higher level. We have started to refer to the level above M0 as M1 and
everything above M0 as M1+.
allison++ also mentioned that many innovations and optimizations are possible in
storing isolated register sets for each Continuation (a.k.a call frame). This
area of Parrot may yield some interesting surprises and perhaps some
publishable findings.
We all agreed that M0 should be as ignorant about the GC as possible, but the
GC will most likely learn about M0 as optimizations are implemented. The
pluggability of our GC's were also talked about. allison++ raised the question
"Are pluggable GC's easier to maintain/implement if they are only pluggable at
compile-time?" Indeed, they probably are, but then we run into the issue that
our current "make fulltest" runs our test suite under different GC's, which
would require multiple compiles for a single test suite run. chromatic++ made a
suggestion that we could instead make GC's pluggable at link-time (which would
require a decent amount of reorganization) which would still allow developers to
easily test different GC's without recompiling all of Parrot. chromatic++'s
estimate is that removing runtime pluggability of GC's would result in an
across the board speed improvement of 5%.
This conversation then turned toward the fact that M0 bytecode might depend on
what GC was used when it was generated, i.e. the same M0 source code run under
two different GC's would generate two different bytecode representations. This
would happen if the M0 alloc() opcode assumes C calling conventions. This was
generally deemed distasteful, so our alloc() opcode will not "bake in C
assumptions", which is a good general principle, as well. This will be a fun
test to write.
allison++ brought up the fact that we may need a way to tell the GC "this is
allocated but uninitialized memory", a.k.a solve the "infant mortality"
problem. chromatic++ suggested that we could add some kind of lifespan flag to
our alloc opcode (which currently has an arbitrary/unused argument, since all
M0 opcodes take 3 arguments for symmetry and performance reasons). This could
be as simple as hints that a variable is local or global, or a more detailed
delineation using bit flags.
It was also decided that we didn't need an invoke opcode and that invoke properly
belongs as a VTABLE method on invokables.
We also talked about the fact that register machines greatly benefit from
concentrating VM operations on either the caller or the callee side. Looking
for more references about this. It seems that the callee side seems to be
what we will try for, but I am not quite sure why.
We finally talked about calling conventions and decided that goto_chunk should
roughly be equivalent to a jmp (assembly unconditional jump to address) and
the invoke VTABLE would setup a return continuation (i.e. make a copy of the
program counter), do a goto_chunk, and let the callee handle the rest, such
as looking up a return continuation and invoking it.
After the main M0 meeting, cotto++, allison++ and I sat down at a coffee shop
and came up with a list of next actions for M0:
- Write a recursive version of 'calculate the n-th Fibonacci number' in M0
- Write a simple checksum algorithm in M0 (suggestions?)
- Create a working PMC in M0
- M0 disassembler
- Create a "glossary brochure for Github cruisers"
- Implement function calls and returns
- Make sure each M0 opcode is tested via Devel::Cover
- Convert the M0 assembler to C
- Convert the M0 interpreter to C
- Link M0 into libparrot (no-op integration)
I have been talking to cotto++ on IRC while typing up these notes and we have
come to the conclusion that a "bytecode verifier"
should also be put on that
list. A verifier is a utility that detects invalid bytecode and prevent attacks
via malicious bytecode. This is something that happens at runtime, where as a
bytecode checksum happens before runtime, or at the end of compile time.
They provide different kinds of insurance. The bytecode checksum feature will
be an instrinsic feature that is not optional, since it prevents Parrot from
running known-bad bytecode. But a bytecode verifier adds a significant amount
of overhead. This overhead is reasonable if you are running untrusted code, but
it is unreasonable when your are running trusted bytecode (i.e. bytecode that
you created), so the verifier will have an option to be turned off.
We obviously have a lot of fun stuff to work on, so if any of it sounds fun,
come ask cotto++ or me (dukeleto) on #parrot on irc://irc.parrot.org for some M0
stuff to do. We especially need help with writing tests and documentation.
There will be a Parrot hackathon at YAPC::NA
this year, where I am sure some M0-related hacking will be happening. If you
have never been to a hackathon before, I highly recommend them as a way to join
a project and/or community. Meatspace is still the best medium for some things
:)
(UPDATE: Some factual errors about our old JIT were pointed out by rafl++ and
corrected)
Recent Comments