jdk8-mips64-public/nashorn: docs/DEVELOPER_README@1a5d67424e83 (annotated)

docs/DEVELOPER_README@1a5d67424e83 (annotated)

docs/DEVELOPER_README

Tue, 11 Jun 2013 13:09:43 +0530

author: sundar
date: Tue, 11 Jun 2013 13:09:43 +0530
changeset 338: 1a5d67424e83
parent 147: 43e40c08e7f8
child 526: eb7b8340ce3a
permissions: -rw-r--r--

8015357: a = []; a[0x7fffffff]=1; a.sort()[0] should evaluate to 1 instead of undefined
Reviewed-by: hannesw, lagergren

 This document describes system properties that are used for internal
 debugging and instrumentation purposes, along with the system loggers,
 which are used for the same thing.
 This document is intended as a developer resource, and it is not
 needed as Nashorn documentation for normal usage. Flags and system
 properties described herein are subject to change without notice.
 =====================================
 . System properties used internally
 =====================================
 This documentation of the system property flags assume that the
 default value of the flag is false, unless otherwise specified.
 SYSTEM PROPERTY: -Dnashorn.args=<string>
 This property takes as its value a space separated list of Nashorn
 command line options that should be passed to Nashorn. This might be useful
 in environments where it is hard to tell how a nashorn.jar is launched.
 Example:
 > java -Dnashorn.args="--lazy-complation --log=compiler" large-java-app-with-nashorn.jar
 > ant -Dnashorn.args="--log=codegen" antjob
 SYSTEM PROPERTY: -Dnashorn.unstable.relink.threshold=x
 This property controls how many call site misses are allowed before a
 callsite is relinked with "apply" semantics to never change again.
 In the case of megamorphic callsites, this is necessary, or the
 program would spend all its time swapping out callsite targets. Dynalink
 has a default value (currently 8 relinks) for this property if it
 is not explicitly set.
 SYSTEM PROPERTY: -Dnashorn.compiler.splitter.threshold=x
 This will change the node weight that requires a subgraph of the IR to
 be split into several classes in order not to run out of bytecode space.
 The default value is 0x8000 (32768).
 SYSTEM PROPERTY: -Dnashorn.compiler.intarithmetic
 Arithmetic operations in Nashorn (except bitwise ones) typically
 coerce the operands to doubles (as per the JavaScript spec). To switch
 this off and remain in integer mode, for example for "var x = a&b; var
 y = c&d; var z = x*y;", use this flag. This will force the
 multiplication of variables that are ints to be done with the IMUL
 bytecode and the result "z" to become an int.
 WARNING: Note that is is experimental only to ensure that type support
 exists for all primitive types. The generated code is unsound. This
 will be the case until we do optimizations based on it. There is a CR
 in Nashorn to do better range analysis, and ensure that this is only
 done where the operation can't overflow into a wider type. Currently
 no overflow checking is done, so at the moment, until range analysis
 has been completed, this option is turned off.
 We've experimented by using int arithmetic for everything and putting
 overflow checks afterwards, which would recompute the operation with
 the correct precision, but have yet to find a configuration where this
 is faster than just using doubles directly, even if the int operation
 does not overflow. Getting access to a JVM intrinsic that does branch
 on overflow would probably alleviate this.
 There is also a problem with this optimistic approach if the symbol
 happens to reside in a local variable slot in the bytecode, as those
 are strongly typed. Then we would need to split large sections of
 control flow, so this is probably not the right way to go, while range
 analysis is. There is a large difference between integer bytecode
 without overflow checks and double bytecode. The former is
 significantly faster.
 SYSTEM PROPERTY: -Dnashorn.codegen.debug, -Dnashorn.codegen.debug.trace=<x>
 See the description of the codegen logger below.
 SYSTEM_PROPERTY: -Dnashorn.fields.debug
 See the description on the fields logger below.
 SYSTEM PROPERTY: -Dnashorn.fields.dual
 When this property is true, Nashorn will attempt to use primitive
 fields for AccessorProperties (currently just AccessorProperties, not
 spill properties). Memory footprint for script objects will increase,
 as we need to maintain both a primitive field (a long) as well as an
 Object field for the property value. Ints are represented as the 32
 low bits of the long fields. Doubles are represented as the
 doubleToLongBits of their value. This way a single field can be used
 for all primitive types. Packing and unpacking doubles to their bit
 representation is intrinsified by the JVM and extremely fast.
 While dual fields in theory runs significantly faster than Object
 fields due to reduction of boxing and memory allocation overhead,
 there is still work to be done to make this a general purpose
 solution. Research is ongoing.
 In the future, this might complement or be replaced by experimental
 feature sun.misc.TaggedArray, which has been discussed on the mlvm
 mailing list. TaggedArrays are basically a way to share data space
 between primitives and references, and have the GC understand this.
 As long as only primitive values are written to the fields and enough
 type information exists to make sure that any reads don't have to be
 uselessly boxed and unboxed, this is significantly faster than the
 standard "Objects only" approach that currently is the default. See
 test/examples/dual-fields-micro.js for an example that runs twice as
 fast with dual fields as without them. Here, the compiler, can
 determine that we are dealing with numbers only throughout the entire
 property life span of the properties involved.
 If a "real" object (not a boxed primitive) is written to a field that
 has a primitive representation, its callsite is relinked and an Object
 field is used forevermore for that particular field in that
 PropertyMap and its children, even if primitives are later assigned to
 it.
 As the amount of compile time type information is very small in a
 dynamic language like JavaScript, it is frequently the case that
 something has to be treated as an object, because we don't know any
 better. In reality though, it is often a boxed primitive is stored to
 an AccessorProperty. The fastest way to handle this soundly is to use
 a callsite typecheck and avoid blowing the field up to an Object. We
 never revert object fields to primitives. Ping-pong:ing back and forth
 between primitive representation and Object representation would cause
 fatal performance overhead, so this is not an option.
 For a general application the dual fields approach is still slower
 than objects only fields in some places, about the same in most cases,
 and significantly faster in very few. This is due the program using
 primitives, but we still can't prove it. For example "local_var a =
 call(); field = a;" may very well write a double to the field, but the
 compiler dare not guess a double type if field is a local variable,
 due to bytecode variables being strongly typed and later non
 interchangeable. To get around this, the entire method would have to
 be replaced and a continuation retained to restart from. We believe
 that the next steps we should go through are instead:
 ) Implement method specialization based on callsite, as it's quite
 frequently the case that numbers are passed around, but currently our
 function nodes just have object types visible to the compiler. For
 example "var b = 17; func(a,b,17)" is an example where two parameters
 can be specialized, but the main version of func might also be called
 from another callsite with func(x,y,"string").
 ) This requires lazy jitting as the functions have to be specialized
 per callsite.
 Even though "function square(x) { return x*x }" might look like a
 trivial function that can always only take doubles, this is not
 true. Someone might have overridden the valueOf for x so that the
 toNumber coercion has side effects. To fulfil JavaScript semantics,
 the coercion has to run twice for both terms of the multiplication
 even if they are the same object. This means that call site
 specialization is necessary, not parameter specialization on the form
 "function square(x) { var xd = (double)x; return xd*xd; }", as one
 might first think.
 Generating a method specialization for any variant of a function that
 we can determine by types at compile time is a combinatorial explosion
 of byte code (try it e.g. on all the variants of am3 in the Octane
 benchmark crypto.js). Thus, this needs to be lazy
 ) Possibly optimistic callsite writes, something on the form
 x = y; //x is a field known to be a primitive. y is only an object as
 far as we can tell
 turns into
 try {
   x = (int)y;
 } catch (X is not an integer field right now | ClassCastException e) {
   x = y;
 }
 Mini POC shows that this is the key to a lot of dual field performance
 in seemingly trivial micros where one unknown object, in reality
 actually a primitive, foils it for us. Very common pattern. Once we
 are "all primitives", dual fields runs a lot faster than Object fields
 only.
 We still have to deal with objects vs primitives for local bytecode
 slots, possibly through code copying and versioning.
 SYSTEM PROPERTY: -Dnashorn.compiler.symbol.trace=[<x>[,*]],
   -Dnashorn.compiler.symbol.stacktrace=[<x>[,*]]
 When this property is set, creation and manipulation of any symbol
 named "x" will show information about when the compiler changes its
 type assumption, bytecode local variable slot assignment and other
 data. This is useful if, for example, a symbol shows up as an Object,
 when you believe it should be a primitive. Usually there is an
 explanation for this, for example that it exists in the global scope
 and type analysis has to be more conservative.
 Several symbols names to watch can be specified by comma separation.
 If no variable name is specified (and no equals sign), all symbols
 will be watched
 By using "stacktrace" instead of or together with "trace", stack
 traces will be displayed upon symbol changes according to the same
 semantics.
 SYSTEM PROPERTY: nashorn.lexer.xmlliterals
 If this property it set, it means that the Lexer should attempt to
 parse XML literals, which would otherwise generate syntax
 errors. Warning: there are currently no unit tests for this
 functionality.
 XML literals, when this is enabled, end up as standard LiteralNodes in
 the IR.
 SYSTEM_PROPERTY: nashorn.debug
 If this property is set to true, Nashorn runs in Debug mode. Debug
 mode is slightly slower, as for example statistics counters are enabled
 during the run. Debug mode makes available a NativeDebug instance
 called "Debug" in the global space that can be used to print property
 maps and layout for script objects, as well as a "dumpCounters" method
 that will print the current values of the previously mentioned stats
 counters.
 These functions currently exists for Debug:
 "map" - print(Debug.map(x)) will dump the PropertyMap for object x to
 stdout (currently there also exist functions called "embedX", where X
 is a value from 0 to 3, that will dump the contents of the embed pool
 for the first spill properties in any script object and "spill", that
 will dump the contents of the growing spill pool of spill properties
 in any script object. This is of course subject to change without
 notice, should we change the script object layout.
 "methodHandle" - this method returns the method handle that is used
 for invoking a particular script function.
 "identical" - this method compares two script objects for reference
 equality. It is a == Java comparison
 "dumpCounters" - will dump the debug counters' current values to
 stdout.
 Currently we count number of ScriptObjects in the system, number of
 Scope objects in the system, number of ScriptObject listeners added,
 removed and dead (without references).
 We also count number of ScriptFunctions, ScriptFunction invocations
 and ScriptFunction allocations.
 Furthermore we count PropertyMap statistics: how many property maps
 exist, how many times were property maps cloned, how many times did
 the property map history cache hit, prevent new allocations, how many
 prototype invalidations were done, how many time the property map
 proto cache hit.
 Finally we count callsite misses on a per callsite bases, which occur
 when a callsite has to be relinked, due to a previous assumption of
 object layout being invalidated.
 SYSTEM PROPERTY: nashorn.methodhandles.debug,
 nashorn.methodhandles.debug=create
 If this property is enabled, each MethodHandle related call that uses
 the java.lang.invoke package gets its MethodHandle intercepted and an
 instrumentation printout of arguments and return value appended to
 it. This shows exactly which method handles are executed and from
 where. (Also MethodTypes and SwitchPoints). This can be augmented with
 more information, for example, instance count, by subclassing or
 further extending the TraceMethodHandleFactory implementation in
 MethodHandleFactory.java.
 If the property is specialized with "=create" as its option,
 instrumentation will be shown for method handles upon creation time
 rather than at runtime usage.
 SYSTEM PROPERTY: nashorn.methodhandles.debug.stacktrace
 This does the same as nashorn.methodhandles.debug, but when enabled
 also dumps the stack trace for every instrumented method handle
 operation. Warning: This is enormously verbose, but provides a pretty
 decent "grep:able" picture of where the calls are coming from.
 See the description of the codegen logger below for a more verbose
 description of this option
 SYSTEM PROPERTY: nashorn.scriptfunction.specialization.disable
 There are several "fast path" implementations of constructors and
 functions in the NativeObject classes that, in their original form,
 take a variable amount of arguments. Said functions are also declared
 to take Object parameters in their original form, as this is what the
 JavaScript specification mandates.
 However, we often know quite a lot more at a callsite of one of these
 functions. For example, Math.min is called with a fixed number (2) of
 integer arguments. The overhead of boxing these ints to Objects and
 folding them into an Object array for the generic varargs Math.min
 function is an order of magnitude slower than calling a specialized
 implementation of Math.min that takes two integers. Specialized
 functions and constructors are identified by the tag
 @SpecializedFunction and @SpecializedConstructor in the Nashorn
 code. The linker will link in the most appropriate (narrowest types,
 right number of types and least number of arguments) specialization if
 specializations are available.
 Every ScriptFunction may carry specializations that the linker can
 choose from. This framework will likely be extended for user defined
 functions. The compiler can often infer enough parameter type info
 from callsites for in order to generate simpler versions with less
 generic Object types. This feature depends on future lazy jitting, as
 there tend to be many calls to user defined functions, some where the
 callsite can be specialized, some where we mostly see object
 parameters even at the callsite.
 If this system property is set to true, the linker will not attempt to
 use any specialized function or constructor for native objects, but
 just call the generic one.
 SYSTEM PROPERTY: nashorn.tcs.miss.samplePercent=<x>
 When running with the trace callsite option (-tcs), Nashorn will count
 and instrument any callsite misses that require relinking. As the
 number of relinks is large and usually produces a lot of output, this
 system property can be used to constrain the percentage of misses that
 should be logged. Typically this is set to 1 or 5 (percent). 1% is the
 default value.
 SYSTEM_PROPERTY: nashorn.profilefile=<filename>
 When running with the profile callsite options (-pcs), Nashorn will
 dump profiling data for all callsites to stderr as a shutdown hook. To
 instead redirect this to a file, specify the path to the file using
 this system property.
 SYSTEM_PROPERTY: nashorn.regexp.impl=[jdk|joni]
 This property defines the regular expression engine to be used by
 Nashorn. The default implementation is "jdk" which is based on the
 JDK's java.util.regex package. Set this property to "joni" to install
 an implementation based on Joni, the regular expression engine used by
 the JRuby project.
 ===============
 . The loggers.
 ===============
 It is very simple to create your own logger. Use the DebugLogger class
 and give the subsystem name as a constructor argument.
 The Nashorn loggers can be used to print per-module or per-subsystem
 debug information with different levels of verbosity. The loggers for
 a given subsystem are available are enabled by using
 --log=<systemname>[:<level>]
 on the command line.
 Here <systemname> identifies the name of the subsystem to be logged
 and the optional colon and level argument is a standard
 java.util.logging.Level name (severe, warning, info, config, fine,
 finer, finest). If the level is left out for a particular subsystem,
 it defaults to "info". Any log message logged as the level or a level
 that is more important will be output to stderr by the logger.
 Several loggers can be enabled by a single command line option, by
 putting a comma after each subsystem/level tuple (or each subsystem if
 level is unspecified). The --log option can also be given multiple
 times on the same command line, with the same effect.
 For example: --log=codegen,fields:finest is equivalent to
 --log=codegen:info --log=fields:finest
 The subsystems that currently support logging are:
 * compiler
 The compiler is in charge of turning source code and function nodes
 into byte code, and installs the classes into a class loader
 controlled from the Context. Log messages are, for example, about
 things like new compile units being allocated. The compiler has global
 settings that all the tiers of codegen (e.g. Lower and CodeGenerator)
 use.s
 * codegen
 The code generator is the emitter stage of the code pipeline, and
 turns the lowest tier of a FunctionNode into bytecode. Codegen logging
 shows byte codes as they are being emitted, line number information
 and jumps. It also shows the contents of the bytecode stack prior to
 each instruction being emitted. This is a good debugging aid. For
 example:
 [codegen] #41                       line:2 (f)_afc824e
 [codegen] #42                           load symbol x slot=2
 [codegen] #43  {1:O}                    load int 0
 [codegen] #44  {2:I O}                  dynamic_runtime_call GT:ZOI_I args=2 returnType=boolean
 [codegen] #45                              signature (Ljava/lang/Object;I)Z
 [codegen] #46  {1:Z}                    ifeq  ternary_false_5402fe28
 [codegen] #47                           load symbol x slot=2
 [codegen] #48  {1:O}                    goto ternary_exit_107c1f2f
 [codegen] #49                       ternary_false_5402fe28
 [codegen] #50                           load symbol x slot=2
 [codegen] #51  {1:O}                    convert object -> double
 [codegen] #52  {1:D}                    neg
 [codegen] #53  {1:D}                    convert double -> object
 [codegen] #54  {1:O}                ternary_exit_107c1f2f
 [codegen] #55  {1:O}                    return object
 shows a ternary node being generated for the sequence "return x > 0 ?
 x : -x"
 The first number on the log line is a unique monotonically increasing
 emission id per bytecode. There is no guarantee this is the same id
 between runs.  depending on non deterministic code
 execution/compilation, but for small applications it usually is. If
 the system variable -Dnashorn.codegen.debug.trace=<x> is set, where x
 is a bytecode emission id, a stack trace will be shown as the
 particular bytecode is about to be emitted. This can be a quick way to
 determine where it comes from without attaching the debugger. "Who
 generated that neg?"
 The --log=codegen option is equivalent to setting the system variable
 "nashorn.codegen.debug" to true.
 * lower
 This is the first lowering pass.
 Lower is a code generation pass that turns high level IR nodes into
 lower level one, for example substituting comparisons to RuntimeNodes
 and inlining finally blocks.
 Lower is also responsible for determining control flow information
 like end points.
 * attr
 The lowering annotates a FunctionNode with symbols for each identifier
 and transforms high level constructs into lower level ones, that the
 CodeGenerator consumes.
 Lower logging typically outputs things like post pass actions,
 insertions of casts because symbol types have been changed and type
 specialization information. Currently very little info is generated by
 this logger. This will probably change.
 * finalize
 This --log=finalize log option outputs information for type finalization,
 the third tier of the compiler. This means things like placement of
 specialized scope nodes or explicit conversions.
 * fields
 The --log=fields option (at info level) is equivalent to setting the
 system variable "nashorn.fields.debug" to true. At the info level it
 will only show info about type assumptions that were invalidated. If
 the level is set to finest, it will also trace every AccessorProperty
 getter and setter in the program, show arguments, return values
 etc. It will also show the internal representation of respective field
 (Object in the normal case, unless running with the dual field
 representation)

Mercurial > jdk8-mips64-public > nashorn / annotate

docs/DEVELOPER_README@1a5d67424e83 (annotated)

docs/DEVELOPER_README