jdk8-mips64-public/nashorn: docs/DEVELOPER

8007545: jjs input evalinput need to be NOT_ENUMERABLE
Reviewed-by: sundar, lagergren
Contributed-by: james.laskey@oracle.com

     1 This document describes system properties that are used for internal

     2 debugging and instrumentation purposes, along with the system loggers,

     3 which are used for the same thing.

     5 This document is intended as a developer resource, and it is not

     6 needed as Nashorn documentation for normal usage. Flags and system

     7 properties described herein are subject to change without notice.

     9 =====================================

    10 1. System properties used internally

    11 =====================================

    13 This documentation of the system property flags assume that the

    14 default value of the flag is false, unless otherwise specified.

    16 SYSTEM PROPERTY: -Dnashorn.unstable.relink.threshold=x

    18 This property controls how many call site misses are allowed before a

    19 callsite is relinked with "apply" semantics to never change again.

    20 In the case of megamorphic callsites, this is necessary, or the

    21 program would spend all its time swapping out callsite targets. Dynalink

    22 has a default value (currently 8 relinks) for this property if it

    23 is not explicitly set.

    26 SYSTEM PROPERTY: -Dnashorn.compiler.split.threshold=x

    28 This will change the node weight that requires a subgraph of the IR to

    29 be split into several classes in order not to run out of bytecode space.

    30 The default value is 0x8000 (32768).

    33 SYSTEM PROPERTY: -Dnashorn.callsiteaccess.debug

    35 See the description of the access logger below. This flag is

    36 equivalent to enabling the access logger with "info" level.

    39 SYSTEM PROPERTY: -Dnashorn.compiler.intarithmetic

    41 Arithmetic operations in Nashorn (except bitwise ones) typically

    42 coerce the operands to doubles (as per the JavaScript spec). To switch

    43 this off and remain in integer mode, for example for "var x = a&b; var

    44 y = c&d; var z = x*y;", use this flag. This will force the

    45 multiplication of variables that are ints to be done with the IMUL

    46 bytecode and the result "z" to become an int.

    48 WARNING: Note that is is experimental only to ensure that type support

    49 exists for all primitive types. The generated code is unsound. This

    50 will be the case until we do optimizations based on it. There is a CR

    51 in Nashorn to do better range analysis, and ensure that this is only

    52 done where the operation can't overflow into a wider type. Currently

    53 no overflow checking is done, so at the moment, until range analysis

    54 has been completed, this option is turned off.

    56 We've experimented by using int arithmetic for everything and putting

    57 overflow checks afterwards, which would recompute the operation with

    58 the correct precision, but have yet to find a configuration where this

    59 is faster than just using doubles directly, even if the int operation

    60 does not overflow. Getting access to a JVM intrinsic that does branch

    61 on overflow would probably alleviate this.

    63 There is also a problem with this optimistic approach if the symbol

    64 happens to reside in a local variable slot in the bytecode, as those

    65 are strongly typed. Then we would need to split large sections of

    66 control flow, so this is probably not the right way to go, while range

    67 analysis is. There is a large difference between integer bytecode

    68 without overflow checks and double bytecode. The former is

    69 significantly faster.

    72 SYSTEM PROPERTY: -Dnashorn.codegen.debug, -Dnashorn.codegen.debug.trace=<x>

    74 See the description of the codegen logger below.

    77 SYSTEM_PROPERTY: -Dnashorn.fields.debug

    79 See the description on the fields logger below.

    82 SYSTEM PROPERTY: -Dnashorn.fields.dual

    84 When this property is true, Nashorn will attempt to use primitive

    85 fields for AccessorProperties (currently just AccessorProperties, not

    86 spill properties). Memory footprint for script objects will increase,

    87 as we need to maintain both a primitive field (a long) as well as an

    88 Object field for the property value. Ints are represented as the 32

    89 low bits of the long fields. Doubles are represented as the

    90 doubleToLongBits of their value. This way a single field can be used

    91 for all primitive types. Packing and unpacking doubles to their bit

    92 representation is intrinsified by the JVM and extremely fast.

    94 While dual fields in theory runs significantly faster than Object

    95 fields due to reduction of boxing and memory allocation overhead,

    96 there is still work to be done to make this a general purpose

    97 solution. Research is ongoing.

    99 In the future, this might complement or be replaced by experimental

   100 feature sun.misc.TaggedArray, which has been discussed on the mlvm

   101 mailing list. TaggedArrays are basically a way to share data space

   102 between primitives and references, and have the GC understand this.

   104 As long as only primitive values are written to the fields and enough

   105 type information exists to make sure that any reads don't have to be

   106 uselessly boxed and unboxed, this is significantly faster than the

   107 standard "Objects only" approach that currently is the default. See

   108 test/examples/dual-fields-micro.js for an example that runs twice as

   109 fast with dual fields as without them. Here, the compiler, can

   110 determine that we are dealing with numbers only throughout the entire

   111 property life span of the properties involved.

   113 If a "real" object (not a boxed primitive) is written to a field that

   114 has a primitive representation, its callsite is relinked and an Object

   115 field is used forevermore for that particular field in that

   116 PropertyMap and its children, even if primitives are later assigned to

   117 it.

   119 As the amount of compile time type information is very small in a

   120 dynamic language like JavaScript, it is frequently the case that

   121 something has to be treated as an object, because we don't know any

   122 better. In reality though, it is often a boxed primitive is stored to

   123 an AccessorProperty. The fastest way to handle this soundly is to use

   124 a callsite typecheck and avoid blowing the field up to an Object. We

   125 never revert object fields to primitives. Ping-pong:ing back and forth

   126 between primitive representation and Object representation would cause

   127 fatal performance overhead, so this is not an option.

   129 For a general application the dual fields approach is still slower

   130 than objects only fields in some places, about the same in most cases,

   131 and significantly faster in very few. This is due the program using

   132 primitives, but we still can't prove it. For example "local_var a =

   133 call(); field = a;" may very well write a double to the field, but the

   134 compiler dare not guess a double type if field is a local variable,

   135 due to bytecode variables being strongly typed and later non

   136 interchangeable. To get around this, the entire method would have to

   137 be replaced and a continuation retained to restart from. We believe

   138 that the next steps we should go through are instead:

   140 1) Implement method specialization based on callsite, as it's quite

   141 frequently the case that numbers are passed around, but currently our

   142 function nodes just have object types visible to the compiler. For

   143 example "var b = 17; func(a,b,17)" is an example where two parameters

   144 can be specialized, but the main version of func might also be called

   145 from another callsite with func(x,y,"string").

   147 2) This requires lazy jitting as the functions have to be specialized

   148 per callsite.

   150 Even though "function square(x) { return x*x }" might look like a

   151 trivial function that can always only take doubles, this is not

   152 true. Someone might have overridden the valueOf for x so that the

   153 toNumber coercion has side effects. To fulfil JavaScript semantics,

   154 the coercion has to run twice for both terms of the multiplication

   155 even if they are the same object. This means that call site

   156 specialization is necessary, not parameter specialization on the form

   157 "function square(x) { var xd = (double)x; return xd*xd; }", as one

   158 might first think.

   160 Generating a method specialization for any variant of a function that

   161 we can determine by types at compile time is a combinatorial explosion

   162 of byte code (try it e.g. on all the variants of am3 in the Octane

   163 benchmark crypto.js). Thus, this needs to be lazy

   165 3) Possibly optimistic callsite writes, something on the form

   167 x = y; //x is a field known to be a primitive. y is only an object as

   168 far as we can tell

   170 turns into

   172 try {

   173   x = (int)y;

   174 } catch (X is not an integer field right now | ClassCastException e) {

   175   x = y;

   176 }

   178 Mini POC shows that this is the key to a lot of dual field performance

   179 in seemingly trivial micros where one unknown object, in reality

   180 actually a primitive, foils it for us. Very common pattern. Once we

   181 are "all primitives", dual fields runs a lot faster than Object fields

   182 only.

   184 We still have to deal with objects vs primitives for local bytecode

   185 slots, possibly through code copying and versioning.

   188 SYSTEM PROPERTY: -Dnashorn.compiler.symbol.trace=[<x>[,*]],

   189   -Dnashorn.compiler.symbol.stacktrace=[<x>[,*]]

   191 When this property is set, creation and manipulation of any symbol

   192 named "x" will show information about when the compiler changes its

   193 type assumption, bytecode local variable slot assignment and other

   194 data. This is useful if, for example, a symbol shows up as an Object,

   195 when you believe it should be a primitive. Usually there is an

   196 explanation for this, for example that it exists in the global scope

   197 and type analysis has to be more conservative.

   199 Several symbols names to watch can be specified by comma separation.

   201 If no variable name is specified (and no equals sign), all symbols

   202 will be watched

   204 By using "stacktrace" instead of or together with "trace", stack

   205 traces will be displayed upon symbol changes according to the same

   206 semantics.

   209 SYSTEM PROPERTY: nashorn.lexer.xmlliterals

   211 If this property it set, it means that the Lexer should attempt to

   212 parse XML literals, which would otherwise generate syntax

   213 errors. Warning: there are currently no unit tests for this

   214 functionality.

   216 XML literals, when this is enabled, end up as standard LiteralNodes in

   217 the IR.

   220 SYSTEM_PROPERTY: nashorn.debug

   222 If this property is set to true, Nashorn runs in Debug mode. Debug

   223 mode is slightly slower, as for example statistics counters are enabled

   224 during the run. Debug mode makes available a NativeDebug instance

   225 called "Debug" in the global space that can be used to print property

   226 maps and layout for script objects, as well as a "dumpCounters" method

   227 that will print the current values of the previously mentioned stats

   228 counters.

   230 These functions currently exists for Debug:

   232 "map" - print(Debug.map(x)) will dump the PropertyMap for object x to

   233 stdout (currently there also exist functions called "embedX", where X

   234 is a value from 0 to 3, that will dump the contents of the embed pool

   235 for the first spill properties in any script object and "spill", that

   236 will dump the contents of the growing spill pool of spill properties

   237 in any script object. This is of course subject to change without

   238 notice, should we change the script object layout.

   240 "methodHandle" - this method returns the method handle that is used

   241 for invoking a particular script function.

   243 "identical" - this method compares two script objects for reference

   244 equality. It is a == Java comparison

   246 "dumpCounters" - will dump the debug counters' current values to

   247 stdout.

   249 Currently we count number of ScriptObjects in the system, number of

   250 Scope objects in the system, number of ScriptObject listeners added,

   251 removed and dead (without references).

   253 We also count number of ScriptFunctions, ScriptFunction invocations

   254 and ScriptFunction allocations.

   256 Furthermore we count PropertyMap statistics: how many property maps

   257 exist, how many times were property maps cloned, how many times did

   258 the property map history cache hit, prevent new allocations, how many

   259 prototype invalidations were done, how many time the property map

   260 proto cache hit.

   262 Finally we count callsite misses on a per callsite bases, which occur

   263 when a callsite has to be relinked, due to a previous assumption of

   264 object layout being invalidated.

   267 SYSTEM PROPERTY: nashorn.methodhandles.debug,

   268 nashorn.methodhandles.debug=create

   270 If this property is enabled, each MethodHandle related call that uses

   271 the java.lang.invoke package gets its MethodHandle intercepted and an

   272 instrumentation printout of arguments and return value appended to

   273 it. This shows exactly which method handles are executed and from

   274 where. (Also MethodTypes and SwitchPoints). This can be augmented with

   275 more information, for example, instance count, by subclassing or

   276 further extending the TraceMethodHandleFactory implementation in

   277 MethodHandleFactory.java.

   279 If the property is specialized with "=create" as its option,

   280 instrumentation will be shown for method handles upon creation time

   281 rather than at runtime usage.

   284 SYSTEM PROPERTY: nashorn.methodhandles.debug.stacktrace

   286 This does the same as nashorn.methodhandles.debug, but when enabled

   287 also dumps the stack trace for every instrumented method handle

   288 operation. Warning: This is enormously verbose, but provides a pretty

   289 decent "grep:able" picture of where the calls are coming from.

   291 See the description of the codegen logger below for a more verbose

   292 description of this option

   295 SYSTEM PROPERTY: nashorn.scriptfunction.specialization.disable

   297 There are several "fast path" implementations of constructors and

   298 functions in the NativeObject classes that, in their original form,

   299 take a variable amount of arguments. Said functions are also declared

   300 to take Object parameters in their original form, as this is what the

   301 JavaScript specification mandates.

   303 However, we often know quite a lot more at a callsite of one of these

   304 functions. For example, Math.min is called with a fixed number (2) of

   305 integer arguments. The overhead of boxing these ints to Objects and

   306 folding them into an Object array for the generic varargs Math.min

   307 function is an order of magnitude slower than calling a specialized

   308 implementation of Math.min that takes two integers. Specialized

   309 functions and constructors are identified by the tag

   310 @SpecializedFunction and @SpecializedConstructor in the Nashorn

   311 code. The linker will link in the most appropriate (narrowest types,

   312 right number of types and least number of arguments) specialization if

   313 specializations are available.

   315 Every ScriptFunction may carry specializations that the linker can

   316 choose from. This framework will likely be extended for user defined

   317 functions. The compiler can often infer enough parameter type info

   318 from callsites for in order to generate simpler versions with less

   319 generic Object types. This feature depends on future lazy jitting, as

   320 there tend to be many calls to user defined functions, some where the

   321 callsite can be specialized, some where we mostly see object

   322 parameters even at the callsite.

   324 If this system property is set to true, the linker will not attempt to

   325 use any specialized function or constructor for native objects, but

   326 just call the generic one.

   329 SYSTEM PROPERTY: nashorn.tcs.miss.samplePercent=<x>

   331 When running with the trace callsite option (-tcs), Nashorn will count

   332 and instrument any callsite misses that require relinking. As the

   333 number of relinks is large and usually produces a lot of output, this

   334 system property can be used to constrain the percentage of misses that

   335 should be logged. Typically this is set to 1 or 5 (percent). 1% is the

   336 default value.

   339 SYSTEM_PROPERTY: nashorn.profilefile=<filename>

   341 When running with the profile callsite options (-pcs), Nashorn will

   342 dump profiling data for all callsites to stderr as a shutdown hook. To

   343 instead redirect this to a file, specify the path to the file using

   344 this system property.

   347 ===============

   348 2. The loggers.

   349 ===============

   351 It is very simple to create your own logger. Use the DebugLogger class

   352 and give the subsystem name as a constructor argument.

   354 The Nashorn loggers can be used to print per-module or per-subsystem

   355 debug information with different levels of verbosity. The loggers for

   356 a given subsystem are available are enabled by using

   358 --log=<systemname>[:<level>]

   360 on the command line.

   362 Here <systemname> identifies the name of the subsystem to be logged

   363 and the optional colon and level argument is a standard

   364 java.util.logging.Level name (severe, warning, info, config, fine,

   365 finer, finest). If the level is left out for a particular subsystem,

   366 it defaults to "info". Any log message logged as the level or a level

   367 that is more important will be output to stderr by the logger.

   369 Several loggers can be enabled by a single command line option, by

   370 putting a comma after each subsystem/level tuple (or each subsystem if

   371 level is unspecified). The --log option can also be given multiple

   372 times on the same command line, with the same effect.

   374 For example: --log=codegen,fields:finest is equivalent to

   375 --log=codegen:info --log=fields:finest

   377 The subsystems that currently support logging are:

   380 * compiler

   382 The compiler is in charge of turning source code and function nodes

   383 into byte code, and installs the classes into a class loader

   384 controlled from the Context. Log messages are, for example, about

   385 things like new compile units being allocated. The compiler has global

   386 settings that all the tiers of codegen (e.g. Lower and CodeGenerator)

   387 use.s

   390 * codegen

   392 The code generator is the emitter stage of the code pipeline, and

   393 turns the lowest tier of a FunctionNode into bytecode. Codegen logging

   394 shows byte codes as they are being emitted, line number information

   395 and jumps. It also shows the contents of the bytecode stack prior to

   396 each instruction being emitted. This is a good debugging aid. For

   397 example:

   399 [codegen] #41                       line:2 (f)_afc824e

   400 [codegen] #42                           load symbol x slot=2

   401 [codegen] #43  {1:O}                    load int 0

   402 [codegen] #44  {2:I O}                  dynamic_runtime_call GT:ZOI_I args=2 returnType=boolean

   403 [codegen] #45                              signature (Ljava/lang/Object;I)Z

   404 [codegen] #46  {1:Z}                    ifeq  ternary_false_5402fe28

   405 [codegen] #47                           load symbol x slot=2

   406 [codegen] #48  {1:O}                    goto ternary_exit_107c1f2f

   407 [codegen] #49                       ternary_false_5402fe28

   408 [codegen] #50                           load symbol x slot=2

   409 [codegen] #51  {1:O}                    convert object -> double

   410 [codegen] #52  {1:D}                    neg

   411 [codegen] #53  {1:D}                    convert double -> object

   412 [codegen] #54  {1:O}                ternary_exit_107c1f2f

   413 [codegen] #55  {1:O}                    return object

   415 shows a ternary node being generated for the sequence "return x > 0 ?

   416 x : -x"

   418 The first number on the log line is a unique monotonically increasing

   419 emission id per bytecode. There is no guarantee this is the same id

   420 between runs.  depending on non deterministic code

   421 execution/compilation, but for small applications it usually is. If

   422 the system variable -Dnashorn.codegen.debug.trace=<x> is set, where x

   423 is a bytecode emission id, a stack trace will be shown as the

   424 particular bytecode is about to be emitted. This can be a quick way to

   425 determine where it comes from without attaching the debugger. "Who

   426 generated that neg?"

   428 The --log=codegen option is equivalent to setting the system variable

   429 "nashorn.codegen.debug" to true.

   432 * lower

   434 This is the first lowering pass.

   436 Lower is a code generation pass that turns high level IR nodes into

   437 lower level one, for example substituting comparisons to RuntimeNodes

   438 and inlining finally blocks.

   440 Lower is also responsible for determining control flow information

   441 like end points.

   444 * attr

   446 The lowering annotates a FunctionNode with symbols for each identifier

   447 and transforms high level constructs into lower level ones, that the

   448 CodeGenerator consumes.

   450 Lower logging typically outputs things like post pass actions,

   451 insertions of casts because symbol types have been changed and type

   452 specialization information. Currently very little info is generated by

   453 this logger. This will probably change.

   456 * finalize

   458 This --log=finalize log option outputs information for type finalization,

   459 the third tier of the compiler. This means things like placement of

   460 specialized scope nodes or explicit conversions.

   463 * fields

   465 The --log=fields option (at info level) is equivalent to setting the

   466 system variable "nashorn.fields.debug" to true. At the info level it

   467 will only show info about type assumptions that were invalidated. If

   468 the level is set to finest, it will also trace every AccessorProperty

   469 getter and setter in the program, show arguments, return values

   470 etc. It will also show the internal representation of respective field

   471 (Object in the normal case, unless running with the dual field

   472 representation)

docs/DEVELOPER_README@ec4d59c9b8d2

docs/DEVELOPER_README

Mercurial > jdk8-mips64-public > nashorn / file revision

docs/DEVELOPER_README@ec4d59c9b8d2

docs/DEVELOPER_README