ASSEMBLY_EXCEPTION

Mon, 30 May 2016 02:01:38 -0400

author
aoqi
date
Mon, 30 May 2016 02:01:38 -0400
changeset 13
bc227c49eaae
parent 0
f90c822e73f8
child 6876
710a3c8b516e
permissions
-rw-r--r--

[C2] Rewrite generate_disjoint_short_copy.
Eliminated unaligned access and Optimized copy algorithm.
xml.transform improved by 50%, total GEO improved by 13%.
Copy Algorithm:
Generate stub for disjoint short copy. If "aligned" is true, the
"from" and "to" addresses are assumed to be heapword aligned.

Arguments for generated stub:
from: A0
to: A1
elm.count: A2 treated as signed
one element: 2 bytes

Strategy for aligned==true:

If length <= 9:
1. copy 1 elements at a time (l_5)

If length > 9:
1. copy 4 elements at a time until less than 4 elements are left (l_7)
2. copy 2 elements at a time until less than 2 elements are left (l_6)
3. copy last element if one was left in step 2. (l_1)


Strategy for aligned==false:

If length <= 9: same as aligned==true case

If length > 9:
1. continue with step 7. if the alignment of from and to mod 4
is different.
2. align from and to to 4 bytes by copying 1 element if necessary
3. at l_2 from and to are 4 byte aligned; continue with
6. if they cannot be aligned to 8 bytes because they have
got different alignment mod 8.
4. at this point we know that both, from and to, have the same
alignment mod 8, now copy one element if necessary to get
8 byte alignment of from and to.
5. copy 4 elements at a time until less than 4 elements are
left; depending on step 3. all load/stores are aligned.
6. copy 2 elements at a time until less than 2 elements are
left. (l_6)
7. copy 1 element at a time. (l_5)
8. copy last element if one was left in step 6. (l_1)

TODO:

1. use loongson 128-bit load/store
2. use loop unrolling optimization when len is big enough, for example if
len > 0x2000:
__ bind(l_x);
__ ld(AT, tmp1, 0);
__ ld(tmp, tmp1, 8);
__ sd(AT, tmp2, 0);
__ sd(tmp, tmp2, 8);
__ ld(AT, tmp1, 16);
__ ld(tmp, tmp1, 24);
__ sd(AT, tmp2, 16);
__ sd(tmp, tmp2, 24);
__ daddi(tmp1, tmp1, 32);
__ daddi(tmp2, tmp2, 32);
__ daddi(tmp3, tmp3, -16);
__ daddi(AT, tmp3, -16);
__ bgez(AT, l_x);
__ delayed()->nop();

aoqi@0 1
aoqi@0 2 OPENJDK ASSEMBLY EXCEPTION
aoqi@0 3
aoqi@0 4 The OpenJDK source code made available by Oracle at openjdk.java.net and
aoqi@0 5 openjdk.dev.java.net ("OpenJDK Code") is distributed under the terms of the
aoqi@0 6 GNU General Public License <http://www.gnu.org/copyleft/gpl.html> version 2
aoqi@0 7 only ("GPL2"), with the following clarification and special exception.
aoqi@0 8
aoqi@0 9 Linking this OpenJDK Code statically or dynamically with other code
aoqi@0 10 is making a combined work based on this library. Thus, the terms
aoqi@0 11 and conditions of GPL2 cover the whole combination.
aoqi@0 12
aoqi@0 13 As a special exception, Oracle gives you permission to link this
aoqi@0 14 OpenJDK Code with certain code licensed by Oracle as indicated at
aoqi@0 15 http://openjdk.java.net/legal/exception-modules-2007-05-08.html
aoqi@0 16 ("Designated Exception Modules") to produce an executable,
aoqi@0 17 regardless of the license terms of the Designated Exception Modules,
aoqi@0 18 and to copy and distribute the resulting executable under GPL2,
aoqi@0 19 provided that the Designated Exception Modules continue to be
aoqi@0 20 governed by the licenses under which they were offered by Oracle.
aoqi@0 21
aoqi@0 22 As such, it allows licensees and sublicensees of Oracle's GPL2 OpenJDK Code to
aoqi@0 23 build an executable that includes those portions of necessary code that Oracle
aoqi@0 24 could not provide under GPL2 (or that Oracle has provided under GPL2 with the
aoqi@0 25 Classpath exception). If you modify or add to the OpenJDK code, that new
aoqi@0 26 GPL2 code may still be combined with Designated Exception Modules if the
aoqi@0 27 new code is made subject to this exception by its copyright holder.

mercurial