Mon, 30 May 2016 02:01:38 -0400
[C2] Rewrite generate_disjoint_short_copy.
Eliminated unaligned access and Optimized copy algorithm.
xml.transform improved by 50%, total GEO improved by 13%.
Copy Algorithm:
Generate stub for disjoint short copy. If "aligned" is true, the
"from" and "to" addresses are assumed to be heapword aligned.
Arguments for generated stub:
from: A0
to: A1
elm.count: A2 treated as signed
one element: 2 bytes
Strategy for aligned==true:
If length <= 9:
1. copy 1 elements at a time (l_5)
If length > 9:
1. copy 4 elements at a time until less than 4 elements are left (l_7)
2. copy 2 elements at a time until less than 2 elements are left (l_6)
3. copy last element if one was left in step 2. (l_1)
Strategy for aligned==false:
If length <= 9: same as aligned==true case
If length > 9:
1. continue with step 7. if the alignment of from and to mod 4
is different.
2. align from and to to 4 bytes by copying 1 element if necessary
3. at l_2 from and to are 4 byte aligned; continue with
6. if they cannot be aligned to 8 bytes because they have
got different alignment mod 8.
4. at this point we know that both, from and to, have the same
alignment mod 8, now copy one element if necessary to get
8 byte alignment of from and to.
5. copy 4 elements at a time until less than 4 elements are
left; depending on step 3. all load/stores are aligned.
6. copy 2 elements at a time until less than 2 elements are
left. (l_6)
7. copy 1 element at a time. (l_5)
8. copy last element if one was left in step 6. (l_1)
TODO:
1. use loongson 128-bit load/store
2. use loop unrolling optimization when len is big enough, for example if
len > 0x2000:
__ bind(l_x);
__ ld(AT, tmp1, 0);
__ ld(tmp, tmp1, 8);
__ sd(AT, tmp2, 0);
__ sd(tmp, tmp2, 8);
__ ld(AT, tmp1, 16);
__ ld(tmp, tmp1, 24);
__ sd(AT, tmp2, 16);
__ sd(tmp, tmp2, 24);
__ daddi(tmp1, tmp1, 32);
__ daddi(tmp2, tmp2, 32);
__ daddi(tmp3, tmp3, -16);
__ daddi(AT, tmp3, -16);
__ bgez(AT, l_x);
__ delayed()->nop();
2 OPENJDK ASSEMBLY EXCEPTION
4 The OpenJDK source code made available by Oracle at openjdk.java.net and
5 openjdk.dev.java.net ("OpenJDK Code") is distributed under the terms of the
6 GNU General Public License <http://www.gnu.org/copyleft/gpl.html> version 2
7 only ("GPL2"), with the following clarification and special exception.
9 Linking this OpenJDK Code statically or dynamically with other code
10 is making a combined work based on this library. Thus, the terms
11 and conditions of GPL2 cover the whole combination.
13 As a special exception, Oracle gives you permission to link this
14 OpenJDK Code with certain code licensed by Oracle as indicated at
15 http://openjdk.java.net/legal/exception-modules-2007-05-08.html
16 ("Designated Exception Modules") to produce an executable,
17 regardless of the license terms of the Designated Exception Modules,
18 and to copy and distribute the resulting executable under GPL2,
19 provided that the Designated Exception Modules continue to be
20 governed by the licenses under which they were offered by Oracle.
22 As such, it allows licensees and sublicensees of Oracle's GPL2 OpenJDK Code to
23 build an executable that includes those portions of necessary code that Oracle
24 could not provide under GPL2 (or that Oracle has provided under GPL2 with the
25 Classpath exception). If you modify or add to the OpenJDK code, that new
26 GPL2 code may still be combined with Designated Exception Modules if the
27 new code is made subject to this exception by its copyright holder.