file: gmpinmos.txt author: alex stuebinger date: 30 may 1999 version: GNU MP 2.0.2 with patches until May 99 and ECM GNU MP 2.0.2/ECM for the Inmos Transputer ========================================= This is a port of Torbjoern Granlund's ^^^^^^^^^^^^^^^^^^ GNU Multiple Precision Arithmetic Library, Edition 2.0.2 of June 1996 to the Inmos Transputer. The routines can be applied in parallel. The GNU MP source code for this release has all patches applied, that were released until May 1999. This version supersedes the release of 29 April 1998. The port was done by Alexander Stuebinger . ^^^^^^^^^^^^^^^^^^^^ Thanks to Torbjoern for making the GMP available. And also thanks for his continuing friendly cooperation. The GMP is under GNU Library General Public License. See "copying.lib". The manual is in Postscript(TM) format, (\doc). It is a must read. Be sure to check out the GMP home page for the latest information. Included is my new port of the latest ECM (Elliptic Curve Method) executable for integer factoring by Paul Zimmermann of INRIA Lorraine/ France ^^^^^^^^^^^^^^^ . ECM is an application, which uses GMP. For more information about ECM visit The core routines, which seriously affect GMP performance are coded in assembly. This gives a significant speed improvement, see below. The speed gain for popcount and Hamming distance are the most dramatic. This distribution contains the binary libraries for the generic 32-bit transputer (/ta) and for the t400, t425, t800, and t805. The necessary header files are in the /include directory. Also included is the transputer related source code to rebuild it, see /inmos directory. This source code is a supplement to the standard GMP 2.0.2 distribution. Notes for rebuiding it: This release contains working sources for the 8.3 filenaming restrictions of the Transputer Development Kits. These sources are zipped as "gmp202patched.zip" in the /source directory. They are based on GMP 2.0.2, with the patches applied. The transputer-specific files have been copied into the respective directories. The syntax of the makefile "inmos.mak" obeys Watcom conventions. The only difference between unix standard is the line continuation character. For any questions please consult the source code first. The bootable file of ECM is in t805 format. The t805 executable is faster but does not run on a t4, since it has inline fpu instructions. The ECM on the Inmos Transputer t805/30MHz is about 67 times as slow as on a Pentium2/300MHz. It's a toy, when one uses it on processors of the 80's. It is not meant for serious factoring. Well, problems of the 90's and the hardware of the 80's do not come together. ;-) You can contact me, if you need the libraries in a special format, as T801-files for example. I will do the best to assemble it. I plan to port the forthcoming GMP 2.1 as well. If you build any interesting applications with the library, we would like to hear from it. If you discover any error in the routines please contact Torbjoern and me. The library is well tested. As is the assembly code. Notes for optimum performance of applications: ============================================== Wherever possible use a stack size <= 4k. If you really need the routines from the /mpn section allocate the numbers from the heap, do not use the stack. Caveats: ======== Population and Hamming distance routines do not run on a t414 as they use the "bitcnt" instruction. Who has still a t414? Solution: recompile the original hamdist.c popcount.c from source. Speed: ====== Machine: Inmos t805/20MHz transputer The routines, which begin with "ref" are the standard c-source code from GMP 2.0.2. The others are coded in assembly language. "Size" is the number of 32-bit limbs the number consists of. Units are cpu clock cycles per limb. The gmp805.lib was used for the timings. Please read Torbjoern's "speed.gmp" for the performance of other processors. ======================================================= size = 10 ======================================================= refmpn_popcount: 169.20 cycles/limb mpn_popcount: 48.82 cycles/limb refmpn_lshift: 92.63 cycles/limb mpn_lshift: 73.87 cycles/limb refmpn_rshift: 90.78 cycles/limb mpn_rshift: 81.17 cycles/limb refmpn_add_n: 89.93 cycles/limb mpn_add_n: 76.13 cycles/limb refmpn_sub_n: 90.04 cycles/limb mpn_sub_n: 76.13 cycles/limb refmpn_mul_1: 120.12 cycles/limb mpn_mul_1: 92.88 cycles/limb refmpn_addmul_1: 153.49 cycles/limb mpn_addmul_1: 113.88 cycles/limb refmpn_submul_1: 153.17 cycles/limb mpn_submul_1: 114.14 cycles/limb ======================================================= ======================================================= size = 30 ======================================================= refmpn_popcount: 168.39 cycles/limb mpn_popcount: 56.48 cycles/limb refmpn_lshift: 91.52 cycles/limb mpn_lshift: 70.24 cycles/limb refmpn_rshift: 90.26 cycles/limb mpn_rshift: 78.08 cycles/limb refmpn_add_n: 86.65 cycles/limb mpn_add_n: 73.36 cycles/limb refmpn_sub_n: 86.72 cycles/limb mpn_sub_n: 73.36 cycles/limb refmpn_mul_1: 117.41 cycles/limb mpn_mul_1: 90.08 cycles/limb refmpn_addmul_1: 150.56 cycles/limb mpn_addmul_1: 111.07 cycles/limb refmpn_submul_1: 150.44 cycles/limb mpn_submul_1: 111.20 cycles/limb ======================================================= ======================================================= size = 100 ======================================================= refmpn_popcount: 168.12 cycles/limb mpn_popcount: 59.28 cycles/limb refmpn_lshift: 91.17 cycles/limb mpn_lshift: 69.00 cycles/limb refmpn_rshift: 90.08 cycles/limb mpn_rshift: 76.99 cycles/limb refmpn_add_n: 85.57 cycles/limb mpn_add_n: 72.41 cycles/limb refmpn_sub_n: 85.58 cycles/limb mpn_sub_n: 72.41 cycles/limb refmpn_mul_1: 116.46 cycles/limb mpn_mul_1: 88.97 cycles/limb refmpn_addmul_1: 149.55 cycles/limb mpn_addmul_1: 110.11 cycles/limb refmpn_submul_1: 149.54 cycles/limb mpn_submul_1: 110.15 cycles/limb ======================================================= ======================================================= size = 300 ======================================================= refmpn_popcount: 168.04 cycles/limb mpn_popcount: 61.09 cycles/limb refmpn_lshift: 91.07 cycles/limb mpn_lshift: 68.66 cycles/limb refmpn_rshift: 90.03 cycles/limb mpn_rshift: 76.69 cycles/limb refmpn_add_n: 85.25 cycles/limb mpn_add_n: 72.14 cycles/limb refmpn_sub_n: 85.25 cycles/limb mpn_sub_n: 72.14 cycles/limb refmpn_mul_1: 116.19 cycles/limb mpn_mul_1: 88.74 cycles/limb refmpn_addmul_1: 149.28 cycles/limb mpn_addmul_1: 109.85 cycles/limb refmpn_submul_1: 149.26 cycles/limb mpn_submul_1: 109.86 cycles/limb =======================================================