file: inmos.txt author: alex stuebinger date: 09/04/98 23:50 GNU MP 2.0.2/ECM for the Inmos Transputer ========================================= This is a port of Torbjoern Granlund's ^^^^^^^^^^^^^^^^^^ GNU Multiple Precision Arithmetic Library, Edition 2.0.2 of June 1996 to the Inmos Transputer. The routines can be applied in parallel. The port was done by Alexander Stuebinger ^^^^^^^^^^^^^^^^^^^^ in April 1998. Thanks to Torbjoern for making the GMP available. And also thanks for his continuing friendly cooperation. The GMP is under GNU Library General Public License. See "copying.lib". The manual is in Postscript(TM) format, (\doc). It is a must read. Be sure to check out the GMP home page for the latest information. Included is my new port of the latest ECM (Elliptic Curve Method) executable for integer factoring by Paul Zimmermann of INRIA Lorraine/ France ^^^^^^^^^^^^^^^ . ECM is an application, which uses GMP. Thanks to Paul for making it free. Paul, we love it! For more information about ECM visit The routines can be applied in parallel. The core routines, which seriously affect GMP performance are coded in assembly. This gives a significant speed improvement, see below. The speed gain for popcount and Hamming distance are the most dramatic. Maybe we will hear from some applications in Coding Theory. This distribution contains the binary libraries for the generic 32-bit transputer (/ta) and for the t805. Also included is the transputer related source code to rebuild it. This source code is a supplement to the standard GMP 2.0.2 distribution. The syntax of the makefile obeys Watcom conventions. The only difference between unix standard is the line continuation character. Notes for rebuiding it: Unpack the standard distribution of GMP 2.0.2 and truncate the filenames to 8.3 conventions. The 8.3 conventions of the Inmos AnsiC Toolset present a major annoyance. Copy the transputer source over the standard distribution. You must do this manually. The most stuff goes into the /mpn directory. Read the makefile "inmos.mak". You have to copy the headers in each directory. Walk then from directory to directory and make the libraries, e.g. "wmake /f inmos.mak mpcore.lib". Unite the libraries to "gmp.lib". For any questions please consult the source code first. The bootable files of ECM are in generic and t805 formats. The t805 executable is faster but does not run on a t4, since it has inline fpu instructions. The ECM on the Inmos Transputer is about 70 times as slow as on a Pentium2/300MHz. It's a toy, when one uses it on processors of the 80's. It is not meant for serious factoring. Well, problems of the 90's and the hardware of the 80's do not come together. ;-) You can contact me, if you need the libraries in a special format, as T425-files for example. I will do the best to assemble it. I plan to port the forthcoming GMP 2.1 as well. If you do any interesting applications with the library, we would like to hear from it. If you discover any error in the routines please contact Torbjoern and me. The library is well tested. As is the assembly code. Notes for optimum performance of applications: ============================================== Wherever possible use a stack size <= 4k. If you really need the routines from the mpn section allocate the numbers from the heap, do not use the stack. Caveats: ======== Population and Hamming distance routines do not run on a t414 as they use the "bitcnt" instruction. Who has still a t414? Solution: recompile the original hamdist.c popcount.c from source. Speed: ====== Machine: Inmos t805/30MHz transputer The routines, which begin with "ref" are the standard c-source code from GMP 2.0.2. The others are coded in assembly language. "Size" is the number of 32-bit limbs the number consists of. Units are cpu clock cycles per limb. The gmpa.lib was used for the timings. Please read Torbjoern's "speed.gmp" for the performance of other processors. ======================================================= size = 10 ======================================================= refmpn_popcount: 186.24 cycles/limb mpn_popcount: 18.82 cycles/limb refmpn_lshift: 65.86 cycles/limb mpn_lshift: 35.90 cycles/limb refmpn_rshift: 63.55 cycles/limb mpn_rshift: 42.82 cycles/limb refmpn_add_n: 66.05 cycles/limb mpn_add_n: 37.25 cycles/limb refmpn_sub_n: 66.24 cycles/limb mpn_sub_n: 37.63 cycles/limb refmpn_mul_1: 129.22 cycles/limb mpn_mul_1: 54.14 cycles/limb refmpn_addmul_1: 168.00 cycles/limb mpn_addmul_1: 75.26 cycles/limb refmpn_submul_1: 167.04 cycles/limb mpn_submul_1: 75.26 cycles/limb ======================================================= ======================================================= size = 30 ======================================================= refmpn_popcount: 193.98 cycles/limb mpn_popcount: 44.48 cycles/limb refmpn_lshift: 89.22 cycles/limb mpn_lshift: 57.28 cycles/limb refmpn_rshift: 87.17 cycles/limb mpn_rshift: 64.96 cycles/limb refmpn_add_n: 87.36 cycles/limb mpn_add_n: 59.71 cycles/limb refmpn_sub_n: 87.36 cycles/limb mpn_sub_n: 59.78 cycles/limb refmpn_mul_1: 134.98 cycles/limb mpn_mul_1: 76.67 cycles/limb refmpn_addmul_1: 173.95 cycles/limb mpn_addmul_1: 97.66 cycles/limb refmpn_submul_1: 172.93 cycles/limb mpn_submul_1: 97.73 cycles/limb ======================================================= ======================================================= size = 100 ======================================================= refmpn_popcount: 196.84 cycles/limb mpn_popcount: 56.66 cycles/limb refmpn_lshift: 97.50 cycles/limb mpn_lshift: 64.78 cycles/limb refmpn_rshift: 95.48 cycles/limb mpn_rshift: 72.71 cycles/limb refmpn_add_n: 94.81 cycles/limb mpn_add_n: 67.64 cycles/limb refmpn_sub_n: 94.85 cycles/limb mpn_sub_n: 67.66 cycles/limb refmpn_mul_1: 137.15 cycles/limb mpn_mul_1: 84.61 cycles/limb refmpn_addmul_1: 176.12 cycles/limb mpn_addmul_1: 105.64 cycles/limb refmpn_submul_1: 175.14 cycles/limb mpn_submul_1: 105.64 cycles/limb ======================================================= ======================================================= size = 300 ======================================================= refmpn_popcount: 197.66 cycles/limb mpn_popcount: 57.74 cycles/limb refmpn_lshift: 99.85 cycles/limb mpn_lshift: 66.95 cycles/limb refmpn_rshift: 97.84 cycles/limb mpn_rshift: 74.92 cycles/limb refmpn_add_n: 96.96 cycles/limb mpn_add_n: 69.89 cycles/limb refmpn_sub_n: 96.97 cycles/limb mpn_sub_n: 69.90 cycles/limb refmpn_mul_1: 137.75 cycles/limb mpn_mul_1: 86.89 cycles/limb refmpn_addmul_1: 176.75 cycles/limb mpn_addmul_1: 107.90 cycles/limb refmpn_submul_1: 175.75 cycles/limb mpn_submul_1: 107.90 cycles/limb =======================================================