diff --git a/hpl/BUGS b/hpl/BUGS deleted file mode 100644 index 638a043fd845f4a9b7262d6347de952238a5362d..0000000000000000000000000000000000000000 --- a/hpl/BUGS +++ /dev/null @@ -1,9 +0,0 @@ -============================================================== - List of the known problems with the HPL software - - Current as of release HPL - 2.1 - October 26, 2012 -============================================================== - -============================================================== - -============================================================== diff --git a/hpl/COPYRIGHT b/hpl/COPYRIGHT deleted file mode 100644 index 63e9f433645cdda4cf0505cae1609c42ce7dbdbf..0000000000000000000000000000000000000000 --- a/hpl/COPYRIGHT +++ /dev/null @@ -1,45 +0,0 @@ -====================================================================== - -- High Performance Computing Linpack Benchmark (HPL) - HPL - 2.1 - October 26, 2012 - Antoine P. Petitet - University of Tennessee, Knoxville - Innovative Computing Laboratory - (C) Copyright 2000-2008 All Rights Reserved - - -- Copyright notice and Licensing terms: - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - 1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - 2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions, and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - 3. All advertising materials mentioning features or use of this - software must display the following acknowledgement: - This product includes software developed at the University of - Tennessee, Knoxville, Innovative Computing Laboratory. - - 4. The name of the University, the name of the Laboratory, or the - names of its contributors may not be used to endorse or promote - products derived from this software without specific written - permission. - - -- Disclaimer: - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -====================================================================== diff --git a/hpl/HISTORY b/hpl/HISTORY deleted file mode 100644 index a20f5162fa11a8b12993863d384d8181ffc3e758..0000000000000000000000000000000000000000 --- a/hpl/HISTORY +++ /dev/null @@ -1,77 +0,0 @@ -============================================================== - High Performance Computing Linpack Benchmark (HPL) - HPL - 2.1 - October 26, 2012 -============================================================== - - History - - - 09/09/00 Public release of Version 1.0 - - - 09/27/00 A couple of mistakes in the VSIPL port have been - corrected. The tar file as well as the web site were updated - on September 27th, 2000. Note that these problems were not - affecting the BLAS version of the software in any way. - - - 01/01/04 Version 1.0a - The MPI process grid numbering scheme is now an run-time - option. - The inlined assembly timer routine that caused the compila- - tion to fail when using gcc version 3.3 and above has been - removed from the package. - Various building problems on the T3E have been fixed; Thanks - to Edward Anderson. - - - 15/12/04 Version 1.0b - Weakness of the pseudo-random matrix generator found for pro- - blem sizes being power of twos and larger than 2^15; Thanks - to Gregory Bauer. This problem has not been fixed. It is thus - currently recommended to HPL users willing to test matrices - of size larger than 2^15 to not use power twos. - - When the matrix size is such that one needs > 16 GB per MPI - rank, the intermediate calculation (mat.ld+1) * mat.nq in - HPL_pdtest.c ends up overflowing because it is done using - 32-bit arithmetic. This issue has been fixed by typecasting - to size_t; Thanks to John Baron. - - - 09/10/08 Version 2.0 - - Piotr Luszczek changed to 64-bit RNG, modified files: - -- [M] include/hpl_matgen.h - -- [M] testing/matgen/HPL_ladd.c - -- [M] testing/matgen/HPL_lmul.c - -- [M] testing/matgen/HPL_rand.c - -- [M] testing/ptest/HPL_pdinfo.c - - For a motivation for the change, see: - Dongarra and Langou, ``The Problem with the Linpack - Benchmark Matrix Generator'', LAWN 206, June 2008. - - -- [M] testing/ptest/HPL_pdtest.c -- - - Julien Langou changed the test for correctness from - ||Ax-b||_oo / ( eps * ||A||_1 * N ) - ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) - ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - to the normwise backward error - || r ||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - See: - Nicholas J. Higham, ``Accuracy and Stability of Numerical Algorithms'', - Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, - Second Edition, pages = xxx+680, ISBN = 0-89871-521-0, 2002. - - Note that in our case || b ||_oo is almost for sure - 1/2, we compute it anyway. - - - 10/26/2012 Version 2.1 - - Piotr Luszczek introduced exact time stamping for HPL_pdgesv(): - -- [M] dist/include/hpl_misc.h - -- [M] dist/testing/ptest/HPL_pdtest.c - - Piotr Luszczek fixed out-of-bounds access in data spreading functions. exact time stamping for HPL_pdgesv(): - -- [M] dist/src/pgesv/HPL_spreadN.c - -- [M] dist/src/pgesv/HPL_spreadT.c - Thanks to Stephen Whalen from Cray. - -============================================================== diff --git a/hpl/INSTALL b/hpl/INSTALL deleted file mode 100644 index 768540ab7ab18685c7b50430549bbcd74b278381..0000000000000000000000000000000000000000 --- a/hpl/INSTALL +++ /dev/null @@ -1,81 +0,0 @@ -============================================================== - High Performance Computing Linpack Benchmark (HPL) - HPL - 2.1 - October 26, 2012 -============================================================== - - 1) Retrieve the tar file, then - - gunzip hpl.tgz; tar -xvf hpl.tar - - this will create an hpl directory, that we call below the - top-level directory. - - 2) Create a file Make. in the top-level directory. For - this purpose, you may want to re-use one contained in the - setup directory. This file essentially contains the compilers - and librairies with their paths to be used. - - 3) Type "make arch=". This should create an executable - in the bin/ directory called xhpl. - - For example, on our Linux PII cluster, I create a file called - Make.Linux_PII in the top-level directory. Then, I type - "make arch=Linux_PII" - This creates the executable file bin/Linux_PII/xhpl. - - 4) Quick check: run a few tests: - - cd bin/ - mpirun -np 4 xhpl - - 5) Tuning: Most of the performance parameters can be tuned, - by modifying the input file bin/HPL.dat. See the file TUNING - in the top-level directory. - -============================================================== - - Compile time options: At the end of the "model" Make., - --------------------- the user is given the opportunity to - compile the software with some specific compile options. The - list of this options and their meaning are: - - -DHPL_COPY_L - force the copy of the panel L before bcast; - - -DHPL_CALL_CBLAS - call the cblas interface; - - -DHPL_CALL_VSIPL - call the vsip library; - - -DHPL_DETAILED_TIMING - enables detail timers; - - The user must choose between either the BLAS Fortran 77 - interface, or the BLAS C interface, or the VSIPL library - depending on which computational kernels are available on his - system. Only one of these options should be selected. If you - choose the BLAS Fortran 77 interface, it is necessary to fill - out the machine-specific C to Fortran 77 interface section of - the Make. file. To do this, please refer to the - Make. examples contained in the setup directory. - - By default HPL will: - *) not copy L before broadcast, - *) call the BLAS Fortran 77 interface, - *) not display detailed timing information. - - As an example, suppose one wants HPL to copy the panel of - columns into a contiguous buffer before broadcasting. In - theory, it would be more efficient to let HPL create the - appropriate MPI user-defined data type since this may avoid - the data copy. So, it is a strange idea, but one insists. To - achieve this one would add -DHPL_COPY_L to the definition of - HPL_OPTS at the end of the file Make.. Issue then a - "make clean arch=; make build arch=" and the xhpl - executable will be re-build with that feature in. -============================================================== - - Check out the website www.netlib.org/benchmark/hpl for the - latest information. -============================================================== diff --git a/hpl/Make.intel64 b/hpl/Make.intel64 deleted file mode 100644 index 1adc8127a85a60d781af42bdd47be3f6f698dd0f..0000000000000000000000000000000000000000 --- a/hpl/Make.intel64 +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = intel64 -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/work/PRACE/CodeVault/src/1_dense/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = $(I_MPI_ROOT) -MPinc = -I$(MPdir)/include64 -MPlib = $(MPdir)/intel64/lib/libmpi_mt.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(MKLROOT)/lib/intel64 -LAinc = -I$(MKLROOT)/include -LAlib = -mkl=cluster -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_CBLAS -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpiicc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = -openmp -fomit-frame-pointer -O3 -funroll-loops $(HPL_DEFS) -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = mpiicc -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/Make.top b/hpl/Make.top deleted file mode 100644 index 507e24b88e95a20156126bbbee443178cc241434..0000000000000000000000000000000000000000 --- a/hpl/Make.top +++ /dev/null @@ -1,192 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -arch = UNKNOWN -# -include Make.$(arch) -# -## build ############################################################### -# -build_src : - ( $(CD) src/auxil/$(arch); $(MAKE) ) - ( $(CD) src/blas/$(arch); $(MAKE) ) - ( $(CD) src/comm/$(arch); $(MAKE) ) - ( $(CD) src/grid/$(arch); $(MAKE) ) - ( $(CD) src/panel/$(arch); $(MAKE) ) - ( $(CD) src/pauxil/$(arch); $(MAKE) ) - ( $(CD) src/pfact/$(arch); $(MAKE) ) - ( $(CD) src/pgesv/$(arch); $(MAKE) ) -# -build_tst : - ( $(CD) testing/matgen/$(arch); $(MAKE) ) - ( $(CD) testing/timer/$(arch); $(MAKE) ) - ( $(CD) testing/pmatgen/$(arch); $(MAKE) ) - ( $(CD) testing/ptimer/$(arch); $(MAKE) ) - ( $(CD) testing/ptest/$(arch); $(MAKE) ) -#( SPMS_make_cd`' testing/test/$(arch); SPMS_make_make`' ) -# -## startup ############################################################# -# -startup_dir : - - $(MKDIR) include/$(arch) - - $(MKDIR) lib - - $(MKDIR) lib/$(arch) - - $(MKDIR) bin - - $(MKDIR) bin/$(arch) -# -startup_src : - - $(MAKE) -f Make.top leaf le=src/auxil arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/blas arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/comm arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/grid arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/panel arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/pauxil arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/pfact arch=$(arch) - - $(MAKE) -f Make.top leaf le=src/pgesv arch=$(arch) -# -startup_tst : - - $(MAKE) -f Make.top leaf le=testing/matgen arch=$(arch) - - $(MAKE) -f Make.top leaf le=testing/timer arch=$(arch) - - $(MAKE) -f Make.top leaf le=testing/pmatgen arch=$(arch) - - $(MAKE) -f Make.top leaf le=testing/ptimer arch=$(arch) - - $(MAKE) -f Make.top leaf le=testing/ptest arch=$(arch) -#- SPMS_make_make`' -f Make.top leaf le=testing/test arch=$(arch) -# -## refresh ############################################################# -# -refresh_src : - - $(CP) makes/Make.auxil src/auxil/$(arch)/Makefile - - $(CP) makes/Make.blas src/blas/$(arch)/Makefile - - $(CP) makes/Make.comm src/comm/$(arch)/Makefile - - $(CP) makes/Make.grid src/grid/$(arch)/Makefile - - $(CP) makes/Make.panel src/panel/$(arch)/Makefile - - $(CP) makes/Make.pauxil src/pauxil/$(arch)/Makefile - - $(CP) makes/Make.pfact src/pfact/$(arch)/Makefile - - $(CP) makes/Make.pgesv src/pgesv/$(arch)/Makefile -# -refresh_tst : - - $(CP) makes/Make.matgen testing/matgen/$(arch)/Makefile - - $(CP) makes/Make.timer testing/timer/$(arch)/Makefile - - $(CP) makes/Make.pmatgen testing/pmatgen/$(arch)/Makefile - - $(CP) makes/Make.ptimer testing/ptimer/$(arch)/Makefile - - $(CP) makes/Make.ptest testing/ptest/$(arch)/Makefile -#- SPMS_make_cp`' makes/Make.test testing/test/$(arch)/Makefile -# -## clean ############################################################### -# -clean_src : - - ( $(CD) src/auxil/$(arch); $(MAKE) clean ) - - ( $(CD) src/blas/$(arch); $(MAKE) clean ) - - ( $(CD) src/comm/$(arch); $(MAKE) clean ) - - ( $(CD) src/grid/$(arch); $(MAKE) clean ) - - ( $(CD) src/panel/$(arch); $(MAKE) clean ) - - ( $(CD) src/pauxil/$(arch); $(MAKE) clean ) - - ( $(CD) src/pfact/$(arch); $(MAKE) clean ) - - ( $(CD) src/pgesv/$(arch); $(MAKE) clean ) -# -clean_tst : - - ( $(CD) testing/matgen/$(arch); $(MAKE) clean ) - - ( $(CD) testing/timer/$(arch); $(MAKE) clean ) - - ( $(CD) testing/pmatgen/$(arch); $(MAKE) clean ) - - ( $(CD) testing/ptimer/$(arch); $(MAKE) clean ) - - ( $(CD) testing/ptest/$(arch); $(MAKE) clean ) -#- ( SPMS_make_cd`' testing/test/$(arch); SPMS_make_make`' clean ) -# -## clean_arch ########################################################## -# -clean_arch_src : - - $(RM) -r src/auxil/$(arch) - - $(RM) -r src/blas/$(arch) - - $(RM) -r src/comm/$(arch) - - $(RM) -r src/grid/$(arch) - - $(RM) -r src/panel/$(arch) - - $(RM) -r src/pauxil/$(arch) - - $(RM) -r src/pfact/$(arch) - - $(RM) -r src/pgesv/$(arch) -# -clean_arch_tst : - - $(RM) -r testing/matgen/$(arch) - - $(RM) -r testing/timer/$(arch) - - $(RM) -r testing/pmatgen/$(arch) - - $(RM) -r testing/ptimer/$(arch) - - $(RM) -r testing/ptest/$(arch) -#- SPMS_make_rm`' -r testing/test/$(arch) -# -## clean_arch_all ###################################################### -# -clean_arch_all : - - $(MAKE) -f Make.top clean_arch_src arch=$(arch) - - $(MAKE) -f Make.top clean_arch_tst arch=$(arch) - - $(RM) -r bin/$(arch) include/$(arch) lib/$(arch) -# -## clean_guard ######################################################### -# -clean_guard_src : - - ( $(CD) src/auxil/$(arch); $(RM) *.grd ) - - ( $(CD) src/blas/$(arch); $(RM) *.grd ) - - ( $(CD) src/comm/$(arch); $(RM) *.grd ) - - ( $(CD) src/grid/$(arch); $(RM) *.grd ) - - ( $(CD) src/panel/$(arch); $(RM) *.grd ) - - ( $(CD) src/pauxil/$(arch); $(RM) *.grd ) - - ( $(CD) src/pfact/$(arch); $(RM) *.grd ) - - ( $(CD) src/pgesv/$(arch); $(RM) *.grd ) -# -clean_guard_tst : - - ( $(CD) testing/matgen/$(arch); $(RM) *.grd ) - - ( $(CD) testing/timer/$(arch); $(RM) *.grd ) - - ( $(CD) testing/pmatgen/$(arch); $(RM) *.grd ) - - ( $(CD) testing/ptimer/$(arch); $(RM) *.grd ) - - ( $(CD) testing/ptest/$(arch); $(RM) *.grd ) -#- ( SPMS_make_cd`' testing/test/$(arch); SPMS_make_rm`' *.grd ) -# -## misc ################################################################ -# -leaf : - - ( $(CD) $(le) ; $(MKDIR) $(arch) ) - - ( $(CD) $(le)/$(arch) ; \ - $(LN_S) $(TOPdir)/Make.$(arch) Make.inc ) -# -######################################################################## diff --git a/hpl/Makefile b/hpl/Makefile deleted file mode 100644 index 9f1ad5e84807472ff58106910d03b6ec9ee21d0d..0000000000000000000000000000000000000000 --- a/hpl/Makefile +++ /dev/null @@ -1,90 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# -SHELL = /bin/sh -# -arch = UNKNOWN -# -## Targets ############################################################# -# -all : install -# -# ###################################################################### -# -install : startup refresh build -# -startup : - $(MAKE) -f Make.top startup_dir arch=$(arch) - $(MAKE) -f Make.top startup_src arch=$(arch) - $(MAKE) -f Make.top startup_tst arch=$(arch) - $(MAKE) -f Make.top refresh_src arch=$(arch) - $(MAKE) -f Make.top refresh_tst arch=$(arch) -# -refresh : - $(MAKE) -f Make.top refresh_src arch=$(arch) - $(MAKE) -f Make.top refresh_tst arch=$(arch) -# -build : - $(MAKE) -f Make.top build_src arch=$(arch) - $(MAKE) -f Make.top build_tst arch=$(arch) -# -clean : - $(MAKE) -f Make.top clean_src arch=$(arch) - $(MAKE) -f Make.top clean_tst arch=$(arch) -# -clean_arch : - $(MAKE) -f Make.top clean_arch_src arch=$(arch) - $(MAKE) -f Make.top clean_arch_tst arch=$(arch) -# -clean_arch_all : - $(MAKE) -f Make.top clean_arch_all arch=$(arch) -# -clean_guard : - $(MAKE) -f Make.top clean_guard_src arch=$(arch) - $(MAKE) -f Make.top clean_guard_tst arch=$(arch) -# -# ###################################################################### diff --git a/hpl/README b/hpl/README deleted file mode 100644 index 1a9781d76b1216be336414fadfc95f2a8d6398cd..0000000000000000000000000000000000000000 --- a/hpl/README +++ /dev/null @@ -1,32 +0,0 @@ -============================================================== - High Performance Computing Linpack Benchmark (HPL) - HPL - 2.1 - October 26, 2012 -============================================================== - - HPL is a software package that solves a (random) dense linear - system in double precision (64 bits) arithmetic on - distributed-memory computers. It can thus be regarded as a - portable as well as freely available implementation of the - High Performance Computing Linpack Benchmark. - - The HPL software package requires the availibility on your - system of an implementation of the Message Passing Interface - MPI (1.1 compliant). An implementation of either the Basic - Linear Algebra Subprograms BLAS or the Vector Signal Image - Processing Library VSIPL is also needed. Machine-specific as - well as generic implementations of MPI, the BLAS and VSIPL - are available for a large variety of systems. - - Install See the file INSTALL in this directory. - ------- - - Tuning See the file TUNING in this directory. - ------ - - Bugs Known problems and bugs with this release are documen- - ---- ted in the file hpl/BUGS. - - Check out the website www.netlib.org/benchmark/hpl for the - latest information. - -============================================================== diff --git a/hpl/TODO b/hpl/TODO deleted file mode 100644 index 9e8668a3f758d6ef0cb4f73f27b7baf6ce7d6f50..0000000000000000000000000000000000000000 --- a/hpl/TODO +++ /dev/null @@ -1,16 +0,0 @@ -============================================================== - High Performance Computing Linpack Benchmark (HPL) - HPL - 2.1 - October 26, 2012 -============================================================== - - Done list in version 1.0b, December 15th, 2004 - - Fixed problem with 32-bit integer overflow. - Thanks to John Baron. - - Done list in version 1.0a, January 1st, 2004 - - Added Row- or Column-major process mapping in data file - - Fixed compilation error for gcc 3.3 in walltime. - - Fixed building problems on the T3E; - Thanks to Edward Anderson. - -============================================================== diff --git a/hpl/TUNING b/hpl/TUNING deleted file mode 100644 index b2b7234c532849b92229bcb6dcbd519b97bcdf60..0000000000000000000000000000000000000000 --- a/hpl/TUNING +++ /dev/null @@ -1,419 +0,0 @@ -============================================================== - Performance Tuning and setting up the input data file HPL.dat - - Current as of release HPL - 2.1 - October 26, 2012 -============================================================== - Check out the website www.netlib.org/benchmark/hpl for the - latest information. - - After having built the executable hpl/bin//xhpl, one - may want to modify the input data file HPL.dat. This file - should reside in the same directory as the executable - hpl/bin//xhpl. An example HPL.dat file is provided by - default. This file contains information about the problem - sizes, machine configuration, and algorithm features to be - used by the executable. It is 30 lines long. All the selected - parameters will be printed in the output generated by the - executable. - - At the end of this file, there is a couple of experimental - guide lines that you may find useful. - -============================================================== - File HPL.dat (description): - - Line 1: (unused) Typically one would use this line for its - own good. For example, it could be used to summarize the con- - tent of the input file. By default this line reads: - - HPL Linpack benchmark input file - - Line 2: (unused) same as line 1. By default this line reads: - - Innovative Computing Laboratory, University of Tennessee - - Line 3: the user can choose where the output should be re- - directed to. In the case of a file, a name is necessary, and - this is the line where one wants to specify it. Only the - first name on this line is significative. By default, the li- - ne reads: - - HPL.out output file name (if any) - - This means that if one chooses to redirect the output to a - file, the file will be called "HPL.out". The rest of the line - is unused, and this space to put some informative comment on - the meaning of this line. - - Line 4: This line specifies where the output should go. The - line is formatted, it must be a positive integer, the rest is - unsignificant. 3 choices are possible for the positive inte- - ger, 6 means that the output will go the standard output, 7 - means that the output will go to the standard error. Any o- - ther integer means that the output should be redirected - to a file, which name has been specified in the line above. - This line by default reads: - - 6 device out (6=stdout,7=stderr,file) - - which means that the output generated by the executable - should be redirected to the standard output. - - Line 5: This line specifies the number of problem sizes to be - executed. This number should be less than or equal to 20. The - first integer is significant, the rest is ignored. If the - line reads: - - 3 # of problems sizes (N) - - this means that the user is willing to run 3 problem sizes - that will be specified in the next line. - - Line 6: This line specifies the problem sizes one wants to - run. Assuming the line above started with 3, the 3 first - positive integers are significant, the rest is ignored. For - example: - - 3000 6000 10000 Ns - - means that one wants xhpl to run 3 (specified in line 5) pro- - blem sizes, namely 3000, 6000 and 10000. - - Line 7: This line specifies the number of block sizes to be - runned. This number should be less than or equal to 20. - The first integer is significant, the rest is ignored. If the - line reads: - - 5 # of NBs - - this means that the user is willing to use 5 block sizes that - will be specified in the next line. - - Line 8: This line specifies the block sizes one wants to run. - Assuming the line above started with 5, the 5 first positive - integers are significant, the rest is ignored. For example: - - 80 100 120 140 160 NBs - - means that one wants xhpl to use 5 (specified in line 7) - block sizes, namely 80, 100, 120, 140 and 160. - - Line 9 specifies how the MPI processes should be mapped onto - the nodes of your platform. There are currently two possible - mappings, namely row- and column-major. This feature is main- - ly useful when these nodes are themselves multi-processor - computers. A row-major mapping is recommended. - - Line 10: This line specifies the number of process grid to - be runned. This number should be less than or equal to 20. - The first integer is significant, the rest is ignored. If the - line reads: - - 2 # of process grids (P x Q) - - this means that you are willing to try 2 process grid sizes - that will be specified in the next line. - - Line 11-12: These two lines specify the number of process - rows and columns of each grid you want to run on. Assuming - the line above (10) started with 2, the 2 first positive in- - tegers of those two lines are significant, the rest is igno- - red. For example: - - 1 2 Ps - 6 8 Qs - - means that one wants to run xhpl on 2 process grids (line - 10), namely 1 by 6 and 2 by 8. Note: In this example, it is - required then to start xhpl on at least 16 nodes (max of P_i - xQ_i). The runs on the two grids will be consecutive. If one - was starting xhpl on more than 16 nodes, say 52, only 6 would - be used for the first grid (1x6) and then 16 (2x8) would be - used for the second grid. The fact that you started the MPI - job on 52 nodes, will not make HPL use all of them. In this - example, only 16 would be used. If one wants to run xhpl with - 52 processes one needs to specify a grid of 52 processes, for - example the following lines would do the job: - - 4 2 Ps - 13 8 Qs - - Line 13: This line specifies the threshold the residuals - should be compared to. The residuals should be or order 1, - but are in practice slightly less than this, typically 0.001. - This line is made of a real number, the rest is unsignifi- - cant. For example: - - 16.0 threshold - - In practice, a value of 16.0 will cover most cases. For va- - rious reasons, it is possible that some of the residuals be- - come slightly larger, say for example 35.6. xhpl will flag - those runs as failed, however they can be considered as cor- - rect. A run can be considered as failed if the residual is a - few order of magnitude bigger than 1 for example 10^6 or mo- - re. Note: if one was to specify a threshold of 0.0, all tests - would be flagged as failed, even though the answer is likely - to be correct. It is allowed to specify a negative value for - this threshold, in which case the checks will be by-passed, - no matter what the value is, as soon as it is negative. This - feature allows to save time when performing a lot of experi- - ments, say for instance during the tuning phase. Example: - - -16.0 threshold - - The remaning lines allow to specifies algorithmic features. - xhpl will run all possible combinations of those for each - problem size, block size, process grid combination. This is - handy when one looks for an "optimal" set of parameters. To - understand a little bit better, let say first a few words - about the algorithm implemented in HPL. Basically this is a - right-looking version with row-partial pivoting. The panel - factorization is matrix-matrix operation based and recursive, - dividing the panel into NDIV subpanels at each step. This - part of the panel factorization is denoted below by - "recursive panel fact. (RFACT)". The recursion stops when the - current panel is made of less than or equal to NBMIN columns. - At that point, xhpl uses a matrix-vector operation based - factorization denoted below by "PFACTs". Classic recursion - would then use NDIV=2, NBMIN=1. There are essentially 3 - numerically equivalent LU factorization algorithm variants - (left-looking, Crout and right-looking). In HPL, one can - choose every one of those for the RFACT, as well as the - PFACT. The following lines of HPL.dat allows you to set those - parameters. - - Lines 14-21: (Example 1) - 3 # of panel fact - 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) - 4 # of recursive stopping criterium - 1 2 4 8 NBMINs (>= 1) - 3 # of panels in recursion - 2 3 4 NDIVs - 3 # of recursive panel fact. - 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) - - This example would try all variants of PFACT, 4 values for - NBMIN, namely 1, 2, 4 and 8, 3 values for NDIV namely 2, 3 - and 4, and all variants for RFACT. Lines 14-21: (Example 1) - - 2 # of panel fact - 2 0 PFACTs (0=left, 1=Crout, 2=Right) - 2 # of recursive stopping criterium - 4 8 NBMINs (>= 1) - 1 # of panels in recursion - 2 NDIVs - 1 # of recursive panel fact. - 2 RFACTs (0=left, 1=Crout, 2=Right) - - This example would try 2 variants of PFACT namely right loo- - king and left looking, 2 values for NBMIN, namely 4 and 8, 1 - value for NDIV namely 2, and one variant for RFACT. - - In the main loop of the algorithm, the current panel of co- - lumn is broadcast in process rows using a virtual ring to- - pology. HPL offers various choices, and one most likely want - to use the increasing ring modified encoded as 1. 4 is also - a good choice. Lines 22-23: (Example 1): - - 1 # of broadcast - 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) - - This will cause HPL to broadcast the current panel using the - increasing ring modified topology. Lines 22-23: (Example 2): - - 2 # of broadcast - 0 4 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) - - This will cause HPL to broadcast the current panel using the - increasing ring virtual topology and the long message algori- - thm. - - Lines 24-25 allow to specify the look-ahead depth used by - HPL. A depth of 0 means that the next panel is factorized af- - ter the update by the current panel is completely finished. A - depth of 1 means that the next panel is factorized immediate- - ly after being updated. The update by the current panel is - then finished. A depth of k means that the k next panels are - factorized immediately after being updated. The update by the - current panel is then finished. It turns out that a depth of - 1 seems to give the best results, but may need a large pro- - blem size before one can see the performance gain. So use 1, - if you do not know better, otherwise you may want to try 0. - Look-ahead of depths 2 and larger will probably not give you - better results. Lines 24-25: (Example 1): - - 1 # of lookahead depth - 1 DEPTHs (>=0) - - This will cause HPL to use a look-ahead of depth 1. - Lines 24-25: (Example 2): - - 2 # of lookahead depth - 0 1 DEPTHs (>=0) - - This will cause HPL to use a look-ahead of depths 0 and 1. - - Lines 26-27 allow to specify the swapping algorithm used by - HPL for all tests. There are currently two swapping algo- - rithms available, one based on "binary exchange" and the - other one based on a "spread-roll" procedure (also called - "long" below. For large problem sizes, this last one is like- - ly to be more efficient. The user can also choose to mix both - variants, that is "binary-exchange" for a number of columns - less than a threshold value, and then the "spread-roll" al- - gorithm. This threshold value is then specified on Line 27. - Lines 26-27: (Example 1): - - 1 SWAP (0=bin-exch,1=long,2=mix) - 60 swapping threshold - - This will cause HPL to use the "long" or "spread-roll" swap- - ping algorithm. Note that a threshold is specified in that - example but not used by HPL. Lines 26-27: (Example 2): - - 2 SWAP (0=bin-exch,1=long,2=mix) - 60 swapping threshold - - This will cause HPL to use the "long" or "spread-roll" swap- - ping algorithm as soon as there is more than 60 columns in - the row panel. Otherwise, the "binary-exchange" algorithm - will be used instead. - - Line 28 allows to specify whether the upper triangle of the - panel of columns should be stored in no-transposed or - transposed form. Example: - - 0 L1 in (0=transposed,1=no-transposed) form - - Line 29 allows to specify whether the panel of rows U should - be stored in no-transposed or transposed form. Example: - - 0 U in (0=transposed,1=no-transposed) form - - Line 30 enables/disables the equilibration phase. This option - will not be used unless you selected 1 or 2 in Line 26. Ex: - - 1 Equilibration (0=no,1=yes) - - - Line 31 allows to specify the alignment in memory for the - memory space allocated by HPL. On modern machines, one proba- - bly wants to use 4, 8 or 16. This may result in a tiny amount - of memory wasted. Example: - - 4 memory alignment in double (> 0) - -============================================================== - Guide lines: - - 1) Figure out a good block size for the matrix-matrix - multiply routine. The best method is to try a few out. If you - happen to know the block size used by the matrix-matrix - multiply routine, a small multiple of that block size will do - fine. - - HPL uses the block size NB for the data distribution as well - as for the computational granularity. From a data - distribution point of view, the smallest NB, the better the - load balance. You definitely want to stay away from very - large values of NB. From a computation point of view, a too - small value of NB may limit the computational performance by - a large factor because almost no data reuse will occur in the - highest level of the memory hierarchy. The number of messages - will also increase. Efficient matrix-multiply routines are - often internally blocked. Small multiples of this blocking - factor are likely to be good block sizes for HPL. The bottom - line is that "good" block sizes are almost always in the - [32..256] interval. The best values depend on the computation - / communication performance ratio of your system. To a much - less extent, the problem size matters as well. Say for - example, you emperically found that 44 was a good block size - with respect to performance. 88 or 132 are likely to give - slightly better results for large problem sizes because of a - slighlty higher flop rate. - - 2) The process mapping should not matter if the nodes of - your platform are single processor computers. If these nodes - are multi-processors, a row-major mapping is recommended. - - 3) HPL likes "square" or slightly flat process grids. Unless - you are using a very small process grid, stay away from the - 1-by-Q and P-by-1 process grids. - - 4) Panel factorization parameters: a good start are the fol- - lowing for the lines 14-21: - - 1 # of panel fact - 1 PFACTs (0=left, 1=Crout, 2=Right) - 2 # of recursive stopping criterium - 4 8 NBMINs (>= 1) - 1 # of panels in recursion - 2 NDIVs - 1 # of recursive panel fact. - 2 RFACTs (0=left, 1=Crout, 2=Right) - - 5) Broadcast parameters: at this time, it is far from obvious - to me what the best setting is, so i would probably try them - all. If I had to guess I would probably start with the follo- - wing for the lines 22-23: - - 2 # of broadcast - 1 3 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) - - The best broadcast depends on your problem size and harware - performance. My take is that 4 or 5 may be competitive for - machines featuring very fast nodes comparatively to the - network. - - 6) Look-ahead depth: as mentioned above 0 or 1 are likely to - be the best choices. This also depends on the problem size - and machine configuration, so I would try "no look-ahead (0)" - and "look-ahead of depth 1 (1)". That is for lines 24-25: - - 2 # of lookahead depth - 0 1 DEPTHs (>=0) - - 7) Swapping: one can select only one of the three algorithm - in the input file. Theoretically, mix (2) should win, however - long (1) might just be good enough. The difference should be - small between those two assuming a swapping threshold of the - order of the block size (NB) selected. If this threshold is - very large, HPL will use bin_exch (0) most of the time and if - it is very small (< NB) long (1) will always be used. In - short and assuming the block size (NB) used is say 60, I - would choose for the lines 26-27: - - 2 SWAP (0=bin-exch,1=long,2=mix) - 60 swapping threshold - - I would also try the long variant. For a very small number - of processes in every column of the process grid (say < 4), - very little performance difference should be observable. - - 8) Local storage: I do not think Line 28 matters. Pick 0 in - doubt. Line 29 is more important. It controls how the panel - of rows should be stored. No doubt 0 is better. The caveat is - that in that case the matrix-multiply function is called with - ( Notrans, Trans, ... ), that is C := C - A B^T. Unless the - computational kernel you are using has a very poor (with - respect to performance) implementation of that case, and is - much more efficient with ( Notrans, Notrans, ... ) just pick - 0 as well. So, my choice: - - 0 L1 in (0=transposed,1=no-transposed) form - 0 U in (0=transposed,1=no-transposed) form - - 9) Equilibration: It is hard to tell whether equilibration - should always be performed or not. Not knowing much about the - random matrix generated and because the overhead is so small - compared to the possible gain, I turn it on all the time. - - 1 Equilibration (0=no,1=yes) - - 10) For alignment, 4 should be plenty, but just to be safe, - one may want to pick 8 instead. - - 8 memory alignment in double (> 0) - -============================================================== diff --git a/hpl/bin/intel64/HPL.dat b/hpl/bin/intel64/HPL.dat deleted file mode 100644 index 3504a3403e09cf5188a495d632f0eadee9ad2c65..0000000000000000000000000000000000000000 --- a/hpl/bin/intel64/HPL.dat +++ /dev/null @@ -1,31 +0,0 @@ -HPLinpack benchmark input file -Innovative Computing Laboratory, University of Tennessee -HPL.out output file name (if any) -6 device out (6=stdout,7=stderr,file) -1 # of problems sizes (N) -102144 Ns -1 # of NBs -256 NBs -0 PMAP process mapping (0=Row-,1=Column-major) -1 # of process grids (P x Q) -2 Ps -8 Qs -16.0 threshold -3 # of panel fact -0 PFACTs (0=left, 1=Crout, 2=Right) -1 # of recursive stopping criterium -4 NBMINs (>= 1) -1 # of panels in recursion -2 NDIVs -1 # of recursive panel fact. -0 RFACTs (0=left, 1=Crout, 2=Right) -1 # of broadcast -0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) -1 # of lookahead depth -0 DEPTHs (>=0) -2 SWAP (0=bin-exch,1=long,2=mix) -64 swapping threshold -0 L1 in (0=transposed,1=no-transposed) form -0 U in (0=transposed,1=no-transposed) form -1 Equilibration (0=no,1=yes) -8 memory alignment in double (> 0) diff --git a/hpl/bin/intel64/xhpl b/hpl/bin/intel64/xhpl deleted file mode 100755 index 728303e53f15a219318eff3e16be2ea2f0831e90..0000000000000000000000000000000000000000 Binary files a/hpl/bin/intel64/xhpl and /dev/null differ diff --git a/hpl/include/hpl.h b/hpl/include/hpl.h deleted file mode 100644 index 513830dc4a255e27aaf66795b3ca36801e8ac72d..0000000000000000000000000000000000000000 --- a/hpl/include/hpl.h +++ /dev/null @@ -1,97 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_H -#define HPL_H -/* - * --------------------------------------------------------------------- - * HPL default compile options that can overridden in the Make. - * --------------------------------------------------------------------- - */ -#ifndef HPL_NO_MPI_DATATYPE /* Use MPI user-defined data type */ -#define HPL_USE_MPI_DATATYPE -#endif - -#ifndef HPL_COPY_L /* do not copy L, use MPI user-defined data types */ -#define HPL_NO_COPY_L -#endif - -#ifndef HPL_DETAILED_TIMING /* Do not enable detailed timings */ -#define HPL_NO_DETAILED_TIMING -#endif - -#ifndef HPL_CALL_VSIPL /* Call the Fortran 77 BLAS interface */ -#ifndef HPL_CALL_CBLAS /* there can be only one */ -#define HPL_CALL_FBLAS -#endif -#endif -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" -#include "hpl_gesv.h" - -#include "hpl_pmisc.h" -#include "hpl_pauxil.h" -#include "hpl_panel.h" -#include "hpl_pfact.h" -#include "hpl_pgesv.h" - -#include "hpl_timer.h" -#include "hpl_matgen.h" -#include "hpl_test.h" - -#include "hpl_ptimer.h" -#include "hpl_pmatgen.h" -#include "hpl_ptest.h" - -#endif -/* - * End of hpl.h - */ diff --git a/hpl/include/hpl_auxil.h b/hpl/include/hpl_auxil.h deleted file mode 100644 index 9ca8c0c6cd3b4259b865ad9d9387b2e566eb07d6..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_auxil.h +++ /dev/null @@ -1,147 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_AUXIL_H -#define HPL_AUXIL_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -/* - * --------------------------------------------------------------------- - * typedef definitions - * --------------------------------------------------------------------- - */ -typedef enum -{ HPL_NORM_A = 800, HPL_NORM_1 = 801, HPL_NORM_I = 802 } HPL_T_NORM; - -typedef enum -{ - HPL_MACH_EPS = 900, /* relative machine precision */ - HPL_MACH_SFMIN = 901, /* safe minimum st 1/sfmin does not overflow */ - HPL_MACH_BASE = 902, /* base = base of the machine */ - HPL_MACH_PREC = 903, /* prec = eps*base */ - HPL_MACH_MLEN = 904, /* number of (base) digits in the mantissa */ - HPL_MACH_RND = 905, /* 1.0 if rounding occurs in addition */ - HPL_MACH_EMIN = 906, /* min exponent before (gradual) underflow */ - HPL_MACH_RMIN = 907, /* underflow threshold base**(emin-1) */ - HPL_MACH_EMAX = 908, /* largest exponent before overflow */ - HPL_MACH_RMAX = 909 /* overflow threshold - (base**emax)*(1-eps) */ - -} HPL_T_MACH; -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_fprintf -STDC_ARGS( ( - FILE *, - const char *, - ... -) ); -void HPL_warn -STDC_ARGS( ( - FILE *, - int, - const char *, - const char *, - ... -) ); -void HPL_abort -STDC_ARGS( ( - int, - const char *, - const char *, - ... -) ); -void HPL_dlacpy -STDC_ARGS( ( - const int, - const int, - const double *, - const int, - double *, - const int -) ); -void HPL_dlatcpy -STDC_ARGS( ( - const int, - const int, - const double *, - const int, - double *, - const int -) ); -void HPL_dlaprnt -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const int, - const int, - const char * -) ); -double HPL_dlange -STDC_ARGS( ( - const HPL_T_NORM, - const int, - const int, - const double *, - const int -) ); -double HPL_dlamch -STDC_ARGS( ( - const HPL_T_MACH -) ); - -#endif -/* - * End of hpl_auxil.h - */ diff --git a/hpl/include/hpl_blas.h b/hpl/include/hpl_blas.h deleted file mode 100644 index c206bcd0edad5e71fdd80cddf65b65b9bd1493be..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_blas.h +++ /dev/null @@ -1,599 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_BLAS_H -#define HPL_BLAS_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -/* - * --------------------------------------------------------------------- - * typedef definitions - * --------------------------------------------------------------------- - */ -enum HPL_ORDER -{ HplRowMajor = 101, HplColumnMajor = 102 }; -enum HPL_TRANS -{ HplNoTrans = 111, HplTrans = 112, HplConjTrans = 113 }; -enum HPL_UPLO -{ HplUpper = 121, HplLower = 122 }; -enum HPL_DIAG -{ HplNonUnit = 131, HplUnit = 132 }; -enum HPL_SIDE -{ HplLeft = 141, HplRight = 142 }; - -#ifdef HPL_CALL_CBLAS -/* - * --------------------------------------------------------------------- - * The C interface of the BLAS is available ... - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define CBLAS_INDEX int - -#define CBLAS_ORDER HPL_ORDER -#define CblasRowMajor HplRowMajor -#define CblasColMajor HplColMajor - -#define CBLAS_TRANSPOSE HPL_TRANS -#define CblasNoTrans HplNoTrans -#define CblasTrans HplTrans -#define CblasConjTrans HplConjTrans - -#define CBLAS_UPLO HPL_UPLO -#define CblasUpper HplUpper -#define CblasLower HplLower - -#define CBLAS_DIAG HPL_DIAG -#define CblasNonUnit HplNonUnit -#define CblasUnit HplUnit - -#define CBLAS_SIDE HPL_SIDE -#define CblasLeft HplLeft -#define CblasRight HplRight -/* - * --------------------------------------------------------------------- - * CBLAS Function prototypes - * --------------------------------------------------------------------- - */ -CBLAS_INDEX cblas_idamax -STDC_ARGS( -( const int, const double *, const int ) ); -void cblas_dswap -STDC_ARGS( -( const int, double *, const int, double *, - const int ) ); -void cblas_dcopy -STDC_ARGS( -( const int, const double *, const int, double *, - const int ) ); -void cblas_daxpy -STDC_ARGS( -( const int, const double, const double *, const int, - double *, const int ) ); -void cblas_dscal -STDC_ARGS( -( const int, const double, double *, const int ) ); - -void cblas_dgemv -STDC_ARGS( -( const enum CBLAS_ORDER, const enum CBLAS_TRANSPOSE, - const int, const int, const double, const double *, - const int, const double *, const int, const double, - double *, const int ) ); - -void cblas_dger -STDC_ARGS( -( const enum CBLAS_ORDER, const int, const int, - const double, const double *, const int, const double *, - const int, double *, const int ) ); -void cblas_dtrsv -STDC_ARGS( -( const enum CBLAS_ORDER, const enum CBLAS_UPLO, - const enum CBLAS_TRANSPOSE, const enum CBLAS_DIAG, - const int, const double *, const int, double *, - const int ) ); - -void cblas_dgemm -STDC_ARGS( -( const enum CBLAS_ORDER, const enum CBLAS_TRANSPOSE, - const enum CBLAS_TRANSPOSE, const int, const int, - const int, const double, const double *, const int, - const double *, const int, const double, double *, - const int ) ); -void cblas_dtrsm -STDC_ARGS( -( const enum CBLAS_ORDER, const enum CBLAS_SIDE, - const enum CBLAS_UPLO, const enum CBLAS_TRANSPOSE, - const enum CBLAS_DIAG, const int, const int, - const double, const double *, const int, double *, - const int ) ); -/* - * --------------------------------------------------------------------- - * HPL C BLAS macro definition - * --------------------------------------------------------------------- - */ -#define HPL_dswap cblas_dswap -#define HPL_dcopy cblas_dcopy -#define HPL_daxpy cblas_daxpy -#define HPL_dscal cblas_dscal -#define HPL_idamax cblas_idamax - -#define HPL_dgemv cblas_dgemv -#define HPL_dtrsv cblas_dtrsv -#define HPL_dger cblas_dger - -#define HPL_dgemm cblas_dgemm -#define HPL_dtrsm cblas_dtrsm - -#endif - -#ifdef HPL_CALL_FBLAS -/* - * --------------------------------------------------------------------- - * Use the Fortran 77 interface of the BLAS ... - * --------------------------------------------------------------------- - * Defaults: Add_, F77_INTEGER=int, StringSunStyle - * --------------------------------------------------------------------- - */ -#ifndef NoChange -#ifndef UpCase -#ifndef Add__ -#ifndef Add_ - -#define Add_ - -#endif -#endif -#endif -#endif - -#ifndef F77_INTEGER -#define F77_INTEGER int -#else -#define HPL_USE_F77_INTEGER_DEF -#endif - -#ifndef StringCrayStyle -#ifndef StringStructVal -#ifndef StringStructPtr -#ifndef StringSunStyle - -#define StringSunStyle - -#endif -#endif -#endif -#endif -/* - * --------------------------------------------------------------------- - * Fortran 77 <-> C interface - * --------------------------------------------------------------------- - * - * These macros identifies how Fortran routines will be called. - * - * Add_ : the Fortran compiler expects the name of C functions to be - * in all lower case and to have an underscore postfixed it (Suns, Intel - * compilers expect this). - * - * NoChange : the Fortran compiler expects the name of C functions to be - * in all lower case (IBM RS6K compilers do this). - * - * UpCase : the Fortran compiler expects the name of C functions to be - * in all upcase. (Cray compilers expect this). - * - * Add__ : the Fortran compiler in use is f2c, a Fortran to C conver- - * ter. - */ -#ifdef NoChange -/* - * These defines set up the naming scheme required to have a FORTRAN - * routine called by a C routine with the following FORTRAN to C inter- - * face: - * - * FORTRAN DECLARATION C CALL - * SUBROUTINE DGEMM(...) dgemm(...) - */ -#define F77dswap dswap -#define F77dscal dscal -#define F77dcopy dcopy -#define F77daxpy daxpy -#define F77idamax idamax - -#define F77dgemv dgemv -#define F77dtrsv dtrsv -#define F77dger dger - -#define F77dgemm dgemm -#define F77dtrsm dtrsm - -#endif - -#ifdef UpCase -/* - * These defines set up the naming scheme required to have a FORTRAN - * routine called by a C routine with the following FORTRAN to C inter- - * face: - * - * FORTRAN DECLARATION C CALL - * SUBROUTINE DGEMM(...) DGEMM(...) - */ -#ifdef CRAY_BLAS - -#define F77dswap SSWAP -#define F77dscal SSCAL -#define F77dcopy SCOPY -#define F77daxpy SAXPY -#define F77idamax ISAMAX - -#define F77dgemv SGEMV -#define F77dtrsv STRSV -#define F77dger SGER - -#define F77dgemm SGEMM -#define F77dtrsm STRSM - -#else - -#define F77dswap DSWAP -#define F77dscal DSCAL -#define F77dcopy DCOPY -#define F77daxpy DAXPY -#define F77idamax IDAMAX - -#define F77dgemv DGEMV -#define F77dtrsv DTRSV -#define F77dger DGER - -#define F77dgemm DGEMM -#define F77dtrsm DTRSM - -#endif - -#endif - -#ifdef Add_ -/* - * These defines set up the naming scheme required to have a FORTRAN - * routine called by a C routine with the following FORTRAN to C inter- - * face: - * - * FORTRAN DECLARATION C CALL - * SUBROUTINE DGEMM(...) dgemm_(...) - */ -#define F77dswap dswap_ -#define F77dscal dscal_ -#define F77dcopy dcopy_ -#define F77daxpy daxpy_ -#define F77idamax idamax_ - -#define F77dgemv dgemv_ -#define F77dtrsv dtrsv_ -#define F77dger dger_ - -#define F77dgemm dgemm_ -#define F77dtrsm dtrsm_ - -#endif - -#ifdef Add__ -/* - * These defines set up the naming scheme required to have a FORTRAN - * routine called by a C routine with the following FORTRAN to C inter- - * face: - * - * FORTRAN DECLARATION C CALL - * SUBROUTINE DGEMM(...) dgemm_(...) - */ -#define F77dswap dswap_ -#define F77dscal dscal_ -#define F77dcopy dcopy_ -#define F77daxpy daxpy_ -#define F77idamax idamax_ - -#define F77dgemv dgemv_ -#define F77dtrsv dtrsv_ -#define F77dger dger_ - -#define F77dgemm dgemm_ -#define F77dtrsm dtrsm_ - -#endif -/* - * --------------------------------------------------------------------- - * Typedef definitions and conversion utilities - * --------------------------------------------------------------------- - */ -#ifdef StringCrayStyle - -#include - /* Type of character argument in a FORTRAN call */ -#define F77_CHAR _fcd - /* Character conversion utilities */ -#define HPL_F2C_CHAR(c) (*(_fcdtocp(c) )) -#define HPL_C2F_CHAR(c) (_cptofcd(&(c), 1)) - -#define F77_CHAR_DECL F77_CHAR /* input CHARACTER*1 */ - -#endif -/* ------------------------------------------------------------------ */ -#ifdef StringStructVal - /* Type of character argument in a FORTRAN call */ -typedef struct { char *cp; F77_INTEGER len; } F77_CHAR; - /* Character conversion utilities */ -#define HPL_F2C_CHAR(c) (*(c.cp)) - -#define F77_CHAR_DECL F77_CHAR /* input CHARACTER*1 */ - -#endif -/* ------------------------------------------------------------------ */ -#ifdef StringStructPtr - /* Type of character argument in a FORTRAN call */ -typedef struct { char *cp; F77_INTEGER len; } F77_CHAR; - /* Character conversion utilities */ -#define HPL_F2C_CHAR(c) (*(c->cp)) - -#define F77_CHAR_DECL F77_CHAR * /* input CHARACTER*1 */ - -#endif -/* ------------------------------------------------------------------ */ -#ifdef StringSunStyle - /* Type of character argument in a FORTRAN call */ -#define F77_CHAR char * - /* Character conversion utilities */ -#define HPL_F2C_CHAR(c) (*(c)) -#define HPL_C2F_CHAR(c) (&(c)) - -#define F77_CHAR_DECL F77_CHAR /* input CHARACTER*1 */ -#define F77_1_CHAR , F77_INTEGER -#define F77_2_CHAR F77_1_CHAR F77_1_CHAR -#define F77_3_CHAR F77_2_CHAR F77_1_CHAR -#define F77_4_CHAR F77_3_CHAR F77_1_CHAR - -#endif -/* ------------------------------------------------------------------ */ - -#ifndef F77_1_CHAR -#define F77_1_CHAR -#define F77_2_CHAR -#define F77_3_CHAR -#define F77_4_CHAR -#endif - -#define F77_INT_DECL const F77_INTEGER * /* input integer */ -#define F77_SIN_DECL const double * /* input scalar */ -#define F77_VIN_DECL const double * /* input vector */ -#define F77_VINOUT_DECL double * /* input/output matrix */ -#define F77_MIN_DECL const double * /* input matrix */ -#define F77_MINOUT_DECL double * /* input/output matrix */ - -#ifdef CRAY_PVP_ENV /* Type of FORTRAN functions */ -#define F77_VOID_FUN extern fortran void /* subroutine */ -#define F77_INT_FUN extern fortran int /* integer function */ -#else -#define F77_VOID_FUN extern void /* subroutine */ -#define F77_INT_FUN extern int /* integer function */ -#endif -/* - * --------------------------------------------------------------------- - * Fortran 77 BLAS function prototypes - * --------------------------------------------------------------------- - */ -F77_VOID_FUN F77dswap -STDC_ARGS( -( F77_INT_DECL, F77_VINOUT_DECL, F77_INT_DECL, F77_VINOUT_DECL, - F77_INT_DECL ) ); -F77_VOID_FUN F77dscal -STDC_ARGS( -( F77_INT_DECL, F77_SIN_DECL, F77_VINOUT_DECL, F77_INT_DECL ) ); -F77_VOID_FUN F77dcopy -STDC_ARGS( -( F77_INT_DECL, F77_VIN_DECL, F77_INT_DECL, F77_VINOUT_DECL, - F77_INT_DECL ) ); -F77_VOID_FUN F77daxpy -STDC_ARGS( -( F77_INT_DECL, F77_SIN_DECL, F77_VIN_DECL, F77_INT_DECL, - F77_VINOUT_DECL, F77_INT_DECL ) ); -F77_INT_FUN F77idamax -STDC_ARGS( -( F77_INT_DECL, F77_VIN_DECL, F77_INT_DECL ) ); - -F77_VOID_FUN F77dgemv -STDC_ARGS( -( F77_CHAR_DECL, F77_INT_DECL, F77_INT_DECL, F77_SIN_DECL, - F77_MIN_DECL, F77_INT_DECL, F77_VIN_DECL, F77_INT_DECL, - F77_SIN_DECL, F77_VINOUT_DECL, F77_INT_DECL F77_1_CHAR ) ); -F77_VOID_FUN F77dger -STDC_ARGS( -( F77_INT_DECL, F77_INT_DECL, F77_SIN_DECL, F77_VIN_DECL, - F77_INT_DECL, F77_VIN_DECL, F77_INT_DECL, F77_MINOUT_DECL, - F77_INT_DECL ) ); -F77_VOID_FUN F77dtrsv -STDC_ARGS( -( F77_CHAR_DECL, F77_CHAR_DECL, F77_CHAR_DECL, F77_INT_DECL, - F77_MIN_DECL, F77_INT_DECL, F77_VINOUT_DECL, F77_INT_DECL - F77_3_CHAR ) ); - -F77_VOID_FUN F77dgemm -STDC_ARGS( -( F77_CHAR_DECL, F77_CHAR_DECL, F77_INT_DECL, F77_INT_DECL, - F77_INT_DECL, F77_SIN_DECL, F77_MIN_DECL, F77_INT_DECL, - F77_MIN_DECL, F77_INT_DECL, F77_SIN_DECL, F77_MINOUT_DECL, - F77_INT_DECL F77_2_CHAR ) ); -F77_VOID_FUN F77dtrsm -STDC_ARGS( -( F77_CHAR_DECL, F77_CHAR_DECL, F77_CHAR_DECL, F77_CHAR_DECL, - F77_INT_DECL, F77_INT_DECL, F77_SIN_DECL, F77_MIN_DECL, - F77_INT_DECL, F77_MINOUT_DECL, F77_INT_DECL F77_4_CHAR ) ); - -#endif -/* - * --------------------------------------------------------------------- - * HPL BLAS Function prototypes - * --------------------------------------------------------------------- - */ -#ifndef HPL_CALL_CBLAS - -int HPL_idamax -STDC_ARGS( ( - const int, - const double *, - const int -) ); -void HPL_daxpy -STDC_ARGS( ( - const int, - const double, - const double *, - const int, - double *, - const int -) ); -void HPL_dcopy -STDC_ARGS( ( - const int, - const double *, - const int, - double *, - const int -) ); -void HPL_dscal -STDC_ARGS( ( - const int, - const double, - double *, - const int -) ); -void HPL_dswap -STDC_ARGS( ( - const int, - double *, - const int, - double *, - const int -) ); -void HPL_dgemv -STDC_ARGS( ( - const enum HPL_ORDER, - const enum HPL_TRANS, - const int, - const int, - const double, - const double *, - const int, - const double *, - const int, - const double, - double *, - const int -) ); -void HPL_dger -STDC_ARGS( ( - const enum HPL_ORDER, - const int, - const int, - const double, - const double *, - const int, - double *, - const int, - double *, - const int -) ); -void HPL_dtrsv -STDC_ARGS( ( - const enum HPL_ORDER, - const enum HPL_UPLO, - const enum HPL_TRANS, - const enum HPL_DIAG, - const int, - const double *, - const int, - double *, - const int -) ); -void HPL_dgemm -STDC_ARGS( ( - const enum HPL_ORDER, - const enum HPL_TRANS, - const enum HPL_TRANS, - const int, - const int, - const int, - const double, - const double *, - const int, - const double *, - const int, - const double, - double *, - const int -) ); -void HPL_dtrsm -STDC_ARGS( ( - const enum HPL_ORDER, - const enum HPL_SIDE, - const enum HPL_UPLO, - const enum HPL_TRANS, - const enum HPL_DIAG, - const int, - const int, - const double, - const double *, - const int, - double *, - const int -) ); - -#endif - -#endif -/* - * hpl_blas.h - */ diff --git a/hpl/include/hpl_comm.h b/hpl/include/hpl_comm.h deleted file mode 100644 index 88b24abb7eb81b6373cd763584e50564a440d935..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_comm.h +++ /dev/null @@ -1,161 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_COMM_H -#define HPL_COMM_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_pmisc.h" -#include "hpl_panel.h" -/* - * --------------------------------------------------------------------- - * #typedefs and data structures - * --------------------------------------------------------------------- - */ -typedef enum -{ - HPL_1RING = 401, /* Increasing ring */ - HPL_1RING_M = 402, /* Increasing ring (modified) */ - HPL_2RING = 403, /* Increasing 2-ring */ - HPL_2RING_M = 404, /* Increasing 2-ring (modified) */ - HPL_BLONG = 405, /* long broadcast */ - HPL_BLONG_M = 406 /* long broadcast (modified) */ -} HPL_T_TOP; -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_FAILURE 0 -#define HPL_SUCCESS 1 -#define HPL_KEEP_TESTING 2 -/* - * --------------------------------------------------------------------- - * comm function prototypes - * --------------------------------------------------------------------- - */ -int HPL_send -STDC_ARGS( ( - double *, - int, - int, - int, - MPI_Comm -) ); -int HPL_recv -STDC_ARGS( ( - double *, - int, - int, - int, - MPI_Comm -) ); -int HPL_sdrv -STDC_ARGS( ( - double *, - int, - int, - double *, - int, - int, - int, - MPI_Comm -) ); -int HPL_binit -STDC_ARGS( ( - HPL_T_panel * -) ); -int HPL_bcast -STDC_ARGS( ( - HPL_T_panel *, - int * -) ); -int HPL_bwait -STDC_ARGS( ( - HPL_T_panel * -) ); -int HPL_packL -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int -) ); -void HPL_copyL -STDC_ARGS( ( - HPL_T_panel * -) ); - -int HPL_binit_1ring STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_1ring STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_1ring STDC_ARGS( ( HPL_T_panel * ) ); - -int HPL_binit_1rinM STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_1rinM STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_1rinM STDC_ARGS( ( HPL_T_panel * ) ); - -int HPL_binit_2ring STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_2ring STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_2ring STDC_ARGS( ( HPL_T_panel * ) ); - -int HPL_binit_2rinM STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_2rinM STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_2rinM STDC_ARGS( ( HPL_T_panel * ) ); - -int HPL_binit_blong STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_blong STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_blong STDC_ARGS( ( HPL_T_panel * ) ); - -int HPL_binit_blonM STDC_ARGS( ( HPL_T_panel * ) ); -int HPL_bcast_blonM STDC_ARGS( ( HPL_T_panel *, int * ) ); -int HPL_bwait_blonM STDC_ARGS( ( HPL_T_panel * ) ); - -#endif -/* - * End of hpl_comm.h - */ diff --git a/hpl/include/hpl_gesv.h b/hpl/include/hpl_gesv.h deleted file mode 100644 index 4baed1ade3ef6dba08cb6d4b804b2f940d9ec2aa..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_gesv.h +++ /dev/null @@ -1,87 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_GESV_H -#define HPL_GESV_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" -/* - * --------------------------------------------------------------------- - * #typedefs and data structures - * --------------------------------------------------------------------- - */ -typedef enum -{ - HPL_LEFT_LOOKING = 301, /* Left looking lu fact variant */ - HPL_CROUT = 302, /* Crout lu fact variant */ - HPL_RIGHT_LOOKING = 303 /* Right looking lu fact variant */ -} HPL_T_FACT; -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_dgesv -STDC_ARGS( -( const int, const int, const int, const HPL_T_FACT, - const HPL_T_FACT, const int, double *, - const int, int * ) ); -void HPL_ipid -STDC_ARGS( -( const int, double *, int *, int *, - int *, int *, int *, int *, - const int, const int, const int, const int, - const int ) ); - -#endif -/* - * End of hpl_gesv.h - */ diff --git a/hpl/include/hpl_grid.h b/hpl/include/hpl_grid.h deleted file mode 100644 index 420e5882295ddc23dc592647a371a0447a32b35d..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_grid.h +++ /dev/null @@ -1,212 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_GRID_H -#define HPL_GRID_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_pmisc.h" -/* - * --------------------------------------------------------------------- - * #typedefs and data structures - * --------------------------------------------------------------------- - */ -typedef enum { HPL_INT = 100, HPL_DOUBLE = 101 } HPL_T_TYPE; - -typedef enum -{ - HPL_ROW_MAJOR = 201, - HPL_COLUMN_MAJOR = 202 -} HPL_T_ORDER; - -typedef struct HPL_S_grid -{ - MPI_Comm all_comm; /* grid communicator */ - MPI_Comm row_comm; /* row communicator */ - MPI_Comm col_comm; /* column communicator */ - HPL_T_ORDER order; /* ordering of the procs in the grid */ - int iam; /* my rank in the grid */ - int myrow; /* my row number in the grid */ - int mycol; /* my column number in the grid */ - int nprow; /* the total # of rows in the grid */ - int npcol; /* the total # of columns in the grid */ - int nprocs; /* the total # of procs in the grid */ - int row_ip2; /* largest power of two <= nprow */ - int row_hdim; /* row_ip2 procs hypercube dimension */ - int row_ip2m1; /* largest power of two <= nprow-1 */ - int row_mask; /* row_ip2m1 procs hypercube mask */ - int col_ip2; /* largest power of two <= npcol */ - int col_hdim; /* col_ip2 procs hypercube dimension */ - int col_ip2m1; /* largest power of two <= npcol-1 */ - int col_mask; /* col_ip2m1 procs hypercube mask */ -} HPL_T_grid; - -/* - * --------------------------------------------------------------------- - * Data Structures - * --------------------------------------------------------------------- - */ -typedef void (*HPL_T_OP) -( const int, const void *, void *, const HPL_T_TYPE ); -/* - * --------------------------------------------------------------------- - * #define macros definitions - * --------------------------------------------------------------------- - */ -#define HPL_2_MPI_TYPE( typ ) \ - ( ( typ == HPL_INT ? MPI_INT : MPI_DOUBLE ) ) -/* - * The following macros perform common modulo operations; All functions - * except MPosMod assume arguments are < d (i.e., arguments are themsel- - * ves within modulo range). - */ - /* increment with mod */ -#define MModInc(I, d) if(++(I) == (d)) (I) = 0 - /* decrement with mod */ -#define MModDec(I, d) if(--(I) == -1) (I) = (d)-1 - /* positive modulo */ -#define MPosMod(I, d) ( (I) - ((I)/(d))*(d) ) - /* add two numbers */ -#define MModAdd(I1, I2, d) \ - ( ( (I1) + (I2) < (d) ) ? (I1) + (I2) : (I1) + (I2) - (d) ) - /* add 1 to # */ -#define MModAdd1(I, d) ( ((I) != (d)-1) ? (I) + 1 : 0 ) - /* subtract two numbers */ -#define MModSub(I1, I2, d) \ - ( ( (I1) < (I2) ) ? (d) + (I1) - (I2) : (I1) - (I2) ) - /* sub 1 from # */ -#define MModSub1(I, d) ( ((I)!=0) ? (I)-1 : (d)-1 ) -/* - * --------------------------------------------------------------------- - * grid function prototypes - * --------------------------------------------------------------------- - */ -int HPL_grid_init -STDC_ARGS( ( - MPI_Comm, - const HPL_T_ORDER, - const int, - const int, - HPL_T_grid * -) ); -int HPL_grid_exit -STDC_ARGS( ( - HPL_T_grid * -) ); - -int HPL_grid_info -STDC_ARGS( ( - const HPL_T_grid *, - int *, - int *, - int *, - int * -) ); -int HPL_pnum -STDC_ARGS( ( - const HPL_T_grid *, - const int, - const int -) ); - -int HPL_barrier -STDC_ARGS( ( - MPI_Comm -) ); -int HPL_broadcast -STDC_ARGS( ( - void *, - const int, - const HPL_T_TYPE, - const int, - MPI_Comm -) ); -int HPL_reduce -STDC_ARGS( ( - void *, - const int, - const HPL_T_TYPE, - const HPL_T_OP , - const int, - MPI_Comm -) ); -int HPL_all_reduce -STDC_ARGS( ( - void *, - const int, - const HPL_T_TYPE, - const HPL_T_OP , - MPI_Comm -) ); - -void HPL_max -STDC_ARGS( ( - const int, - const void *, - void *, - const HPL_T_TYPE -) ); -void HPL_min -STDC_ARGS( ( - const int, - const void *, - void *, - const HPL_T_TYPE -) ); -void HPL_sum -STDC_ARGS( ( - const int, - const void *, - void *, - const HPL_T_TYPE -) ); - -#endif -/* - * End of hpl_grid.h - */ diff --git a/hpl/include/hpl_matgen.h b/hpl/include/hpl_matgen.h deleted file mode 100644 index 47994e083493e08639cace532af6b84f891ef2cc..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_matgen.h +++ /dev/null @@ -1,120 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_MATGEN_H -#define HPL_MATGEN_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_MULT0 1284865837 -#define HPL_MULT1 1481765933 -#define HPL_IADD0 1 -#define HPL_IADD1 0 -#define HPL_DIVFAC 2147483648.0 -#define HPL_POW16 65536.0 -#define HPL_HALF 0.5 -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_dmatgen -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const int -) ); -void HPL_lmul -STDC_ARGS( ( - int *, - int *, - int * -) ); -void HPL_ladd -STDC_ARGS( ( - int *, - int *, - int * -) ); -void HPL_xjumpm -STDC_ARGS( ( - const int, - int *, - int *, - int *, - int *, - int *, - int * -) ); -void HPL_setran -STDC_ARGS( ( - const int, - int * -) ); -void HPL_jumpit -STDC_ARGS( ( - int *, - int *, - int *, - int * -) ); -double HPL_rand STDC_ARGS( ( void ) ); - -#endif -/* - * End of hpl_matgen.h - */ diff --git a/hpl/include/hpl_misc.h b/hpl/include/hpl_misc.h deleted file mode 100644 index deb225eeb94fdfa9b6cab2252710fc3b72f1c8f3..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_misc.h +++ /dev/null @@ -1,110 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_MISC_H -#define HPL_MISC_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#ifdef __STDC__ -#define STDC_HEADERS -#endif - -#include -#include -#include -#include - -#ifdef STDC_HEADERS -#include -#define STDC_ARGS(p) p -#else -#include -#define STDC_ARGS(p) () -#endif - -#ifdef HPL_CALL_VSIPL -#include -#endif -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_rone 1.0 -#define HPL_rtwo 2.0 -#define HPL_rzero 0.0 -/* - * --------------------------------------------------------------------- - * #define macros definitions - * --------------------------------------------------------------------- - */ -#define Mabs( a_ ) ( ( (a_) < 0 ) ? -(a_) : (a_) ) -#define Mmin( a_, b_ ) ( ( (a_) < (b_) ) ? (a_) : (b_) ) -#define Mmax( a_, b_ ) ( ( (a_) > (b_) ) ? (a_) : (b_) ) - -#define Mfloor(a,b) (((a)>0) ? (((a)/(b))) : (-(((-(a))+(b)-1)/(b)))) -#define Mceil(a,b) ( ( (a)+(b)-1 ) / (b) ) -#define Miceil(a,b) (((a)>0) ? ((((a)+(b)-1)/(b))) : (-((-(a))/(b)))) - -#define Mupcase(C) (((C)>96 && (C)<123) ? (C) & 0xDF : (C)) -#define Mlowcase(C) (((C)>64 && (C)< 91) ? (C) | 32 : (C)) -/* - * Mptr returns a pointer to a_( i_, j_ ) for readability reasons and - * also less silly errors ... - */ -#define Mptr( a_, i_, j_, lda_ ) \ - ( (a_) + (size_t)(i_) + (size_t)(j_)*(size_t)(lda_) ) -/* - * Align pointer - */ -#define HPL_PTR( ptr_, al_ ) \ - ( ( ( (size_t)(ptr_)+(al_)-1 ) / (al_) ) * (al_) ) -#endif -/* - * End of hpl_misc.h - */ diff --git a/hpl/include/hpl_panel.h b/hpl/include/hpl_panel.h deleted file mode 100644 index 653b69767eb17cfa1624675a7402e0e43b2c1ce2..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_panel.h +++ /dev/null @@ -1,147 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PANEL_H -#define HPL_PANEL_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_pmisc.h" -#include "hpl_grid.h" -/* - * --------------------------------------------------------------------- - * Data Structures - * --------------------------------------------------------------------- - */ -typedef struct HPL_S_panel -{ - struct HPL_S_grid * grid; /* ptr to the process grid */ - struct HPL_S_palg * algo; /* ptr to the algo parameters */ - struct HPL_S_pmat * pmat; /* ptr to the local array info */ - double * A; /* ptr to trailing part of A */ - double * WORK; /* work space */ - double * L2; /* ptr to L */ - double * L1; /* ptr to jb x jb upper block of A */ - double * DPIV; /* ptr to replicated jb pivot array */ - double * DINFO; /* ptr to replicated scalar info */ - double * U; /* ptr to U */ - int * IWORK; /* integer workspace for swapping */ - void * * * buffers[2]; /* buffers for panel bcast */ - int counts [2]; /* counts for panel bcast */ - MPI_Datatype dtypes [2]; /* data types for panel bcast */ - MPI_Request request[1]; /* requests for panel bcast */ - MPI_Status status [1]; /* status for panel bcast */ - int nb; /* distribution blocking factor */ - int jb; /* panel width */ - int m; /* global # of rows of trailing part of A */ - int n; /* global # of cols of trailing part of A */ - int ia; /* global row index of trailing part of A */ - int ja; /* global col index of trailing part of A */ - int mp; /* local # of rows of trailing part of A */ - int nq; /* local # of cols of trailing part of A */ - int ii; /* local row index of trailing part of A */ - int jj; /* local col index of trailing part of A */ - int lda; /* local leading dim of array A */ - int prow; /* proc. row owning 1st row of trail. A */ - int pcol; /* proc. col owning 1st col of trail. A */ - int msgid; /* message id for panel bcast */ - int ldl2; /* local leading dim of array L2 */ - int len; /* length of the buffer to broadcast */ -#ifdef HPL_CALL_VSIPL - vsip_block_d * Ablock; /* A block */ - vsip_block_d * L1block; /* L1 block */ - vsip_block_d * L2block; /* L2 block */ - vsip_block_d * Ublock; /* U block */ -#endif -} HPL_T_panel; - -/* - * --------------------------------------------------------------------- - * panel function prototypes - * --------------------------------------------------------------------- - */ -#include "hpl_pgesv.h" - -void HPL_pdpanel_new -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - const int, - const int, - const int, - HPL_T_pmat *, - const int, - const int, - const int, - HPL_T_panel * * -) ); -void HPL_pdpanel_init -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - const int, - const int, - const int, - HPL_T_pmat *, - const int, - const int, - const int, - HPL_T_panel * -) ); -int HPL_pdpanel_disp -STDC_ARGS( ( - HPL_T_panel * * -) ); -int HPL_pdpanel_free -STDC_ARGS( ( - HPL_T_panel * -) ); - -#endif -/* - * End of hpl_panel.h - */ diff --git a/hpl/include/hpl_pauxil.h b/hpl/include/hpl_pauxil.h deleted file mode 100644 index 7ff047ea0a4f5c1b79db8ecf93c2924e87704d56..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_pauxil.h +++ /dev/null @@ -1,505 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PAUXIL_H -#define HPL_PAUXIL_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" - -#include "hpl_pmisc.h" -#include "hpl_grid.h" -/* - * --------------------------------------------------------------------- - * #define macros definitions - * --------------------------------------------------------------------- - */ -/* - * Mindxg2p returns the process coodinate owning the entry globally in- - * dexed by ig_. - */ -#define Mindxg2p( ig_, inb_, nb_, proc_, src_, nprocs_ ) \ - { \ - if( ( (ig_) >= (inb_) ) && ( (src_) >= 0 ) && \ - ( (nprocs_) > 1 ) ) \ - { \ - proc_ = (src_) + 1 + ( (ig_)-(inb_) ) / (nb_); \ - proc_ -= ( proc_ / (nprocs_) ) * (nprocs_); \ - } \ - else \ - { \ - proc_ = (src_); \ - } \ - } - -#define Mindxg2l( il_, ig_, inb_, nb_, proc_, src_, nprocs_ ) \ - { \ - if( ( (ig_) < (inb_) ) || ( (src_) == -1 ) || \ - ( (nprocs_) == 1 ) ) { il_ = (ig_); } \ - else \ - { \ - int i__, j__; \ - j__ = ( i__ = ( (ig_)-(inb_) ) / (nb_) ) / (nprocs_); \ - il_ = (nb_)*( j__ - i__ ) + \ - ( (i__ + 1 - ( j__ + 1 ) * (nprocs_) ) ? \ - (ig_) - (inb_) : (ig_) ); \ - } \ - } - -#define Mindxg2lp( il_, proc_, ig_, inb_, nb_, src_, nprocs_ ) \ - { \ - if( ( (ig_) < (inb_) ) || ( (src_) == -1 ) || \ - ( (nprocs_) == 1 ) ) \ - { il_ = (ig_); proc_ = (src_); } \ - else \ - { \ - int i__, j__; \ - j__ = ( i__ = ( (ig_)-(inb_) ) / (nb_) ) / (nprocs_); \ - il_ = (nb_)*(j__-i__) + \ - ( ( i__ + 1 - ( j__ + 1 ) * (nprocs_) ) ? \ - (ig_) - (inb_) : (ig_) ); \ - proc_ = (src_) + 1 + i__; \ - proc_ -= ( proc_ / (nprocs_) ) * (nprocs_); \ - } \ - } -/* - * Mindxl2g computes the global index ig_ corresponding to the local - * index il_ in process proc_. - */ -#define Mindxl2g( ig_, il_, inb_, nb_, proc_, src_, nprocs_ ) \ - { \ - if( ( (src_) >= 0 ) && ( (nprocs_) > 1 ) ) \ - { \ - if( (proc_) == (src_) ) \ - { \ - if( (il_) < (inb_) ) ig_ = (il_); \ - else ig_ = (il_) + \ - (nb_)*((nprocs_)-1)*(((il_)-(inb_))/(nb_) + 1); \ - } \ - else if( (proc_) < (src_) ) \ - { \ - ig_ = (il_) + (inb_) + \ - (nb_)*( ((nprocs_)-1)*((il_)/(nb_)) + \ - (proc_)-(src_)-1+(nprocs_) ); \ - } \ - else \ - { \ - ig_ = (il_) + (inb_) + \ - (nb_)*( ((nprocs_)-1)*((il_)/(nb_)) + \ - (proc_)-(src_)-1 ); \ - } \ - } \ - else \ - { \ - ig_ = (il_); \ - } \ - } -/* - * MnumrocI computes the # of local indexes np_ residing in the process - * of coordinate proc_ corresponding to the interval of global indexes - * i_:i_+n_-1 assuming that the global index 0 resides in the process - * src_, and that the indexes are distributed from src_ using the para- - * meters inb_, nb_ and nprocs_. - */ -#define MnumrocI( np_, n_, i_, inb_, nb_, proc_, src_, nprocs_ ) \ - { \ - if( ( (src_) >= 0 ) && ( (nprocs_) > 1 ) ) \ - { \ - int inb__, mydist__, n__, nblk__, quot__, src__; \ - if( ( inb__ = (inb_) - (i_) ) <= 0 ) \ - { \ - nblk__ = (-inb__) / (nb_) + 1; \ - src__ = (src_) + nblk__; \ - src__ -= ( src__ / (nprocs_) ) * (nprocs_); \ - inb__ += nblk__*(nb_); \ - if( ( n__ = (n_) - inb__ ) <= 0 ) \ - { \ - if( (proc_) == src__ ) np_ = (n_); \ - else np_ = 0; \ - } \ - else \ - { \ - if( ( mydist__ = (proc_) - src__ ) < 0 ) \ - mydist__ += (nprocs_); \ - nblk__ = n__ / (nb_) + 1; \ - mydist__ -= nblk__ - \ - (quot__ = (nblk__ / (nprocs_))) * (nprocs_); \ - if( mydist__ < 0 ) \ - { \ - if( (proc_) != src__ ) \ - np_ = (nb_) + (nb_) * quot__; \ - else \ - np_ = inb__ + (nb_) * quot__; \ - } \ - else if( mydist__ > 0 ) \ - { \ - np_ = (nb_) * quot__; \ - } \ - else \ - { \ - if( (proc_) != src__ ) \ - np_ = n__ +(nb_)+(nb_)*(quot__ - nblk__); \ - else \ - np_ = (n_)+ (nb_)*(quot__ - nblk__); \ - } \ - } \ - } \ - else \ - { \ - if( ( n__ = (n_) - inb__ ) <= 0 ) \ - { \ - if( (proc_) == (src_) ) np_ = (n_); \ - else np_ = 0; \ - } \ - else \ - { \ - if( ( mydist__ = (proc_) - (src_) ) < 0 ) \ - mydist__ += (nprocs_); \ - nblk__ = n__ / (nb_) + 1; \ - mydist__ -= nblk__ - \ - ( quot__ = (nblk__ / (nprocs_)) )*(nprocs_); \ - if( mydist__ < 0 ) \ - { \ - if( (proc_) != (src_) ) \ - np_ = (nb_) + (nb_) * quot__; \ - else \ - np_ = inb__ + (nb_) * quot__; \ - } \ - else if( mydist__ > 0 ) \ - { \ - np_ = (nb_) * quot__; \ - } \ - else \ - { \ - if( (proc_) != (src_) ) \ - np_ = n__ +(nb_)+(nb_)*(quot__ - nblk__); \ - else \ - np_ = (n_)+ (nb_)*(quot__ - nblk__); \ - } \ - } \ - } \ - } \ - else \ - { \ - np_ = (n_); \ - } \ - } - -#define Mnumroc( np_, n_, inb_, nb_, proc_, src_, nprocs_ ) \ - MnumrocI( np_, n_, 0, inb_, nb_, proc_, src_, nprocs_ ) -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_indxg2lp -STDC_ARGS( ( - int *, - int *, - const int, - const int, - const int, - const int, - const int -) ); -int HPL_indxg2l -STDC_ARGS( ( - const int, - const int, - const int, - const int, - const int -) ); -int HPL_indxg2p -STDC_ARGS( ( - const int, - const int, - const int, - const int, - const int -) ); -int HPL_indxl2g -STDC_ARGS( ( - const int, - const int, - const int, - const int, - const int, - const int -) ); -void HPL_infog2l -STDC_ARGS( ( - int, - int, - const int, - const int, - const int, - const int, - const int, - const int, - const int, - const int, - const int, - const int, - int *, - int *, - int *, - int * -) ); -int HPL_numroc -STDC_ARGS( ( - const int, - const int, - const int, - const int, - const int, - const int -) ); -int HPL_numrocI -STDC_ARGS( ( - const int, - const int, - const int, - const int, - const int, - const int, - const int -) ); - -void HPL_dlaswp00N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const int * -) ); -void HPL_dlaswp10N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const int * -) ); -void HPL_dlaswp01N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp01T -STDC_ARGS( ( - const int, - const int, - double *, - const int, - double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp02N -STDC_ARGS( ( - const int, - const int, - const double *, - const int, - double *, - double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp03N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const double *, - const double *, - const int -) ); -void HPL_dlaswp03T -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const double *, - const double *, - const int -) ); -void HPL_dlaswp04N -STDC_ARGS( ( - const int, - const int, - const int, - double *, - const int, - double *, - const int, - const double *, - const double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp04T -STDC_ARGS( ( - const int, - const int, - const int, - double *, - const int, - double *, - const int, - const double *, - const double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp05N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp05T -STDC_ARGS( ( - const int, - const int, - double *, - const int, - const double *, - const int, - const int *, - const int * -) ); -void HPL_dlaswp06N -STDC_ARGS( ( - const int, - const int, - double *, - const int, - double *, - const int, - const int * -) ); -void HPL_dlaswp06T -STDC_ARGS( ( - const int, - const int, - double *, - const int, - double *, - const int, - const int * -) ); - -void HPL_pabort -STDC_ARGS( ( - int, - const char *, - const char *, - ... -) ); -void HPL_pwarn -STDC_ARGS( ( - FILE *, - int, - const char *, - const char *, - ... -) ); -void HPL_pdlaprnt -STDC_ARGS( ( - const HPL_T_grid *, - const int, - const int, - const int, - double *, - const int, - const int, - const int, - const char * -) ); -double HPL_pdlamch -STDC_ARGS( ( - MPI_Comm, - const HPL_T_MACH -) ); -double HPL_pdlange -STDC_ARGS( ( - const HPL_T_grid *, - const HPL_T_NORM, - const int, - const int, - const int, - const double *, - const int -) ); - -#endif -/* - * End of hpl_pauxil.h - */ diff --git a/hpl/include/hpl_pfact.h b/hpl/include/hpl_pfact.h deleted file mode 100644 index d6b8ad4609d77054b43e9d9211a30db84ebd9151..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_pfact.h +++ /dev/null @@ -1,216 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PFACT_H -#define HPL_PFACT_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_gesv.h" - -#include "hpl_pmisc.h" -#include "hpl_pauxil.h" -#include "hpl_panel.h" -/* - * --------------------------------------------------------------------- - * #typedefs and data structures - * --------------------------------------------------------------------- - */ -typedef void (*HPL_T_PFA_FUN) -( HPL_T_panel *, const int, const int, const int, - double * ); -typedef void (*HPL_T_RFA_FUN) -( HPL_T_panel *, const int, const int, const int, - double * ); -typedef void (*HPL_T_UPD_FUN) -( HPL_T_panel *, int *, HPL_T_panel *, const int ); -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_dlocmax -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); - -void HPL_dlocswpN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - double * -) ); -void HPL_dlocswpT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - double * -) ); -void HPL_pdmxswp -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); - -void HPL_pdpancrN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdpancrT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdpanllN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdpanllT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdpanrlN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdpanrlT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); - -void HPL_pdrpancrN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdrpancrT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdrpanllN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdrpanllT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdrpanrlN -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); -void HPL_pdrpanrlT -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int, - const int, - double * -) ); - -void HPL_pdfact -STDC_ARGS( ( - HPL_T_panel * -) ); - -#endif -/* - * End of hpl_pfact.h - */ diff --git a/hpl/include/hpl_pgesv.h b/hpl/include/hpl_pgesv.h deleted file mode 100644 index 516218f1d9f52a4f45b05616b87930d6ba8c47fa..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_pgesv.h +++ /dev/null @@ -1,346 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PGESV_H -#define HPL_PGESV_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" - -#include "hpl_pmisc.h" -#include "hpl_grid.h" -#include "hpl_comm.h" -#include "hpl_pauxil.h" -#include "hpl_panel.h" -#include "hpl_pfact.h" -/* - * --------------------------------------------------------------------- - * #typedefs and data structures - * --------------------------------------------------------------------- - */ -typedef enum -{ - HPL_SWAP00 = 451, /* Use HPL_pdlaswp00 */ - HPL_SWAP01 = 452, /* Use HPL_pdlaswp01 */ - HPL_SW_MIX = 453, /* Use HPL_pdlaswp00_ for small number of */ - /* columns, and HPL_pdlaswp01_ otherwise. */ - HPL_NO_SWP = 499 -} HPL_T_SWAP; - -typedef struct HPL_S_palg -{ - HPL_T_TOP btopo; /* row broadcast topology */ - int depth; /* look-ahead depth */ - int nbdiv; /* recursive division factor */ - int nbmin; /* recursion stopping criterium */ - HPL_T_FACT pfact; /* panel fact variant */ - HPL_T_FACT rfact; /* recursive fact variant */ - HPL_T_PFA_FUN pffun; /* panel fact function ptr */ - HPL_T_RFA_FUN rffun; /* recursive fact function ptr */ - HPL_T_UPD_FUN upfun; /* update function */ - HPL_T_SWAP fswap; /* Swapping algorithm */ - int fsthr; /* Swapping threshold */ - int equil; /* Equilibration */ - int align; /* data alignment constant */ -} HPL_T_palg; - -typedef struct HPL_S_pmat -{ -#ifdef HPL_CALL_VSIPL - vsip_block_d * block; -#endif - double * A; /* pointer to local piece of A */ - double * X; /* pointer to solution vector */ - int n; /* global problem size */ - int nb; /* blocking factor */ - int ld; /* local leading dimension */ - int mp; /* local number of rows */ - int nq; /* local number of columns */ - int info; /* computational flag */ -} HPL_T_pmat; -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define MSGID_BEGIN_PFACT 1001 /* message id ranges */ -#define MSGID_END_PFACT 2000 -#define MSGID_BEGIN_FACT 2001 -#define MSGID_END_FACT 3000 -#define MSGID_BEGIN_PTRSV 3001 -#define MSGID_END_PTRSV 4000 - -#define MSGID_BEGIN_COLL 9001 -#define MSGID_END_COLL 10000 -/* - * --------------------------------------------------------------------- - * #define macros definitions - * --------------------------------------------------------------------- - */ -#define MNxtMgid( id_, beg_, end_ ) \ - (( (id_)+1 > (end_) ? (beg_) : (id_)+1 )) -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_pipid -STDC_ARGS( ( - HPL_T_panel *, - int *, - int * -) ); -void HPL_plindx0 -STDC_ARGS( ( - HPL_T_panel *, - const int, - int *, - int *, - int *, - int * -) ); -void HPL_pdlaswp00N -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); -void HPL_pdlaswp00T -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); - -void HPL_perm -STDC_ARGS( ( - const int, - int *, - int *, - int * -) ); -void HPL_logsort -STDC_ARGS( ( - const int, - const int, - int *, - int *, - int * -) ); -void HPL_plindx10 -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int *, - int *, - int *, - int * -) ); -void HPL_plindx1 -STDC_ARGS( ( - HPL_T_panel *, - const int, - const int *, - int *, - int *, - int *, - int *, - int *, - int *, - int *, - int * -) ); -void HPL_spreadN -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const enum HPL_SIDE, - const int, - double *, - const int, - const int, - const int *, - const int *, - const int * -) ); -void HPL_spreadT -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const enum HPL_SIDE, - const int, - double *, - const int, - const int, - const int *, - const int *, - const int * -) ); -void HPL_equil -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const enum HPL_TRANS, - const int, - double *, - const int, - int *, - const int *, - const int *, - int * -) ); -void HPL_rollN -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int, - double *, - const int, - const int *, - const int *, - const int * -) ); -void HPL_rollT -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int, - double *, - const int, - const int *, - const int *, - const int * -) ); -void HPL_pdlaswp01N -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); -void HPL_pdlaswp01T -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); - -void HPL_pdupdateNN -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); -void HPL_pdupdateNT -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); -void HPL_pdupdateTN -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); -void HPL_pdupdateTT -STDC_ARGS( ( - HPL_T_panel *, - int *, - HPL_T_panel *, - const int -) ); - -void HPL_pdgesv0 -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - HPL_T_pmat * -) ); -void HPL_pdgesvK1 -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - HPL_T_pmat * -) ); -void HPL_pdgesvK2 -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - HPL_T_pmat * -) ); -void HPL_pdgesv -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_palg *, - HPL_T_pmat * -) ); - -void HPL_pdtrsv -STDC_ARGS( ( - HPL_T_grid *, - HPL_T_pmat * -) ); - -#endif -/* - * End of hpl_pgesv.h - */ diff --git a/hpl/include/hpl_pmatgen.h b/hpl/include/hpl_pmatgen.h deleted file mode 100644 index 4c649deb9729b9b44cfab781f3e0c43dae5ca4b4..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_pmatgen.h +++ /dev/null @@ -1,77 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PMATGEN_H -#define HPL_PMATGEN_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_matgen.h" - -#include "hpl_pmisc.h" -#include "hpl_pauxil.h" -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_pdmatgen -STDC_ARGS( ( - const HPL_T_grid *, - const int, - const int, - const int, - double *, - const int, - const int -) ); - -#endif -/* - * End of hpl_pmatgen.h - */ diff --git a/hpl/include/hpl_pmisc.h b/hpl/include/hpl_pmisc.h deleted file mode 100644 index 56ecffd0c3f4a5996107c08a18504503324567c5..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_pmisc.h +++ /dev/null @@ -1,59 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PMISC_H -#define HPL_PMISC_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "mpi.h" - -#endif -/* - * End of hpl_pmisc.h - */ diff --git a/hpl/include/hpl_ptest.h b/hpl/include/hpl_ptest.h deleted file mode 100644 index 1f41019c24e4e560cc93d646bc1b9778be79b124..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_ptest.h +++ /dev/null @@ -1,151 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PTEST_H -#define HPL_PTEST_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" -#include "hpl_gesv.h" - -#include "hpl_pmisc.h" -#include "hpl_pauxil.h" -#include "hpl_panel.h" -#include "hpl_pgesv.h" - -#include "hpl_ptimer.h" -#include "hpl_pmatgen.h" -/* - * --------------------------------------------------------------------- - * Data Structures - * --------------------------------------------------------------------- - */ -typedef struct HPL_S_test -{ - double epsil; /* epsilon machine */ - double thrsh; /* threshold */ - FILE * outfp; /* output stream (only in proc 0) */ - int kfail; /* # of tests failed */ - int kpass; /* # of tests passed */ - int kskip; /* # of tests skipped */ - int ktest; /* total number of tests */ -} HPL_T_test; - -/* - * --------------------------------------------------------------------- - * #define macro constants for testing only - * --------------------------------------------------------------------- - */ -#define HPL_LINE_MAX 256 -#define HPL_MAX_PARAM 20 -#define HPL_ISEED 100 -/* - * --------------------------------------------------------------------- - * global timers for timing analysis only - * --------------------------------------------------------------------- - */ -#ifdef HPL_DETAILED_TIMING -#define HPL_TIMING_BEG 11 /* timer 0 reserved, used by main */ -#define HPL_TIMING_N 6 /* number of timers defined below */ -#define HPL_TIMING_RPFACT 11 /* starting from here, contiguous */ -#define HPL_TIMING_PFACT 12 -#define HPL_TIMING_MXSWP 13 -#define HPL_TIMING_UPDATE 14 -#define HPL_TIMING_LASWP 15 -#define HPL_TIMING_PTRSV 16 -#endif -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_pdinfo -STDC_ARGS( ( - HPL_T_test *, - int *, - int *, - int *, - int *, - HPL_T_ORDER *, - int *, - int *, - int *, - int *, - HPL_T_FACT *, - int *, - int *, - int *, - int *, - int *, - HPL_T_FACT *, - int *, - HPL_T_TOP *, - int *, - int *, - HPL_T_SWAP *, - int *, - int *, - int *, - int *, - int * -) ); -void HPL_pdtest -STDC_ARGS( ( - HPL_T_test *, - HPL_T_grid *, - HPL_T_palg *, - const int, - const int -) ); - -#endif -/* - * End of hpl_ptest.h - */ diff --git a/hpl/include/hpl_ptimer.h b/hpl/include/hpl_ptimer.h deleted file mode 100644 index 2f1bec9a22de3c43325665c78c524c2ba287a1af..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_ptimer.h +++ /dev/null @@ -1,96 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_PTIMER_H -#define HPL_PTIMER_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_pmisc.h" -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_NPTIMER 64 -#define HPL_PTIMER_STARTFLAG 5.0 -#define HPL_PTIMER_ERROR -1.0 -/* - * --------------------------------------------------------------------- - * type definitions - * --------------------------------------------------------------------- - */ -typedef enum -{ HPL_WALL_PTIME = 101, HPL_CPU_PTIME = 102 } HPL_T_PTIME; - -typedef enum -{ HPL_AMAX_PTIME = 201, HPL_AMIN_PTIME = 202, HPL_SUM_PTIME = 203 } -HPL_T_PTIME_OP; -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -double HPL_ptimer_cputime STDC_ARGS( ( void ) ); -double HPL_ptimer_walltime STDC_ARGS( ( void ) ); - -void HPL_ptimer STDC_ARGS( ( const int ) ); -void HPL_ptimer_boot STDC_ARGS( ( void ) ); -void HPL_ptimer_combine -STDC_ARGS( -( MPI_Comm comm, const HPL_T_PTIME_OP, const HPL_T_PTIME, - const int, const int, double * ) ); -void HPL_ptimer_disable STDC_ARGS( ( void ) ); -void HPL_ptimer_enable STDC_ARGS( ( void ) ); -double HPL_ptimer_inquire -STDC_ARGS( -( const HPL_T_PTIME, const int ) ); - -#endif -/* - * End of hpl_ptimer.h - */ diff --git a/hpl/include/hpl_test.h b/hpl/include/hpl_test.h deleted file mode 100644 index a2f4a42d22a28cff08f37a2221f9cf872f1450cb..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_test.h +++ /dev/null @@ -1,80 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_TEST_H -#define HPL_TEST_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -#include "hpl_blas.h" -#include "hpl_auxil.h" -#include "hpl_gesv.h" - -#include "hpl_matgen.h" -#include "hpl_timer.h" -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_dinfo -STDC_ARGS( -( FILE * *, int *, int *, int *, - HPL_T_FACT *, int *, int *, int *, - int *, int *, HPL_T_FACT *, int *, - double *, double * ) ); -void HPL_dtest -STDC_ARGS( -( FILE *, const int, const int, const int, - HPL_T_FACT, HPL_T_FACT, const int, const double, - const double, int *, int *, int * ) ); - -#endif -/* - * End of hpl_test.h - */ diff --git a/hpl/include/hpl_timer.h b/hpl/include/hpl_timer.h deleted file mode 100644 index 97d228b513917f409742b346cee2af5fc8318f77..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_timer.h +++ /dev/null @@ -1,88 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_TIMER_H -#define HPL_TIMER_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_misc.h" -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_NTIMER 64 -#define HPL_TIMER_STARTFLAG 5.0 -#define HPL_TIMER_ERROR -1.0 -/* - * --------------------------------------------------------------------- - * type definitions - * --------------------------------------------------------------------- - */ -typedef enum -{ HPL_WALL_TIME = 101, HPL_CPU_TIME = 102 } HPL_T_TIME; -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -double HPL_timer_cputime STDC_ARGS( ( void ) ); -double HPL_timer_walltime STDC_ARGS( ( void ) ); - -void HPL_timer STDC_ARGS( ( const int ) ); -void HPL_timer_boot STDC_ARGS( ( void ) ); -void HPL_timer_enable STDC_ARGS( ( void ) ); -void HPL_timer_disable STDC_ARGS( ( void ) ); -double HPL_timer_inquire -STDC_ARGS( -( const HPL_T_TIME, const int ) ); - -#endif -/* - * End of hpl_timer.h - */ diff --git a/hpl/include/hpl_units.h b/hpl/include/hpl_units.h deleted file mode 100644 index 0a15610d52732d516dcb4147cf88a7cb2e36bb80..0000000000000000000000000000000000000000 --- a/hpl/include/hpl_units.h +++ /dev/null @@ -1,135 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ -#ifndef HPL_UNITS_H -#define HPL_UNITS_H -/* - * --------------------------------------------------------------------- - * Include files - * --------------------------------------------------------------------- - */ -#include "hpl_pmisc.h" -#include "hpl_pauxil.h" -/* - * --------------------------------------------------------------------- - * #define macro constants - * --------------------------------------------------------------------- - */ -#define HPL_MAXROUT 50 -#define HPL_MAXRNAME 15 - -#define HPL_TRUE 'T' -#define HPL_FALSE 'F' - -#define HPL_INDXG2P_ROUT "HPL_indxg2p" -#define HPL_INDXG2L_ROUT "HPL_indxg2l" -#define HPL_INDXL2G_ROUT "HPL_indxl2g" -#define HPL_NUMROC_ROUT "HPL_numroc" -#define HPL_NUMROCI_ROUT "HPL_numrocI" -/* - * --------------------------------------------------------------------- - * Function prototypes - * --------------------------------------------------------------------- - */ -void HPL_unit_info -STDC_ARGS( -( FILE * *, int *, int *, int *, - int *, int *, int *, int *, - int *, int *, int *, char [][HPL_MAXRNAME], - int [] ) ); - -void HPL_unit_indxg2l -STDC_ARGS( -( FILE *, const int, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); -int HPL_chek_indxg2l -STDC_ARGS( -( FILE *, const char *, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); - -void HPL_unit_indxl2g -STDC_ARGS( -( FILE *, const int, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); -int HPL_chek_indxl2g -STDC_ARGS( -( FILE *, const char *, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); - -void HPL_unit_indxg2p -STDC_ARGS( -( FILE *, const int, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); -int HPL_chek_indxg2p -STDC_ARGS( -( FILE *, const char *, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); - -void HPL_unit_numroc -STDC_ARGS( -( FILE *, const int, const int, const int, - const int, const int, const int, const int, - const int, long *, long * ) ); -void HPL_unit_numrocI -STDC_ARGS( -( FILE *, const int, const int, const int, - const int, const int, const int, const int, - const int, const int, long *, long * ) ); -int HPL_chek_numrocI -STDC_ARGS( -( FILE *, const char *, const int, const int, - const int, const int, const int, const int, - const int, const int, long *, long * ) ); - -#endif -/* - * End of hpl_units.h - */ diff --git a/hpl/makes/Make.auxil b/hpl/makes/Make.auxil deleted file mode 100644 index bb6b1cd68995bfb5e66b9440341ce02315934d94..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.auxil +++ /dev/null @@ -1,100 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h -# -## Object files ######################################################## -# -HPL_au0obj = \ - HPL_dlacpy.o HPL_dlatcpy.o HPL_fprintf.o \ - HPL_warn.o HPL_abort.o HPL_dlaprnt.o \ - HPL_dlange.o -HPL_au1obj = \ - HPL_dlamch.o -HPL_auxobj = \ - $(HPL_au0obj) $(HPL_au1obj) -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_auxobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_auxobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dlacpy.o : ../HPL_dlacpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlacpy.c -HPL_dlatcpy.o : ../HPL_dlatcpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlatcpy.c -HPL_fprintf.o : ../HPL_fprintf.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_fprintf.c -HPL_warn.o : ../HPL_warn.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_warn.c -HPL_abort.o : ../HPL_abort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_abort.c -HPL_dlaprnt.o : ../HPL_dlaprnt.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaprnt.c -HPL_dlange.o : ../HPL_dlange.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlange.c -HPL_dlamch.o : ../HPL_dlamch.c $(INCdep) - $(CC) -o $@ -c $(CCNOOPT) ../HPL_dlamch.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.blas b/hpl/makes/Make.blas deleted file mode 100644 index e0dce425071d4277a56c6087b9fb344455e7284f..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.blas +++ /dev/null @@ -1,98 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h -# -## Object files ######################################################## -# -HPL_blaobj = \ - HPL_dcopy.o HPL_daxpy.o HPL_dscal.o \ - HPL_idamax.o HPL_dgemv.o HPL_dtrsv.o \ - HPL_dger.o HPL_dgemm.o HPL_dtrsm.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_blaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_blaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dcopy.o : ../HPL_dcopy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dcopy.c -HPL_daxpy.o : ../HPL_daxpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_daxpy.c -HPL_dscal.o : ../HPL_dscal.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dscal.c -HPL_idamax.o : ../HPL_idamax.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_idamax.c -HPL_dgemv.o : ../HPL_dgemv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dgemv.c -HPL_dtrsv.o : ../HPL_dtrsv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dtrsv.c -HPL_dger.o : ../HPL_dger.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dger.c -HPL_dgemm.o : ../HPL_dgemm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dgemm.c -HPL_dtrsm.o : ../HPL_dtrsm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dtrsm.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.comm b/hpl/makes/Make.comm deleted file mode 100644 index bef4904ac620dd9e2779f23d2d981a97a083ce24..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.comm +++ /dev/null @@ -1,111 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h \ - $(INCdir)/hpl_panel.h $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_comobj = \ - HPL_1ring.o HPL_1rinM.o HPL_2ring.o \ - HPL_2rinM.o HPL_blong.o HPL_blonM.o \ - HPL_packL.o HPL_copyL.o HPL_binit.o \ - HPL_bcast.o HPL_bwait.o HPL_send.o \ - HPL_recv.o HPL_sdrv.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_comobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_comobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_1ring.o : ../HPL_1ring.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_1ring.c -HPL_1rinM.o : ../HPL_1rinM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_1rinM.c -HPL_2ring.o : ../HPL_2ring.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_2ring.c -HPL_2rinM.o : ../HPL_2rinM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_2rinM.c -HPL_blong.o : ../HPL_blong.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_blong.c -HPL_blonM.o : ../HPL_blonM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_blonM.c -HPL_packL.o : ../HPL_packL.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_packL.c -HPL_copyL.o : ../HPL_copyL.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_copyL.c -HPL_binit.o : ../HPL_binit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_binit.c -HPL_bcast.o : ../HPL_bcast.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_bcast.c -HPL_bwait.o : ../HPL_bwait.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_bwait.c -HPL_send.o : ../HPL_send.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_send.c -HPL_recv.o : ../HPL_recv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_recv.c -HPL_sdrv.o : ../HPL_sdrv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_sdrv.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.gesv b/hpl/makes/Make.gesv deleted file mode 100644 index 82b4bb747a8f4403a00d7e33c7379eb453ad8ee2..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.gesv +++ /dev/null @@ -1,83 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_gesv.h -# -## Object files ######################################################## -# -HPL_gesobj = \ - HPL_dgesv.o HPL_ipid.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_gesobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_gesobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dgesv.o : ../HPL_dgesv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dgesv.c -HPL_ipid.o : ../HPL_ipid.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ipid.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.grid b/hpl/makes/Make.grid deleted file mode 100644 index 4a493f49d7f6c84552fd2112ad95fc1f4e0924b3..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.grid +++ /dev/null @@ -1,103 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h -# -## Object files ######################################################## -# -HPL_griobj = \ - HPL_grid_init.o HPL_pnum.o HPL_grid_info.o \ - HPL_grid_exit.o HPL_broadcast.o HPL_reduce.o \ - HPL_all_reduce.o HPL_barrier.o HPL_min.o \ - HPL_max.o HPL_sum.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_griobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_griobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_grid_init.o : ../HPL_grid_init.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_init.c -HPL_pnum.o : ../HPL_pnum.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pnum.c -HPL_grid_info.o : ../HPL_grid_info.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_info.c -HPL_grid_exit.o : ../HPL_grid_exit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_exit.c -HPL_broadcast.o : ../HPL_broadcast.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_broadcast.c -HPL_reduce.o : ../HPL_reduce.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_reduce.c -HPL_all_reduce.o : ../HPL_all_reduce.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_all_reduce.c -HPL_barrier.o : ../HPL_barrier.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_barrier.c -HPL_min.o : ../HPL_min.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_min.c -HPL_max.o : ../HPL_max.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_max.c -HPL_sum.o : ../HPL_sum.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_sum.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.matgen b/hpl/makes/Make.matgen deleted file mode 100644 index 0aaac9b3409800f89ae473f651239d923ac621d4..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.matgen +++ /dev/null @@ -1,95 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_matgen.h -# -## Object files ######################################################## -# -HPL_matobj = \ - HPL_dmatgen.o HPL_ladd.o HPL_lmul.o \ - HPL_xjumpm.o HPL_jumpit.o HPL_rand.o \ - HPL_setran.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_matobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_matobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dmatgen.o : ../HPL_dmatgen.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dmatgen.c -HPL_ladd.o : ../HPL_ladd.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ladd.c -HPL_lmul.o : ../HPL_lmul.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_lmul.c -HPL_xjumpm.o : ../HPL_xjumpm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_xjumpm.c -HPL_jumpit.o : ../HPL_jumpit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_jumpit.c -HPL_rand.o : ../HPL_rand.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rand.c -HPL_setran.o : ../HPL_setran.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_setran.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.panel b/hpl/makes/Make.panel deleted file mode 100644 index 92a4fa5fa87057d2afa92f6981570e0ec90afa61..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.panel +++ /dev/null @@ -1,90 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_comm.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_panel.h $(INCdir)/hpl_pfact.h \ - $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_panobj = \ - HPL_pdpanel_new.o HPL_pdpanel_init.o HPL_pdpanel_disp.o \ - HPL_pdpanel_free.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_panobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_panobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pdpanel_new.o : ../HPL_pdpanel_new.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_new.c -HPL_pdpanel_init.o : ../HPL_pdpanel_init.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_init.c -HPL_pdpanel_disp.o : ../HPL_pdpanel_disp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_disp.c -HPL_pdpanel_free.o : ../HPL_pdpanel_free.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_free.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.pauxil b/hpl/makes/Make.pauxil deleted file mode 100644 index 69bb9125dcd435c7f087dd0ffb619377879b52b6..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.pauxil +++ /dev/null @@ -1,137 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_pauxil.h -# -## Object files ######################################################## -# -HPL_pauobj = \ - HPL_indxg2l.o HPL_indxg2lp.o HPL_indxg2p.o \ - HPL_indxl2g.o HPL_infog2l.o HPL_numroc.o \ - HPL_numrocI.o HPL_dlaswp00N.o HPL_dlaswp10N.o \ - HPL_dlaswp01N.o HPL_dlaswp01T.o HPL_dlaswp02N.o \ - HPL_dlaswp03N.o HPL_dlaswp03T.o HPL_dlaswp04N.o \ - HPL_dlaswp04T.o HPL_dlaswp05N.o HPL_dlaswp05T.o \ - HPL_dlaswp06N.o HPL_dlaswp06T.o HPL_pwarn.o \ - HPL_pabort.o HPL_pdlaprnt.o HPL_pdlamch.o \ - HPL_pdlange.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pauobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pauobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_indxg2l.o : ../HPL_indxg2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2l.c -HPL_indxg2lp.o : ../HPL_indxg2lp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2lp.c -HPL_indxg2p.o : ../HPL_indxg2p.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2p.c -HPL_indxl2g.o : ../HPL_indxl2g.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxl2g.c -HPL_infog2l.o : ../HPL_infog2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_infog2l.c -HPL_numroc.o : ../HPL_numroc.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_numroc.c -HPL_numrocI.o : ../HPL_numrocI.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_numrocI.c -HPL_dlaswp00N.o : ../HPL_dlaswp00N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp00N.c -HPL_dlaswp10N.o : ../HPL_dlaswp10N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp10N.c -HPL_dlaswp01N.o : ../HPL_dlaswp01N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp01N.c -HPL_dlaswp01T.o : ../HPL_dlaswp01T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp01T.c -HPL_dlaswp02N.o : ../HPL_dlaswp02N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp02N.c -HPL_dlaswp03N.o : ../HPL_dlaswp03N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp03N.c -HPL_dlaswp03T.o : ../HPL_dlaswp03T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp03T.c -HPL_dlaswp04N.o : ../HPL_dlaswp04N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp04N.c -HPL_dlaswp04T.o : ../HPL_dlaswp04T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp04T.c -HPL_dlaswp05N.o : ../HPL_dlaswp05N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp05N.c -HPL_dlaswp05T.o : ../HPL_dlaswp05T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp05T.c -HPL_dlaswp06N.o : ../HPL_dlaswp06N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp06N.c -HPL_dlaswp06T.o : ../HPL_dlaswp06T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp06T.c -HPL_pwarn.o : ../HPL_pwarn.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pwarn.c -HPL_pabort.o : ../HPL_pabort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pabort.c -HPL_pdlaprnt.o : ../HPL_pdlaprnt.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaprnt.c -HPL_pdlamch.o : ../HPL_pdlamch.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlamch.c -HPL_pdlange.o : ../HPL_pdlange.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlange.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.pfact b/hpl/makes/Make.pfact deleted file mode 100644 index 8d3f5e0847dcc94b745aa52b8a5724acc38d575a..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.pfact +++ /dev/null @@ -1,118 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_pfact.h -# -## Object files ######################################################## -# -HPL_pfaobj = \ - HPL_dlocmax.o HPL_dlocswpN.o HPL_dlocswpT.o \ - HPL_pdmxswp.o HPL_pdpancrN.o HPL_pdpancrT.o \ - HPL_pdpanllN.o HPL_pdpanllT.o HPL_pdpanrlN.o \ - HPL_pdpanrlT.o HPL_pdrpanllN.o HPL_pdrpanllT.o \ - HPL_pdrpancrN.o HPL_pdrpancrT.o HPL_pdrpanrlN.o \ - HPL_pdrpanrlT.o HPL_pdfact.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pfaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pfaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dlocmax.o : ../HPL_dlocmax.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocmax.c -HPL_dlocswpN.o : ../HPL_dlocswpN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocswpN.c -HPL_dlocswpT.o : ../HPL_dlocswpT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocswpT.c -HPL_pdmxswp.o : ../HPL_pdmxswp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdmxswp.c -HPL_pdpancrN.o : ../HPL_pdpancrN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpancrN.c -HPL_pdpancrT.o : ../HPL_pdpancrT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpancrT.c -HPL_pdpanllN.o : ../HPL_pdpanllN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanllN.c -HPL_pdpanllT.o : ../HPL_pdpanllT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanllT.c -HPL_pdpanrlN.o : ../HPL_pdpanrlN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanrlN.c -HPL_pdpanrlT.o : ../HPL_pdpanrlT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanrlT.c -HPL_pdrpanllN.o : ../HPL_pdrpanllN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanllN.c -HPL_pdrpanllT.o : ../HPL_pdrpanllT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanllT.c -HPL_pdrpancrN.o : ../HPL_pdrpancrN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpancrN.c -HPL_pdrpancrT.o : ../HPL_pdrpancrT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpancrT.c -HPL_pdrpanrlN.o : ../HPL_pdrpanrlN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanrlN.c -HPL_pdrpanrlT.o : ../HPL_pdrpanrlT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanrlT.c -HPL_pdfact.o : ../HPL_pdfact.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdfact.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.pgesv b/hpl/makes/Make.pgesv deleted file mode 100644 index 1e9d40c5d126efee91034dc91cc71872d18cd115..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.pgesv +++ /dev/null @@ -1,136 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_comm.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_panel.h $(INCdir)/hpl_pfact.h \ - $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_pgeobj = \ - HPL_pipid.o HPL_plindx0.o HPL_pdlaswp00N.o \ - HPL_pdlaswp00T.o HPL_perm.o HPL_logsort.o \ - HPL_plindx10.o HPL_plindx1.o HPL_spreadN.o \ - HPL_spreadT.o HPL_rollN.o HPL_rollT.o \ - HPL_equil.o HPL_pdlaswp01N.o HPL_pdlaswp01T.o \ - HPL_pdupdateNN.o HPL_pdupdateNT.o HPL_pdupdateTN.o \ - HPL_pdupdateTT.o HPL_pdtrsv.o HPL_pdgesv0.o \ - HPL_pdgesvK1.o HPL_pdgesvK2.o HPL_pdgesv.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pgeobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pgeobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pipid.o : ../HPL_pipid.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pipid.c -HPL_plindx0.o : ../HPL_plindx0.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx0.c -HPL_pdlaswp00N.o : ../HPL_pdlaswp00N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp00N.c -HPL_pdlaswp00T.o : ../HPL_pdlaswp00T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp00T.c -HPL_perm.o : ../HPL_perm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_perm.c -HPL_logsort.o : ../HPL_logsort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_logsort.c -HPL_plindx10.o : ../HPL_plindx10.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx10.c -HPL_plindx1.o : ../HPL_plindx1.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx1.c -HPL_spreadN.o : ../HPL_spreadN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_spreadN.c -HPL_spreadT.o : ../HPL_spreadT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_spreadT.c -HPL_rollN.o : ../HPL_rollN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rollN.c -HPL_rollT.o : ../HPL_rollT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rollT.c -HPL_equil.o : ../HPL_equil.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_equil.c -HPL_pdlaswp01N.o : ../HPL_pdlaswp01N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp01N.c -HPL_pdlaswp01T.o : ../HPL_pdlaswp01T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp01T.c -HPL_pdupdateNN.o : ../HPL_pdupdateNN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateNN.c -HPL_pdupdateNT.o : ../HPL_pdupdateNT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateNT.c -HPL_pdupdateTN.o : ../HPL_pdupdateTN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateTN.c -HPL_pdupdateTT.o : ../HPL_pdupdateTT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateTT.c -HPL_pdtrsv.o : ../HPL_pdtrsv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdtrsv.c -HPL_pdgesv0.o : ../HPL_pdgesv0.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesv0.c -HPL_pdgesvK1.o : ../HPL_pdgesvK1.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesvK1.c -HPL_pdgesvK2.o : ../HPL_pdgesvK2.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesvK2.c -HPL_pdgesv.o : ../HPL_pdgesv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesv.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.pmatgen b/hpl/makes/Make.pmatgen deleted file mode 100644 index 75f0c07cbe637eaaa2d98fa29556c1228501a3dc..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.pmatgen +++ /dev/null @@ -1,81 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_matgen.h $(INCdir)/hpl_pmisc.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_pmatgen.h -# -## Object files ######################################################## -# -HPL_pmaobj = \ - HPL_pdmatgen.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pmaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pmaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pdmatgen.o : ../HPL_pdmatgen.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdmatgen.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.ptest b/hpl/makes/Make.ptest deleted file mode 100644 index f9de98381367eed44d7cc3ac50f45e28ce5cb11e..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.ptest +++ /dev/null @@ -1,94 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_gesv.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_pauxil.h \ - $(INCdir)/hpl_panel.h $(INCdir)/hpl_pgesv.h $(INCdir)/hpl_pmatgen.h \ - $(INCdir)/hpl_ptimer.h $(INCdir)/hpl_ptest.h -# -## Executable names #################################################### -# -xhpl = $(BINdir)/xhpl -# -## Object files ######################################################## -# -HPL_pteobj = \ - HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o -# -## Targets ############################################################# -# -all : dexe -# -dexe : dexe.grd -# -$(BINdir)/HPL.dat : ../HPL.dat - ( $(CP) ../HPL.dat $(BINdir) ) -# -dexe.grd: $(HPL_pteobj) $(HPLlib) - $(LINKER) $(LINKFLAGS) -o $(xhpl) $(HPL_pteobj) $(HPL_LIBS) - $(MAKE) $(BINdir)/HPL.dat - $(TOUCH) dexe.grd -# -# ###################################################################### -# -HPL_pddriver.o : ../HPL_pddriver.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pddriver.c -HPL_pdinfo.o : ../HPL_pdinfo.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdinfo.c -HPL_pdtest.o : ../HPL_pdtest.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdtest.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.ptimer b/hpl/makes/Make.ptimer deleted file mode 100644 index 012b65989b7a31780185547e3dc907a98067ae28..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.ptimer +++ /dev/null @@ -1,84 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_ptimer.h -# -## Object files ######################################################## -# -HPL_ptiobj = \ - HPL_ptimer.o HPL_ptimer_cputime.o HPL_ptimer_walltime.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_ptiobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_ptiobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_ptimer.o : ../HPL_ptimer.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer.c -HPL_ptimer_cputime.o : ../HPL_ptimer_cputime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer_cputime.c -HPL_ptimer_walltime.o : ../HPL_ptimer_walltime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer_walltime.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.test b/hpl/makes/Make.test deleted file mode 100644 index a08bc01b49db1597d6f615cdbbd337b677d1a998..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.test +++ /dev/null @@ -1,93 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_gesv.h $(INCdir)/hpl_matgen.h $(INCdir)/hpl_timer.h \ - $(INCdir)/hpl_test.h -# -## Executable names #################################################### -# -xlinpack = $(BINdir)/xlinpack -# -## Object files ######################################################## -# -HPL_tesobj = \ - HPL_ddriver.o HPL_dinfo.o HPL_dtest.o -# -## Targets ############################################################# -# -all : dexe -# -dexe : dexe.grd -# -$(BINdir)/LINPACK.dat : ../LINPACK.dat - ( $(CP) ../LINPACK.dat $(BINdir) ) -# -dexe.grd: $(HPL_tesobj) $(HPLlib) - $(LINKER) $(LINKFLAGS) -o $(xlinpack) $(HPL_tesobj) HPL_make_libs - $(MAKE) $(BINdir)/LINPACK.dat - $(TOUCH) dexe.grd -# -# ###################################################################### -# -HPL_ddriver.o : ../HPL_ddriver.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ddriver.c -HPL_dinfo.o : ../HPL_dinfo.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dinfo.c -HPL_dtest.o : ../HPL_dtest.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dtest.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.timer b/hpl/makes/Make.timer deleted file mode 100644 index b684ab1bc7cfa59683de0ae2b26d949994eb9a7f..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.timer +++ /dev/null @@ -1,84 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_timer.h -# -## Object files ######################################################## -# -HPL_timobj = \ - HPL_timer.o HPL_timer_cputime.o HPL_timer_walltime.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_timobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_timobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_timer.o : ../HPL_timer.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer.c -HPL_timer_cputime.o : ../HPL_timer_cputime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer_cputime.c -HPL_timer_walltime.o : ../HPL_timer_walltime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer_walltime.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/makes/Make.units b/hpl/makes/Make.units deleted file mode 100644 index ab50f3d7beae31a77a8259508b0ea0568bccf950..0000000000000000000000000000000000000000 --- a/hpl/makes/Make.units +++ /dev/null @@ -1,112 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ -@rout Make.units - $(INCdir)/hpl_misc.h $(INCdir)/hpl_auxil.h $(INCdir)/hpl_pmisc.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_units.h -# -## Executable names #################################################### -# -xunits = $(BINdir)/xunits -# -## Object files ######################################################## -# -HPL_uniobj = \ - HPL_unit_driver.o HPL_unit_info.o HPL_unit_indxg2l.o \ - HPL_chek_indxg2l.o HPL_unit_indxg2p.o HPL_chek_indxg2p.o \ - HPL_unit_indxl2g.o HPL_chek_indxl2g.o HPL_unit_numroc.o \ - HPL_unit_numrocI.o HPL_chek_numrocI.o -# -## Targets ############################################################# -# -all : dexe -# -dexe : dexe.grd -# -$(BINdir)/UNITS.dat : ../UNITS.dat - ( $(CP) ../UNITS.dat $(BINdir) ) -# -dexe.grd : $(HPL_uniobj) $(HPLlib) - $(LINKER) $(LINKFLAGS) -o $(xunits) $(HPL_uniobj) @(hpllibs) - $(MAKE) $(BINdir)/UNITS.dat - $(TOUCH) dexe.grd -# -# ###################################################################### -# -HPL_unit_driver.o : ../HPL_unit_driver.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_driver.c -HPL_unit_info.o : ../HPL_unit_info.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_info.c -HPL_unit_indxg2l.o : ../HPL_unit_indxg2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_indxg2l.c -HPL_chek_indxg2l.o : ../HPL_chek_indxg2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_chek_indxg2l.c -HPL_unit_indxg2p.o : ../HPL_unit_indxg2p.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_indxg2p.c -HPL_chek_indxg2p.o : ../HPL_chek_indxg2p.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_chek_indxg2p.c -HPL_unit_indxl2g.o : ../HPL_unit_indxl2g.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_indxl2g.c -HPL_chek_indxl2g.o : ../HPL_chek_indxl2g.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_chek_indxl2g.c -HPL_unit_numroc.o : ../HPL_unit_numroc.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_numroc.c -HPL_unit_numrocI.o : ../HPL_unit_numrocI.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_unit_numrocI.c -HPL_chek_numrocI.o : ../HPL_chek_numrocI.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_chek_numrocI.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/man/man3/HPL_abort.3 b/hpl/man/man3/HPL_abort.3 deleted file mode 100644 index 5afa3c7a8609347b6e8c4f9452b4fbb1cf5d5a8b..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_abort.3 +++ /dev/null @@ -1,52 +0,0 @@ -.TH HPL_abort 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_abort \- halts execution. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_abort(\fR -\fB\&int\fR -\fI\&LINE\fR, -\fB\&const char *\fR -\fI\&SRNAME\fR, -\fB\&const char *\fR -\fI\&FORM\fR, -\fB\&...\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_abort\fR -displays an error message on stderr and halts execution. -.SH ARGUMENTS -.TP 8 -LINE (local input) int -On entry, LINE specifies the line number in the file where -the error has occured. When LINE is not a positive line -number, it is ignored. -.TP 8 -SRNAME (local input) const char * -On entry, SRNAME should be the name of the routine calling -this error handler. -.TP 8 -FORM (local input) const char * -On entry, FORM specifies the format, i.e., how the subsequent -arguments are converted for output. -.TP 8 - (local input) ... -On entry, ... is the list of arguments to be printed within -the format string. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - HPL_abort( __LINE__, __FILE__, "Halt.\en" ); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_fprintf \ (3), -.BR HPL_warn \ (3). diff --git a/hpl/man/man3/HPL_all_reduce.3 b/hpl/man/man3/HPL_all_reduce.3 deleted file mode 100644 index ae10e2da02569c56c1acd1f953c1232b7c507385..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_all_reduce.3 +++ /dev/null @@ -1,49 +0,0 @@ -.TH HPL_all_reduce 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_all_reduce \- All reduce operation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_all_reduce(\fR -\fB\&void *\fR -\fI\&BUFFER\fR, -\fB\&const int\fR -\fI\&COUNT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR, -\fB\&const HPL_T_OP \fR -\fI\&OP\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_all_reduce\fR -performs a global reduce operation across all -processes of a group leaving the results on all processes. -.SH ARGUMENTS -.TP 8 -BUFFER (local input/global out void * -On entry, BUFFER points to the buffer to be combined. On -exit, this array contains the combined data and is identical -on all processes in the group. -.TP 8 -COUNT (global input) const int -On entry, COUNT indicates the number of entries in BUFFER. -COUNT must be at least zero. -.TP 8 -DTYPE (global input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.TP 8 -OP (global input) const HPL_T_OP -On entry, OP is a pointer to the local combine function. -.TP 8 -COMM (global/local input) MPI_Comm -The MPI communicator identifying the process collection. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_min \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_barrier.3 b/hpl/man/man3/HPL_barrier.3 deleted file mode 100644 index ddf8413df34f26e09a304b34c290c3c60210bc22..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_barrier.3 +++ /dev/null @@ -1,27 +0,0 @@ -.TH HPL_barrier 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_barrier \- Barrier operation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_barrier(\fR -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_barrier\fR -blocks the caller until all process members have call it. -The call returns at any process only after all group members have -entered the call. -.SH ARGUMENTS -.TP 8 -COMM (global/local input) MPI_Comm -The MPI communicator identifying the process collection. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_reduce \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_min \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_bcast.3 b/hpl/man/man3/HPL_bcast.3 deleted file mode 100644 index df462553b5cb8910d37ad81c1699c350e361d40e..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_bcast.3 +++ /dev/null @@ -1,31 +0,0 @@ -.TH HPL_bcast 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_bcast \- Perform the row broadcast. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_bcast(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&int *\fR -\fI\&IFLAG\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_bcast\fR -broadcasts the current panel. Successful completion is -indicated by IFLAG set to HPL_SUCCESS on return. IFLAG will be set to -HPL_FAILURE on failure and to HPL_KEEP_TESTING when the operation was -not completed, in which case this function should be called again. -.SH ARGUMENTS -.TP 8 -PANEL (input/output) HPL_T_panel * -On entry, PANEL points to the current panel data structure -being broadcast. -.TP 8 -IFLAG (output) int * -On exit, IFLAG indicates whether or not the broadcast has -occured. -.SH SEE ALSO -.BR HPL_binit \ (3), -.BR HPL_bwait \ (3). diff --git a/hpl/man/man3/HPL_binit.3 b/hpl/man/man3/HPL_binit.3 deleted file mode 100644 index 66ddfe5d6caae807721c6fc5f368137e460f2674..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_binit.3 +++ /dev/null @@ -1,23 +0,0 @@ -.TH HPL_binit 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_binit \- Initialize the row broadcast. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_binit(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_binit\fR -initializes a row broadcast. Successful completion is -indicated by the returned error code HPL_SUCCESS. -.SH ARGUMENTS -.TP 8 -PANEL (input/output) HPL_T_panel * -On entry, PANEL points to the current panel data structure -being broadcast. -.SH SEE ALSO -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3). diff --git a/hpl/man/man3/HPL_broadcast.3 b/hpl/man/man3/HPL_broadcast.3 deleted file mode 100644 index c8e1649d7b6d924f19f420c9781f9dfabb25bbb0..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_broadcast.3 +++ /dev/null @@ -1,49 +0,0 @@ -.TH HPL_broadcast 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_broadcast \- Broadcast operation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_broadcast(\fR -\fB\&void *\fR -\fI\&BUFFER\fR, -\fB\&const int\fR -\fI\&COUNT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR, -\fB\&const int\fR -\fI\&ROOT\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_broadcast\fR -broadcasts a message from the process with rank ROOT to -all processes in the group. -.SH ARGUMENTS -.TP 8 -BUFFER (local input/output) void * -On entry, BUFFER points to the buffer to be broadcast. On -exit, this array contains the broadcast data and is identical -on all processes in the group. -.TP 8 -COUNT (global input) const int -On entry, COUNT indicates the number of entries in BUFFER. -COUNT must be at least zero. -.TP 8 -DTYPE (global input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.TP 8 -ROOT (global input) const int -On entry, ROOT is the coordinate of the source process. -.TP 8 -COMM (global/local input) MPI_Comm -The MPI communicator identifying the process collection. -.SH SEE ALSO -.BR HPL_reduce \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_min \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_bwait.3 b/hpl/man/man3/HPL_bwait.3 deleted file mode 100644 index 5f8c1bd23c0c3dfb81f0e317ab60c0c58e6ea7ca..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_bwait.3 +++ /dev/null @@ -1,24 +0,0 @@ -.TH HPL_bwait 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_bwait \- Finalize the row broadcast. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_bwait(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_bwait\fR -HPL_bwait waits for the row broadcast of the current panel to -terminate. Successful completion is indicated by the returned error -code HPL_SUCCESS. -.SH ARGUMENTS -.TP 8 -PANEL (input/output) HPL_T_panel * -On entry, PANEL points to the current panel data structure -being broadcast. -.SH SEE ALSO -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3). diff --git a/hpl/man/man3/HPL_copyL.3 b/hpl/man/man3/HPL_copyL.3 deleted file mode 100644 index 566e46d78388eb12a69edb25288ae6e058511007..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_copyL.3 +++ /dev/null @@ -1,28 +0,0 @@ -.TH HPL_copyL 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_copyL \- Copy the current panel into a contiguous workspace. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_copyL(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_copyL\fR -copies the panel of columns, the L1 replicated submatrix, -the pivot array and the info scalar into a contiguous workspace for -later broadcast. - -The copy of this panel into a contiguous buffer can be enforced by -specifying -DHPL_COPY_L in the architecture specific Makefile. -.SH ARGUMENTS -.TP 8 -PANEL (input/output) HPL_T_panel * -On entry, PANEL points to the current panel data structure -being broadcast. -.SH SEE ALSO -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3). diff --git a/hpl/man/man3/HPL_daxpy.3 b/hpl/man/man3/HPL_daxpy.3 deleted file mode 100644 index 75e8ddc509da903d8128802e668db250728e0579..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_daxpy.3 +++ /dev/null @@ -1,76 +0,0 @@ -.TH HPL_daxpy 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_daxpy \- y := y + alpha * x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_daxpy(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&const double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR, -\fB\&double *\fR -\fI\&Y\fR, -\fB\&const int\fR -\fI\&INCY\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_daxpy\fR -scales the vector x by alpha and adds it to y. -.SH ARGUMENTS -.TP 8 -N (local input) const int -On entry, N specifies the length of the vectors x and y. N -must be at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero, then the entries of the incremented array X -need not be set on input. -.TP 8 -X (local input) const double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.TP 8 -Y (local input/output) double * -On entry, Y is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. -On exit, the entries of the incremented array Y are updated -with the scaled entries of the incremented array X. -.TP 8 -INCY (local input) const int -On entry, INCY specifies the increment for the elements of Y. -INCY must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double x[3], y[3]; -.br - x[0] = 1.0; x[1] = 2.0; x[2] = 3.0; -.br - y[0] = 4.0; y[1] = 5.0; y[2] = 6.0; -.br - HPL_daxpy( 3, 2.0, x, 1, y, 1 ); -.br - printf("y=[%f,%f,%f]\en", y[0], y[1], y[2]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dcopy \ (3), -.BR HPL_dscal \ (3), -.BR HPL_dswap \ (3). diff --git a/hpl/man/man3/HPL_dcopy.3 b/hpl/man/man3/HPL_dcopy.3 deleted file mode 100644 index b498f3d0cb369ef389675d1b50f824dacea37efb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dcopy.3 +++ /dev/null @@ -1,69 +0,0 @@ -.TH HPL_dcopy 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dcopy \- y := x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dcopy(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR, -\fB\&double *\fR -\fI\&Y\fR, -\fB\&const int\fR -\fI\&INCY\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dcopy\fR -copies the vector x into the vector y. -.SH ARGUMENTS -.TP 8 -N (local input) const int -On entry, N specifies the length of the vectors x and y. N -must be at least zero. -.TP 8 -X (local input) const double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.TP 8 -Y (local input/output) double * -On entry, Y is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. -On exit, the entries of the incremented array Y are updated -with the entries of the incremented array X. -.TP 8 -INCY (local input) const int -On entry, INCY specifies the increment for the elements of Y. -INCY must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double x[3], y[3]; -.br - x[0] = 1.0; x[1] = 2.0; x[2] = 3.0; -.br - y[0] = 4.0; y[1] = 5.0; y[2] = 6.0; -.br - HPL_dcopy( 3, x, 1, y, 1 ); -.br - printf("y=[%f,%f,%f]\en", y[0], y[1], y[2]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_daxpy \ (3), -.BR HPL_dscal \ (3), -.BR HPL_dswap \ (3). diff --git a/hpl/man/man3/HPL_dgemm.3 b/hpl/man/man3/HPL_dgemm.3 deleted file mode 100644 index b3e702be56aa8b5c372a1c73fd05bac7c28d2d9a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dgemm.3 +++ /dev/null @@ -1,160 +0,0 @@ -.TH HPL_dgemm 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dgemm \- C := alpha * op(A) * op(B) + beta * C. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dgemm(\fR -\fB\&const enum HPL_ORDER\fR -\fI\&ORDER\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANSA\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANSB\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&K\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&B\fR, -\fB\&const int\fR -\fI\&LDB\fR, -\fB\&const double\fR -\fI\&BETA\fR, -\fB\&double *\fR -\fI\&C\fR, -\fB\&const int\fR -\fI\&LDC\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dgemm\fR -performs one of the matrix-matrix operations - - C := alpha * op( A ) * op( B ) + beta * C - - where op( X ) is one of - - op( X ) = X or op( X ) = X^T. - -Alpha and beta are scalars, and A, B and C are matrices, with op(A) -an m by k matrix, op(B) a k by n matrix and C an m by n matrix. -.SH ARGUMENTS -.TP 8 -ORDER (local input) const enum HPL_ORDER -On entry, ORDER specifies the storage format of the operands -as follows: - ORDER = HplRowMajor, - ORDER = HplColumnMajor. -.TP 8 -TRANSA (local input) const enum HPL_TRANS -On entry, TRANSA specifies the form of op(A) to be used in -the matrix-matrix operation follows: - TRANSA==HplNoTrans : op( A ) = A, - TRANSA==HplTrans : op( A ) = A^T, - TRANSA==HplConjTrans : op( A ) = A^T. -.TP 8 -TRANSB (local input) const enum HPL_TRANS -On entry, TRANSB specifies the form of op(B) to be used in -the matrix-matrix operation follows: - TRANSB==HplNoTrans : op( B ) = B, - TRANSB==HplTrans : op( B ) = B^T, - TRANSB==HplConjTrans : op( B ) = B^T. -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the matrix -op(A) and of the matrix C. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the matrix -op(B) and the number of columns of the matrix C. N must be -at least zero. -.TP 8 -K (local input) const int -On entry, K specifies the number of columns of the matrix -op(A) and the number of rows of the matrix op(B). K must be -be at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero then the elements of the matrices A and B -need not be set on input. -.TP 8 -A (local input) const double * -On entry, A is an array of dimension (LDA,ka), where ka is -k when TRANSA==HplNoTrans, and is m otherwise. Before -entry with TRANSA==HplNoTrans, the leading m by k part of -the array A must contain the matrix A, otherwise the leading -k by m part of the array A must contain the matrix A. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the first dimension of A as declared -in the calling (sub) program. When TRANSA==HplNoTrans then -LDA must be at least max(1,m), otherwise LDA must be at least -max(1,k). -.TP 8 -B (local input) const double * -On entry, B is an array of dimension (LDB,kb), where kb is -n when TRANSB==HplNoTrans, and is k otherwise. Before -entry with TRANSB==HplNoTrans, the leading k by n part of -the array B must contain the matrix B, otherwise the leading -n by k part of the array B must contain the matrix B. -.TP 8 -LDB (local input) const int -On entry, LDB specifies the first dimension of B as declared -in the calling (sub) program. When TRANSB==HplNoTrans then -LDB must be at least max(1,k), otherwise LDB must be at least -max(1,n). -.TP 8 -BETA (local input) const double -On entry, BETA specifies the scalar beta. When BETA is -supplied as zero then the elements of the matrix C need -not be set on input. -.TP 8 -C (local input/output) double * -On entry, C is an array of dimension (LDC,n). Before entry, -the leading m by n part of the array C must contain the -matrix C, except when beta is zero, in which case C need not -be set on entry. On exit, the array C is overwritten by the -m by n matrix ( alpha*op( A )*op( B ) + beta*C ). -.TP 8 -LDC (local input) const int -On entry, LDC specifies the first dimension of C as declared -in the calling (sub) program. LDC must be at least -max(1,m). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], b[2*2], c[2*2]; -.br - a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0; -.br - b[0] = 2.0; b[1] = 1.0; b[2] = 1.0; b[3] = 2.0; -.br - c[0] = 4.0; c[1] = 3.0; c[2] = 2.0; c[3] = 1.0; -.br - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, -.br - 2, 2, 2, 2.0, a, 2, b, 2, -1.0, c, 2 ); -.br - printf(" [%f,%f]\en", c[0], c[2]); -.br - printf("c=[%f,%f]\en", c[1], c[3]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dtrsm \ (3). diff --git a/hpl/man/man3/HPL_dgemv.3 b/hpl/man/man3/HPL_dgemv.3 deleted file mode 100644 index 69407664d66ba6dea43fdfa7ddab02e437585b93..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dgemv.3 +++ /dev/null @@ -1,128 +0,0 @@ -.TH HPL_dgemv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dgemv \- y := beta * y + alpha * op(A) * x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dgemv(\fR -\fB\&const enum HPL_ORDER\fR -\fI\&ORDER\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANS\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR, -\fB\&const double\fR -\fI\&BETA\fR, -\fB\&double *\fR -\fI\&Y\fR, -\fB\&const int\fR -\fI\&INCY\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dgemv\fR -performs one of the matrix-vector operations - - y := alpha * op( A ) * x + beta * y, - - where op( X ) is one of - - op( X ) = X or op( X ) = X^T. - -where alpha and beta are scalars, x and y are vectors and A is an m -by n matrix. -.SH ARGUMENTS -.TP 8 -ORDER (local input) const enum HPL_ORDER -On entry, ORDER specifies the storage format of the operands -as follows: - ORDER = HplRowMajor, - ORDER = HplColumnMajor. -.TP 8 -TRANS (local input) const enum HPL_TRANS -On entry, TRANS specifies the operation to be performed as -follows: - TRANS = HplNoTrans y := alpha*A *x + beta*y, - TRANS = HplTrans y := alpha*A^T*x + beta*y. -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero then A and X need not be set on input. -.TP 8 -A (local input) const double * -On entry, A points to an array of size equal to or greater -than LDA * n. Before entry, the leading m by n part of the -array A must contain the matrix coefficients. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of A as -declared in the calling (sub) program. LDA must be at -least MAX(1,m). -.TP 8 -X (local input) const double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.TP 8 -BETA (local input) const double -On entry, BETA specifies the scalar beta. When ALPHA is -supplied as zero then Y need not be set on input. -.TP 8 -Y (local input/output) double * -On entry, Y is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. -Before entry with BETA non-zero, the incremented array Y must -contain the vector y. On exit, Y is overwritten by the -updated vector y. -.TP 8 -INCY (local input) const int -On entry, INCY specifies the increment for the elements of Y. -INCY must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], x[2], y[2]; -.br - a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0; -.br - x[0] = 2.0; x[1] = 1.0; y[2] = 1.0; y[3] = 2.0; -.br - HPL_dgemv( HplColumnMajor, HplNoTrans, 2, 2, 2.0, -.br - a, 2, x, 1, -1.0, y, 1 ); -.br - printf("y=[%f,%f]\en", y[0], y[1]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dger \ (3), -.BR HPL_dtrsv \ (3). diff --git a/hpl/man/man3/HPL_dger.3 b/hpl/man/man3/HPL_dger.3 deleted file mode 100644 index 18d60138238d3d6d19068e9e50b2ca4fbe697a49..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dger.3 +++ /dev/null @@ -1,108 +0,0 @@ -.TH HPL_dger 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dger \- A := alpha * x * y^T + A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dger(\fR -\fB\&const enum HPL_ORDER\fR -\fI\&ORDER\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&const double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR, -\fB\&double *\fR -\fI\&Y\fR, -\fB\&const int\fR -\fI\&INCY\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dger\fR -performs the rank 1 operation - - A := alpha * x * y^T + A, - -where alpha is a scalar, x is an m-element vector, y is an n-element -vector and A is an m by n matrix. -.SH ARGUMENTS -.TP 8 -ORDER (local input) const enum HPL_ORDER -On entry, ORDER specifies the storage format of the operands -as follows: - ORDER = HplRowMajor, - ORDER = HplColumnMajor. -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero then X and Y need not be set on input. -.TP 8 -X (local input) const double * -On entry, X is an incremented array of dimension at least -( 1 + ( m - 1 ) * abs( INCX ) ) that contains the vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.TP 8 -Y (local input) double * -On entry, Y is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. -.TP 8 -INCY (local input) const int -On entry, INCY specifies the increment for the elements of Y. -INCY must not be zero. -.TP 8 -A (local input/output) double * -On entry, A points to an array of size equal to or greater -than LDA * n. Before entry, the leading m by n part of the -array A must contain the matrix coefficients. On exit, A is -overwritten by the updated matrix. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of A as -declared in the calling (sub) program. LDA must be at -least MAX(1,m). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], x[2], y[2]; -.br - a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0; -.br - x[0] = 2.0; x[1] = 1.0; y[2] = 1.0; y[3] = 2.0; -.br - HPL_dger( HplColumnMajor, 2, 2, 2.0, x, 1, y, 1, -.br - a, 2 ); -.br - printf("y=[%f,%f]\en", y[0], y[1]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dgemv \ (3), -.BR HPL_dtrsv \ (3). diff --git a/hpl/man/man3/HPL_dlacpy.3 b/hpl/man/man3/HPL_dlacpy.3 deleted file mode 100644 index f7b7386645def46e92cc8a60b0223e99260bcedb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlacpy.3 +++ /dev/null @@ -1,72 +0,0 @@ -.TH HPL_dlacpy 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlacpy \- B := A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlacpy(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&B\fR, -\fB\&const int\fR -\fI\&LDB\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlacpy\fR -copies an array A into an array B. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the arrays A and -B. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the arrays A -and B. N must be at least zero. -.TP 8 -A (local input) const double * -On entry, A points to an array of dimension (LDA,N). -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -B (local output) double * -On entry, B points to an array of dimension (LDB,N). On exit, -B is overwritten with A. -.TP 8 -LDB (local input) const int -On entry, LDB specifies the leading dimension of the array B. -LDB must be at least MAX(1,M). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], b[2*2]; -.br - a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0; -.br - HPL_dlacpy( 2, 2, a, 2, b, 2 ); -.br - printf(" [%f,%f]\en", b[0], b[2]); -.br - printf("b=[%f,%f]\en", b[1], b[3]); -.br - exit(0); -.br - return(0); -.br -} -.SH SEE ALSO -.BR HPL_dlatcpy \ (3). diff --git a/hpl/man/man3/HPL_dlamch.3 b/hpl/man/man3/HPL_dlamch.3 deleted file mode 100644 index c702bab4cb86fa14cbda0f4617e86165bce33ab8..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlamch.3 +++ /dev/null @@ -1,76 +0,0 @@ -.TH HPL_dlamch 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlamch \- determines machine-specific arithmetic constants. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_dlamch(\fR -\fB\&const HPL_T_MACH\fR -\fI\&CMACH\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlamch\fR -determines machine-specific arithmetic constants such as -the relative machine precision (eps), the safe minimum (sfmin) such -that 1 / sfmin does not overflow, the base of the machine (base), the -precision (prec), the number of (base) digits in the mantissa (t), -whether rounding occurs in addition (rnd=1.0 and 0.0 otherwise), the -minimum exponent before (gradual) underflow (emin), the underflow -threshold (rmin) base**(emin-1), the largest exponent before overflow -(emax), the overflow threshold (rmax) (base**emax)*(1-eps). -.SH ARGUMENTS -.TP 8 -CMACH (local input) const HPL_T_MACH -Specifies the value to be returned by HPL_dlamch - = HPL_MACH_EPS, HPL_dlamch := eps (default) - = HPL_MACH_SFMIN, HPL_dlamch := sfmin - = HPL_MACH_BASE, HPL_dlamch := base - = HPL_MACH_PREC, HPL_dlamch := eps*base - = HPL_MACH_MLEN, HPL_dlamch := t - = HPL_MACH_RND, HPL_dlamch := rnd - = HPL_MACH_EMIN, HPL_dlamch := emin - = HPL_MACH_RMIN, HPL_dlamch := rmin - = HPL_MACH_EMAX, HPL_dlamch := emax - = HPL_MACH_RMAX, HPL_dlamch := rmax - -where - - eps = relative machine precision, - sfmin = safe minimum, - base = base of the machine, - prec = eps*base, - t = number of digits in the mantissa, - rnd = 1.0 if rounding occurs in addition, - emin = minimum exponent before underflow, - rmin = underflow threshold, - emax = largest exponent before overflow, - rmax = overflow threshold. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double eps; -.br - eps = HPL_dlamch( HPL_MACH_EPS ); -.br - printf("eps=%18.8e\en", eps); -.br - exit(0); return(0); -.br -} -.SH REFERENCES -This function has been manually translated from the Fortran 77 LAPACK -auxiliary function dlamch.f (version 2.0 -- 1992), that was itself -based on the function ENVRON by Malcolm and incorporated suggestions -by Gentleman and Marovich. See - -Malcolm M. A., Algorithms to reveal properties of floating-point -arithmetic., Comms. of the ACM, 15, 949-951 (1972). - -Gentleman W. M. and Marovich S. B., More on algorithms that reveal -properties of floating point arithmetic units., Comms. of the ACM, -17, 276-277 (1974). diff --git a/hpl/man/man3/HPL_dlange.3 b/hpl/man/man3/HPL_dlange.3 deleted file mode 100644 index f71d0bd26631cd470afaadbc2dc8ab045ec0ef27..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlange.3 +++ /dev/null @@ -1,73 +0,0 @@ -.TH HPL_dlange 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlange \- Compute ||A||. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_dlange(\fR -\fB\&const HPL_T_NORM\fR -\fI\&NORM\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlange\fR -returns the value of the one norm, or the infinity norm, -or the element of largest absolute value of a matrix A: - - max(abs(A(i,j))) when NORM = HPL_NORM_A, - norm1(A), when NORM = HPL_NORM_1, - normI(A), when NORM = HPL_NORM_I, - -where norm1 denotes the one norm of a matrix (maximum column sum) and -normI denotes the infinity norm of a matrix (maximum row sum). Note -that max(abs(A(i,j))) is not a matrix norm. -.SH ARGUMENTS -.TP 8 -NORM (local input) const HPL_T_NORM -On entry, NORM specifies the value to be returned by this -function as described above. -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -A (local input) const double * -On entry, A points to an array of dimension (LDA,N), that -contains the matrix A. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,M). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2]; -.br - a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0; -.br - norm = HPL_dlange( HPL_NORM_I, 2, 2, a, 2 ); -.br - printf("norm=%f\en", norm); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dlaprnt \ (3), -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_dlaprnt.3 b/hpl/man/man3/HPL_dlaprnt.3 deleted file mode 100644 index 4e8dd4be318c832defa7d2371476696835d3c76a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaprnt.3 +++ /dev/null @@ -1,70 +0,0 @@ -.TH HPL_dlaprnt 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaprnt \- Print the matrix A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaprnt(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&IA\fR, -\fB\&const int\fR -\fI\&JA\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const char *\fR -\fI\&CMATNM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaprnt\fR -prints to standard error an M-by-N matrix A. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A. M must be at -least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of A. N must be -at least zero. -.TP 8 -A (local input) double * -On entry, A points to an array of dimension (LDA,N). -.TP 8 -IA (local input) const int -On entry, IA specifies the starting row index to be printed. -.TP 8 -JA (local input) const int -On entry, JA specifies the starting column index to be -printed. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,M). -.TP 8 -CMATNM (local input) const char * -On entry, CMATNM is the name of the matrix to be printed. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2]; -.br - a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0; -.br - HPL_dlaprnt( 2, 2, a, 0, 0, 2, "A" ); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_dlaswp00N.3 b/hpl/man/man3/HPL_dlaswp00N.3 deleted file mode 100644 index b3b2d0340e70dde1c63fdec9f3b6be64978e5bd3..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp00N.3 +++ /dev/null @@ -1,60 +0,0 @@ -.TH HPL_dlaswp00N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp00N \- performs a series of row interchanges. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp00N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const int *\fR -\fI\&IPIV\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp00N\fR -performs a series of local row interchanges on a matrix -A. One row interchange is initiated for rows 0 through M-1 of A. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the array A to be -interchanged. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the array A. -N must be at least zero. -.TP 8 -A (local input/output) double * -On entry, A points to an array of dimension (LDA,N) to which -the row interchanges will be applied. On exit, the permuted -matrix. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -IPIV (local input) const int * -On entry, IPIV is an array of size M that contains the -pivoting information. For k in [0..M), IPIV[k]=IROFF + l -implies that local rows k and l are to be interchanged. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp01N.3 b/hpl/man/man3/HPL_dlaswp01N.3 deleted file mode 100644 index 07050294e089d608a835e5d66bfbbd8917c5a8ec..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp01N.3 +++ /dev/null @@ -1,88 +0,0 @@ -.TH HPL_dlaswp01N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp01N \- copies rows of A into itself and into U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp01N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp01N\fR -copies scattered rows of A into itself and into an -array U. The row offsets in A of the source rows are specified by -LINDXA. The destination of those rows are specified by LINDXAU. A -positive value of LINDXAU indicates that the array destination is U, -and A otherwise. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A that should be -moved within A or copied into U. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of rows of A that should be -moved within A or copied into U. N must be at least zero. -.TP 8 -A (local input/output) double * -On entry, A points to an array of dimension (LDA,N). The rows -of this array specified by LINDXA should be moved within A or -copied into U. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,N). The rows -of A specified by LINDXA are be copied within this array U at -the positions indicated by positive values of LINDXAU. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,M). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be moved within A or -or copied into U. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M that contains -the local row indexes of U where the rows of A should be -copied at. This array also contains the local row offsets in -A where some of the rows of A should be moved to. A positive -value of LINDXAU[i] indicates that the row LINDXA[i] of A -should be copied into U at the position LINDXAU[i]; otherwise -the row LINDXA[i] of A should be moved at the position --LINDXAU[i] within A. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp01T.3 b/hpl/man/man3/HPL_dlaswp01T.3 deleted file mode 100644 index dbebf259f59415aff1e55e9738510ad6409c0064..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp01T.3 +++ /dev/null @@ -1,89 +0,0 @@ -.TH HPL_dlaswp01T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp01T \- copies rows of A into itself and into U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp01T(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp01T\fR -copies scattered rows of A into itself and into an -array U. The row offsets in A of the source rows are specified by -LINDXA. The destination of those rows are specified by LINDXAU. A -positive value of LINDXAU indicates that the array destination is U, -and A otherwise. Rows of A are stored as columns in U. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A that should be -moved within A or copied into U. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of rows of A that should be -moved within A or copied into U. N must be at least zero. -.TP 8 -A (local input/output) double * -On entry, A points to an array of dimension (LDA,N). The rows -of this array specified by LINDXA should be moved within A or -copied into U. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,M). The rows -of A specified by LINDXA are copied within this array U at -the positions indicated by positive values of LINDXAU. The -rows of A are stored as columns in U. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,N). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be moved within A or -or copied into U. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M that contains -the local row indexes of U where the rows of A should be -copied at. This array also contains the local row offsets in -A where some of the rows of A should be moved to. A positive -value of LINDXAU[i] indicates that the row LINDXA[i] of A -should be copied into U at the position LINDXAU[i]; otherwise -the row LINDXA[i] of A should be moved at the position --LINDXAU[i] within A. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp02N.3 b/hpl/man/man3/HPL_dlaswp02N.3 deleted file mode 100644 index d4725ad7e09f6601fc3acca7c4261323d5bfebca..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp02N.3 +++ /dev/null @@ -1,85 +0,0 @@ -.TH HPL_dlaswp02N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp02N \- pack rows of A into columns of W. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp02N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&W0\fR, -\fB\&double *\fR -\fI\&W\fR, -\fB\&const int\fR -\fI\&LDW\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp02N\fR -packs scattered rows of an array A into workspace W. -The row offsets in A are specified by LINDXA. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A that should be -copied into W. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of rows of A that should be -copied into W. N must be at least zero. -.TP 8 -A (local input) const double * -On entry, A points to an array of dimension (LDA,N). The rows -of this array specified by LINDXA should be copied into W. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -W0 (local input/output) double * -On exit, W0 is an array of size (M-1)*LDW+1, that contains -the destination offset in U where the columns of W should be -copied. -.TP 8 -W (local output) double * -On entry, W is an array of size (LDW,M). On exit, W contains -the rows LINDXA[i] for i in [0..M) of A stored contiguously -in W(:,i). -.TP 8 -LDW (local input) const int -On entry, LDW specifies the leading dimension of the array W. -LDW must be at least MAX(1,N+1). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be copied into W. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M that contains -the local row indexes of U that should be copied into A and -replaced by the rows of W. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp03N.3 b/hpl/man/man3/HPL_dlaswp03N.3 deleted file mode 100644 index 0a9b016c21dde31b8e3a9cea89b634ed89c5ca05..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp03N.3 +++ /dev/null @@ -1,75 +0,0 @@ -.TH HPL_dlaswp03N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp03N \- copy rows of W into U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp03N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const double *\fR -\fI\&W0\fR, -\fB\&const double *\fR -\fI\&W\fR, -\fB\&const int\fR -\fI\&LDW\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp03N\fR -copies columns of W into rows of an array U. The -destination in U of these columns contained in W is stored within W0. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of columns of W stored -contiguously that should be copied into U. M must be at least -zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of columns of W stored -contiguously that should be copied into U. N must be at least -zero. -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,N). Columns -of W are copied as rows within this array U at the positions -specified in W0. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,M). -.TP 8 -W0 (local input) const double * -On entry, W0 is an array of size (M-1)*LDW+1, that contains -the destination offset in U where the columns of W should be -copied. -.TP 8 -W (local input) const double * -On entry, W is an array of size (LDW,M), that contains data -to be copied into U. For i in [0..M), entries W(:,i) should -be copied into the row or column W0(i*LDW) of U. -.TP 8 -LDW (local input) const int -On entry, LDW specifies the leading dimension of the array W. -LDW must be at least MAX(1,N+1). -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp03T.3 b/hpl/man/man3/HPL_dlaswp03T.3 deleted file mode 100644 index f5d8482f013131790c12f6074c6e5844e25063fb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp03T.3 +++ /dev/null @@ -1,75 +0,0 @@ -.TH HPL_dlaswp03T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp03T \- copy columns of W into U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp03T(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const double *\fR -\fI\&W0\fR, -\fB\&const double *\fR -\fI\&W\fR, -\fB\&const int\fR -\fI\&LDW\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp03T\fR -copies columns of W into an array U. The destination -in U of these columns contained in W is stored within W0. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of columns of W stored -contiguously that should be copied into U. M must be at least -zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of columns of W stored -contiguously that should be copied into U. N must be at least -zero. -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,M). Columns -of W are copied within the array U at the positions specified -in W0. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,N). -.TP 8 -W0 (local input) const double * -On entry, W0 is an array of size (M-1)*LDW+1, that contains -the destination offset in U where the columns of W should be -copied. -.TP 8 -W (local input) const double * -On entry, W is an array of size (LDW,M), that contains data -to be copied into U. For i in [0..M), entries W(:,i) should -be copied into the row or column W0(i*LDW) of U. -.TP 8 -LDW (local input) const int -On entry, LDW specifies the leading dimension of the array W. -LDW must be at least MAX(1,N+1). -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp04N.3 b/hpl/man/man3/HPL_dlaswp04N.3 deleted file mode 100644 index a0072dd419f8627c407a015eb1fcc9794f6aeed1..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp04N.3 +++ /dev/null @@ -1,106 +0,0 @@ -.TH HPL_dlaswp04N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp04N \- copy rows of U in A and replace them with columns of W. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp04N(\fR -\fB\&const int\fR -\fI\&M0\fR, -\fB\&const int\fR -\fI\&M1\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&W0\fR, -\fB\&const double *\fR -\fI\&W\fR, -\fB\&const int\fR -\fI\&LDW\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp04N\fR -copies M0 rows of U into A and replaces those rows of U -with columns of W. In addition M1 - M0 columns of W are copied into -rows of U. -.SH ARGUMENTS -.TP 8 -M0 (local input) const int -On entry, M0 specifies the number of rows of U that should be -copied into A and replaced by columns of W. M0 must be at -least zero. -.TP 8 -M1 (local input) const int -On entry, M1 specifies the number of columns of W that should -be copied into rows of U. M1 must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the rows of U that should -be copied into A. N must be at least zero. -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,N). This -array contains the rows that are to be copied into A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,M1). -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -rows of U indicated by LINDXAU. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M0). -.TP 8 -W0 (local input) const double * -On entry, W0 is an array of size (M-1)*LDW+1, that contains -the destination offset in U where the columns of W should be -copied. -.TP 8 -W (local input) const double * -On entry, W is an array of size (LDW,M0+M1), that contains -data to be copied into U. For i in [M0..M0+M1), the entries -W(:,i) are copied into the row W0(i*LDW) of U. -.TP 8 -LDW (local input) const int -On entry, LDW specifies the leading dimension of the array W. -LDW must be at least MAX(1,N+1). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M0 containing the -local row indexes A into which rows of U are copied. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M0 that contains -the local row indexes of U that should be copied into A and -replaced by the columns of W. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp04T.3 b/hpl/man/man3/HPL_dlaswp04T.3 deleted file mode 100644 index 08fca50859b59d5ccf2f61e472cd7d9a0eb37b0f..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp04T.3 +++ /dev/null @@ -1,107 +0,0 @@ -.TH HPL_dlaswp04T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp04T \- copy columns of U in rows of A and replace them with columns of W. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp04T(\fR -\fB\&const int\fR -\fI\&M0\fR, -\fB\&const int\fR -\fI\&M1\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&W0\fR, -\fB\&const double *\fR -\fI\&W\fR, -\fB\&const int\fR -\fI\&LDW\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp04T\fR -copies M0 columns of U into rows of A and replaces those -columns of U with columns of W. In addition M1 - M0 columns of W are -copied into U. -.SH ARGUMENTS -.TP 8 -M0 (local input) const int -On entry, M0 specifies the number of columns of U that should -be copied into A and replaced by columns of W. M0 must be at -least zero. -.TP 8 -M1 (local input) const int -On entry, M1 specifies the number of columnns of W that will -be copied into U. M1 must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the columns of U that -will be copied into rows of A. N must be at least zero. -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,*). This -array contains the columns that are to be copied into rows of -A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,N). -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -columns of U indicated by LINDXAU. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M0). -.TP 8 -W0 (local input) const double * -On entry, W0 is an array of size (M-1)*LDW+1, that contains -the destination offset in U where the columns of W should be -copied. -.TP 8 -W (local input) const double * -On entry, W is an array of size (LDW,M0+M1), that contains -data to be copied into U. For i in [M0..M0+M1), the entries -W(:,i) are copied into the column W0(i*LDW) of U. -.TP 8 -LDW (local input) const int -On entry, LDW specifies the leading dimension of the array W. -LDW must be at least MAX(1,N+1). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M0 containing the -local row indexes A into which columns of U are copied. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M0 that contains -the local column indexes of U that should be copied into A -and replaced by the columns of W. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp05N.3 b/hpl/man/man3/HPL_dlaswp05N.3 deleted file mode 100644 index e16945faa7ea265db7d61c67ea3909089d0b7baf..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp05N.3 +++ /dev/null @@ -1,77 +0,0 @@ -.TH HPL_dlaswp05N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp05N \- copy rows of U into A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp05N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp05N\fR -copies rows of U of global offset LINDXAU into rows of -A at positions indicated by LINDXA. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of U that should be -copied into A. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the rows of U that should -be copied into A. N must be at least zero. -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -rows of U indicated by LINDXAU. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) const double * -On entry, U points to an array of dimension (LDU,N). This -array contains the rows that are to be copied into A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,M). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be copied from U. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M that contains -the local row indexes of U that should be copied in A. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp05T.3 b/hpl/man/man3/HPL_dlaswp05T.3 deleted file mode 100644 index 327913b54e1962f9b516314698f6a7abc797680f..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp05T.3 +++ /dev/null @@ -1,77 +0,0 @@ -.TH HPL_dlaswp05T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp05T \- copy rows of U into A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp05T(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR, -\fB\&const int *\fR -\fI\&LINDXAU\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp05T\fR -copies columns of U of global offset LINDXAU into rows -of A at positions indicated by LINDXA. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of columns of U that shouldbe copied into A. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the columns of U that will -be copied into rows of A. N must be at least zero. -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -columns of U indicated by LINDXAU. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) const double * -On entry, U points to an array of dimension (LDU,*). This -array contains the columns that are to be copied into rows of -A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,N). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be copied from U. -.TP 8 -LINDXAU (local input) const int * -On entry, LINDXAU is an array of dimension M that contains -the local column indexes of U that should be copied in A. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp06N.3 b/hpl/man/man3/HPL_dlaswp06N.3 deleted file mode 100644 index ec69ecae83cb3b28c1f2688a3cbe55b78135790c..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp06N.3 +++ /dev/null @@ -1,72 +0,0 @@ -.TH HPL_dlaswp06N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp06N \- swap rows of U with rows of A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp06N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp06N\fR -swaps rows of U with rows of A at positions -indicated by LINDXA. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A that should be -swapped with rows of U. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the rows of A that should -be swapped with rows of U. N must be at least zero. -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -rows or columns of U. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,N). This -array contains the rows of U that are to be swapped with rows -of A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,M). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be swapped with U. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp06T.3 b/hpl/man/man3/HPL_dlaswp06T.3 deleted file mode 100644 index 3a92267f69553cc5a168b29855068e073fadc8bc..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp06T.3 +++ /dev/null @@ -1,72 +0,0 @@ -.TH HPL_dlaswp06T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp06T \- swap rows or columns of U with rows of A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp06T(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&LINDXA\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp06T\fR -swaps columns of U with rows of A at positions -indicated by LINDXA. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of A that should be -swapped with columns of U. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the length of the rows of A that should -be swapped with columns of U. N must be at least zero. -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -the rows of this array specified by LINDXA are replaced by -columns of U. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.TP 8 -U (local input/output) double * -On entry, U points to an array of dimension (LDU,*). This -array contains the columns of U that are to be swapped with -rows of A. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the leading dimension of the array U. -LDU must be at least MAX(1,N). -.TP 8 -LINDXA (local input) const int * -On entry, LINDXA is an array of dimension M that contains the -local row indexes of A that should be swapped with U. -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlaswp10N.3 b/hpl/man/man3/HPL_dlaswp10N.3 deleted file mode 100644 index 7fade37ce5ec4bf4bdfde1c836dfb1031bea6fc6..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlaswp10N.3 +++ /dev/null @@ -1,59 +0,0 @@ -.TH HPL_dlaswp10N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlaswp10N \- performs a series column interchanges. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlaswp10N(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const int *\fR -\fI\&IPIV\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlaswp10N\fR -performs a sequence of local column interchanges on a -matrix A. One column interchange is initiated for columns 0 through -N-1 of A. -.SH ARGUMENTS -.TP 8 -M (local input) const int -__arg0__ -.TP 8 -N (local input) const int -On entry, M specifies the number of rows of the array A. M -must be at least zero. -.TP 8 -A (local input/output) double * -On entry, N specifies the number of columns of the array A. N -must be at least zero. -.TP 8 -LDA (local input) const int -On entry, A points to an array of dimension (LDA,N). This -array contains the columns onto which the interchanges should -be applied. On exit, A contains the permuted matrix. -.TP 8 -IPIV (local input) const int * -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,M). -.SH SEE ALSO -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05N \ (3), -.BR HPL_dlaswp05T \ (3), -.BR HPL_dlaswp06N \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_dlatcpy.3 b/hpl/man/man3/HPL_dlatcpy.3 deleted file mode 100644 index e09e426bb87c578501fec64680f1467588594c53..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlatcpy.3 +++ /dev/null @@ -1,70 +0,0 @@ -.TH HPL_dlatcpy 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlatcpy \- B := A^T -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlatcpy(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&B\fR, -\fB\&const int\fR -\fI\&LDB\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlatcpy\fR -copies the transpose of an array A into an array B. -.SH ARGUMENTS -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the array B and -the number of columns of A. M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of rows of the array A and -the number of columns of B. N must be at least zero. -.TP 8 -A (local input) const double * -On entry, A points to an array of dimension (LDA,M). -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least MAX(1,N). -.TP 8 -B (local output) double * -On entry, B points to an array of dimension (LDB,N). On exit, -B is overwritten with the transpose of A. -.TP 8 -LDB (local input) const int -On entry, LDB specifies the leading dimension of the array B. -LDB must be at least MAX(1,M). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], b[2*2]; -.br - a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0; -.br - HPL_dlacpy( 2, 2, a, 2, b, 2 ); -.br - printf(" [%f,%f]\en", b[0], b[2]); -.br - printf("b=[%f,%f]\en", b[1], b[3]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dlacpy \ (3). diff --git a/hpl/man/man3/HPL_dlocmax.3 b/hpl/man/man3/HPL_dlocmax.3 deleted file mode 100644 index a92d4ae7741a2c0cdc7c736ebda8b69e8cf1717f..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlocmax.3 +++ /dev/null @@ -1,69 +0,0 @@ -.TH HPL_dlocmax 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlocmax \- finds the maximum entry in matrix column. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlocmax(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&II\fR, -\fB\&const int\fR -\fI\&JJ\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlocmax\fR -finds the maximum entry in the current column and packs -the useful information in WORK[0:3]. On exit, WORK[0] contains the -local maximum absolute value scalar, WORK[1] is the corresponding -local row index, WORK[2] is the corresponding global row index, and -WORK[3] is the coordinate of the process owning this max. When N is -less than 1, the WORK[0:2] is initialized to zero, and WORK[3] is set -to the total number of process rows. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -N (local input) const int -On entry, N specifies the local number of rows of the column -of A on which we operate. -.TP 8 -II (local input) const int -On entry, II specifies the row offset where the column to be -operated on starts with respect to the panel. -.TP 8 -JJ (local input) const int -On entry, JJ specifies the column offset where the column to -be operated on starts with respect to the panel. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 4. On exit, -WORK[0] contains the local maximum absolute value scalar, -WORK[1] contains the corresponding local row index, WORK[2] -contains the corresponding global row index, and WORK[3] is -the coordinate of process owning this max. -.SH SEE ALSO -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_dlocswpN.3 b/hpl/man/man3/HPL_dlocswpN.3 deleted file mode 100644 index 818070bdd3211d54d6f4aad0c412a56f854ebfe6..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlocswpN.3 +++ /dev/null @@ -1,62 +0,0 @@ -.TH HPL_dlocswpN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlocswpN \- locally swaps rows within panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlocswpN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&II\fR, -\fB\&const int\fR -\fI\&JJ\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlocswpN\fR -performs the local swapping operations within a panel. -The lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -II (local input) const int -On entry, II specifies the row offset where the column to be -operated on starts with respect to the panel. -.TP 8 -JJ (local input) const int -On entry, JJ specifies the column offset where the column to -be operated on starts with respect to the panel. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2 * (4+2*N0). -WORK[0] contains the local maximum absolute value scalar, -WORK[1] contains the corresponding local row index, WORK[2] -contains the corresponding global row index, and WORK[3] is -the coordinate of process owning this max. The N0 length max -row is stored in WORK[4:4+N0-1]; Note that this is also the -JJth row (or column) of L1. The remaining part of this array -is used as workspace. -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_dlocswpT.3 b/hpl/man/man3/HPL_dlocswpT.3 deleted file mode 100644 index 7b7771c402d6129619917e45b5afe21e43eb6876..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dlocswpT.3 +++ /dev/null @@ -1,62 +0,0 @@ -.TH HPL_dlocswpT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dlocswpT \- locally swaps rows within panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dlocswpT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&II\fR, -\fB\&const int\fR -\fI\&JJ\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dlocswpT\fR -performs the local swapping operations within a panel. -The lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -II (local input) const int -On entry, II specifies the row offset where the column to be -operated on starts with respect to the panel. -.TP 8 -JJ (local input) const int -On entry, JJ specifies the column offset where the column to -be operated on starts with respect to the panel. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2 * (4+2*N0). -WORK[0] contains the local maximum absolute value scalar, -WORK[1] contains the corresponding local row index, WORK[2] -contains the corresponding global row index, and WORK[3] is -the coordinate of process owning this max. The N0 length max -row is stored in WORK[4:4+N0-1]; Note that this is also the -JJth row (or column) of L1. The remaining part of this array -is used as workspace. -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_dmatgen.3 b/hpl/man/man3/HPL_dmatgen.3 deleted file mode 100644 index 01a12ed8c8f278857a7afac27e9c47d3fabf6c18..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dmatgen.3 +++ /dev/null @@ -1,55 +0,0 @@ -.TH HPL_dmatgen 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dmatgen \- random matrix generator. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dmatgen(\fR -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const int\fR -\fI\&ISEED\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dmatgen\fR -generates (or regenerates) a random matrix A. - -The pseudo-random generator uses the linear congruential algorithm: -X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer -Programming, Knuth 1973, Vol. 2. -.SH ARGUMENTS -.TP 8 -M (input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -A (output) double * -On entry, A points to an array of dimension (LDA,N). On exit, -this array contains the coefficients of the randomly -generated matrix. -.TP 8 -LDA (input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,M). -.TP 8 -ISEED (input) const int -On entry, ISEED specifies the seed number to generate the -matrix A. ISEED must be at least zero. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/man/man3/HPL_dscal.3 b/hpl/man/man3/HPL_dscal.3 deleted file mode 100644 index ddc3271fe17a730c2dd612ff06cf7b4bae62f45d..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dscal.3 +++ /dev/null @@ -1,62 +0,0 @@ -.TH HPL_dscal 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dscal \- x = alpha * x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dscal(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dscal\fR -scales the vector x by alpha. -.SH ARGUMENTS -.TP 8 -N (local input) const int -On entry, N specifies the length of the vector x. N must be -at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero, then the entries of the incremented array X -need not be set on input. -.TP 8 -X (local input/output) double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -On exit, the entries of the incremented array X are scaled -by the scalar alpha. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double x[3]; -.br - x[0] = 1.0; x[1] = 2.0; x[2] = 3.0; -.br - HPL_dscal( 3, 2.0, x, 1 ); -.br - printf("x=[%f,%f,%f]\en", x[0], x[1], x[2]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_daxpy \ (3), -.BR HPL_dcopy \ (3), -.BR HPL_dswap \ (3). diff --git a/hpl/man/man3/HPL_dswap.3 b/hpl/man/man3/HPL_dswap.3 deleted file mode 100644 index e2539bcea443eb76d9af6376d9131f2721cd60c5..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dswap.3 +++ /dev/null @@ -1,73 +0,0 @@ -.TH HPL_dswap 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dswap \- y <-> x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dswap(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR, -\fB\&double *\fR -\fI\&Y\fR, -\fB\&const int\fR -\fI\&INCY\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dswap\fR -swaps the vectors x and y. -.SH ARGUMENTS -.TP 8 -N (local input) const int -On entry, N specifies the length of the vectors x and y. N -must be at least zero. -.TP 8 -X (local input/output) double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -On exit, the entries of the incremented array X are updated -with the entries of the incremented array Y. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.TP 8 -Y (local input/output) double * -On entry, Y is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. -On exit, the entries of the incremented array Y are updated -with the entries of the incremented array X. -.TP 8 -INCY (local input) const int -On entry, INCY specifies the increment for the elements of Y. -INCY must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double x[3], y[3]; -.br - x[0] = 1.0; x[1] = 2.0; x[2] = 3.0; -.br - y[0] = 4.0; y[1] = 5.0; y[2] = 6.0; -.br - HPL_dswap( 3, x, 1, y, 1 ); -.br - printf("x=[%f,%f,%f]\en", x[0], x[1], x[2]); -.br - printf("y=[%f,%f,%f]\en", y[0], y[1], y[2]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_daxpy \ (3), -.BR HPL_dcopy \ (3), -.BR HPL_dscal \ (3). diff --git a/hpl/man/man3/HPL_dtrsm.3 b/hpl/man/man3/HPL_dtrsm.3 deleted file mode 100644 index 10bd3d497d1746c62dcea6e1eb7a8d30780c2232..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dtrsm.3 +++ /dev/null @@ -1,152 +0,0 @@ -.TH HPL_dtrsm 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dtrsm \- B := A^{-1} * B or B := B * A^{-1}. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dtrsm(\fR -\fB\&const enum HPL_ORDER\fR -\fI\&ORDER\fR, -\fB\&const enum HPL_SIDE\fR -\fI\&SIDE\fR, -\fB\&const enum HPL_UPLO\fR -\fI\&UPLO\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANS\fR, -\fB\&const enum HPL_DIAG\fR -\fI\&DIAG\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double\fR -\fI\&ALPHA\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&B\fR, -\fB\&const int\fR -\fI\&LDB\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dtrsm\fR -solves one of the matrix equations - - op( A ) * X = alpha * B, or X * op( A ) = alpha * B, - -where alpha is a scalar, X and B are m by n matrices, A is a unit, or -non-unit, upper or lower triangular matrix and op(A) is one of - - op( A ) = A or op( A ) = A^T. - -The matrix X is overwritten on B. - -No test for singularity or near-singularity is included in this -routine. Such tests must be performed before calling this routine. -.SH ARGUMENTS -.TP 8 -ORDER (local input) const enum HPL_ORDER -On entry, ORDER specifies the storage format of the operands -as follows: - ORDER = HplRowMajor, - ORDER = HplColumnMajor. -.TP 8 -SIDE (local input) const enum HPL_SIDE -On entry, SIDE specifies whether op(A) appears on the left -or right of X as follows: - SIDE==HplLeft op( A ) * X = alpha * B, - SIDE==HplRight X * op( A ) = alpha * B. -.TP 8 -UPLO (local input) const enum HPL_UPLO -On entry, UPLO specifies whether the upper or lower -triangular part of the array A is to be referenced. When -UPLO==HplUpper, only the upper triangular part of A is to be -referenced, otherwise only the lower triangular part of A is -to be referenced. -.TP 8 -TRANS (local input) const enum HPL_TRANS -On entry, TRANSA specifies the form of op(A) to be used in -the matrix-matrix operation follows: - TRANSA==HplNoTrans : op( A ) = A, - TRANSA==HplTrans : op( A ) = A^T, - TRANSA==HplConjTrans : op( A ) = A^T. -.TP 8 -DIAG (local input) const enum HPL_DIAG -On entry, DIAG specifies whether A is unit triangular or -not. When DIAG==HplUnit, A is assumed to be unit triangular, -and otherwise, A is not assumed to be unit triangular. -.TP 8 -M (local input) const int -On entry, M specifies the number of rows of the matrix B. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of the matrix B. -N must be at least zero. -.TP 8 -ALPHA (local input) const double -On entry, ALPHA specifies the scalar alpha. When ALPHA is -supplied as zero then the elements of the matrix B need not -be set on input. -.TP 8 -A (local input) const double * -On entry, A points to an array of size equal to or greater -than LDA * k, where k is m when SIDE==HplLeft and is n -otherwise. Before entry with UPLO==HplUpper, the leading -k by k upper triangular part of the array A must contain the -upper triangular matrix and the strictly lower triangular -part of A is not referenced. When UPLO==HplLower on entry, -the leading k by k lower triangular part of the array A must -contain the lower triangular matrix and the strictly upper -triangular part of A is not referenced. - -Note that when DIAG==HplUnit, the diagonal elements of A -not referenced either, but are assumed to be unity. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of A as -declared in the calling (sub) program. LDA must be at -least MAX(1,m) when SIDE==HplLeft, and MAX(1,n) otherwise. -.TP 8 -B (local input/output) double * -On entry, B points to an array of size equal to or greater -than LDB * n. Before entry, the leading m by n part of the -array B must contain the matrix B, except when beta is zero, -in which case B need not be set on entry. On exit, the array -B is overwritten by the m by n solution matrix. -.TP 8 -LDB (local input) const int -On entry, LDB specifies the leading dimension of B as -declared in the calling (sub) program. LDB must be at -least MAX(1,m). -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], b[2*2]; -.br - a[0] = 4.0; a[1] = 1.0; a[2] = 2.0; a[3] = 5.0; -.br - b[0] = 2.0; b[1] = 1.0; b[2] = 1.0; b[3] = 2.0; -.br - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, -.br - HplNoTrans, HplNonUnit, 2, 2, 2.0, -.br - a, 2, b, 2 ); -.br - printf(" [%f,%f]\en", b[0], b[2]); -.br - printf("b=[%f,%f]\en", b[1], b[3]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dgemm \ (3). diff --git a/hpl/man/man3/HPL_dtrsv.3 b/hpl/man/man3/HPL_dtrsv.3 deleted file mode 100644 index 5b74924b199d7941e19e6ef3bfc039c04c3c9dc8..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_dtrsv.3 +++ /dev/null @@ -1,121 +0,0 @@ -.TH HPL_dtrsv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_dtrsv \- x := A^{-1} x. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_dtrsv(\fR -\fB\&const enum HPL_ORDER\fR -\fI\&ORDER\fR, -\fB\&const enum HPL_UPLO\fR -\fI\&UPLO\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANS\fR, -\fB\&const enum HPL_DIAG\fR -\fI\&DIAG\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_dtrsv\fR -solves one of the systems of equations - - A * x = b, or A^T * x = b, - -where b and x are n-element vectors and A is an n by n non-unit, or -unit, upper or lower triangular matrix. - -No test for singularity or near-singularity is included in this -routine. Such tests must be performed before calling this routine. -.SH ARGUMENTS -.TP 8 -ORDER (local input) const enum HPL_ORDER -On entry, ORDER specifies the storage format of the operands -as follows: - ORDER = HplRowMajor, - ORDER = HplColumnMajor. -.TP 8 -UPLO (local input) const enum HPL_UPLO -On entry, UPLO specifies whether the upper or lower -triangular part of the array A is to be referenced. When -UPLO==HplUpper, only the upper triangular part of A is to be -referenced, otherwise only the lower triangular part of A is -to be referenced. -.TP 8 -TRANS (local input) const enum HPL_TRANS -On entry, TRANS specifies the equations to be solved as -follows: - TRANS==HplNoTrans A * x = b, - TRANS==HplTrans A^T * x = b. -.TP 8 -DIAG (local input) const enum HPL_DIAG -On entry, DIAG specifies whether A is unit triangular or -not. When DIAG==HplUnit, A is assumed to be unit triangular, -and otherwise, A is not assumed to be unit triangular. -.TP 8 -N (local input) const int -On entry, N specifies the order of the matrix A. N must be at -least zero. -.TP 8 -A (local input) const double * -On entry, A points to an array of size equal to or greater -than LDA * n. Before entry with UPLO==HplUpper, the leading -n by n upper triangular part of the array A must contain the -upper triangular matrix and the strictly lower triangular -part of A is not referenced. When UPLO==HplLower on entry, -the leading n by n lower triangular part of the array A must -contain the lower triangular matrix and the strictly upper -triangular part of A is not referenced. - -Note that when DIAG==HplUnit, the diagonal elements of A -not referenced either, but are assumed to be unity. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of A as -declared in the calling (sub) program. LDA must be at -least MAX(1,n). -.TP 8 -X (local input/output) double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -Before entry, the incremented array X must contain the n -element right-hand side vector b. On exit, X is overwritten -with the solution vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double a[2*2], x[2]; -.br - a[0] = 4.0; a[1] = 1.0; a[2] = 2.0; a[3] = 5.0; -.br - x[0] = 2.0; x[1] = 1.0; -.br - HPL_dtrsv( HplColumnMajor, HplLower, HplNoTrans, -.br - HplNoUnit, a, 2, x, 1 ); -.br - printf("x=[%f,%f]\en", x[0], x[1]); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_dger \ (3), -.BR HPL_dgemv \ (3). diff --git a/hpl/man/man3/HPL_equil.3 b/hpl/man/man3/HPL_equil.3 deleted file mode 100644 index 212654fec02d076882aab420948247b4e28aeaa7..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_equil.3 +++ /dev/null @@ -1,91 +0,0 @@ -.TH HPL_equil 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_equil \- Equilibrate U and forward the column panel L. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_equil(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const enum HPL_TRANS\fR -\fI\&TRANS\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&int *\fR -\fI\&IPLEN\fR, -\fB\&const int *\fR -\fI\&IPMAP\fR, -\fB\&const int *\fR -\fI\&IPMAPM1\fR, -\fB\&int *\fR -\fI\&IWORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_equil\fR -equilibrates the local pieces of U, so that on exit to -this function, pieces of U contained in every process row are of the -same size. This phase makes the rolling phase optimal. In addition, -this function probes for the column panel L and forwards it when -possible. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be equilibrated) information. -.TP 8 -TRANS (global input) const enum HPL_TRANS -On entry, TRANS specifies whether U is stored in transposed -or non-transposed form. -.TP 8 -N (local input) const int -On entry, N specifies the number of rows or columns of U. N -must be at least 0. -.TP 8 -U (local input/output) double * -On entry, U is an array of dimension (LDU,*) containing the -local pieces of U in each process row. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the local leading dimension of U. LDU -should be at least MAX(1,IPLEN[nprow]) when U is stored in -non-transposed form, and MAX(1,N) otherwise. -.TP 8 -IPLEN (global input) int * -On entry, IPLEN is an array of dimension NPROW+1. This array -is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U -in process IPMAP[i]. -.TP 8 -IPMAP (global input) const int * -On entry, IPMAP is an array of dimension NPROW. This array -contains the logarithmic mapping of the processes. In other -words, IPMAP[myrow] is the absolute coordinate of the sorted -process. -.TP 8 -IPMAPM1 (global input) const int * -On entry, IPMAPM1 is an array of dimension NPROW. This array -contains the inverse of the logarithmic mapping contained in -IPMAP: For i in [0.. NPROCS) IPMAPM1[IPMAP[i]] = i. -.TP 8 -IWORK (workspace) int * -On entry, IWORK is a workarray of dimension NPROW+1. -.SH SEE ALSO -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_fprintf.3 b/hpl/man/man3/HPL_fprintf.3 deleted file mode 100644 index c0a3c48388a4c65141a1a36e662b4eebff246860..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_fprintf.3 +++ /dev/null @@ -1,44 +0,0 @@ -.TH HPL_fprintf 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_fprintf \- fprintf + fflush wrapper. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_fprintf(\fR -\fB\&FILE *\fR -\fI\&STREAM\fR, -\fB\&const char *\fR -\fI\&FORM\fR, -\fB\&...\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_fprintf\fR -is a wrapper around fprintf flushing the output stream. -.SH ARGUMENTS -.TP 8 -STREAM (local input) FILE * -On entry, STREAM specifies the output stream. -.TP 8 -FORM (local input) const char * -On entry, FORM specifies the format, i.e., how the subsequent -arguments are converted for output. -.TP 8 - (local input) ... -On entry, ... is the list of arguments to be printed within -the format string. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - HPL_fprintf( stdout, "Hello World.\en" ); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_abort \ (3), -.BR HPL_warn \ (3). diff --git a/hpl/man/man3/HPL_grid_exit.3 b/hpl/man/man3/HPL_grid_exit.3 deleted file mode 100644 index 3c4386eb7457bc000033dd385805b4850d32678b..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_grid_exit.3 +++ /dev/null @@ -1,25 +0,0 @@ -.TH HPL_grid_exit 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_grid_exit \- Exit process grid. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_grid_exit(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_grid_exit\fR -marks the process grid object for deallocation. The -returned error code MPI_SUCCESS indicates successful completion. -Other error codes are (MPI) implementation dependent. -.SH ARGUMENTS -.TP 8 -GRID (local input/output) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid to be released. -.SH SEE ALSO -.BR HPL_pnum \ (3), -.BR HPL_grid_init \ (3), -.BR HPL_grid_info \ (3). diff --git a/hpl/man/man3/HPL_grid_info.3 b/hpl/man/man3/HPL_grid_info.3 deleted file mode 100644 index bb8566ae08ac50f8a56ad9c313a687c2d72812d6..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_grid_info.3 +++ /dev/null @@ -1,52 +0,0 @@ -.TH HPL_grid_info 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_grid_info \- Retrieve grid information. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_grid_info(\fR -\fB\&const HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&int *\fR -\fI\&NPROW\fR, -\fB\&int *\fR -\fI\&NPCOL\fR, -\fB\&int *\fR -\fI\&MYROW\fR, -\fB\&int *\fR -\fI\&MYCOL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_grid_info\fR -returns the grid shape and the coordinates in the grid -of the calling process. Successful completion is indicated by the -returned error code MPI_SUCCESS. Other error codes depend on the MPI -implementation. -.SH ARGUMENTS -.TP 8 -GRID (local input) const HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -NPROW (global output) int * -On exit, NPROW specifies the number of process rows in the -grid. NPROW is at least one. -.TP 8 -NPCOL (global output) int * -On exit, NPCOL specifies the number of process columns in -the grid. NPCOL is at least one. -.TP 8 -MYROW (global output) int * -On exit, MYROW specifies my row process coordinate in the -grid. MYROW is greater than or equal to zero and less than -NPROW. -.TP 8 -MYCOL (global output) int * -On exit, MYCOL specifies my column process coordinate in the -grid. MYCOL is greater than or equal to zero and less than -NPCOL. -.SH SEE ALSO -.BR HPL_pnum \ (3), -.BR HPL_grid_init \ (3), -.BR HPL_grid_exit \ (3). diff --git a/hpl/man/man3/HPL_grid_init.3 b/hpl/man/man3/HPL_grid_init.3 deleted file mode 100644 index ccc892383256d57ca1d0f0ed2104d9cbf002d2a9..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_grid_init.3 +++ /dev/null @@ -1,55 +0,0 @@ -.TH HPL_grid_init 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_grid_init \- Create a process grid. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_grid_init(\fR -\fB\&MPI_Comm\fR -\fI\&COMM\fR, -\fB\&const HPL_T_ORDER\fR -\fI\&ORDER\fR, -\fB\&const int\fR -\fI\&NPROW\fR, -\fB\&const int\fR -\fI\&NPCOL\fR, -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_grid_init\fR -creates a NPROW x NPCOL process grid using column- or -row-major ordering from an initial collection of processes identified -by an MPI communicator. Successful completion is indicated by the -returned error code MPI_SUCCESS. Other error codes depend on the MPI -implementation. The coordinates of processes that are not part of the -grid are set to values outside of [0..NPROW) x [0..NPCOL). -.SH ARGUMENTS -.TP 8 -COMM (global/local input) MPI_Comm -On entry, COMM is the MPI communicator identifying the -initial collection of processes out of which the grid is -formed. -.TP 8 -ORDER (global input) const HPL_T_ORDER -On entry, ORDER specifies how the processes should be ordered -in the grid as follows: - ORDER = HPL_ROW_MAJOR row-major ordering; - ORDER = HPL_COLUMN_MAJOR column-major ordering; -.TP 8 -NPROW (global input) const int -On entry, NPROW specifies the number of process rows in the -grid to be created. NPROW must be at least one. -.TP 8 -NPCOL (global input) const int -On entry, NPCOL specifies the number of process columns in -the grid to be created. NPCOL must be at least one. -.TP 8 -GRID (local input/output) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information to be initialized. -.SH SEE ALSO -.BR HPL_pnum \ (3), -.BR HPL_grid_info \ (3), -.BR HPL_grid_exit \ (3). diff --git a/hpl/man/man3/HPL_idamax.3 b/hpl/man/man3/HPL_idamax.3 deleted file mode 100644 index 96ec5698e71a1797fc1a497a63dbd08ed0d21c37..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_idamax.3 +++ /dev/null @@ -1,59 +0,0 @@ -.TH HPL_idamax 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_idamax \- 1st k s.t. |x_k| = max_i(|x_i|). -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_idamax(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const double *\fR -\fI\&X\fR, -\fB\&const int\fR -\fI\&INCX\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_idamax\fR -returns the index in an n-vector x of the first element -having maximum absolute value. -.SH ARGUMENTS -.TP 8 -N (local input) const int -On entry, N specifies the length of the vector x. N must be -at least zero. -.TP 8 -X (local input) const double * -On entry, X is an incremented array of dimension at least -( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. -.TP 8 -INCX (local input) const int -On entry, INCX specifies the increment for the elements of X. -INCX must not be zero. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - double x[3]; -.br - int imax; -.br - x[0] = 1.0; x[1] = 3.0; x[2] = 2.0; -.br - imax = HPL_idamax( 3, x, 1 ); -.br - printf("imax=%d\en", imax); -.br - exit(0); -.br - return(0); -.br -} -.SH SEE ALSO -.BR HPL_daxpy \ (3), -.BR HPL_dcopy \ (3), -.BR HPL_dscal \ (3), -.BR HPL_dswap \ (3). diff --git a/hpl/man/man3/HPL_indxg2l.3 b/hpl/man/man3/HPL_indxg2l.3 deleted file mode 100644 index def26ac9c630a860564497521246698a146a8a8a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_indxg2l.3 +++ /dev/null @@ -1,53 +0,0 @@ -.TH HPL_indxg2l 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_indxg2l \- Map a global index into a local one. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_indxg2l(\fR -\fB\&const int\fR -\fI\&IG\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_indxg2l\fR -computes the local index of a matrix entry pointed to by -the global index IG. This local returned index is the same in all -processes. -.SH ARGUMENTS -.TP 8 -IG (input) const int -On entry, IG specifies the global index of the matrix entry. -IG must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of the -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix. NB must be larger than one. -.TP 8 -SRCPROC (input) const int -On entry, if SRCPROC = -1, the data is not distributed but -replicated, in which case this routine returns IG in all -processes. Otherwise, the value of SRCPROC is ignored. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process rows -or columns over which the matrix is distributed. NPROCS must -be at least one. -.SH SEE ALSO -.BR HPL_indxg2lp \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numroc \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_indxg2lp.3 b/hpl/man/man3/HPL_indxg2lp.3 deleted file mode 100644 index e410a00b8941b2b576689d6ce6e168535504cae3..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_indxg2lp.3 +++ /dev/null @@ -1,66 +0,0 @@ -.TH HPL_indxg2lp 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_indxg2lp \- Map a local index into a global one. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_indxg2lp(\fR -\fB\&int *\fR -\fI\&IL\fR, -\fB\&int *\fR -\fI\&PROC\fR, -\fB\&const int\fR -\fI\&IG\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_indxg2lp\fR -computes the local index of a matrix entry pointed to by -the global index IG as well as the process coordinate which posseses -this entry. The local returned index is the same in all processes. -.SH ARGUMENTS -.TP 8 -IL (output) int * -On exit, IL specifies the local index corresponding to IG. IL -is at least zero. -.TP 8 -PROC (output) int * -On exit, PROC is the coordinate of the process owning the -entry specified by the global index IG. PROC is at least zero -and less than NPROCS. -.TP 8 -IG (input) const int -On entry, IG specifies the global index of the matrix entry. -IG must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of the -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -SRCPROC (input) const int -On entry, if SRCPROC = -1, the data is not distributed but -replicated, in which case this routine returns IG in all -processes. Otherwise, the value of SRCPROC is ignored. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process rows -or columns over which the matrix is distributed. NPROCS must -be at least one. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numroc \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_indxg2p.3 b/hpl/man/man3/HPL_indxg2p.3 deleted file mode 100644 index 248124d17f279d2ef613a7ac59fb9d8a5c39a606..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_indxg2p.3 +++ /dev/null @@ -1,52 +0,0 @@ -.TH HPL_indxg2p 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_indxg2p \- Map a global index into a process coordinate. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_indxg2p(\fR -\fB\&const int\fR -\fI\&IG\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_indxg2p\fR -computes the process coordinate which posseses the entry -of a matrix specified by a global index IG. -.SH ARGUMENTS -.TP 8 -IG (input) const int -On entry, IG specifies the global index of the matrix entry. -IG must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of the -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -SRCPROC (input) const int -On entry, SRCPROC specifies the coordinate of the process -that possesses the first row or column of the matrix. SRCPROC -must be at least zero and strictly less than NPROCS. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process rows -or columns over which the matrix is distributed. NPROCS must -be at least one. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numroc \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_indxl2g.3 b/hpl/man/man3/HPL_indxl2g.3 deleted file mode 100644 index d0a470f00a2f2661811920858f49efec5505f56e..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_indxl2g.3 +++ /dev/null @@ -1,59 +0,0 @@ -.TH HPL_indxl2g 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_indxl2g \- Map a index-process pair into a global index. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_indxl2g(\fR -\fB\&const int\fR -\fI\&IL\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&PROC\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_indxl2g\fR -computes the global index of a matrix entry pointed to -by the local index IL of the process indicated by PROC. -.SH ARGUMENTS -.TP 8 -IL (input) const int -On entry, IL specifies the local index of the matrix entry. -IL must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of the -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -PROC (input) const int -On entry, PROC specifies the coordinate of the process whose -local array row or column is to be determined. PROC must be -at least zero and strictly less than NPROCS. -.TP 8 -SRCPROC (input) const int -On entry, SRCPROC specifies the coordinate of the process -that possesses the first row or column of the matrix. SRCPROC -must be at least zero and strictly less than NPROCS. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process rows -or columns over which the matrix is distributed. NPROCS must -be at least one. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2lp \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_numroc \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_infog2l.3 b/hpl/man/man3/HPL_infog2l.3 deleted file mode 100644 index 33d1e60521a8ed92c13e190857b2a3512d72398a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_infog2l.3 +++ /dev/null @@ -1,126 +0,0 @@ -.TH HPL_infog2l 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_infog2l \- global to local index translation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_infog2l(\fR -\fB\&int\fR -\fI\&I\fR, -\fB\&int\fR -\fI\&J\fR, -\fB\&const int\fR -\fI\&IMB\fR, -\fB\&const int\fR -\fI\&MB\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&RSRC\fR, -\fB\&const int\fR -\fI\&CSRC\fR, -\fB\&const int\fR -\fI\&MYROW\fR, -\fB\&const int\fR -\fI\&MYCOL\fR, -\fB\&const int\fR -\fI\&NPROW\fR, -\fB\&const int\fR -\fI\&NPCOL\fR, -\fB\&int *\fR -\fI\&II\fR, -\fB\&int *\fR -\fI\&JJ\fR, -\fB\&int *\fR -\fI\&PROW\fR, -\fB\&int *\fR -\fI\&PCOL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_infog2l\fR -computes the starting local index II, JJ corresponding to -the submatrix starting globally at the entry pointed by I, J. This -routine returns the coordinates in the grid of the process owning the -matrix entry of global indexes I, J, namely PROW and PCOL. -.SH ARGUMENTS -.TP 8 -I (global input) int -On entry, I specifies the global row index of the matrix -entry. I must be at least zero. -.TP 8 -J (global input) int -On entry, J specifies the global column index of the matrix -entry. J must be at least zero. -.TP 8 -IMB (global input) const int -On entry, IMB specifies the size of the first row block of -the global matrix. IMB must be at least one. -.TP 8 -MB (global input) const int -On entry, MB specifies the blocking factor used to partition -and distribute the rows of the matrix A. MB must be larger -than one. -.TP 8 -INB (global input) const int -On entry, INB specifies the size of the first column block of -the global matrix. INB must be at least one. -.TP 8 -NB (global input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the columns of the matrix A. NB must be larger -than one. -.TP 8 -RSRC (global input) const int -On entry, RSRC specifies the row coordinate of the process -that possesses the row I. RSRC must be at least zero and -strictly less than NPROW. -.TP 8 -CSRC (global input) const int -On entry, CSRC specifies the column coordinate of the process -that possesses the column J. CSRC must be at least zero and -strictly less than NPCOL. -.TP 8 -MYROW (local input) const int -On entry, MYROW specifies my row process coordinate in the -grid. MYROW is greater than or equal to zero and less than -NPROW. -.TP 8 -MYCOL (local input) const int -On entry, MYCOL specifies my column process coordinate in the -grid. MYCOL is greater than or equal to zero and less than -NPCOL. -.TP 8 -NPROW (global input) const int -On entry, NPROW specifies the number of process rows in the -grid. NPROW is at least one. -.TP 8 -NPCOL (global input) const int -On entry, NPCOL specifies the number of process columns in -the grid. NPCOL is at least one. -.TP 8 -II (local output) int * -On exit, II specifies the local starting row index of the -submatrix. On exit, II is at least 0. -.TP 8 -JJ (local output) int * -On exit, JJ specifies the local starting column index of the -submatrix. On exit, JJ is at least 0. -.TP 8 -PROW (global output) int * -On exit, PROW is the row coordinate of the process owning the -entry specified by the global index I. PROW is at least zero -and less than NPROW. -.TP 8 -PCOL (global output) int * -On exit, PCOL is the column coordinate of the process owning -the entry specified by the global index J. PCOL is at least -zero and less than NPCOL. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numroc \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_jumpit.3 b/hpl/man/man3/HPL_jumpit.3 deleted file mode 100644 index 8f864bf94e0deecdf52d6c3af49201e2894a1be1..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_jumpit.3 +++ /dev/null @@ -1,48 +0,0 @@ -.TH HPL_jumpit 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_jumpit \- jump into the random sequence. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_jumpit(\fR -\fB\&int *\fR -\fI\&MULT\fR, -\fB\&int *\fR -\fI\&IADD\fR, -\fB\&int *\fR -\fI\&IRANN\fR, -\fB\&int *\fR -\fI\&IRANM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_jumpit\fR -jumps in the random sequence from the number X(n) encoded -in IRANN to the number X(m) encoded in IRANM using the constants A -and C encoded in MULT and IADD: X(m) = A * X(n) + C. The constants A -and C obviously depend on m and n, see the function HPL_xjumpm in -order to initialize them. -.SH ARGUMENTS -.TP 8 -MULT (local input) int * -On entry, MULT is an array of dimension 2, that contains the -16-lower and 15-higher bits of the constant A. -.TP 8 -IADD (local input) int * -On entry, IADD is an array of dimension 2, that contains the -16-lower and 15-higher bits of the constant C. -.TP 8 -IRANN (local input) int * -On entry, IRANN is an array of dimension 2, that contains -the 16-lower and 15-higher bits of the encoding of X(n). -.TP 8 -IRANM (local output) int * -On entry, IRANM is an array of dimension 2. On exit, this -array contains respectively the 16-lower and 15-higher bits -of the encoding of X(m). -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/man/man3/HPL_ladd.3 b/hpl/man/man3/HPL_ladd.3 deleted file mode 100644 index 799b7dfd147b4947781a5bebe9b18bc602d56af0..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_ladd.3 +++ /dev/null @@ -1,41 +0,0 @@ -.TH HPL_ladd 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_ladd \- Adds two long positive integers. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_ladd(\fR -\fB\&int *\fR -\fI\&J\fR, -\fB\&int *\fR -\fI\&K\fR, -\fB\&int *\fR -\fI\&I\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_ladd\fR -adds without carry two long positive integers K and J and -puts the result into I. The long integers I, J, K are encoded on 64 -bits using an array of 2 integers. The 32-lower bits are stored in -the first entry of each array, the 32-higher bits in the second -entry. -.SH ARGUMENTS -.TP 8 -J (local input) int * -On entry, J is an integer array of dimension 2 containing the -encoded long integer J. -.TP 8 -K (local input) int * -On entry, K is an integer array of dimension 2 containing the -encoded long integer K. -.TP 8 -I (local output) int * -On entry, I is an integer array of dimension 2. On exit, this -array contains the encoded long integer result. -.SH SEE ALSO -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/man/man3/HPL_lmul.3 b/hpl/man/man3/HPL_lmul.3 deleted file mode 100644 index 9770d4d0165331941f7319dc24fe09069d8b6b5a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_lmul.3 +++ /dev/null @@ -1,42 +0,0 @@ -.TH HPL_lmul 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_lmul \- multiplies 2 long positive integers. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_lmul(\fR -\fB\&int *\fR -\fI\&K\fR, -\fB\&int *\fR -\fI\&J\fR, -\fB\&int *\fR -\fI\&I\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_lmul\fR -multiplies without carry two long positive integers K and J -and puts the result into I. The long integers I, J, K are encoded on -64 bits using an array of 2 integers. The 32-lower bits are stored in -the first entry of each array, the 32-higher bits in the second entry -of each array. For efficiency purposes, the intrisic modulo function -is inlined. -.SH ARGUMENTS -.TP 8 -K (local input) int * -On entry, K is an integer array of dimension 2 containing the -encoded long integer K. -.TP 8 -J (local input) int * -On entry, J is an integer array of dimension 2 containing the -encoded long integer J. -.TP 8 -I (local output) int * -On entry, I is an integer array of dimension 2. On exit, this -array contains the encoded long integer result. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/man/man3/HPL_logsort.3 b/hpl/man/man3/HPL_logsort.3 deleted file mode 100644 index d98c76d2f0f2e176cc25061f74270b18accbfab4..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_logsort.3 +++ /dev/null @@ -1,65 +0,0 @@ -.TH HPL_logsort 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_logsort \- Sort the processes in logarithmic order. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_logsort(\fR -\fB\&const int\fR -\fI\&NPROCS\fR, -\fB\&const int\fR -\fI\&ICURROC\fR, -\fB\&int *\fR -\fI\&IPLEN\fR, -\fB\&int *\fR -\fI\&IPMAP\fR, -\fB\&int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_logsort\fR -computes an array IPMAP and its inverse IPMAPM1 that -contain the logarithmic sorted processes id with repect to the local -number of rows of U that they own. This is necessary to ensure that -the logarithmic spreading of U is optimal in terms of number of steps -and communication volume as well. In other words, the larget pieces -of U will be sent a minimal number of times. -.SH ARGUMENTS -.TP 8 -NPROCS (global input) const int -On entry, NPROCS specifies the number of process rows in the -process grid. NPROCS is at least one. -.TP 8 -ICURROC (global input) const int -On entry, ICURROC is the source process row. -.TP 8 -IPLEN (global input/output) int * -On entry, IPLEN is an array of dimension NPROCS+1, such that -IPLEN[0] is 0, and IPLEN[i] contains the number of rows of U, -that process i-1 has. On exit, IPLEN[i] is the number of -rows of U in the processes before process IPMAP[i] after the -sort, with the convention that IPLEN[NPROCS] is the total -number of rows of the panel. In other words, IPLEN[i+1] - -IPLEN[i] is the number of rows of A that should be moved to -the process IPMAP[i]. IPLEN is such that the number of rows -of the source process row is IPLEN[1] - IPLEN[0], and the -remaining entries of this array are sorted so that the -quantities IPLEN[i+1]-IPLEN[i] are logarithmically sorted. -.TP 8 -IPMAP (global output) int * -On entry, IPMAP is an array of dimension NPROCS. On exit, -array contains the logarithmic mapping of the processes. In -other words, IPMAP[myroc] is the corresponding sorted process -coordinate. -.TP 8 -IPMAPM1 (global output) int * -On entry, IPMAPM1 is an array of dimension NPROCS. On exit, -this array contains the inverse of the logarithmic mapping -contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in -[0.. NPROCS) -.SH SEE ALSO -.BR HPL_plindx1 \ (3), -.BR HPL_plindx10 \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_max.3 b/hpl/man/man3/HPL_max.3 deleted file mode 100644 index 1d7290825122f3f4f2268036e77a0d31353e6edb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_max.3 +++ /dev/null @@ -1,43 +0,0 @@ -.TH HPL_max 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_max \- Combine (max) two buffers. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_max(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const void *\fR -\fI\&IN\fR, -\fB\&void *\fR -\fI\&INOUT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_max\fR -combines (max) two buffers. -.SH ARGUMENTS -.TP 8 -N (input) const int -On entry, N specifies the length of the buffers to be -combined. N must be at least zero. -.TP 8 -IN (input) const void * -On entry, IN points to the input-only buffer to be combined. -.TP 8 -INOUT (input/output) void * -On entry, INOUT points to the input-output buffer to be -combined. On exit, the entries of this array contains the -combined results. -.TP 8 -DTYPE (input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_reduce \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_min \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_min.3 b/hpl/man/man3/HPL_min.3 deleted file mode 100644 index 0e2bd3645b5ebee1f4e8e7f5b1b58009e5adc442..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_min.3 +++ /dev/null @@ -1,43 +0,0 @@ -.TH HPL_min 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_min \- Combine (min) two buffers. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_min(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const void *\fR -\fI\&IN\fR, -\fB\&void *\fR -\fI\&INOUT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_min\fR -combines (min) two buffers. -.SH ARGUMENTS -.TP 8 -N (input) const int -On entry, N specifies the length of the buffers to be -combined. N must be at least zero. -.TP 8 -IN (input) const void * -On entry, IN points to the input-only buffer to be combined. -.TP 8 -INOUT (input/output) void * -On entry, INOUT points to the input-output buffer to be -combined. On exit, the entries of this array contains the -combined results. -.TP 8 -DTYPE (input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_reduce \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_numroc.3 b/hpl/man/man3/HPL_numroc.3 deleted file mode 100644 index 0f6393966fb380f47cc42caca38a614021ce39e3..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_numroc.3 +++ /dev/null @@ -1,60 +0,0 @@ -.TH HPL_numroc 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_numroc \- Compute the local number of row/columns. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_numroc(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&PROC\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_numroc\fR -returns the local number of matrix rows/columns process -PROC will get if we give out N rows/columns starting from global -index 0. -.SH ARGUMENTS -.TP 8 -N (input) const int -On entry, N specifies the number of rows/columns being dealt -out. N must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of the -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -PROC (input) const int -On entry, PROC specifies the coordinate of the process whose -local portion is determined. PROC must be at least zero and -strictly less than NPROCS. -.TP 8 -SRCPROC (input) const int -On entry, SRCPROC specifies the coordinate of the process -that possesses the first row or column of the matrix. SRCPROC -must be at least zero and strictly less than NPROCS. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process rows -or columns over which the matrix is distributed. NPROCS must -be at least one. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2lp \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numrocI \ (3). diff --git a/hpl/man/man3/HPL_numrocI.3 b/hpl/man/man3/HPL_numrocI.3 deleted file mode 100644 index 7925843fef5c8759ea9af663127982b9357b5bbb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_numrocI.3 +++ /dev/null @@ -1,66 +0,0 @@ -.TH HPL_numrocI 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_numrocI \- Compute the local number of row/columns. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_numrocI(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&I\fR, -\fB\&const int\fR -\fI\&INB\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const int\fR -\fI\&PROC\fR, -\fB\&const int\fR -\fI\&SRCPROC\fR, -\fB\&const int\fR -\fI\&NPROCS\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_numrocI\fR -returns the local number of matrix rows/columns process -PROC will get if we give out N rows/columns starting from global -index I. -.SH ARGUMENTS -.TP 8 -N (input) const int -On entry, N specifies the number of rows/columns being dealt -out. N must be at least zero. -.TP 8 -I (input) const int -On entry, I specifies the global index of the matrix entry -I must be at least zero. -.TP 8 -INB (input) const int -On entry, INB specifies the size of the first block of th -global matrix. INB must be at least one. -.TP 8 -NB (input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -PROC (input) const int -On entry, PROC specifies the coordinate of the process whos -local portion is determined. PROC must be at least zero an -strictly less than NPROCS. -.TP 8 -SRCPROC (input) const int -On entry, SRCPROC specifies the coordinate of the proces -that possesses the first row or column of the matrix. SRCPRO -must be at least zero and strictly less than NPROCS. -.TP 8 -NPROCS (input) const int -On entry, NPROCS specifies the total number of process row -or columns over which the matrix is distributed. NPROCS mus -be at least one. -.SH SEE ALSO -.BR HPL_indxg2l \ (3), -.BR HPL_indxg2lp \ (3), -.BR HPL_indxg2p \ (3), -.BR HPL_indxl2g \ (3), -.BR HPL_numroc \ (3). diff --git a/hpl/man/man3/HPL_pabort.3 b/hpl/man/man3/HPL_pabort.3 deleted file mode 100644 index 396c513776efc839f39dbad75a0095090701199d..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pabort.3 +++ /dev/null @@ -1,40 +0,0 @@ -.TH HPL_pabort 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pabort \- halts execution. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pabort(\fR -\fB\&int\fR -\fI\&LINE\fR, -\fB\&const char *\fR -\fI\&SRNAME\fR, -\fB\&const char *\fR -\fI\&FORM\fR, -\fB\&...\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pabort\fR -displays an error message on stderr and halts execution. -.SH ARGUMENTS -.TP 8 -LINE (local input) int -On entry, LINE specifies the line number in the file where -the error has occured. When LINE is not a positive line -number, it is ignored. -.TP 8 -SRNAME (local input) const char * -On entry, SRNAME should be the name of the routine calling -this error handler. -.TP 8 -FORM (local input) const char * -On entry, FORM specifies the format, i.e., how the subsequent -arguments are converted for output. -.TP 8 - (local input) ... -On entry, ... is the list of arguments to be printed within -the format string. -.SH SEE ALSO -.BR HPL_fprintf \ (3), -.BR HPL_pwarn \ (3). diff --git a/hpl/man/man3/HPL_packL.3 b/hpl/man/man3/HPL_packL.3 deleted file mode 100644 index 3b249c22997c9d4edb8d0947ed7b432fc1c8c493..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_packL.3 +++ /dev/null @@ -1,42 +0,0 @@ -.TH HPL_packL 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_packL \- Form the MPI structure for the row ring broadcasts. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_packL(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&INDEX\fR, -\fB\&const int\fR -\fI\&LEN\fR, -\fB\&const int\fR -\fI\&IBUF\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_packL\fR -forms the MPI data type for the panel to be broadcast. -Successful completion is indicated by the returned error code -MPI_SUCCESS. -.SH ARGUMENTS -.TP 8 -PANEL (input/output) HPL_T_panel * -On entry, PANEL points to the current panel data structure -being broadcast. -.TP 8 -INDEX (input) const int -On entry, INDEX points to the first entry of the packed -buffer being broadcast. -.TP 8 -LEN (input) const int -On entry, LEN is the length of the packed buffer. -.TP 8 -IBUF (input) const int -On entry, IBUF specifies the panel buffer/count/type entries -that should be initialized. -.SH SEE ALSO -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3). diff --git a/hpl/man/man3/HPL_pddriver.3 b/hpl/man/man3/HPL_pddriver.3 deleted file mode 100644 index ae10f26a0da8f405b08919c85d9c9c6edaf0c428..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pddriver.3 +++ /dev/null @@ -1,15 +0,0 @@ -.TH main 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -main \- HPL main timing program. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&main();\fR -.SH DESCRIPTION -\fB\&main\fR -is the main driver program for testing the HPL routines. -This program is driven by a short data file named "HPL.dat". -.SH SEE ALSO -.BR HPL_pdinfo \ (3), -.BR HPL_pdtest \ (3). diff --git a/hpl/man/man3/HPL_pdfact.3 b/hpl/man/man3/HPL_pdfact.3 deleted file mode 100644 index ed969f5d4befa297e0614d736053abee3cdcf147..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdfact.3 +++ /dev/null @@ -1,64 +0,0 @@ -.TH HPL_pdfact 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdfact \- recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdfact(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdfact\fR -recursively factorizes a 1-dimensional panel of columns. -The RPFACT function pointer specifies the recursive algorithm to be -used, either Crout, Left- or Right looking. NBMIN allows to vary the -recursive stopping criterium in terms of the number of columns in the -panel, and NDIV allow to specify the number of subpanels each panel -should be divided into. Usuallly a value of 2 will be chosen. Finally -PFACT is a function pointer specifying the non-recursive algorithm to -to be used on at most NBMIN columns. One can also choose here between -Crout, Left- or Right looking. Empirical tests seem to indicate that -values of 4 or 8 for NBMIN give the best results. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdgesv.3 b/hpl/man/man3/HPL_pdgesv.3 deleted file mode 100644 index 5a73758e5761ad6125c11db011dc9ca0c677072f..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdgesv.3 +++ /dev/null @@ -1,40 +0,0 @@ -.TH HPL_pdgesv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdgesv \- Solve A x = b. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdgesv(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdgesv\fR -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with or without look-ahead. The lower triangular factor is left -unpivoted and the pivots are not returned. The right hand side is the -N+1 column of the coefficient matrix. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.SH SEE ALSO -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdtrsv \ (3). diff --git a/hpl/man/man3/HPL_pdgesv0.3 b/hpl/man/man3/HPL_pdgesv0.3 deleted file mode 100644 index 6cff49dfe3fa12d32ff9725286436c0c0ba9a7a1..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdgesv0.3 +++ /dev/null @@ -1,47 +0,0 @@ -.TH HPL_pdgesv0 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdgesv0 \- Factor an N x N+1 matrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdgesv0(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdgesv0\fR -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -without look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdfact \ (3), -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3), -.BR HPL_pdupdateNN \ (3), -.BR HPL_pdupdateNT \ (3), -.BR HPL_pdupdateTN \ (3), -.BR HPL_pdupdateTT \ (3). diff --git a/hpl/man/man3/HPL_pdgesvK1.3 b/hpl/man/man3/HPL_pdgesvK1.3 deleted file mode 100644 index 8bd8545eb1e3d1e5f2e93d4fc703e94c3bac4b35..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdgesvK1.3 +++ /dev/null @@ -1,46 +0,0 @@ -.TH HPL_pdgesvK1 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdgesvK1 \- Factor an N x N+1 matrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdgesvK1(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdgesvK1\fR -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdfact \ (3), -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3), -.BR HPL_pdupdateNN \ (3), -.BR HPL_pdupdateNT \ (3), -.BR HPL_pdupdateTN \ (3), -.BR HPL_pdupdateTT \ (3). diff --git a/hpl/man/man3/HPL_pdgesvK2.3 b/hpl/man/man3/HPL_pdgesvK2.3 deleted file mode 100644 index 2184436070c8c70706f436f9275110b98df51130..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdgesvK2.3 +++ /dev/null @@ -1,47 +0,0 @@ -.TH HPL_pdgesvK2 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdgesvK2 \- Factor an N x N+1 matrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdgesvK2(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdgesvK2\fR -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdfact \ (3), -.BR HPL_binit \ (3), -.BR HPL_bcast \ (3), -.BR HPL_bwait \ (3), -.BR HPL_pdupdateNN \ (3), -.BR HPL_pdupdateNT \ (3), -.BR HPL_pdupdateTN \ (3), -.BR HPL_pdupdateTT \ (3). diff --git a/hpl/man/man3/HPL_pdinfo.3 b/hpl/man/man3/HPL_pdinfo.3 deleted file mode 100644 index 23c98592e42bb9e90b113e0776e878d008f994c4..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdinfo.3 +++ /dev/null @@ -1,212 +0,0 @@ -.TH HPL_pdinfo 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdinfo \- Read input parameter file. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdinfo(\fR -\fB\&HPL_T_test *\fR -\fI\&TEST\fR, -\fB\&int *\fR -\fI\&NS\fR, -\fB\&int *\fR -\fI\&N\fR, -\fB\&int *\fR -\fI\&NBS\fR, -\fB\&int *\fR -\fI\&NB\fR, -\fB\&HPL_T_ORDER *\fR -\fI\&PMAPPIN\fR, -\fB\&int *\fR -\fI\&NPQS\fR, -\fB\&int *\fR -\fI\&P\fR, -\fB\&int *\fR -\fI\&Q\fR, -\fB\&int *\fR -\fI\&NPFS\fR, -\fB\&HPL_T_FACT *\fR -\fI\&PF\fR, -\fB\&int *\fR -\fI\&NBMS\fR, -\fB\&int *\fR -\fI\&NBM\fR, -\fB\&int *\fR -\fI\&NDVS\fR, -\fB\&int *\fR -\fI\&NDV\fR, -\fB\&int *\fR -\fI\&NRFS\fR, -\fB\&HPL_T_FACT *\fR -\fI\&RF\fR, -\fB\&int *\fR -\fI\&NTPS\fR, -\fB\&HPL_T_TOP *\fR -\fI\&TP\fR, -\fB\&int *\fR -\fI\&NDHS\fR, -\fB\&int *\fR -\fI\&DH\fR, -\fB\&HPL_T_SWAP *\fR -\fI\&FSWAP\fR, -\fB\&int *\fR -\fI\&TSWAP\fR, -\fB\&int *\fR -\fI\&L1NOTRAN\fR, -\fB\&int *\fR -\fI\&UNOTRAN\fR, -\fB\&int *\fR -\fI\&EQUIL\fR, -\fB\&int *\fR -\fI\&ALIGN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdinfo\fR -reads the startup information for the various tests and -transmits it to all processes. -.SH ARGUMENTS -.TP 8 -TEST (global output) HPL_T_test * -On entry, TEST points to a testing data structure. On exit, -the fields of this data structure are initialized as follows: -TEST->outfp specifies the output file where the results will -be printed. It is only defined and used by the process 0 of -the grid. TEST->thrsh specifies the threshhold value for the -test ratio. TEST->epsil is the relative machine precision of -the distributed computer. Finally the test counters, kfail, -kpass, kskip, ktest are initialized to zero. -.TP 8 -NS (global output) int * -On exit, NS specifies the number of different problem sizes -to be tested. NS is less than or equal to HPL_MAX_PARAM. -.TP 8 -N (global output) int * -On entry, N is an array of dimension HPL_MAX_PARAM. On exit, -the first NS entries of this array contain the problem sizes -to run the code with. -.TP 8 -NBS (global output) int * -On exit, NBS specifies the number of different distribution -blocking factors to be tested. NBS must be less than or equal -to HPL_MAX_PARAM. -.TP 8 -NB (global output) int * -On exit, PMAPPIN specifies the process mapping onto the no- -des of the MPI machine configuration. PMAPPIN defaults to -row-major ordering. -.TP 8 -PMAPPIN (global output) HPL_T_ORDER * -On entry, NB is an array of dimension HPL_MAX_PARAM. On exit, -the first NBS entries of this array contain the values of the -various distribution blocking factors, to run the code with. -.TP 8 -NPQS (global output) int * -On exit, NPQS specifies the number of different values that -can be used for P and Q, i.e., the number of process grids to -run the code with. NPQS must be less than or equal to -HPL_MAX_PARAM. -.TP 8 -P (global output) int * -On entry, P is an array of dimension HPL_MAX_PARAM. On exit, -the first NPQS entries of this array contain the values of P, -the number of process rows of the NPQS grids to run the code -with. -.TP 8 -Q (global output) int * -On entry, Q is an array of dimension HPL_MAX_PARAM. On exit, -the first NPQS entries of this array contain the values of Q, -the number of process columns of the NPQS grids to run the -code with. -.TP 8 -NPFS (global output) int * -On exit, NPFS specifies the number of different values that -can be used for PF : the panel factorization algorithm to run -the code with. NPFS is less than or equal to HPL_MAX_PARAM. -.TP 8 -PF (global output) HPL_T_FACT * -On entry, PF is an array of dimension HPL_MAX_PARAM. On exit, -the first NPFS entries of this array contain the various -panel factorization algorithms to run the code with. -.TP 8 -NBMS (global output) int * -On exit, NBMS specifies the number of various recursive -stopping criteria to be tested. NBMS must be less than or -equal to HPL_MAX_PARAM. -.TP 8 -NBM (global output) int * -On entry, NBM is an array of dimension HPL_MAX_PARAM. On -exit, the first NBMS entries of this array contain the values -of the various recursive stopping criteria to be tested. -.TP 8 -NDVS (global output) int * -On exit, NDVS specifies the number of various numbers of -panels in recursion to be tested. NDVS is less than or equal -to HPL_MAX_PARAM. -.TP 8 -NDV (global output) int * -On entry, NDV is an array of dimension HPL_MAX_PARAM. On -exit, the first NDVS entries of this array contain the values -of the various numbers of panels in recursion to be tested. -.TP 8 -NRFS (global output) int * -On exit, NRFS specifies the number of different values that -can be used for RF : the recursive factorization algorithm to -be tested. NRFS is less than or equal to HPL_MAX_PARAM. -.TP 8 -RF (global output) HPL_T_FACT * -On entry, RF is an array of dimension HPL_MAX_PARAM. On exit, -the first NRFS entries of this array contain the various -recursive factorization algorithms to run the code with. -.TP 8 -NTPS (global output) int * -On exit, NTPS specifies the number of different values that -can be used for the broadcast topologies to be tested. NTPS -is less than or equal to HPL_MAX_PARAM. -.TP 8 -TP (global output) HPL_T_TOP * -On entry, TP is an array of dimension HPL_MAX_PARAM. On exit, -the first NTPS entries of this array contain the various -broadcast (along rows) topologies to run the code with. -.TP 8 -NDHS (global output) int * -On exit, NDHS specifies the number of different values that -can be used for the lookahead depths to be tested. NDHS is -less than or equal to HPL_MAX_PARAM. -.TP 8 -DH (global output) int * -On entry, DH is an array of dimension HPL_MAX_PARAM. On -exit, the first NDHS entries of this array contain the values -of lookahead depths to run the code with. Such a value is at -least 0 (no-lookahead) or greater than zero. -.TP 8 -FSWAP (global output) HPL_T_SWAP * -On exit, FSWAP specifies the swapping algorithm to be used in -all tests. -.TP 8 -TSWAP (global output) int * -On exit, TSWAP specifies the swapping threshold as a number -of columns when the mixed swapping algorithm was chosen. -.TP 8 -L1NOTRA (global output) int * -On exit, L1NOTRAN specifies whether the upper triangle of the -panels of columns should be stored in no-transposed form -(L1NOTRAN=1) or in transposed form (L1NOTRAN=0). -.TP 8 -UNOTRAN (global output) int * -On exit, UNOTRAN specifies whether the panels of rows should -be stored in no-transposed form (UNOTRAN=1) or transposed -form (UNOTRAN=0) during their broadcast. -.TP 8 -EQUIL (global output) int * -On exit, EQUIL specifies whether equilibration during the -swap-broadcast of the panel of rows should be performed -(EQUIL=1) or not (EQUIL=0). -.TP 8 -ALIGN (global output) int * -On exit, ALIGN specifies the alignment of the dynamically -allocated buffers in double precision words. ALIGN is greater -than zero. -.SH SEE ALSO -.BR HPL_pddriver \ (3), -.BR HPL_pdtest \ (3). diff --git a/hpl/man/man3/HPL_pdlamch.3 b/hpl/man/man3/HPL_pdlamch.3 deleted file mode 100644 index 8329a44443fc187a69d2485ce2df3a31029d9aed..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlamch.3 +++ /dev/null @@ -1,53 +0,0 @@ -.TH HPL_pdlamch 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlamch \- determines machine-specific arithmetic constants. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_pdlamch(\fR -\fB\&MPI_Comm\fR -\fI\&COMM\fR, -\fB\&const HPL_T_MACH\fR -\fI\&CMACH\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlamch\fR -determines machine-specific arithmetic constants such as -the relative machine precision (eps), the safe minimum(sfmin) such that -1/sfmin does not overflow, the base of the machine (base), the precision -(prec), the number of (base) digits in the mantissa (t), whether -rounding occurs in addition (rnd = 1.0 and 0.0 otherwise), the minimum -exponent before (gradual) underflow (emin), the underflow threshold -(rmin)- base**(emin-1), the largest exponent before overflow (emax), the -overflow threshold (rmax) - (base**emax)*(1-eps). -.SH ARGUMENTS -.TP 8 -COMM (global/local input) MPI_Comm -The MPI communicator identifying the process collection. -.TP 8 -CMACH (global input) const HPL_T_MACH -Specifies the value to be returned by HPL_pdlamch - = HPL_MACH_EPS, HPL_pdlamch := eps (default) - = HPL_MACH_SFMIN, HPL_pdlamch := sfmin - = HPL_MACH_BASE, HPL_pdlamch := base - = HPL_MACH_PREC, HPL_pdlamch := eps*base - = HPL_MACH_MLEN, HPL_pdlamch := t - = HPL_MACH_RND, HPL_pdlamch := rnd - = HPL_MACH_EMIN, HPL_pdlamch := emin - = HPL_MACH_RMIN, HPL_pdlamch := rmin - = HPL_MACH_EMAX, HPL_pdlamch := emax - = HPL_MACH_RMAX, HPL_pdlamch := rmax - -where - - eps = relative machine precision, - sfmin = safe minimum, - base = base of the machine, - prec = eps*base, - t = number of digits in the mantissa, - rnd = 1.0 if rounding occurs in addition, - emin = minimum exponent before underflow, - rmin = underflow threshold, - emax = largest exponent before overflow, - rmax = overflow threshold. diff --git a/hpl/man/man3/HPL_pdlange.3 b/hpl/man/man3/HPL_pdlange.3 deleted file mode 100644 index c629defeea398860aa1dff314d9b29fabc77ba5c..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlange.3 +++ /dev/null @@ -1,68 +0,0 @@ -.TH HPL_pdlange 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlange \- Compute ||A||. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_pdlange(\fR -\fB\&const HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&const HPL_T_NORM\fR -\fI\&NORM\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&const double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlange\fR -returns the value of the one norm, or the infinity norm, -or the element of largest absolute value of a distributed matrix A: - - - max(abs(A(i,j))) when NORM = HPL_NORM_A, - norm1(A), when NORM = HPL_NORM_1, - normI(A), when NORM = HPL_NORM_I, - -where norm1 denotes the one norm of a matrix (maximum column sum) and -normI denotes the infinity norm of a matrix (maximum row sum). Note -that max(abs(A(i,j))) is not a matrix norm. -.SH ARGUMENTS -.TP 8 -GRID (local input) const HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -NORM (global input) const HPL_T_NORM -On entry, NORM specifies the value to be returned by this -function as described above. -.TP 8 -M (global input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (global input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -NB (global input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix. NB must be larger than one. -.TP 8 -A (local input) const double * -On entry, A points to an array of dimension (LDA,LocQ(N)), -that contains the local pieces of the distributed matrix A. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,LocP(M)). -.SH SEE ALSO -.BR HPL_pdlaprnt \ (3), -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_pdlaprnt.3 b/hpl/man/man3/HPL_pdlaprnt.3 deleted file mode 100644 index 16894f5d59e4f8c841c318e8a2ea8b90a82da5de..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlaprnt.3 +++ /dev/null @@ -1,72 +0,0 @@ -.TH HPL_pdlaprnt 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlaprnt \- Print a distributed matrix A. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdlaprnt(\fR -\fB\&const HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const int\fR -\fI\&IAROW\fR, -\fB\&const int\fR -\fI\&IACOL\fR, -\fB\&const char *\fR -\fI\&CMATNM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlaprnt\fR -prints to standard error a distributed matrix A. The -local pieces of A are sent to the process of coordinates (0,0) in -the grid and then printed. -.SH ARGUMENTS -.TP 8 -GRID (local input) const HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -M (global input) const int -On entry, M specifies the number of rows of the coefficient -matrix A. M must be at least zero. -.TP 8 -N (global input) const int -On entry, N specifies the number of columns of the -coefficient matrix A. N must be at least zero. -.TP 8 -NB (global input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix. NB must be larger than one. -.TP 8 -A (local input) double * -On entry, A points to an array of dimension (LDA,LocQ(N)). -This array contains the coefficient matrix to be printed. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,LocP(M)). -.TP 8 -IAROW (global input) const int -On entry, IAROW specifies the row process coordinate owning -the first row of A. IAROW must be larger than or equal to -zero and less than NPROW. -.TP 8 -IACOL (global input) const int -On entry, IACOL specifies the column process coordinate -owning the first column of A. IACOL must be larger than or -equal to zero and less than NPCOL. -.TP 8 -CMATNM (global input) const char * -On entry, CMATNM is the name of the matrix to be printed. -.SH SEE ALSO -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_pdlaswp00N.3 b/hpl/man/man3/HPL_pdlaswp00N.3 deleted file mode 100644 index 31ec27c4098f62fd37387d3dbb9c7576e2357048..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlaswp00N.3 +++ /dev/null @@ -1,65 +0,0 @@ -.TH HPL_pdlaswp00N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlaswp00N \- Broadcast a column panel L and swap the row panel U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdlaswp00N(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlaswp00N\fR -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -Bi-directional exchange is used to perform the swap :: broadcast of -the row panel U at once, resulting in a lower number of messages than -usual as well as a lower communication volume. With P process rows and -assuming bi-directional links, the running time of this function can -be approximated by: - - log_2(P) * (lat + NB*LocQ(N) / bdwth) - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. Mono -directional links will double this communication cost. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be broadcast and swapped) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be swapped and broadcast starting at -the current position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdupdateNN \ (3), -.BR HPL_pdupdateTN \ (3), -.BR HPL_pipid \ (3), -.BR HPL_plindx0 \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03N \ (3), -.BR HPL_dlaswp04N \ (3), -.BR HPL_dlaswp05N \ (3). diff --git a/hpl/man/man3/HPL_pdlaswp00T.3 b/hpl/man/man3/HPL_pdlaswp00T.3 deleted file mode 100644 index 96430c572bdcf5b3c4e25baca1c600290f580a4e..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlaswp00T.3 +++ /dev/null @@ -1,65 +0,0 @@ -.TH HPL_pdlaswp00T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlaswp00T \- Broadcast a column panel L and swap the row panel U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdlaswp00T(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlaswp00T\fR -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -Bi-directional exchange is used to perform the swap :: broadcast of -the row panel U at once, resulting in a lower number of messages than -usual as well as a lower communication volume. With P process rows and -assuming bi-directional links, the running time of this function can -be approximated by: - - log_2(P) * (lat + NB*LocQ(N) / bdwth) - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. Mono -directional links will double this communication cost. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be broadcast and swapped) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be swapped and broadcast starting at -the current position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdupdateNT \ (3), -.BR HPL_pdupdateTT \ (3), -.BR HPL_pipid \ (3), -.BR HPL_plindx0 \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp02N \ (3), -.BR HPL_dlaswp03T \ (3), -.BR HPL_dlaswp04T \ (3), -.BR HPL_dlaswp05T \ (3). diff --git a/hpl/man/man3/HPL_pdlaswp01N.3 b/hpl/man/man3/HPL_pdlaswp01N.3 deleted file mode 100644 index 4e8d9d68ea2a17583e15c0530bf42cb87fbaaaed..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlaswp01N.3 +++ /dev/null @@ -1,69 +0,0 @@ -.TH HPL_pdlaswp01N 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlaswp01N \- Broadcast a column panel L and swap the row panel U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdlaswp01N(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlaswp01N\fR -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -A "Spread then roll" algorithm performs the swap :: broadcast of the -row panel U at once, resulting in a minimal communication volume and -a "very good" use of the connectivity if available. With P process -rows and assuming bi-directional links, the running time of this -function can be approximated by: - - (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. K is -a constant in (2,3] that depends on the achieved bandwidth during a -simultaneous message exchange between two processes. An empirical -optimistic value of K is typically 2.4. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be swapped and broadcast starting at -the current position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdupdateNN \ (3), -.BR HPL_pdupdateTN \ (3), -.BR HPL_pipid \ (3), -.BR HPL_plindx1 \ (3), -.BR HPL_plindx10 \ (3), -.BR HPL_spreadN \ (3), -.BR HPL_equil \ (3), -.BR HPL_rollN \ (3), -.BR HPL_dlaswp00N \ (3), -.BR HPL_dlaswp01N \ (3), -.BR HPL_dlaswp06N \ (3). diff --git a/hpl/man/man3/HPL_pdlaswp01T.3 b/hpl/man/man3/HPL_pdlaswp01T.3 deleted file mode 100644 index 92b874ee3d87fac447fc434b1d265634039364ff..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdlaswp01T.3 +++ /dev/null @@ -1,69 +0,0 @@ -.TH HPL_pdlaswp01T 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdlaswp01T \- Broadcast a column panel L and swap the row panel U. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdlaswp01T(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdlaswp01T\fR -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -A "Spread then roll" algorithm performs the swap :: broadcast of the -row panel U at once, resulting in a minimal communication volume and -a "very good" use of the connectivity if available. With P process -rows and assuming bi-directional links, the running time of this -function can be approximated by: - - (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. K is -a constant in (2,3] that depends on the achieved bandwidth during a -simultaneous message exchange between two processes. An empirical -optimistic value of K is typically 2.4. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be swapped and broadcast starting at -the current position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdupdateNT \ (3), -.BR HPL_pdupdateTT \ (3), -.BR HPL_pipid \ (3), -.BR HPL_plindx1 \ (3), -.BR HPL_plindx10 \ (3), -.BR HPL_spreadT \ (3), -.BR HPL_equil \ (3), -.BR HPL_rollT \ (3), -.BR HPL_dlaswp10N \ (3), -.BR HPL_dlaswp01T \ (3), -.BR HPL_dlaswp06T \ (3). diff --git a/hpl/man/man3/HPL_pdmatgen.3 b/hpl/man/man3/HPL_pdmatgen.3 deleted file mode 100644 index add74f94642c037262d4228a3d4f50e78548c2e8..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdmatgen.3 +++ /dev/null @@ -1,67 +0,0 @@ -.TH HPL_pdmatgen 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdmatgen \- Parallel random matrix generator. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdmatgen(\fR -\fB\&const HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&NB\fR, -\fB\&double *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&LDA\fR, -\fB\&const int\fR -\fI\&ISEED\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdmatgen\fR -generates (or regenerates) a parallel random matrix A. - -The pseudo-random generator uses the linear congruential algorithm: -X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer -Programming, Knuth 1973, Vol. 2. -.SH ARGUMENTS -.TP 8 -GRID (local input) const HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -M (global input) const int -On entry, M specifies the number of rows of the matrix A. -M must be at least zero. -.TP 8 -N (global input) const int -On entry, N specifies the number of columns of the matrix A. -N must be at least zero. -.TP 8 -NB (global input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.TP 8 -A (local output) double * -On entry, A points to an array of dimension (LDA,LocQ(N)). -On exit, this array contains the coefficients of the randomly -generated matrix. -.TP 8 -LDA (local input) const int -On entry, LDA specifies the leading dimension of the array A. -LDA must be at least max(1,LocP(M)). -.TP 8 -ISEED (global input) const int -On entry, ISEED specifies the seed number to generate the -matrix A. ISEED must be at least zero. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_drand \ (3). diff --git a/hpl/man/man3/HPL_pdmxswp.3 b/hpl/man/man3/HPL_pdmxswp.3 deleted file mode 100644 index cdbacf87a1353da8e8adb379b06e59842237b4b1..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdmxswp.3 +++ /dev/null @@ -1,78 +0,0 @@ -.TH HPL_pdmxswp 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdmxswp \- swaps and broacast the pivot row. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdmxswp(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&II\fR, -\fB\&const int\fR -\fI\&JJ\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdmxswp\fR -swaps and broadcasts the absolute value max row using -bi-directional exchange. The buffer is partially set by HPL_dlocmax. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by - - log_2( P ) * ( lat + ( 2 * N0 + 4 ) / bdwth ) - -where lat and bdwth are the latency and bandwidth of the network for -double precision real elements. Communication only occurs in one -process column. Mono-directional links will cause the communication -cost to double. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of the matrix -column on which this function operates. -.TP 8 -II (local input) const int -On entry, II specifies the row offset where the column to be -operated on starts with respect to the panel. -.TP 8 -JJ (local input) const int -On entry, JJ specifies the column offset where the column to -be operated on starts with respect to the panel. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2 * (4+2*N0). -It is assumed that HPL_dlocmax was called prior to this -routine to initialize the first four entries of this array. -On exit, the N0 length max row is stored in WORK[4:4+N0-1]; -Note that this is also the JJth row (or column) of L1. The -remaining part is used as a temporary array. -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdpancrN.3 b/hpl/man/man3/HPL_pdpancrN.3 deleted file mode 100644 index 742c740bf36057fdb9916f34219c7ffb7c83adfa..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpancrN.3 +++ /dev/null @@ -1,82 +0,0 @@ -.TH HPL_pdpancrN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpancrN \- Crout panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpancrN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpancrN\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Crout variant of the usual -one-dimensional algorithm. The lower triangular N0-by-N0 upper block -of the panel is stored in no-transpose form (i.e. just like the input -matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdpancrT.3 b/hpl/man/man3/HPL_pdpancrT.3 deleted file mode 100644 index ce3a3971cea097bb4beff1ff7c66e37146ca5858..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpancrT.3 +++ /dev/null @@ -1,81 +0,0 @@ -.TH HPL_pdpancrT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpancrT \- Crout panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpancrT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpancrT\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Crout variant of the usual -one-dimensional algorithm. The lower triangular N0-by-N0 upper block -of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdpanel_disp.3 b/hpl/man/man3/HPL_pdpanel_disp.3 deleted file mode 100644 index 56669bc1521bf926875817b1b7ebe7d199389b3e..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanel_disp.3 +++ /dev/null @@ -1,24 +0,0 @@ -.TH HPL_pdpanel_disp 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanel_disp \- Deallocate a panel data structure. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_pdpanel_disp(\fR -\fB\&HPL_T_panel * *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanel_disp\fR -deallocates the panel structure and resources and -stores the error code returned by the panel factorization. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * * -On entry, PANEL points to the address of the panel data -structure to be deallocated. -.SH SEE ALSO -.BR HPL_pdpanel_new \ (3), -.BR HPL_pdpanel_init \ (3), -.BR HPL_pdpanel_free \ (3). diff --git a/hpl/man/man3/HPL_pdpanel_free.3 b/hpl/man/man3/HPL_pdpanel_free.3 deleted file mode 100644 index f43ddc8421d39aa852cfa0988a77de2483296338..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanel_free.3 +++ /dev/null @@ -1,24 +0,0 @@ -.TH HPL_pdpanel_free 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanel_free \- Deallocate the panel ressources. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_pdpanel_free(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanel_free\fR -deallocates the panel resources and stores the error -code returned by the panel factorization. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the panel data structure from -which the resources should be deallocated. -.SH SEE ALSO -.BR HPL_pdpanel_new \ (3), -.BR HPL_pdpanel_init \ (3), -.BR HPL_pdpanel_disp \ (3). diff --git a/hpl/man/man3/HPL_pdpanel_init.3 b/hpl/man/man3/HPL_pdpanel_init.3 deleted file mode 100644 index 148cb4f88bbf99ab3fbda33711fbf657a1d1769d..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanel_init.3 +++ /dev/null @@ -1,76 +0,0 @@ -.TH HPL_pdpanel_init 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanel_init \- Initialize the panel resources. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanel_init(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&JB\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&IA\fR, -\fB\&const int\fR -\fI\&JA\fR, -\fB\&const int\fR -\fI\&TAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanel_init\fR -initializes a panel data structure. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -M (local input) const int -On entry, M specifies the global number of rows of the panel. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the global number of columns of the -panel and trailing submatrix. N must be at least zero. -.TP 8 -JB (global input) const int -On entry, JB specifies is the number of columns of the panel. -JB must be at least zero. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.TP 8 -IA (global input) const int -On entry, IA is the global row index identifying the panel -and trailing submatrix. IA must be at least zero. -.TP 8 -JA (global input) const int -On entry, JA is the global column index identifying the panel -and trailing submatrix. JA must be at least zero. -.TP 8 -TAG (global input) const int -On entry, TAG is the row broadcast message id. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.SH SEE ALSO -.BR HPL_pdpanel_new \ (3), -.BR HPL_pdpanel_disp \ (3), -.BR HPL_pdpanel_free \ (3). diff --git a/hpl/man/man3/HPL_pdpanel_new.3 b/hpl/man/man3/HPL_pdpanel_new.3 deleted file mode 100644 index c0ec78fb3dc705a3c320293c04911ae48b524a95..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanel_new.3 +++ /dev/null @@ -1,76 +0,0 @@ -.TH HPL_pdpanel_new 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanel_new \- Create a panel data structure. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanel_new(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&JB\fR, -\fB\&HPL_T_pmat *\fR -\fI\&A\fR, -\fB\&const int\fR -\fI\&IA\fR, -\fB\&const int\fR -\fI\&JA\fR, -\fB\&const int\fR -\fI\&TAG\fR, -\fB\&HPL_T_panel * *\fR -\fI\&PANEL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanel_new\fR -creates and initializes a panel data structure. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters. -.TP 8 -M (local input) const int -On entry, M specifies the global number of rows of the panel. -M must be at least zero. -.TP 8 -N (local input) const int -On entry, N specifies the global number of columns of the -panel and trailing submatrix. N must be at least zero. -.TP 8 -JB (global input) const int -On entry, JB specifies is the number of columns of the panel. -JB must be at least zero. -.TP 8 -A (local input/output) HPL_T_pmat * -On entry, A points to the data structure containing the local -array information. -.TP 8 -IA (global input) const int -On entry, IA is the global row index identifying the panel -and trailing submatrix. IA must be at least zero. -.TP 8 -JA (global input) const int -On entry, JA is the global column index identifying the panel -and trailing submatrix. JA must be at least zero. -.TP 8 -TAG (global input) const int -On entry, TAG is the row broadcast message id. -.TP 8 -PANEL (local input/output) HPL_T_panel * * -On entry, PANEL points to the address of the panel data -structure to create and initialize. -.SH SEE ALSO -.BR HPL_pdpanel_new \ (3), -.BR HPL_pdpanel_init \ (3), -.BR HPL_pdpanel_disp \ (3). diff --git a/hpl/man/man3/HPL_pdpanllN.3 b/hpl/man/man3/HPL_pdpanllN.3 deleted file mode 100644 index 36ec24d8a109c29ccc4df8da3f429a18300e59f4..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanllN.3 +++ /dev/null @@ -1,82 +0,0 @@ -.TH HPL_pdpanllN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanllN \- Left-looking panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanllN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanllN\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Left-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in no-transpose form (i.e. just like the -input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdpanllT.3 b/hpl/man/man3/HPL_pdpanllT.3 deleted file mode 100644 index da5efd4238f10c2bbd0fb453bedea8d859b1244f..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanllT.3 +++ /dev/null @@ -1,81 +0,0 @@ -.TH HPL_pdpanllT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanllT \- Left-looking panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanllT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanllT\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Left-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdpanrlN.3 b/hpl/man/man3/HPL_pdpanrlN.3 deleted file mode 100644 index 99a872f8a14a79b5f82f9b1b00144c2f82e9b026..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanrlN.3 +++ /dev/null @@ -1,82 +0,0 @@ -.TH HPL_pdpanrlN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanrlN \- Right-looking panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanrlN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanrlN\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Right-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in no-transpose form (i.e. just like the -input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlT \ (3). diff --git a/hpl/man/man3/HPL_pdpanrlT.3 b/hpl/man/man3/HPL_pdpanrlT.3 deleted file mode 100644 index f4d7ef0600aae87b40f931fd0c65b2ac66d54516..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdpanrlT.3 +++ /dev/null @@ -1,81 +0,0 @@ -.TH HPL_pdpanrlT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdpanrlT \- Right-looking panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdpanrlT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdpanrlT\fR -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Right-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3). diff --git a/hpl/man/man3/HPL_pdrpancrN.3 b/hpl/man/man3/HPL_pdrpancrN.3 deleted file mode 100644 index 69f4b0fc4c833e8f88fc9dae23bc7806a8013d2e..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpancrN.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpancrN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpancrN \- Crout recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpancrN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpancrN\fR -HPL_pdrpancrN recursively factorizes a panel of columns using the -recursive Crout variant of the usual one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdrpancrT.3 b/hpl/man/man3/HPL_pdrpancrT.3 deleted file mode 100644 index 5db5cac272072208d8a1e80f7415ff6a0d506f3a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpancrT.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpancrT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpancrT \- Crout recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpancrT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpancrT\fR -recursively factorizes a panel of columns using the -recursive Crout variant of the usual one-dimensional algorithm. -The lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdrpanllN.3 b/hpl/man/man3/HPL_pdrpanllN.3 deleted file mode 100644 index 1d60528e7bf2fd1e1271b05b6f0ca555b071cf7b..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpanllN.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpanllN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpanllN \- Left-looking recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpanllN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpanllN\fR -recursively factorizes a panel of columns using the -recursive Left-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdrpanllT.3 b/hpl/man/man3/HPL_pdrpanllT.3 deleted file mode 100644 index 763a645a2390cc0a91816bf3618df1d490956ebe..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpanllT.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpanllT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpanllT \- Left-looking recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpanllT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpanllT\fR -recursively factorizes a panel of columns using the -recursive Left-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdrpanrlN.3 b/hpl/man/man3/HPL_pdrpanrlN.3 deleted file mode 100644 index e22f2ce4742c4de08df3228e16abedc25cfd3e09..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpanrlN.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpanrlN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpanrlN \- Right-looking recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpanrlN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpanrlN\fR -recursively factorizes a panel of columns using the -recursive Right-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlT \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdrpanrlT.3 b/hpl/man/man3/HPL_pdrpanrlT.3 deleted file mode 100644 index 7fa94ede022d9e6d546a287f91ec54bfa1cff440..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdrpanrlT.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pdrpanrlT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdrpanrlT \- Right-looking recursive panel factorization. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdrpanrlT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&M\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&ICOFF\fR, -\fB\&double *\fR -\fI\&WORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdrpanrlT\fR -recursively factorizes a panel of columns using the -recursive Right-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -M (local input) const int -On entry, M specifies the local number of rows of sub(A). -.TP 8 -N (local input) const int -On entry, N specifies the local number of columns of sub(A). -.TP 8 -ICOFF (global input) const int -On entry, ICOFF specifies the row and column offset of sub(A) -in A. -.TP 8 -WORK (local workspace) double * -On entry, WORK is a workarray of size at least 2*(4+2*N0). -.SH SEE ALSO -.BR HPL_dlocmax \ (3), -.BR HPL_dlocswpN \ (3), -.BR HPL_dlocswpT \ (3), -.BR HPL_pdmxswp \ (3), -.BR HPL_pdpancrN \ (3), -.BR HPL_pdpancrT \ (3), -.BR HPL_pdpanllN \ (3), -.BR HPL_pdpanllT \ (3), -.BR HPL_pdpanrlN \ (3), -.BR HPL_pdpanrlT \ (3), -.BR HPL_pdrpancrN \ (3), -.BR HPL_pdrpancrT \ (3), -.BR HPL_pdrpanllN \ (3), -.BR HPL_pdrpanllT \ (3), -.BR HPL_pdrpanrlN \ (3), -.BR HPL_pdfact \ (3). diff --git a/hpl/man/man3/HPL_pdtest.3 b/hpl/man/man3/HPL_pdtest.3 deleted file mode 100644 index 13f216952a7f8e5ec5ba1047ec9456e871457d83..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdtest.3 +++ /dev/null @@ -1,63 +0,0 @@ -.TH HPL_pdtest 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdtest \- Perform one test. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdtest(\fR -\fB\&HPL_T_test *\fR -\fI\&TEST\fR, -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_palg *\fR -\fI\&ALGO\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&const int\fR -\fI\&NB\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdtest\fR -performs one test given a set of parameters such as the -process grid, the problem size, the distribution blocking factor ... -This function generates the data, calls and times the linear system -solver, checks the accuracy of the obtained vector solution and -writes this information to the file pointed to by TEST->outfp. -.SH ARGUMENTS -.TP 8 -TEST (global input) HPL_T_test * -On entry, TEST points to a testing data structure: outfp -specifies the output file where the results will be printed. -It is only defined and used by the process 0 of the grid. -thrsh specifies the threshhold value for the test ratio. -Concretely, a test is declared "PASSED" if and only if the -following inequality is satisfied: -||Ax-b||_oo / ( epsil * - ( || x ||_oo * || A ||_oo + || b ||_oo ) * - N ) < thrsh. -epsil is the relative machine precision of the distributed -computer. Finally the test counters, kfail, kpass, kskip and -ktest are updated as follows: if the test passes, kpass is -incremented by one; if the test fails, kfail is incremented -by one; if the test is skipped, kskip is incremented by one. -ktest is left unchanged. -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -ALGO (global input) HPL_T_palg * -On entry, ALGO points to the data structure containing the -algorithmic parameters to be used for this test. -.TP 8 -N (global input) const int -On entry, N specifies the order of the coefficient matrix A. -N must be at least zero. -.TP 8 -NB (global input) const int -On entry, NB specifies the blocking factor used to partition -and distribute the matrix A. NB must be larger than one. -.SH SEE ALSO -.BR HPL_pddriver \ (3), -.BR HPL_pdinfo \ (3). diff --git a/hpl/man/man3/HPL_pdtrsv.3 b/hpl/man/man3/HPL_pdtrsv.3 deleted file mode 100644 index 8e324e5c6d3fcc780031129693c5ba23ca8bdaee..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdtrsv.3 +++ /dev/null @@ -1,49 +0,0 @@ -.TH HPL_pdtrsv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdtrsv \- Solve triu( A ) x = b. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdtrsv(\fR -\fB\&HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&HPL_T_pmat *\fR -\fI\&AMAT\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdtrsv\fR -solves an upper triangular system of linear equations. - -The rhs is the last column of the N by N+1 matrix A. The solve starts -in the process column owning the Nth column of A, so the rhs b may -need to be moved one process column to the left at the beginning. The -routine therefore needs a column vector in every process column but -the one owning b. The result is replicated in all process rows, and -returned in XR, i.e. XR is of size nq = LOCq( N ) in all processes. - -The algorithm uses decreasing one-ring broadcast in process rows and -columns implemented in terms of synchronous communication point to -point primitives. The lookahead of depth 1 is used to minimize the -critical path. This entire operation is essentially ``latency'' bound -and an estimate of its running time is given by: - - (move rhs) lat + N / ( P bdwth ) + - (solve) ((N / NB)-1) 2 (lat + NB / bdwth) + - gam2 N^2 / ( P Q ), - -where gam2 is an estimate of the Level 2 BLAS rate of execution. -There are N / NB diagonal blocks. One must exchange 2 messages of -length NB to compute the next NB entries of the vector solution, as -well as performing a total of N^2 floating point operations. -.SH ARGUMENTS -.TP 8 -GRID (local input) HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -AMAT (local input/output) HPL_T_pmat * -On entry, AMAT points to the data structure containing the -local array information. -.SH SEE ALSO -.BR HPL_pdgesv \ (3). diff --git a/hpl/man/man3/HPL_pdupdateNN.3 b/hpl/man/man3/HPL_pdupdateNN.3 deleted file mode 100644 index 855addb2bb4c972db06fd024db2276c6d6bf6004..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdupdateNN.3 +++ /dev/null @@ -1,48 +0,0 @@ -.TH HPL_pdupdateNN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdupdateNN \- Broadcast a panel and update the trailing submatrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdupdateNN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdupdateNN\fR -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local output) int * -On exit, IFLAG indicates whether or not the broadcast has -been completed when PBCST is not NULL on entry. In that case, -IFLAG is left unchanged. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be updated) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be updated starting at the current -position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp01N \ (3). diff --git a/hpl/man/man3/HPL_pdupdateNT.3 b/hpl/man/man3/HPL_pdupdateNT.3 deleted file mode 100644 index 4622021cd14dc105e9376cf2e86b24eb5f7af7fe..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdupdateNT.3 +++ /dev/null @@ -1,48 +0,0 @@ -.TH HPL_pdupdateNT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdupdateNT \- Broadcast a panel and update the trailing submatrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdupdateNT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdupdateNT\fR -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local output) int * -On exit, IFLAG indicates whether or not the broadcast has -been completed when PBCST is not NULL on entry. In that case, -IFLAG is left unchanged. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be updated) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be updated starting at the current -position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_pdupdateTN.3 b/hpl/man/man3/HPL_pdupdateTN.3 deleted file mode 100644 index 457c37500a0e751c3b847443ab5c9a83c8dc0760..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdupdateTN.3 +++ /dev/null @@ -1,48 +0,0 @@ -.TH HPL_pdupdateTN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdupdateTN \- Broadcast a panel and update the trailing submatrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdupdateTN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdupdateTN\fR -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local output) int * -On exit, IFLAG indicates whether or not the broadcast has -been completed when PBCST is not NULL on entry. In that case, -IFLAG is left unchanged. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be updated) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be updated starting at the current -position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp01N \ (3). diff --git a/hpl/man/man3/HPL_pdupdateTT.3 b/hpl/man/man3/HPL_pdupdateTT.3 deleted file mode 100644 index 37a76e7a76fbb59fc46b9af53c1c8e7feafce64c..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pdupdateTT.3 +++ /dev/null @@ -1,48 +0,0 @@ -.TH HPL_pdupdateTT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pdupdateTT \- Broadcast a panel and update the trailing submatrix. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pdupdateTT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&NN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pdupdateTT\fR -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local output) int * -On exit, IFLAG indicates whether or not the broadcast has -been completed when PBCST is not NULL on entry. In that case, -IFLAG is left unchanged. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be updated) information. -.TP 8 -NN (local input) const int -On entry, NN specifies the local number of columns of the -trailing submatrix to be updated starting at the current -position. NN must be at least zero. -.SH SEE ALSO -.BR HPL_pdgesv \ (3), -.BR HPL_pdgesv0 \ (3), -.BR HPL_pdgesvK1 \ (3), -.BR HPL_pdgesvK2 \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_perm.3 b/hpl/man/man3/HPL_perm.3 deleted file mode 100644 index 82542a9940e4b7410b58e2b89b0c799824e37515..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_perm.3 +++ /dev/null @@ -1,50 +0,0 @@ -.TH HPL_perm 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_perm \- Combine 2 index arrays - Generate the permutation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_perm(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&int *\fR -\fI\&LINDXA\fR, -\fB\&int *\fR -\fI\&LINDXAU\fR, -\fB\&int *\fR -\fI\&IWORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_perm\fR -combines two index arrays and generate the corresponding -permutation. First, this function computes the inverse of LINDXA, and -then combine it with LINDXAU. Second, in order to be able to perform -the permutation in place, LINDXAU is overwritten by the sequence of -permutation producing the same result. What we ultimately want to -achieve is: U[LINDXAU[i]] := U[LINDXA[i]] for i in [0..N). After the -call to this function, this in place permutation can be performed by -for i in [0..N) swap U[i] with U[LINDXAU[i]]. -.SH ARGUMENTS -.TP 8 -N (global input) const int -On entry, N specifies the length of the arrays LINDXA and -LINDXAU. N should be at least zero. -.TP 8 -LINDXA (global input/output) int * -On entry, LINDXA is an array of dimension N containing the -source indexes. On exit, LINDXA contains the combined index -array. -.TP 8 -LINDXAU (global input/output) int * -On entry, LINDXAU is an array of dimension N containing the -target indexes. On exit, LINDXAU contains the sequence of -permutation, that should be applied in increasing order to -permute the underlying array U in place. -.TP 8 -IWORK (workspace) int * -On entry, IWORK is a workarray of dimension N. -.SH SEE ALSO -.BR HPL_plindx1 \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_pipid.3 b/hpl/man/man3/HPL_pipid.3 deleted file mode 100644 index 63384d918c882906e1e658c33c7e80a0980079dc..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pipid.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH HPL_pipid 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pipid \- Simplify the pivot vector. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pipid(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&int *\fR -\fI\&K\fR, -\fB\&int *\fR -\fI\&IPID\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pipid\fR -computes an array IPID that contains the source and final -destination of matrix rows resulting from the application of N -interchanges as computed by the LU factorization with row partial -pivoting. The array IPID is such that the row of global index IPID(i) -should be mapped onto the row of global index IPID(i+1). Note that we -cannot really know the length of IPID a priori. However, we know that -this array is at least 2*N long, since there are N rows to swap and -broadcast. The length of this array must be smaller than or equal to -4*N, since every row is swapped with at most a single distinct remote -row. The algorithm constructing IPID goes as follows: Let IA be the -global index of the first row to be swapped. - -For every row src IA + i with i in [0..N) to be swapped with row dst -such that dst is given by DPIV[i]: - -Is row src the destination of a previous row of the current block, -that is, is there k odd such that IPID(k) is equal to src ? - Yes: update this destination with dst. For example, if the -pivot array is (0,2)(1,1)(2,5) ... , then when we swap rows 2 and 5, -we swap in fact row 0 and 5, i.e., row 0 goes to 5 and not 2 as it -was thought so far ... - No : add the pair (src,dst) at the end of IPID; row src has not -been moved yet. - -Is row dst different from src the destination of a previous row of -the current block, i.e., is there k odd such that IPID(k) is equal to -dst ? - Yes: update IPID(k) with src. For example, if the pivot array -is (0,5)(1,1)(2,5) ... , then when we swap rows 2 and 5, we swap in -fact row 2 and 0, i.e., row 0 goes to 2 and not 5 as it was thought -so far ... - No : add the pair (dst,src) at the end of IPID; row dst has not -been moved yet. - -Note that when src is equal to dst, the pair (dst,src) should not be -added to IPID in order to avoid duplicated entries in this array. -During the construction of the array IPID, we make sure that the -first N entries are such that IPID(k) with k odd is equal to IA+k/2. -For k in [0..K/2), the row of global index IPID(2*k) should be -mapped onto the row of global index IPID(2*k+1). -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -K (global output) int * -On exit, K specifies the number of entries in IPID. K is at -least 2*N, and at most 4*N. -.TP 8 -IPID (global output) int * -On entry, IPID is an array of length 4*N. On exit, the first -K entries of that array contain the src and final destination -resulting from the application of the N interchanges as -specified by DPIV. The pairs (src,dst) are contiguously -stored and sorted so that IPID(2*i+1) is equal to IA+i with i -in [0..N) -.SH SEE ALSO -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_plindx0.3 b/hpl/man/man3/HPL_plindx0.3 deleted file mode 100644 index 997f499b4223736d05cd9f104660ae521ad38817..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_plindx0.3 +++ /dev/null @@ -1,168 +0,0 @@ -.TH HPL_plindx0 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_plindx0 \- Compute local swapping index arrays. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_plindx0(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&K\fR, -\fB\&int *\fR -\fI\&IPID\fR, -\fB\&int *\fR -\fI\&LINDXA\fR, -\fB\&int *\fR -\fI\&LINDXAU\fR, -\fB\&int *\fR -\fI\&LLEN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_plindx0\fR -computes two local arrays LINDXA and LINDXAU containing -the local source and final destination position resulting from the -application of row interchanges. - -On entry, the array IPID of length K is such that the row of global -index IPID(i) should be mapped onto row of global index IPID(i+1). -Let IA be the global index of the first row to be swapped. For k in -[0..K/2), the row of global index IPID(2*k) should be mapped onto the -row of global index IPID(2*k+1). The question then, is to determine -which rows should ultimately be part of U. - -First, some rows of the process ICURROW may be swapped locally. One -of this row belongs to U, the other one belongs to my local piece of -A. The other rows of the current block are swapped with remote rows -and are thus not part of U. These rows however should be sent along, -and grabbed by the other processes as we progress in the exchange -phase. - -So, assume that I am ICURROW and consider a row of index IPID(2*i) -that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less -than N, this row is locally swapped and should be copied into U at -the position IPID(2*i+1) - IA. No row will be exchanged for this one. -If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be -locally copied into my local piece of A at the position corresponding -to the row of global index IPID(2*i+1). - -If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) -is to be swapped away and strictly speaking does not belong to U, but -to A remotely. Since this process will however send this array U, -this row is copied into U, exactly where the row IPID(2*i+1) should -go. For this, we search IPID for k1, such that IPID(2*k1) is equal to -IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position -IPID(2*k1+1)-IA. - -It is thus important to put the rows that go into U, i.e., such that -IPID(2*i+1) - IA is less than N at the begining of the array IPID. By -doing so, U is formed, and the local copy is performed in just one -sweep. - -Two lists LINDXA and LINDXAU are built. LINDXA contains the local -index of the rows I have that should be copied. LINDXAU contains the -local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A -is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) -of A should be locally copied into A(-LINDXAU(k),:). In the process -ICURROW, the initial packing algorithm proceeds as follows. - - for all entries in IPID, - if IPID(2*i) is in ICURROW, - if IPID(2*i+1) is in ICURROW, - if( IPID(2*i+1) - IA < N ) - save corresponding local position - of this row (LINDXA); - save local position (LINDXAU) in U - where this row goes; - [copy row IPID(2*i) in U at position - IPID(2*i+1)-IA; ]; - else - save corresponding local position of - this row (LINDXA); - save local position (-LINDXAU) in A - where this row goes; - [copy row IPID(2*i) in my piece of A - at IPID(2*i+1);] - end if - else - find k1 such that IPID(2*k1) = IPID(2*i+1); - copy row IPID(2*i) in U at position - IPID(2*k1+1)-IA; - save corresponding local position of this - row (LINDXA); - save local position (LINDXAU) in U where - this row goes; - end if - end if - end for - -Second, if I am not the current row process ICURROW, all source rows -in IPID that I own are part of U. Indeed, they are swapped with one -row of the current block of rows, and the main factorization -algorithm proceeds one row after each other. The processes different -from ICURROW, should exchange and accumulate those rows until they -receive some data previously owned by the process ICURROW. - -In processes different from ICURROW, the initial packing algorithm -proceeds as follows. Consider a row of global index IPID(2*i) that I -own. When I will be receiving data previously owned by ICURROW, i.e., -U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, -and this particular row of U should be first copied into my piece of -A, at A(il,:), where il is the local row index corresponding to -IPID(2*i). Now,initially, this row will be packed into workspace, say -as the kth row of that work array. The following algorithm sets -LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row -should be copied. LINDXA(k) stores the local index in A where this -row of U should be copied, i.e il. - - for all entries in IPID, - if IPID(2*i) is not in ICURROW, - copy row IPID(2*i) in work array; - save corresponding local position - of this row (LINDXA); - save position (LINDXAU) in U where - this row should be copied; - end if - end for - -Since we are at it, we also globally figure out how many rows every -process has. That is necessary, because it would rather be cumbersome -to figure it on the fly during the bi-directional exchange phase. -This information is kept in the array LLEN of size NPROW. Also note -that the arrays LINDXA and LINDXAU are of max length equal to 2*N. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -K (global input) const int -On entry, K specifies the number of entries in IPID. K is at -least 2*N, and at most 4*N. -.TP 8 -IPID (global input) int * -On entry, IPID is an array of length K. The first K entries -of that array contain the src and final destination resulting -from the application of the interchanges. -.TP 8 -LINDXA (local output) int * -On entry, LINDXA is an array of dimension 2*N. On exit, this -array contains the local indexes of the rows of A I have that -should be copied into U. -.TP 8 -LINDXAU (local output) int * -On exit, LINDXAU is an array of dimension 2*N. On exit, this -array contains the local destination information encoded as -follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be -copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) -of A should be locally copied into A(-LINDXAU(k),:). -.TP 8 -LLEN (global output) int * -On entry, LLEN is an array of length NPROW. On exit, it -contains how many rows every process has. -.SH SEE ALSO -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_plindx1.3 b/hpl/man/man3/HPL_plindx1.3 deleted file mode 100644 index 324f186440f714f2c5f30ef69eec963f3010e57c..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_plindx1.3 +++ /dev/null @@ -1,106 +0,0 @@ -.TH HPL_plindx1 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_plindx1 \- Compute local swapping index arrays. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_plindx1(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&K\fR, -\fB\&const int *\fR -\fI\&IPID\fR, -\fB\&int *\fR -\fI\&IPA\fR, -\fB\&int *\fR -\fI\&LINDXA\fR, -\fB\&int *\fR -\fI\&LINDXAU\fR, -\fB\&int *\fR -\fI\&IPLEN\fR, -\fB\&int *\fR -\fI\&IPMAP\fR, -\fB\&int *\fR -\fI\&IPMAPM1\fR, -\fB\&int *\fR -\fI\&PERMU\fR, -\fB\&int *\fR -\fI\&IWORK\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_plindx1\fR -computes two local arrays LINDXA and LINDXAU containing -the local source and final destination position resulting from the -application of row interchanges. In addition, this function computes -three arrays IPLEN, IPMAP and IPMAPM1 that contain the logarithmic -mapping information for the spreading phase. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -K (global input) const int -On entry, K specifies the number of entries in IPID. K is at -least 2*N, and at most 4*N. -.TP 8 -IPID (global input) const int * -On entry, IPID is an array of length K. The first K entries -of that array contain the src and final destination resulting -from the application of the interchanges. -.TP 8 -IPA (global output) int * -On exit, IPA specifies the number of rows that the current -process row has that either belong to U or should be swapped -with remote rows of A. -.TP 8 -LINDXA (global output) int * -On entry, LINDXA is an array of dimension 2*N. On exit, this -array contains the local indexes of the rows of A I have that -should be copied into U. -.TP 8 -LINDXAU (global output) int * -On exit, LINDXAU is an array of dimension 2*N. On exit, this -array contains the local destination information encoded as -follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be -copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) -of A should be locally copied into A(-LINDXAU(k),:). -.TP 8 -IPLEN (global output) int * -On entry, IPLEN is an array of dimension NPROW + 1. On exit, -this array is such that IPLEN[i] is the number of rows of A -in the processes before process IPMAP[i] after the sort -with the convention that IPLEN[nprow] is the total number of -rows of the panel. In other words IPLEN[i+1]-IPLEN[i] is the -local number of rows of A that should be moved to the process -IPMAP[i]. IPLEN is such that the number of rows of the source -process row can be computed as IPLEN[1] - IPLEN[0], and the -remaining entries of this array are sorted so that the -quantities IPLEN[i+1] - IPLEN[i] are logarithmically sorted. -.TP 8 -IPMAP (global output) int * -On entry, IPMAP is an array of dimension NPROW. On exit, this -array contains the logarithmic mapping of the processes. In -other words, IPMAP[myrow] is the corresponding sorted process -coordinate. -.TP 8 -IPMAPM1 (global output) int * -On entry, IPMAPM1 is an array of dimension NPROW. On exit, -this array contains the inverse of the logarithmic mapping -contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in -[0.. NPROCS) -.TP 8 -PERMU (global output) int * -On entry, PERMU is an array of dimension JB. On exit, PERMU -contains a sequence of permutations, that should be applied -in increasing order to permute in place the row panel U. -.TP 8 -IWORK (workspace) int * -On entry, IWORK is a workarray of dimension 2*JB. -.SH SEE ALSO -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_plindx10.3 b/hpl/man/man3/HPL_plindx10.3 deleted file mode 100644 index b81e189ab3fe32eb83c8acbbcfcea8d2978151d3..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_plindx10.3 +++ /dev/null @@ -1,68 +0,0 @@ -.TH HPL_plindx10 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_plindx10 \- Compute the logarithmic maps for the spreading. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_plindx10(\fR -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&K\fR, -\fB\&const int *\fR -\fI\&IPID\fR, -\fB\&int *\fR -\fI\&IPLEN\fR, -\fB\&int *\fR -\fI\&IPMAP\fR, -\fB\&int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_plindx10\fR -computes three arrays IPLEN, IPMAP and IPMAPM1 that -contain the logarithmic mapping information for the spreading phase. -.SH ARGUMENTS -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel information. -.TP 8 -K (global input) const int -On entry, K specifies the number of entries in IPID. K is at -least 2*N, and at most 4*N. -.TP 8 -IPID (global input) const int * -On entry, IPID is an array of length K. The first K entries -of that array contain the src and final destination resulting -from the application of the interchanges. -.TP 8 -IPLEN (global output) int * -On entry, IPLEN is an array of dimension NPROW + 1. On exit, -this array is such that IPLEN[i] is the number of rows of A -in the processes before process IMAP[i] after the sort, with -the convention that IPLEN[nprow] is the total number of rows. -In other words, IPLEN[i+1] - IPLEN[i] is the local number of -rows of A that should be moved for each process. IPLEN is -such that the number of rows of the source process row can be -computed as IPLEN[1] - IPLEN[0], and the remaining entries of -this array are sorted so that the quantities IPLEN[i+1] - -IPLEN[i] are logarithmically sorted. -.TP 8 -IPMAP (global output) int * -On entry, IPMAP is an array of dimension NPROW. On exit, this -array contains the logarithmic mapping of the processes. In -other words, IPMAP[myrow] is the corresponding sorted process -coordinate. -.TP 8 -IPMAPM1 (global output) int * -On entry, IPMAPM1 is an array of dimension NPROW. On exit, -this array contains the inverse of the logarithmic mapping -contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in -[0.. NPROW) -.SH SEE ALSO -.BR HPL_pdlaswp00N \ (3), -.BR HPL_pdlaswp00T \ (3), -.BR HPL_pdlaswp01N \ (3), -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_pnum.3 b/hpl/man/man3/HPL_pnum.3 deleted file mode 100644 index bd55b1b96a8b758d4b954dbc420ae85595ff2be5..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pnum.3 +++ /dev/null @@ -1,38 +0,0 @@ -.TH HPL_pnum 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pnum \- Rank determination. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_pnum(\fR -\fB\&const HPL_T_grid *\fR -\fI\&GRID\fR, -\fB\&const int\fR -\fI\&MYROW\fR, -\fB\&const int\fR -\fI\&MYCOL\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pnum\fR -determines the rank of a process as a function of its -coordinates in the grid. -.SH ARGUMENTS -.TP 8 -GRID (local input) const HPL_T_grid * -On entry, GRID points to the data structure containing the -process grid information. -.TP 8 -MYROW (local input) const int -On entry, MYROW specifies the row coordinate of the process -whose rank is to be determined. MYROW must be greater than or -equal to zero and less than NPROW. -.TP 8 -MYCOL (local input) const int -On entry, MYCOL specifies the column coordinate of the -process whose rank is to be determined. MYCOL must be greater -than or equal to zero and less than NPCOL. -.SH SEE ALSO -.BR HPL_grid_init \ (3), -.BR HPL_grid_info \ (3), -.BR HPL_grid_exit \ (3). diff --git a/hpl/man/man3/HPL_ptimer.3 b/hpl/man/man3/HPL_ptimer.3 deleted file mode 100644 index 7a7219d716b0e5a9d47867a89f2e1ee7f5b40d80..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_ptimer.3 +++ /dev/null @@ -1,35 +0,0 @@ -.TH HPL_ptimer 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_ptimer \- Timer facility. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_ptimer(\fR -\fB\&const int\fR -\fI\&I\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_ptimer\fR -provides a "stopwatch" functionality cpu/wall timer in -seconds. Up to 64 separate timers can be functioning at once. The -first call starts the timer, and the second stops it. This routine -can be disenabled by calling HPL_ptimer_disable(), so that calls to -the timer are ignored. This feature can be used to make sure certain -sections of code do not affect timings, even if they call routines -which have HPL_ptimer calls in them. HPL_ptimer_enable() will enable -the timer functionality. One can retrieve the current value of a -timer by calling - -t0 = HPL_ptimer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - -where I is the timer index in [0..64). To inititialize the timer -functionality, one must have called HPL_ptimer_boot() prior to any of -the functions mentioned above. -.SH ARGUMENTS -.TP 8 -I (global input) const int -On entry, I specifies the timer to stop/start. -.SH SEE ALSO -.BR HPL_ptimer_cputime \ (3), -.BR HPL_ptimer_walltime \ (3). diff --git a/hpl/man/man3/HPL_ptimer_cputime.3 b/hpl/man/man3/HPL_ptimer_cputime.3 deleted file mode 100644 index ed931284aaa3a71813480d12deceba5c8f8e0dcd..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_ptimer_cputime.3 +++ /dev/null @@ -1,23 +0,0 @@ -.TH HPL_ptimer_cputime 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_ptimer_cputime \- Return the CPU time. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_ptimer_cputime();\fR -.SH DESCRIPTION -\fB\&HPL_ptimer_cputime\fR -returns the cpu time. If HPL_USE_CLOCK is defined, -the clock() function is used to return an approximation of processor -time used by the program. The value returned is the CPU time used so -far as a clock_t; to get the number of seconds used, the result is -divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C -standard library. If HPL_USE_TIMES is defined, the times() function -is used instead. This function returns the current process times. -times() returns the number of clock ticks that have elapsed since the -system has been up. Otherwise and by default, the standard library -function getrusage() is used. -.SH SEE ALSO -.BR HPL_ptimer_walltime \ (3), -.BR HPL_ptimer \ (3). diff --git a/hpl/man/man3/HPL_ptimer_walltime.3 b/hpl/man/man3/HPL_ptimer_walltime.3 deleted file mode 100644 index dba3fa0b3e8dbcd5ce2ded5f81df439671d6b365..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_ptimer_walltime.3 +++ /dev/null @@ -1,14 +0,0 @@ -.TH HPL_ptimer_walltime 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_ptimer_walltime \- Return the elapsed (wall-clock) time. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_ptimer_walltime();\fR -.SH DESCRIPTION -\fB\&HPL_ptimer_walltime\fR -returns the elapsed (wall-clock) time. -.SH SEE ALSO -.BR HPL_ptimer_cputime \ (3), -.BR HPL_ptimer \ (3). diff --git a/hpl/man/man3/HPL_pwarn.3 b/hpl/man/man3/HPL_pwarn.3 deleted file mode 100644 index 34ba5fa0b86860c5c935a3d27d3716bef927750d..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_pwarn.3 +++ /dev/null @@ -1,45 +0,0 @@ -.TH HPL_pwarn 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_pwarn \- displays an error message. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_pwarn(\fR -\fB\&FILE *\fR -\fI\&STREAM\fR, -\fB\&int\fR -\fI\&LINE\fR, -\fB\&const char *\fR -\fI\&SRNAME\fR, -\fB\&const char *\fR -\fI\&FORM\fR, -\fB\&...\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_pwarn\fR -displays an error message. -.SH ARGUMENTS -.TP 8 -STREAM (local input) FILE * -On entry, STREAM specifies the output stream. -.TP 8 -LINE (local input) int -On entry, LINE specifies the line number in the file where -the error has occured. When LINE is not a positive line -number, it is ignored. -.TP 8 -SRNAME (local input) const char * -On entry, SRNAME should be the name of the routine calling -this error handler. -.TP 8 -FORM (local input) const char * -On entry, FORM specifies the format, i.e., how the subsequent -arguments are converted for output. -.TP 8 - (local input) ... -On entry, ... is the list of arguments to be printed within -the format string. -.SH SEE ALSO -.BR HPL_pabort \ (3), -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_rand.3 b/hpl/man/man3/HPL_rand.3 deleted file mode 100644 index ba6e84a4e5b68ee53347bdc3304ebbdb41910600..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_rand.3 +++ /dev/null @@ -1,28 +0,0 @@ -.TH HPL_rand 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_rand \- random number generator. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_rand();\fR -.SH DESCRIPTION -\fB\&HPL_rand\fR -generates the next number in the random sequence. This -function ensures that this number lies in the interval (-0.5, 0.5]. - -The static array irand contains the information (2 integers) required -to generate the next number in the sequence X(n). This number is -computed as X(n) = (2^32 * irand[1] + irand[0]) / d - 0.5, where the -constant d is the largest 64 bit positive integer. The array irand is -then updated for the generation of the next number X(n+1) in the -random sequence as follows X(n+1) = a * X(n) + c. The constants a and -c should have been preliminarily stored in the arrays ias and ics as -2 pairs of integers. The initialization of ias, ics and irand is -performed by the function HPL_setran. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3). diff --git a/hpl/man/man3/HPL_recv.3 b/hpl/man/man3/HPL_recv.3 deleted file mode 100644 index ae7b54a0a99eb7478006551172e98c4cdf1b92f2..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_recv.3 +++ /dev/null @@ -1,49 +0,0 @@ -.TH HPL_recv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_recv \- Receive a message. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_recv(\fR -\fB\&double *\fR -\fI\&RBUF\fR, -\fB\&int\fR -\fI\&RCOUNT\fR, -\fB\&int\fR -\fI\&SRC\fR, -\fB\&int\fR -\fI\&RTAG\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_recv\fR -is a simple wrapper around MPI_Recv. Its main purpose is -to allow for some experimentation / tuning of this simple routine. -Successful completion is indicated by the returned error code -HPL_SUCCESS. In the case of messages of length less than or equal to -zero, this function returns immediately. -.SH ARGUMENTS -.TP 8 -RBUF (local output) double * -On entry, RBUF specifies the starting address of buffer to be -received. -.TP 8 -RCOUNT (local input) int -On entry, RCOUNT specifies the number of double precision -entries in RBUF. RCOUNT must be at least zero. -.TP 8 -SRC (local input) int -On entry, SRC specifies the rank of the sending process in -the communication space defined by COMM. -.TP 8 -RTAG (local input) int -On entry, STAG specifies the message tag to be used for this -communication operation. -.TP 8 -COMM (local input) MPI_Comm -The MPI communicator identifying the communication space. -.SH SEE ALSO -.BR HPL_send \ (3), -.BR HPL_sendrecv \ (3). diff --git a/hpl/man/man3/HPL_reduce.3 b/hpl/man/man3/HPL_reduce.3 deleted file mode 100644 index 8981e537f3ef9e6fac9ff24a063b3b4f786b7651..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_reduce.3 +++ /dev/null @@ -1,56 +0,0 @@ -.TH HPL_reduce 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_reduce \- Reduce operation. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_reduce(\fR -\fB\&void *\fR -\fI\&BUFFER\fR, -\fB\&const int\fR -\fI\&COUNT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR, -\fB\&const HPL_T_OP \fR -\fI\&OP\fR, -\fB\&const int\fR -\fI\&ROOT\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_reduce\fR -performs a global reduce operation across all processes of -a group. Note that the input buffer is used as workarray and in all -processes but the accumulating process corrupting the original data. -.SH ARGUMENTS -.TP 8 -BUFFER (local input/output) void * -On entry, BUFFER points to the buffer to be reduced. On -exit, and in process of rank ROOT this array contains the -reduced data. This buffer is also used as workspace during -the operation in the other processes of the group. -.TP 8 -COUNT (global input) const int -On entry, COUNT indicates the number of entries in BUFFER. -COUNT must be at least zero. -.TP 8 -DTYPE (global input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.TP 8 -OP (global input) const HPL_T_OP -On entry, OP is a pointer to the local combine function. -.TP 8 -ROOT (global input) const int -On entry, ROOT is the coordinate of the accumulating process. -.TP 8 -COMM (global/local input) MPI_Comm -The MPI communicator identifying the process collection. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_min \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_rollN.3 b/hpl/man/man3/HPL_rollN.3 deleted file mode 100644 index ea68616dbeb30a01f2bc9fdddd6b021c3528ef37..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_rollN.3 +++ /dev/null @@ -1,77 +0,0 @@ -.TH HPL_rollN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_rollN \- Roll U and forward the column panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_rollN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&IPLEN\fR, -\fB\&const int *\fR -\fI\&IPMAP\fR, -\fB\&const int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_rollN\fR -rolls the local arrays containing the local pieces of U, so -that on exit to this function U is replicated in every process row. -In addition, this function probe for the presence of the column panel -and forwards it when available. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be rolled) information. -.TP 8 -N (local input) const int -On entry, N specifies the number of columns of U. N must be -at least zero. -.TP 8 -U (local input/output) double * -On entry, U is an array of dimension (LDU,*) containing the -local pieces of U in each process row. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the local leading dimension of U. LDU -should be at least MAX(1,IPLEN[NPROW]). -.TP 8 -IPLEN (global input) const int * -On entry, IPLEN is an array of dimension NPROW+1. This array -is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U -in each process row. -.TP 8 -IPMAP (global input) const int * -On entry, IMAP is an array of dimension NPROW. This array -contains the logarithmic mapping of the processes. In other -words, IMAP[myrow] is the absolute coordinate of the sorted -process. -.TP 8 -IPMAPM1 (global input) const int * -On entry, IMAPM1 is an array of dimension NPROW. This array -contains the inverse of the logarithmic mapping contained in -IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i. -.SH SEE ALSO -.BR HPL_pdlaswp01N \ (3). diff --git a/hpl/man/man3/HPL_rollT.3 b/hpl/man/man3/HPL_rollT.3 deleted file mode 100644 index b4064fe695b3cbe4b06fddcccd351eefbbe686f8..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_rollT.3 +++ /dev/null @@ -1,77 +0,0 @@ -.TH HPL_rollT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_rollT \- Roll U and forward the column panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_rollT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int *\fR -\fI\&IPLEN\fR, -\fB\&const int *\fR -\fI\&IPMAP\fR, -\fB\&const int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_rollT\fR -rolls the local arrays containing the local pieces of U, so -that on exit to this function U is replicated in every process row. -In addition, this function probe for the presence of the column panel -and forwards it when available. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be rolled) information. -.TP 8 -N (local input) const int -On entry, N specifies the local number of rows of U. N must -be at least zero. -.TP 8 -U (local input/output) double * -On entry, U is an array of dimension (LDU,*) containing the -local pieces of U in each process row. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the local leading dimension of U. LDU -should be at least MAX(1,N). -.TP 8 -IPLEN (global input) const int * -On entry, IPLEN is an array of dimension NPROW+1. This array -is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U -in each process row. -.TP 8 -IPMAP (global input) const int * -On entry, IMAP is an array of dimension NPROW. This array -contains the logarithmic mapping of the processes. In other -words, IMAP[myrow] is the absolute coordinate of the sorted -process. -.TP 8 -IPMAPM1 (global input) const int * -On entry, IMAPM1 is an array of dimension NPROW. This array -contains the inverse of the logarithmic mapping contained in -IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i. -.SH SEE ALSO -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_sdrv.3 b/hpl/man/man3/HPL_sdrv.3 deleted file mode 100644 index b33f12a8729ddc6f7c679ff6e4955330a45f30f4..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_sdrv.3 +++ /dev/null @@ -1,67 +0,0 @@ -.TH HPL_sdrv 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_sdrv \- Send and receive a message. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_sdrv(\fR -\fB\&double *\fR -\fI\&SBUF\fR, -\fB\&int\fR -\fI\&SCOUNT\fR, -\fB\&int\fR -\fI\&STAG\fR, -\fB\&double *\fR -\fI\&RBUF\fR, -\fB\&int\fR -\fI\&RCOUNT\fR, -\fB\&int\fR -\fI\&RTAG\fR, -\fB\&int\fR -\fI\&PARTNER\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_sdrv\fR -is a simple wrapper around MPI_Sendrecv. Its main purpose is -to allow for some experimentation and tuning of this simple function. -Messages of length less than or equal to zero are not sent nor -received. Successful completion is indicated by the returned error -code HPL_SUCCESS. -.SH ARGUMENTS -.TP 8 -SBUF (local input) double * -On entry, SBUF specifies the starting address of buffer to be -sent. -.TP 8 -SCOUNT (local input) int -On entry, SCOUNT specifies the number of double precision -entries in SBUF. SCOUNT must be at least zero. -.TP 8 -STAG (local input) int -On entry, STAG specifies the message tag to be used for the -sending communication operation. -.TP 8 -RBUF (local output) double * -On entry, RBUF specifies the starting address of buffer to be -received. -.TP 8 -RCOUNT (local input) int -On entry, RCOUNT specifies the number of double precision -entries in RBUF. RCOUNT must be at least zero. -.TP 8 -RTAG (local input) int -On entry, RTAG specifies the message tag to be used for the -receiving communication operation. -.TP 8 -PARTNER (local input) int -On entry, PARTNER specifies the rank of the collaborative -process in the communication space defined by COMM. -.TP 8 -COMM (local input) MPI_Comm -The MPI communicator identifying the communication space. -.SH SEE ALSO -.BR HPL_send \ (3), -.BR HPL_recv \ (3). diff --git a/hpl/man/man3/HPL_send.3 b/hpl/man/man3/HPL_send.3 deleted file mode 100644 index 214a44871529cd8e18557d925a961421059d7a4a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_send.3 +++ /dev/null @@ -1,49 +0,0 @@ -.TH HPL_send 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_send \- Send a message. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&int\fR -\fB\&HPL_send(\fR -\fB\&double *\fR -\fI\&SBUF\fR, -\fB\&int\fR -\fI\&SCOUNT\fR, -\fB\&int\fR -\fI\&DEST\fR, -\fB\&int\fR -\fI\&STAG\fR, -\fB\&MPI_Comm\fR -\fI\&COMM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_send\fR -is a simple wrapper around MPI_Send. Its main purpose is -to allow for some experimentation / tuning of this simple routine. -Successful completion is indicated by the returned error code -MPI_SUCCESS. In the case of messages of length less than or equal to -zero, this function returns immediately. -.SH ARGUMENTS -.TP 8 -SBUF (local input) double * -On entry, SBUF specifies the starting address of buffer to be -sent. -.TP 8 -SCOUNT (local input) int -On entry, SCOUNT specifies the number of double precision -entries in SBUF. SCOUNT must be at least zero. -.TP 8 -DEST (local input) int -On entry, DEST specifies the rank of the receiving process in -the communication space defined by COMM. -.TP 8 -STAG (local input) int -On entry, STAG specifies the message tag to be used for this -communication operation. -.TP 8 -COMM (local input) MPI_Comm -The MPI communicator identifying the communication space. -.SH SEE ALSO -.BR HPL_recv \ (3), -.BR HPL_sendrecv \ (3). diff --git a/hpl/man/man3/HPL_setran.3 b/hpl/man/man3/HPL_setran.3 deleted file mode 100644 index be4533487215d9f7422734558268266b53803506..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_setran.3 +++ /dev/null @@ -1,37 +0,0 @@ -.TH HPL_setran 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_setran \- Manage the random number generator. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_setran(\fR -\fB\&const int\fR -\fI\&OPTION\fR, -\fB\&int *\fR -\fI\&IRAN\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_setran\fR -initializes the random generator with the encoding of the -first number X(0) in the sequence, and the constants a and c used to -compute the next element in the sequence: X(n+1) = a*X(n) + c. X(0), -a and c are stored in the static variables irand, ias and ics. When -OPTION is 0 (resp. 1 and 2), irand (resp. ia and ic) is set to the -values of the input array IRAN. When OPTION is 3, IRAN is set to the -current value of irand, and irand is then incremented. -.SH ARGUMENTS -.TP 8 -OPTION (local input) const int -On entry, OPTION is an integer that specifies the operations -to be performed on the random generator as specified above. -.TP 8 -IRAN (local input/output) int * -On entry, IRAN is an array of dimension 2, that contains the -16-lower and 15-higher bits of a random number. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_xjumpm \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/man/man3/HPL_spreadN.3 b/hpl/man/man3/HPL_spreadN.3 deleted file mode 100644 index e5c1c0af9ec7bb9d90840bca8097e06e9d26824a..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_spreadN.3 +++ /dev/null @@ -1,96 +0,0 @@ -.TH HPL_spreadN 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_spreadN \- Spread row panel U and forward current column panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_spreadN(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const enum HPL_SIDE\fR -\fI\&SIDE\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int\fR -\fI\&SRCDIST\fR, -\fB\&const int *\fR -\fI\&IPLEN\fR, -\fB\&const int *\fR -\fI\&IPMAP\fR, -\fB\&const int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_spreadN\fR -spreads the local array containing local pieces of U, so -that on exit to this function, a piece of U is contained in every -process row. The array IPLEN contains the number of rows of U, that -should be spread on any given process row. This function also probes -for the presence of the column panel PBCST. In case of success, this -panel will be forwarded. If PBCST is NULL on input, this probing -mechanism will be disabled. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be spread) information. -.TP 8 -SIDE (global input) const enum HPL_SIDE -On entry, SIDE specifies whether the local piece of U located -in process IPMAP[SRCDIST] should be spread to the right or to -the left. This feature is used by the equilibration process. -.TP 8 -N (global input) const int -On entry, N specifies the local number of columns of U. N -must be at least zero. -.TP 8 -U (local input/output) double * -On entry, U is an array of dimension (LDU,*) containing the -local pieces of U. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the local leading dimension of U. LDU -should be at least MAX(1,IPLEN[nprow]). -.TP 8 -SRCDIST (local input) const int -On entry, SRCDIST specifies the source process that spreads -its piece of U. -.TP 8 -IPLEN (global input) const int * -On entry, IPLEN is an array of dimension NPROW+1. This array -is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U -in each process before process IPMAP[i], with the convention -that IPLEN[nprow] is the total number of rows. In other words -IPLEN[i+1] - IPLEN[i] is the local number of rows of U that -should be moved to process IPMAP[i]. -.TP 8 -IPMAP (global input) const int * -On entry, IPMAP is an array of dimension NPROW. This array -contains the logarithmic mapping of the processes. In other -words, IPMAP[myrow] is the absolute coordinate of the sorted -process. -.TP 8 -IPMAPM1 (global input) const int * -On entry, IPMAPM1 is an array of dimension NPROW. This array -contains the inverse of the logarithmic mapping contained in -IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i. -.SH SEE ALSO -.BR HPL_pdlaswp01N \ (3). diff --git a/hpl/man/man3/HPL_spreadT.3 b/hpl/man/man3/HPL_spreadT.3 deleted file mode 100644 index f8529b4efe69b1a2fb4b6ad1fdbf3b89b46654bb..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_spreadT.3 +++ /dev/null @@ -1,96 +0,0 @@ -.TH HPL_spreadT 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_spreadT \- Spread row panel U and forward current column panel. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_spreadT(\fR -\fB\&HPL_T_panel *\fR -\fI\&PBCST\fR, -\fB\&int *\fR -\fI\&IFLAG\fR, -\fB\&HPL_T_panel *\fR -\fI\&PANEL\fR, -\fB\&const enum HPL_SIDE\fR -\fI\&SIDE\fR, -\fB\&const int\fR -\fI\&N\fR, -\fB\&double *\fR -\fI\&U\fR, -\fB\&const int\fR -\fI\&LDU\fR, -\fB\&const int\fR -\fI\&SRCDIST\fR, -\fB\&const int *\fR -\fI\&IPLEN\fR, -\fB\&const int *\fR -\fI\&IPMAP\fR, -\fB\&const int *\fR -\fI\&IPMAPM1\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_spreadT\fR -spreads the local array containing local pieces of U, so -that on exit to this function, a piece of U is contained in every -process row. The array IPLEN contains the number of columns of U, -that should be spread on any given process row. This function also -probes for the presence of the column panel PBCST. If available, -this panel will be forwarded. If PBCST is NULL on input, this -probing mechanism will be disabled. -.SH ARGUMENTS -.TP 8 -PBCST (local input/output) HPL_T_panel * -On entry, PBCST points to the data structure containing the -panel (to be broadcast) information. -.TP 8 -IFLAG (local input/output) int * -On entry, IFLAG indicates whether or not the broadcast has -already been completed. If not, probing will occur, and the -outcome will be contained in IFLAG on exit. -.TP 8 -PANEL (local input/output) HPL_T_panel * -On entry, PANEL points to the data structure containing the -panel (to be spread) information. -.TP 8 -SIDE (global input) const enum HPL_SIDE -On entry, SIDE specifies whether the local piece of U located -in process IPMAP[SRCDIST] should be spread to the right or to -the left. This feature is used by the equilibration process. -.TP 8 -N (global input) const int -On entry, N specifies the local number of rows of U. N must -be at least zero. -.TP 8 -U (local input/output) double * -On entry, U is an array of dimension (LDU,*) containing the -local pieces of U. -.TP 8 -LDU (local input) const int -On entry, LDU specifies the local leading dimension of U. LDU -should be at least MAX(1,N). -.TP 8 -SRCDIST (local input) const int -On entry, SRCDIST specifies the source process that spreads -its piece of U. -.TP 8 -IPLEN (global input) const int * -On entry, IPLEN is an array of dimension NPROW+1. This array -is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U -in each process before process IPMAP[i], with the convention -that IPLEN[nprow] is the total number of rows. In other words -IPLEN[i+1] - IPLEN[i] is the local number of rows of U that -should be moved to process IPMAP[i]. -.TP 8 -IPMAP (global input) const int * -On entry, IPMAP is an array of dimension NPROW. This array -contains the logarithmic mapping of the processes. In other -words, IPMAP[myrow] is the absolute coordinate of the sorted -process. -.TP 8 -IPMAPM1 (global input) const int * -On entry, IPMAPM1 is an array of dimension NPROW. This array -contains the inverse of the logarithmic mapping contained in -IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i. -.SH SEE ALSO -.BR HPL_pdlaswp01T \ (3). diff --git a/hpl/man/man3/HPL_sum.3 b/hpl/man/man3/HPL_sum.3 deleted file mode 100644 index 274e1186b43e024effb16c7826bcde5b5a57cf8c..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_sum.3 +++ /dev/null @@ -1,44 +0,0 @@ -.TH HPL_sum 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_sum \- Combine (sum) two buffers. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_sum(\fR -\fB\&const int\fR -\fI\&N\fR, -\fB\&const void *\fR -\fI\&IN\fR, -\fB\&void *\fR -\fI\&INOUT\fR, -\fB\&const HPL_T_TYPE\fR -\fI\&DTYPE\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_sum\fR -combines (sum) two buffers. -.SH ARGUMENTS -.TP 8 -N (input) const int -On entry, N specifies the length of the buffers to be -combined. N must be at least zero. -.TP 8 -IN (input) const void * -On entry, IN points to the input-only buffer to be combined. -.TP 8 -INOUT (input/output) void * -On entry, INOUT points to the input-output buffer to be -combined. On exit, the entries of this array contains the -combined results. -.TP 8 -DTYPE (input) const HPL_T_TYPE -On entry, DTYPE specifies the type of the buffers operands. -.SH SEE ALSO -.BR HPL_broadcast \ (3), -.BR HPL_reduce \ (3), -.BR HPL_all_reduce \ (3), -.BR HPL_barrier \ (3), -.BR HPL_min \ (3), -.BR HPL_max \ (3), -.BR HPL_sum \ (3). diff --git a/hpl/man/man3/HPL_timer.3 b/hpl/man/man3/HPL_timer.3 deleted file mode 100644 index 77b9d3cf3dfc4a12aeecf80b7c72531404973f0b..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_timer.3 +++ /dev/null @@ -1,35 +0,0 @@ -.TH HPL_timer 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_timer \- Timer facility. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_timer(\fR -\fB\&const int\fR -\fI\&I\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_timer\fR -provides a "stopwatch" functionality cpu/wall timer in -seconds. Up to 64 separate timers can be functioning at once. The -first call starts the timer, and the second stops it. This routine -can be disenabled by calling HPL_timer_disable(), so that calls to -the timer are ignored. This feature can be used to make sure certain -sections of code do not affect timings, even if they call routines -which have HPL_timer calls in them. HPL_timer_enable() will re-enable -the timer functionality. One can retrieve the current value of a -timer by calling - -t0 = HPL_timer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - -where I is the timer index in [0..64). To initialize the timer -functionality, one must have called HPL_timer_boot() prior to any of -the functions mentioned above. -.SH ARGUMENTS -.TP 8 -I (global input) const int -On entry, I specifies the timer to stop/start. -.SH SEE ALSO -.BR HPL_timer_cputime \ (3), -.BR HPL_timer_walltime \ (3). diff --git a/hpl/man/man3/HPL_timer_cputime.3 b/hpl/man/man3/HPL_timer_cputime.3 deleted file mode 100644 index c9444c9031a37cb8998ed38735ad751304b8d528..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_timer_cputime.3 +++ /dev/null @@ -1,23 +0,0 @@ -.TH HPL_timer_cputime 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_timer_cputime \- Return the CPU time. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_timer_cputime();\fR -.SH DESCRIPTION -\fB\&HPL_timer_cputime\fR -returns the cpu time. If HPL_USE_CLOCK is defined, -the clock() function is used to return an approximation of processor -time used by the program. The value returned is the CPU time used so -far as a clock_t; to get the number of seconds used, the result is -divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C -standard library. If HPL_USE_TIMES is defined, the times() function -is used instead. This function returns the current process times. -times() returns the number of clock ticks that have elapsed since the -system has been up. Otherwise and by default, the standard library -function getrusage() is used. -.SH SEE ALSO -.BR HPL_timer_walltime \ (3), -.BR HPL_timer \ (3). diff --git a/hpl/man/man3/HPL_timer_walltime.3 b/hpl/man/man3/HPL_timer_walltime.3 deleted file mode 100644 index 4ff6532da2bcef7890042d86f91df53d55a31e63..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_timer_walltime.3 +++ /dev/null @@ -1,14 +0,0 @@ -.TH HPL_timer_walltime 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_timer_walltime \- Return the elapsed (wall-clock) time. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&double\fR -\fB\&HPL_timer_walltime();\fR -.SH DESCRIPTION -\fB\&HPL_timer_walltime\fR -returns the elapsed (wall-clock) time. -.SH SEE ALSO -.BR HPL_timer_cputime \ (3), -.BR HPL_timer \ (3). diff --git a/hpl/man/man3/HPL_warn.3 b/hpl/man/man3/HPL_warn.3 deleted file mode 100644 index 433528bb3e446ccbf103b942187daee9bec54ae4..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_warn.3 +++ /dev/null @@ -1,59 +0,0 @@ -.TH HPL_warn 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_warn \- displays an error message. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_warn(\fR -\fB\&FILE *\fR -\fI\&STREAM\fR, -\fB\&int\fR -\fI\&LINE\fR, -\fB\&const char *\fR -\fI\&SRNAME\fR, -\fB\&const char *\fR -\fI\&FORM\fR, -\fB\&...\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_warn\fR -displays an error message. -.SH ARGUMENTS -.TP 8 -STREAM (local input) FILE * -On entry, STREAM specifies the output stream. -.TP 8 -LINE (local input) int -On entry, LINE specifies the line number in the file where -the error has occured. When LINE is not a positive line -number, it is ignored. -.TP 8 -SRNAME (local input) const char * -On entry, SRNAME should be the name of the routine calling -this error handler. -.TP 8 -FORM (local input) const char * -On entry, FORM specifies the format, i.e., how the subsequent -arguments are converted for output. -.TP 8 - (local input) ... -On entry, ... is the list of arguments to be printed within -the format string. -.SH EXAMPLE -\fI\&#include "hpl.h"\fR - -int main(int argc, char *argv[]) -.br -{ -.br - HPL_warn( stderr, __LINE__, __FILE__, -.br - "Demo.\en" ); -.br - exit(0); return(0); -.br -} -.SH SEE ALSO -.BR HPL_abort \ (3), -.BR HPL_fprintf \ (3). diff --git a/hpl/man/man3/HPL_xjumpm.3 b/hpl/man/man3/HPL_xjumpm.3 deleted file mode 100644 index d2aa7faa8a9f956f80cc827001b9818be4d5983b..0000000000000000000000000000000000000000 --- a/hpl/man/man3/HPL_xjumpm.3 +++ /dev/null @@ -1,77 +0,0 @@ -.TH HPL_xjumpm 3 "October 26, 2012" "HPL 2.1" "HPL Library Functions" -.SH NAME -HPL_xjumpm \- Compute constants to jump in the random sequence. -.SH SYNOPSIS -\fB\&#include "hpl.h"\fR - -\fB\&void\fR -\fB\&HPL_xjumpm(\fR -\fB\&const int\fR -\fI\&JUMPM\fR, -\fB\&int *\fR -\fI\&MULT\fR, -\fB\&int *\fR -\fI\&IADD\fR, -\fB\&int *\fR -\fI\&IRANN\fR, -\fB\&int *\fR -\fI\&IRANM\fR, -\fB\&int *\fR -\fI\&IAM\fR, -\fB\&int *\fR -\fI\&ICM\fR -\fB\&);\fR -.SH DESCRIPTION -\fB\&HPL_xjumpm\fR -computes the constants A and C to jump JUMPM numbers in -the random sequence: X(n+JUMPM) = A*X(n)+C. The constants encoded in -MULT and IADD specify how to jump from one entry in the sequence to -the next. -.SH ARGUMENTS -.TP 8 -JUMPM (local input) const int -On entry, JUMPM specifies the number of entries in the -sequence to jump over. When JUMPM is less or equal than zero, -A and C are not computed, IRANM is set to IRANN corresponding -to a jump of size zero. -.TP 8 -MULT (local input) int * -On entry, MULT is an array of dimension 2, that contains the -16-lower and 15-higher bits of the constant a to jump from -X(n) to X(n+1) = a*X(n) + c in the random sequence. -.TP 8 -IADD (local input) int * -On entry, IADD is an array of dimension 2, that contains the -16-lower and 15-higher bits of the constant c to jump from -X(n) to X(n+1) = a*X(n) + c in the random sequence. -.TP 8 -IRANN (local input) int * -On entry, IRANN is an array of dimension 2. that contains the -16-lower and 15-higher bits of the encoding of X(n). -.TP 8 -IRANM (local output) int * -On entry, IRANM is an array of dimension 2. On exit, this -array contains respectively the 16-lower and 15-higher bits -of the encoding of X(n+JUMPM). -.TP 8 -IAM (local output) int * -On entry, IAM is an array of dimension 2. On exit, when JUMPM -is greater than zero, this array contains the encoded -constant A to jump from X(n) to X(n+JUMPM) in the random -sequence. IAM(0:1) contains respectively the 16-lower and -15-higher bits of this constant A. When JUMPM is less or -equal than zero, this array is not referenced. -.TP 8 -ICM (local output) int * -On entry, ICM is an array of dimension 2. On exit, when JUMPM -is greater than zero, this array contains the encoded -constant C to jump from X(n) to X(n+JUMPM) in the random -sequence. ICM(0:1) contains respectively the 16-lower and -15-higher bits of this constant C. When JUMPM is less or -equal than zero, this array is not referenced. -.SH SEE ALSO -.BR HPL_ladd \ (3), -.BR HPL_lmul \ (3), -.BR HPL_setran \ (3), -.BR HPL_jumpit \ (3), -.BR HPL_rand \ (3). diff --git a/hpl/setup/Make.FreeBSD_PIV_CBLAS b/hpl/setup/Make.FreeBSD_PIV_CBLAS deleted file mode 100644 index 12218d219b00a30a28468a074b314dc84ab6da8e..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.FreeBSD_PIV_CBLAS +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = FreeBSD_PIV_CBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpich -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a $(MPdir)/lib/libpmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/share/ATLAS/lib/FreeBSD_P5SSE2 -LAinc = -LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_CBLAS -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = /usr/bin/f77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = /usr/bin/ranlib -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.HPUX_FBLAS b/hpl/setup/Make.HPUX_FBLAS deleted file mode 100644 index 322ebe965fe8b579d401fed36e2690fcb3416d35..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.HPUX_FBLAS +++ /dev/null @@ -1,179 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = HPUX -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - MPI directories - library ------------------------------------------ -# ---------------------------------------------------------------------- -# MPIinc tells the C compiler where to find the MPI header files, MPIlib -# is defined to be the name of the MPI library to be used. The variables -# MPIdir and MPIplat are only used for defining MPIinc and MPIlib). -# -MPIdir = $(HOME)/local/mpi -MPIplat = $(MPIdir)/hpux/ch_p4 -# -MPIinc = -I$(MPIdir)/include -I$(MPIplat)/include -MPIlib = $(MPIplat)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - BLAS library ------------------------------------------------------- -# ---------------------------------------------------------------------- -# -BLASlib = /usr/lib/pa1.1/libblas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DNoChange -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(MPIinc) -HPL_LIBS = $(HPLlib) $(BLASlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS F77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(HPL_INCLUDES) $(F2CDEFS) $(HPL_OPTS) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -D_INCLUDE_POSIX_SOURCE -DUseTimes -Aa +O4 -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = cc -LINKFLAGS = -Aa -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.I860_FBLAS b/hpl/setup/Make.I860_FBLAS deleted file mode 100644 index 28cda4d7b2d78dfeb3093b4777731692dc3b9719..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.I860_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = I860_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -lmpi -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lkmath -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -nx -CCFLAGS = $(HPL_DEFS) -O4 -nx -# -LINKER = f77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.IRIX_FBLAS b/hpl/setup/Make.IRIX_FBLAS deleted file mode 100644 index 47f6aa230b6f3b5ffca67e6a4df4d8ae665ea9aa..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.IRIX_FBLAS +++ /dev/null @@ -1,181 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = IRIX_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = $(HOME)/local/mpi -MPinc = -I$(MPdir)/include -I$(MPdir)/IRIX64/ch_p4/include -MPlib = $(MPdir)/IRIX64/ch_p4/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lblas -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DStringSunStyle -DF77_INTEGER=int -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -64 -CCFLAGS = $(HPL_DEFS) -O3 -64 -OPT:Olimit=15000 -TARG:platform=IP30 \ - -LNO:blocking=OFF -LOPT:alias=typed -# -LINKER = cc -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_ATHLON_CBLAS b/hpl/setup/Make.Linux_ATHLON_CBLAS deleted file mode 100644 index 562d0f2e97ac48be6f001c3870f87ede056da251..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_ATHLON_CBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_ATHLON_CBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - MPI directories - library ------------------------------------------ -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_ATHLON -LAinc = -LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the Fortran 77 BLAS interface -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_CBLAS -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -LINKER = /usr/bin/gcc -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_ATHLON_FBLAS b/hpl/setup/Make.Linux_ATHLON_FBLAS deleted file mode 100644 index d3b3fdab1b4c960497b3d85d1aee1f7dbca321b7..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_ATHLON_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_ATHLON_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_ATHLON -LAinc = -LAlib = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -LINKER = /usr/bin/g77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_ATHLON_VSIPL b/hpl/setup/Make.Linux_ATHLON_VSIPL deleted file mode 100644 index 43941c5629728d8a01234289b00d96e67d8ecc74..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_ATHLON_VSIPL +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_ATHLON_VSIPL -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - MPI directories - library ------------------------------------------ -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = /home/software/TASP_VSIPL_Core_Plus -LAinc = -I$(LAdir)/include -LAlib = $(LAdir)/lib/libvsip_c.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the Fortran 77 BLAS interface -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_VSIPL -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -LINKER = /usr/bin/gcc -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_CBLAS b/hpl/setup/Make.Linux_PII_CBLAS deleted file mode 100644 index 93aa10f8e9db6cb2345c70cb889d9d6bbc2b8dec..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_CBLAS +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_CBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_PII -LAinc = -LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_CBLAS -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = /usr/bin/g77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_CBLAS_gm b/hpl/setup/Make.Linux_PII_CBLAS_gm deleted file mode 100644 index b1c6401f6b75f2b45cfc3d9a8a9134c134030142..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_CBLAS_gm +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_CBLAS_gm -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_PII -LAinc = -LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_CBLAS -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpicc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = mpif77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_FBLAS b/hpl/setup/Make.Linux_PII_FBLAS deleted file mode 100644 index 01fb5f094f462153383253d6e385cb7e408f79bd..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_FBLAS +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_PII -LAinc = -LAlib = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = /usr/bin/g77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_FBLAS_gm b/hpl/setup/Make.Linux_PII_FBLAS_gm deleted file mode 100644 index e5967aae9215d10c5e41eac84535c168d6fa0e83..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_FBLAS_gm +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_FBLAS_gm -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/netlib/ARCHIVES/Linux_PII -LAinc = -LAlib = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpicc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = mpif77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_VSIPL b/hpl/setup/Make.Linux_PII_VSIPL deleted file mode 100644 index d97efec3fa49c5efb722b25e40809cabc700c9ff..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_VSIPL +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_VSIPL -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = /home/software/TASP_VSIPL_Core_Plus -LAinc = -I$(LAdir)/include -LAlib = $(LAdir)/lib/libvsip_c.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_VSIPL -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/bin/gcc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = /usr/bin/g77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Linux_PII_VSIPL_gm b/hpl/setup/Make.Linux_PII_VSIPL_gm deleted file mode 100644 index 1b20a08cf598d9deb8fa3a21b59d82cfa34eb09d..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Linux_PII_VSIPL_gm +++ /dev/null @@ -1,183 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Linux_PII_VSIPL_gm -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = /home/software/TASP_VSIPL_Core_Plus -LAinc = -I$(LAdir)/include -LAlib = $(LAdir)/lib/libvsip_c.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_VSIPL -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpicc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -# -# On some platforms, it is necessary to use the Fortran linker to find -# the Fortran internals used in the BLAS library. -# -LINKER = mpif77 -LINKFLAGS = $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.PWR2_FBLAS b/hpl/setup/Make.PWR2_FBLAS deleted file mode 100644 index f6c471c31c16e7ba324e646e88fa1a24427f3896..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.PWR2_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = PWR2_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lesslp2 -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DNoChange -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpcc_r -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -O3 -qarch=pwr2 -qtune=pwr2 -qmaxmem=-1 -# -LINKER = mpxlf_r -LINKFLAGS = -bmaxdata:0x70000000 $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.PWR3_FBLAS b/hpl/setup/Make.PWR3_FBLAS deleted file mode 100644 index ddbb2e83abcfea77a5dc63cf76f9fb9973822705..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.PWR3_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = PWR3_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lessl -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DNoChange -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = /usr/vac/bin/xlc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -qtune=pwr3 -qarch=pwr3 -O3 -qmaxmem=-1 -qfloat=hsflt -# -LINKER = /usr/bin/xlf -LINKFLAGS = -bmaxdata:0x70000000 $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.PWRPC_FBLAS b/hpl/setup/Make.PWRPC_FBLAS deleted file mode 100644 index d1a43935e1886f3fb68078eebbcc5c614ba2362f..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.PWRPC_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = PWRPC_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -MPlib = $(MPdir)/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lessl -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DNoChange -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = mpcc_r -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -O3 -qarch=ppc -qtune=604 -qmaxmem=-1 -# -LINKER = mpxlf_r -LINKFLAGS = -bmaxdata:0x70000000 $(CCFLAGS) -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.SUN4SOL2-g_FBLAS b/hpl/setup/Make.SUN4SOL2-g_FBLAS deleted file mode 100644 index 845d07b1a5a47ef2bde6154ec88191699c03c1a0..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.SUN4SOL2-g_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = SUN4SOL2-g_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = $(HOME)/local/mpi -MPinc = -I$(MPdir)/include -I$(MPdir)/solaris/ch_p4/include -MPlib = $(MPdir)/solaris/ch_p4/lib/libmpich.a -lsocket -lnsl -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -xlic_lib=sunperf -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -g -# -LINKER = purify -best-effort f77 -LINKFLAGS = -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.SUN4SOL2-g_VSIPL b/hpl/setup/Make.SUN4SOL2-g_VSIPL deleted file mode 100644 index 5e6ae0f61352ff2fafb1b65029232cac7a50f138..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.SUN4SOL2-g_VSIPL +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = SUN4SOL2-g_VSIPL -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = $(HOME)/local/mpi -MPinc = -I$(MPdir)/include -I$(MPdir)/solaris/ch_p4/include -MPlib = $(MPdir)/solaris/ch_p4/lib/libmpich.a -lsocket -lnsl -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = $(HOME)/local/TASP_VSIPL_Core_Plus -LAinc = -I$(LAdir)/include -LAlib = $(LAdir)/lib/libvsip_c.a -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -DHPL_CALL_VSIPL -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -g -# -LINKER = purify -best-effort cc -LINKFLAGS = -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.SUN4SOL2_FBLAS b/hpl/setup/Make.SUN4SOL2_FBLAS deleted file mode 100644 index 55540498fb99ff57984eb3577c65ba141fe9274e..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.SUN4SOL2_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = SUN4SOL2_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = $(HOME)/local/mpi -MPinc = -I$(MPdir)/include -I$(MPdir)/solaris/ch_p4/include -MPlib = $(MPdir)/solaris/ch_p4/lib/libmpich.a -lsocket -lnsl -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -xlic_lib=sunperf -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -dalign -fsingle -xO5 -native -xarch=v8plusa -# -LINKER = f77 -LINKFLAGS = -dalign -native -xarch=v8plusa -xO5 -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.T3E_FBLAS b/hpl/setup/Make.T3E_FBLAS deleted file mode 100644 index f7b651618b5db97811748a8abb553f5408ca4d8a..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.T3E_FBLAS +++ /dev/null @@ -1,187 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = T3E_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DUpCase -DF77_INTEGER=long -DStringCrayStyle \ - -DCRAY_BLAS -DHPL_USE_TIMES -# -# When UpCase is defined, CRAY_BLAS redefines the BLAS routines used in -# HPL to be prefixed with an S. In the Cray programming environment, the -# default INTEGER and REAL size is 64 bits. This is reflected in the -# Cray Scientific Library as well, so SGEMM is the 64-bit matrix multi- -# ply. -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -O3 -# -LINKER = f77 -LINKFLAGS = -O3,unroll2,pipeline2 -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = echo -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Tru64_FBLAS b/hpl/setup/Make.Tru64_FBLAS deleted file mode 100644 index fe8dd51d3835fc78d4d7dd5c540eee7ec0e3d668..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Tru64_FBLAS +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Tru64_FBLAS -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = /usr/local/mpi -MPinc = -I$(MPdir)/include -I$(MPdir)/alpha/ch_p4/include -MPlib = $(MPdir)/alpha/ch_p4/lib/libmpich.a -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lcxml -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -arch host -tune host -std -O5 -# -LINKER = f77 -LINKFLAGS = -nofor_main -O5 -arch host -tune host -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = ranlib -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.Tru64_FBLAS_elan b/hpl/setup/Make.Tru64_FBLAS_elan deleted file mode 100644 index fd21d243e7e3c6f00538373ab7f0c4f1af44cc87..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.Tru64_FBLAS_elan +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = /bin/sh -# -CD = cd -CP = cp -LN_S = ln -s -MKDIR = mkdir -RM = /bin/rm -f -TOUCH = touch -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = Tru64_FBLAS_elan -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = -MPinc = -MPlib = -lmpi -lelan -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = -LAinc = -LAlib = -lcxml -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = cc -CCNOOPT = $(HPL_DEFS) -CCFLAGS = $(HPL_DEFS) -arch host -tune host -std -O5 -# -LINKER = f77 -LINKFLAGS = -nofor_main -O5 -arch host -tune host -# -ARCHIVER = ar -ARFLAGS = r -RANLIB = ranlib -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/Make.UNKNOWN.in b/hpl/setup/Make.UNKNOWN.in deleted file mode 100644 index 54ad4be26383f6acebee02995e3d91ee9bfcc81e..0000000000000000000000000000000000000000 --- a/hpl/setup/Make.UNKNOWN.in +++ /dev/null @@ -1,180 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# ---------------------------------------------------------------------- -# - shell -------------------------------------------------------------- -# ---------------------------------------------------------------------- -# -SHELL = @SHELL@ -# -CD = @CD@ -CP = @CP@ -LN_S = @LN_S@ -MKDIR = @MKDIR@ -RM = @RM@ -TOUCH = @TOUCH@ -# -# ---------------------------------------------------------------------- -# - Platform identifier ------------------------------------------------ -# ---------------------------------------------------------------------- -# -ARCH = @ARCH@ -# -# ---------------------------------------------------------------------- -# - HPL Directory Structure / HPL library ------------------------------ -# ---------------------------------------------------------------------- -# -TOPdir = $(HOME)/hpl -INCdir = $(TOPdir)/include -BINdir = $(TOPdir)/bin/$(ARCH) -LIBdir = $(TOPdir)/lib/$(ARCH) -# -HPLlib = $(LIBdir)/libhpl.a -# -# ---------------------------------------------------------------------- -# - Message Passing library (MPI) -------------------------------------- -# ---------------------------------------------------------------------- -# MPinc tells the C compiler where to find the Message Passing library -# header files, MPlib is defined to be the name of the library to be -# used. The variable MPdir is only used for defining MPinc and MPlib. -# -MPdir = @MPDIR@ -MPinc = @MPINC@ -MPlib = @MPLIB@ -# -# ---------------------------------------------------------------------- -# - Linear Algebra library (BLAS or VSIPL) ----------------------------- -# ---------------------------------------------------------------------- -# LAinc tells the C compiler where to find the Linear Algebra library -# header files, LAlib is defined to be the name of the library to be -# used. The variable LAdir is only used for defining LAinc and LAlib. -# -LAdir = @LADIR@ -LAinc = @LAINC@ -LAlib = @LALIB@ -# -# ---------------------------------------------------------------------- -# - F77 / C interface -------------------------------------------------- -# ---------------------------------------------------------------------- -# You can skip this section if and only if you are not planning to use -# a BLAS library featuring a Fortran 77 interface. Otherwise, it is -# necessary to fill out the F2CDEFS variable with the appropriate -# options. **One and only one** option should be chosen in **each** of -# the 3 following categories: -# -# 1) name space (How C calls a Fortran 77 routine) -# -# -DAdd_ : all lower case and a suffixed underscore (Suns, -# Intel, ...), [default] -# -DNoChange : all lower case (IBM RS6000), -# -DUpCase : all upper case (Cray), -# -DAdd__ : the FORTRAN compiler in use is f2c. -# -# 2) C and Fortran 77 integer mapping -# -# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] -# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, -# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. -# -# 3) Fortran 77 string handling -# -# -DStringSunStyle : The string address is passed at the string loca- -# tion on the stack, and the string length is then -# passed as an F77_INTEGER after all explicit -# stack arguments, [default] -# -DStringStructPtr : The address of a structure is passed by a -# Fortran 77 string, and the structure is of the -# form: struct {char *cp; F77_INTEGER len;}, -# -DStringStructVal : A structure is passed by value for each Fortran -# 77 string, and the structure is of the form: -# struct {char *cp; F77_INTEGER len;}, -# -DStringCrayStyle : Special option for Cray machines, which uses -# Cray fcd (fortran character descriptor) for -# interoperation. -# -F2CDEFS = @F2CDEFS@ -# -# ---------------------------------------------------------------------- -# - HPL includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# -HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -# -# - Compile time options ----------------------------------------------- -# -# -DHPL_COPY_L force the copy of the panel L before bcast; -# -DHPL_CALL_CBLAS call the cblas interface; -# -DHPL_CALL_VSIPL call the vsip library; -# -DHPL_DETAILED_TIMING enable detailed timers; -# -# By default HPL will: -# *) not copy L before broadcast, -# *) call the BLAS Fortran 77 interface, -# *) not display detailed timing information. -# -HPL_OPTS = -# -# ---------------------------------------------------------------------- -# -HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -# -CC = @CC@ -CCNOOPT = $(HPL_DEFS) @CCNOOPT@ -CCFLAGS = $(HPL_DEFS) @CCFLAGS@ -# -LINKER = @LINKER@ -LINKFLAGS = @LINKFLAGS@ -# -ARCHIVER = @ARCHIVER@ -ARFLAGS = @ARFLAGS@ -RANLIB = @RANLIB@ -# -# ---------------------------------------------------------------------- diff --git a/hpl/setup/make_generic b/hpl/setup/make_generic deleted file mode 100644 index b093e695c84a3159cc6e06e6eafca760d89518fb..0000000000000000000000000000000000000000 --- a/hpl/setup/make_generic +++ /dev/null @@ -1,83 +0,0 @@ -#!/bin/sh -# -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -# -# Configure script to create Make.UNKNOWN from Make.UNKNOWN.in for the -# HPL distribution, so users without a real Unix system can have a gene- -# ric Make.UNKNOWN to edit for their needs. This script substitutes -# pathless version of all the system programs, and commonly used options -# values into Make.UNKNOWN.in. -# -######################################################################## -# -sed -e 's%@SHELL@%/bin/sh%' \ - -e 's%@CD@%cd%' \ - -e 's%@CP@%cp%' \ - -e 's%@LN_S@%ln -s%' \ - -e 's%@MKDIR@%mkdir%' \ - -e 's%@RM@%/bin/rm -f%' \ - -e 's%@TOUCH@%touch%' \ - -e 's%@ARCH@%UNKNOWN%' \ - -e 's%@CC@%mpicc%' \ - -e 's%@CCNOOPT@%%' \ - -e 's%@CCFLAGS@%%' \ - -e 's%@LINKER@%mpif77%' \ - -e 's%@LINKFLAGS@%%' \ - -e 's%@ARCHIVER@%ar%' \ - -e 's%@ARFLAGS@%r%' \ - -e 's%@RANLIB@%echo%' \ - -e 's%@MPDIR@%%' \ - -e 's%@MPINC@%%' \ - -e 's%@MPLIB@%%' \ - -e 's%@F2CDEFS@%-DAdd_ -DF77_INTEGER=int -DStringSunStyle%' \ - -e 's%@LADIR@%%' \ - -e 's%@LAINC@%%' \ - -e 's%@LALIB@%-lblas%' \ - Make.UNKNOWN.in > Make.UNKNOWN -# -######################################################################## diff --git a/hpl/src/auxil/HPL_abort.c b/hpl/src/auxil/HPL_abort.c deleted file mode 100644 index 04b7ec84eea730ecb65b7243604fd3cb1207e093..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_abort.c +++ /dev/null @@ -1,129 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_abort -( - int LINE, - const char * SRNAME, - const char * FORM, - ... -) -#else -void HPL_abort( va_alist ) -va_dcl -#endif -{ -/* - * Purpose - * ======= - * - * HPL_abort displays an error message on stderr and halts execution. - * - * - * Arguments - * ========= - * - * LINE (local input) int - * On entry, LINE specifies the line number in the file where - * the error has occured. When LINE is not a positive line - * number, it is ignored. - * - * SRNAME (local input) const char * - * On entry, SRNAME should be the name of the routine calling - * this error handler. - * - * FORM (local input) const char * - * On entry, FORM specifies the format, i.e., how the subsequent - * arguments are converted for output. - * - * (local input) ... - * On entry, ... is the list of arguments to be printed within - * the format string. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - va_list argptr; - char cline[128]; -#ifndef STDC_HEADERS - int LINE; - char * FORM, * SRNAME; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef STDC_HEADERS - va_start( argptr, FORM ); -#else - va_start( argptr ); - LINE = va_arg( argptr, int ); - SRNAME = va_arg( argptr, char * ); - FORM = va_arg( argptr, char * ); -#endif - (void) vsprintf( cline, FORM, argptr ); - va_end( argptr ); -/* - * Display an error message - */ - if( LINE <= 0 ) - HPL_fprintf( stderr, "%s %s:\n>>> %s <<< Abort ...\n\n", - "HPL ERROR in function", SRNAME, cline ); - else - HPL_fprintf( stderr, "%s %d %s %s:\n>>> %s <<< Abort ...\n\n", - "HPL ERROR on line", LINE, "of function", SRNAME, cline ); - exit( 0 ); -/* - * End of HPL_abort - */ -} diff --git a/hpl/src/auxil/HPL_dlacpy.c b/hpl/src/auxil/HPL_dlacpy.c deleted file mode 100644 index 66b0c9fcc9ac4474c6a3f69310e38ebc79425ee7..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_dlacpy.c +++ /dev/null @@ -1,343 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factors - * #ifndef HPL_LACPY_M_DEPTH - * #define HPL_LACPY_M_DEPTH 32 - * #define HPL_LACPY_LOG2_M_DEPTH 5 - * #endif - * #ifndef HPL_LACPY_N_DEPTH - * #define HPL_LACPY_N_DEPTH 4 - * #define HPL_LACPY_LOG2_N_DEPTH 2 - * #endif - */ -#ifndef HPL_LACPY_M_DEPTH -#define HPL_LACPY_M_DEPTH 4 -#define HPL_LACPY_LOG2_M_DEPTH 2 -#endif -#ifndef HPL_LACPY_N_DEPTH -#define HPL_LACPY_N_DEPTH 2 -#define HPL_LACPY_LOG2_N_DEPTH 1 -#endif - -#ifdef STDC_HEADERS -void HPL_dlacpy -( - const int M, - const int N, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -void HPL_dlacpy -( M, N, A, LDA, B, LDB ) - const int M; - const int N; - const double * A; - const int LDA; - double * B; - const int LDB; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlacpy copies an array A into an array B. - * - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of the arrays A and - * B. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the arrays A - * and B. N must be at least zero. - * - * A (local input) const double * - * On entry, A points to an array of dimension (LDA,N). - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * B (local output) double * - * On entry, B points to an array of dimension (LDB,N). On exit, - * B is overwritten with A. - * - * LDB (local input) const int - * On entry, LDB specifies the leading dimension of the array B. - * LDB must be at least MAX(1,M). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#ifdef HPL_LACPY_USE_COPY - register int j; -#else -#if ( HPL_LACPY_N_DEPTH == 1 ) - const double * A0 = A; - double * B0 = B; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - const double * A0 = A, * A1 = A + LDA; - double * B0 = B, * B1 = B + LDB; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - const double * A0 = A, * A1 = A + LDA, - * A2 = A + (LDA << 1), * A3 = A + 3 * LDA; - double * B0 = B, * B1 = B + LDB, - * B2 = B + (LDB << 1), * B3 = B + 3 * LDB; -#endif - const int incA = ( (unsigned int)(LDA) << - HPL_LACPY_LOG2_N_DEPTH ) - M, - incB = ( (unsigned int)(LDB) << - HPL_LACPY_LOG2_N_DEPTH ) - M, - incA0 = (unsigned int)(LDA) - M, - incB0 = (unsigned int)(LDB) - M; - int mu, nu; - register int i, j; -#endif -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - -#ifdef HPL_LACPY_USE_COPY - for( j = 0; j < N; j++, A0 += LDA, B0 += LDB ) HPL_dcopy( M, A0, 1, B0, 1 ); -#else - mu = (int)( ( (unsigned int)(M) >> HPL_LACPY_LOG2_M_DEPTH ) << - HPL_LACPY_LOG2_M_DEPTH ); - nu = (int)( ( (unsigned int)(N) >> HPL_LACPY_LOG2_N_DEPTH ) << - HPL_LACPY_LOG2_N_DEPTH ); - - for( j = 0; j < nu; j += HPL_LACPY_N_DEPTH ) - { - for( i = 0; i < mu; i += HPL_LACPY_M_DEPTH ) - { -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[ 0] = A0[ 0]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[ 0] = A0[ 0]; B1[ 0] = A1[ 0]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[ 0] = A0[ 0]; B1[ 0] = A1[ 0]; B2[ 0] = A2[ 0]; B3[ 0] = A3[ 0]; -#endif - -#if ( HPL_LACPY_M_DEPTH > 1 ) - -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[ 1] = A0[ 1]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[ 1] = A0[ 1]; B1[ 1] = A1[ 1]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[ 1] = A0[ 1]; B1[ 1] = A1[ 1]; B2[ 1] = A2[ 1]; B3[ 1] = A3[ 1]; -#endif - -#endif -#if ( HPL_LACPY_M_DEPTH > 2 ) - -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[ 2] = A0[ 2]; B0[ 3] = A0[ 3]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[ 2] = A0[ 2]; B1[ 2] = A1[ 2]; B0[ 3] = A0[ 3]; B1[ 3] = A1[ 3]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[ 2] = A0[ 2]; B1[ 2] = A1[ 2]; B2[ 2] = A2[ 2]; B3[ 2] = A3[ 2]; - B0[ 3] = A0[ 3]; B1[ 3] = A1[ 3]; B2[ 3] = A2[ 3]; B3[ 3] = A3[ 3]; -#endif - -#endif -#if ( HPL_LACPY_M_DEPTH > 4 ) - -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[ 4] = A0[ 4]; B0[ 5] = A0[ 5]; B0[ 6] = A0[ 6]; B0[ 7] = A0[ 7]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[ 4] = A0[ 4]; B1[ 4] = A1[ 4]; B0[ 5] = A0[ 5]; B1[ 5] = A1[ 5]; - B0[ 6] = A0[ 6]; B1[ 6] = A1[ 6]; B0[ 7] = A0[ 7]; B1[ 7] = A1[ 7]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[ 4] = A0[ 4]; B1[ 4] = A1[ 4]; B2[ 4] = A2[ 4]; B3[ 4] = A3[ 4]; - B0[ 5] = A0[ 5]; B1[ 5] = A1[ 5]; B2[ 5] = A2[ 5]; B3[ 5] = A3[ 5]; - B0[ 6] = A0[ 6]; B1[ 6] = A1[ 6]; B2[ 6] = A2[ 6]; B3[ 6] = A3[ 6]; - B0[ 7] = A0[ 7]; B1[ 7] = A1[ 7]; B2[ 7] = A2[ 7]; B3[ 7] = A3[ 7]; -#endif - -#endif -#if ( HPL_LACPY_M_DEPTH > 8 ) - -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[ 8] = A0[ 8]; B0[ 9] = A0[ 9]; B0[10] = A0[10]; B0[11] = A0[11]; - B0[12] = A0[12]; B0[13] = A0[13]; B0[14] = A0[14]; B0[15] = A0[15]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[ 8] = A0[ 8]; B1[ 8] = A1[ 8]; B0[ 9] = A0[ 9]; B1[ 9] = A1[ 9]; - B0[10] = A0[10]; B1[10] = A1[10]; B0[11] = A0[11]; B1[11] = A1[11]; - B0[12] = A0[12]; B1[12] = A1[12]; B0[13] = A0[13]; B1[13] = A1[13]; - B0[14] = A0[14]; B1[14] = A1[14]; B0[15] = A0[15]; B1[15] = A1[15]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[ 8] = A0[ 8]; B1[ 8] = A1[ 8]; B2[ 8] = A2[ 8]; B3[ 8] = A3[ 8]; - B0[ 9] = A0[ 9]; B1[ 9] = A1[ 9]; B2[ 9] = A2[ 9]; B3[ 9] = A3[ 9]; - B0[10] = A0[10]; B1[10] = A1[10]; B2[10] = A2[10]; B3[10] = A3[10]; - B0[11] = A0[11]; B1[11] = A1[11]; B2[11] = A2[11]; B3[11] = A3[11]; - B0[12] = A0[12]; B1[12] = A1[12]; B2[12] = A2[12]; B3[12] = A3[12]; - B0[13] = A0[13]; B1[13] = A1[13]; B2[13] = A2[13]; B3[13] = A3[13]; - B0[14] = A0[14]; B1[14] = A1[14]; B2[14] = A2[14]; B3[14] = A3[14]; - B0[15] = A0[15]; B1[15] = A1[15]; B2[15] = A2[15]; B3[15] = A3[15]; -#endif - -#endif -#if ( HPL_LACPY_M_DEPTH > 16 ) - -#if ( HPL_LACPY_N_DEPTH == 1 ) - B0[16] = A0[16]; B0[17] = A0[17]; B0[18] = A0[18]; B0[19] = A0[19]; - B0[20] = A0[20]; B0[21] = A0[21]; B0[22] = A0[22]; B0[23] = A0[23]; - B0[24] = A0[24]; B0[25] = A0[25]; B0[26] = A0[26]; B0[27] = A0[27]; - B0[28] = A0[28]; B0[29] = A0[29]; B0[30] = A0[30]; B0[31] = A0[31]; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - B0[16] = A0[16]; B1[16] = A1[16]; B0[17] = A0[17]; B1[17] = A1[17]; - B0[18] = A0[18]; B1[18] = A1[18]; B0[19] = A0[19]; B1[19] = A1[19]; - B0[20] = A0[20]; B1[20] = A1[20]; B0[21] = A0[21]; B1[21] = A1[21]; - B0[22] = A0[22]; B1[22] = A1[22]; B0[23] = A0[23]; B1[23] = A1[23]; - B0[24] = A0[24]; B1[24] = A1[24]; B0[25] = A0[25]; B1[25] = A1[25]; - B0[26] = A0[26]; B1[26] = A1[26]; B0[27] = A0[27]; B1[27] = A1[27]; - B0[28] = A0[28]; B1[28] = A1[28]; B0[29] = A0[29]; B1[29] = A1[29]; - B0[30] = A0[30]; B1[30] = A1[30]; B0[31] = A0[31]; B1[31] = A1[31]; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - B0[16] = A0[16]; B1[16] = A1[16]; B2[16] = A2[16]; B3[16] = A3[16]; - B0[17] = A0[17]; B1[17] = A1[17]; B2[17] = A2[17]; B3[17] = A3[17]; - B0[18] = A0[18]; B1[18] = A1[18]; B2[18] = A2[18]; B3[18] = A3[18]; - B0[19] = A0[19]; B1[19] = A1[19]; B2[19] = A2[19]; B3[19] = A3[19]; - B0[20] = A0[20]; B1[20] = A1[20]; B2[20] = A2[20]; B3[20] = A3[20]; - B0[21] = A0[21]; B1[21] = A1[21]; B2[21] = A2[21]; B3[21] = A3[21]; - B0[22] = A0[22]; B1[22] = A1[22]; B2[22] = A2[22]; B3[22] = A3[22]; - B0[23] = A0[23]; B1[23] = A1[23]; B2[23] = A2[23]; B3[23] = A3[23]; - B0[24] = A0[24]; B1[24] = A1[24]; B2[24] = A2[24]; B3[24] = A3[24]; - B0[25] = A0[25]; B1[25] = A1[25]; B2[25] = A2[25]; B3[25] = A3[25]; - B0[26] = A0[26]; B1[26] = A1[26]; B2[26] = A2[26]; B3[26] = A3[26]; - B0[27] = A0[27]; B1[27] = A1[27]; B2[27] = A2[27]; B3[27] = A3[27]; - B0[28] = A0[28]; B1[28] = A1[28]; B2[28] = A2[28]; B3[28] = A3[28]; - B0[29] = A0[29]; B1[29] = A1[29]; B2[29] = A2[29]; B3[29] = A3[29]; - B0[30] = A0[30]; B1[30] = A1[30]; B2[30] = A2[30]; B3[30] = A3[30]; - B0[31] = A0[31]; B1[31] = A1[31]; B2[31] = A2[31]; B3[31] = A3[31]; -#endif - -#endif - -#if ( HPL_LACPY_N_DEPTH == 1 ) - A0 += HPL_LACPY_M_DEPTH; B0 += HPL_LACPY_M_DEPTH; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - A0 += HPL_LACPY_M_DEPTH; B0 += HPL_LACPY_M_DEPTH; - A1 += HPL_LACPY_M_DEPTH; B1 += HPL_LACPY_M_DEPTH; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - A0 += HPL_LACPY_M_DEPTH; B0 += HPL_LACPY_M_DEPTH; - A1 += HPL_LACPY_M_DEPTH; B1 += HPL_LACPY_M_DEPTH; - A2 += HPL_LACPY_M_DEPTH; B2 += HPL_LACPY_M_DEPTH; - A3 += HPL_LACPY_M_DEPTH; B3 += HPL_LACPY_M_DEPTH; -#endif - } - - for( i = mu; i < M; i++ ) - { -#if ( HPL_LACPY_N_DEPTH == 1 ) - *B0 = *A0; B0++; A0++; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - *B0 = *A0; B0++; A0++; *B1 = *A1; B1++; A1++; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - *B0 = *A0; B0++; A0++; *B1 = *A1; B1++; A1++; - *B2 = *A2; B2++; A2++; *B3 = *A3; B3++; A3++; -#endif - } - -#if ( HPL_LACPY_N_DEPTH == 1 ) - A0 += incA; B0 += incB; -#elif ( HPL_LACPY_N_DEPTH == 2 ) - A0 += incA; B0 += incB; A1 += incA; B1 += incB; -#elif ( HPL_LACPY_N_DEPTH == 4 ) - A0 += incA; B0 += incB; A1 += incA; B1 += incB; - A2 += incA; B2 += incB; A3 += incA; B3 += incB; -#endif - } - - for( j = nu; j < N; j++, B0 += incB0, A0 += incA0 ) - { - for( i = 0; i < mu; i += HPL_LACPY_M_DEPTH, - B0 += HPL_LACPY_M_DEPTH, A0 += HPL_LACPY_M_DEPTH ) - { - B0[ 0] = A0[ 0]; -#if ( HPL_LACPY_M_DEPTH > 1 ) - B0[ 1] = A0[ 1]; -#endif -#if ( HPL_LACPY_M_DEPTH > 2 ) - B0[ 2] = A0[ 2]; B0[ 3] = A0[ 3]; -#endif -#if ( HPL_LACPY_M_DEPTH > 4 ) - B0[ 4] = A0[ 4]; B0[ 5] = A0[ 5]; B0[ 6] = A0[ 6]; B0[ 7] = A0[ 7]; -#endif -#if ( HPL_LACPY_M_DEPTH > 8 ) - B0[ 8] = A0[ 8]; B0[ 9] = A0[ 9]; B0[10] = A0[10]; B0[11] = A0[11]; - B0[12] = A0[12]; B0[13] = A0[13]; B0[14] = A0[14]; B0[15] = A0[15]; -#endif -#if ( HPL_LACPY_M_DEPTH > 16 ) - B0[16] = A0[16]; B0[17] = A0[17]; B0[18] = A0[18]; B0[19] = A0[19]; - B0[20] = A0[20]; B0[21] = A0[21]; B0[22] = A0[22]; B0[23] = A0[23]; - B0[24] = A0[24]; B0[25] = A0[25]; B0[26] = A0[26]; B0[27] = A0[27]; - B0[28] = A0[28]; B0[29] = A0[29]; B0[30] = A0[30]; B0[31] = A0[31]; -#endif - } - for( i = mu; i < M; i++, B0++, A0++ ) { *B0 = *A0; } - } -#endif -/* - * End of HPL_dlacpy - */ -} diff --git a/hpl/src/auxil/HPL_dlamch.c b/hpl/src/auxil/HPL_dlamch.c deleted file mode 100644 index ef9d538d38f17135e6c736109aae8c79f75f2dc6..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_dlamch.c +++ /dev/null @@ -1,876 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * --------------------------------------------------------------------- - * Static function prototypes - * --------------------------------------------------------------------- - */ -static void HPL_dlamc1 -STDC_ARGS( -( int *, int *, int *, int * ) ); -static void HPL_dlamc2 -STDC_ARGS( -( int *, int *, int *, double *, - int *, double *, int *, double * ) ); -static double HPL_dlamc3 -STDC_ARGS( -( const double, const double ) ); -static void HPL_dlamc4 -STDC_ARGS( -( int *, const double, const int ) ); -static void HPL_dlamc5 -STDC_ARGS( -( const int, const int, const int, const int, - int *, double * ) ); -static double HPL_dipow -STDC_ARGS( -( const double, const int ) ); - -#ifdef STDC_HEADERS -double HPL_dlamch -( - const HPL_T_MACH CMACH -) -#else -double HPL_dlamch -( CMACH ) - const HPL_T_MACH CMACH; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamch determines machine-specific arithmetic constants such as - * the relative machine precision (eps), the safe minimum (sfmin) such - * that 1 / sfmin does not overflow, the base of the machine (base), the - * precision (prec), the number of (base) digits in the mantissa (t), - * whether rounding occurs in addition (rnd=1.0 and 0.0 otherwise), the - * minimum exponent before (gradual) underflow (emin), the underflow - * threshold (rmin) base**(emin-1), the largest exponent before overflow - * (emax), the overflow threshold (rmax) (base**emax)*(1-eps). - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamch.f (version 2.0 -- 1992), that was itself - * based on the function ENVRON by Malcolm and incorporated suggestions - * by Gentleman and Marovich. See - * - * Malcolm M. A., Algorithms to reveal properties of floating-point - * arithmetic., Comms. of the ACM, 15, 949-951 (1972). - * - * Gentleman W. M. and Marovich S. B., More on algorithms that reveal - * properties of floating point arithmetic units., Comms. of the ACM, - * 17, 276-277 (1974). - * - * Arguments - * ========= - * - * CMACH (local input) const HPL_T_MACH - * Specifies the value to be returned by HPL_dlamch - * = HPL_MACH_EPS, HPL_dlamch := eps (default) - * = HPL_MACH_SFMIN, HPL_dlamch := sfmin - * = HPL_MACH_BASE, HPL_dlamch := base - * = HPL_MACH_PREC, HPL_dlamch := eps*base - * = HPL_MACH_MLEN, HPL_dlamch := t - * = HPL_MACH_RND, HPL_dlamch := rnd - * = HPL_MACH_EMIN, HPL_dlamch := emin - * = HPL_MACH_RMIN, HPL_dlamch := rmin - * = HPL_MACH_EMAX, HPL_dlamch := emax - * = HPL_MACH_RMAX, HPL_dlamch := rmax - * - * where - * - * eps = relative machine precision, - * sfmin = safe minimum, - * base = base of the machine, - * prec = eps*base, - * t = number of digits in the mantissa, - * rnd = 1.0 if rounding occurs in addition, - * emin = minimum exponent before underflow, - * rmin = underflow threshold, - * emax = largest exponent before overflow, - * rmax = overflow threshold. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - static double eps, sfmin, base, t, rnd, emin, rmin, emax, - rmax, prec; - double small; - static int first=1; - int beta=0, imax=0, imin=0, it=0, lrnd=0; -/* .. - * .. Executable Statements .. - */ - if( first != 0 ) - { - first = 0; - HPL_dlamc2( &beta, &it, &lrnd, &eps, &imin, &rmin, &imax, &rmax ); - base = (double)(beta); t = (double)(it); - if( lrnd != 0 ) - { rnd = HPL_rone; eps = HPL_dipow( base, 1 - it ) / HPL_rtwo; } - else - { rnd = HPL_rzero; eps = HPL_dipow( base, 1 - it ); } - prec = eps * base; emin = (double)(imin); emax = (double)(imax); - sfmin = rmin; small = HPL_rone / rmax; -/* - * Use SMALL plus a bit, to avoid the possibility of rounding causing - * overflow when computing 1/sfmin. - */ - if( small >= sfmin ) sfmin = small * ( HPL_rone + eps ); - } - - if( CMACH == HPL_MACH_EPS ) return( eps ); - if( CMACH == HPL_MACH_SFMIN ) return( sfmin ); - if( CMACH == HPL_MACH_BASE ) return( base ); - if( CMACH == HPL_MACH_PREC ) return( prec ); - if( CMACH == HPL_MACH_MLEN ) return( t ); - if( CMACH == HPL_MACH_RND ) return( rnd ); - if( CMACH == HPL_MACH_EMIN ) return( emin ); - if( CMACH == HPL_MACH_RMIN ) return( rmin ); - if( CMACH == HPL_MACH_EMAX ) return( emax ); - if( CMACH == HPL_MACH_RMAX ) return( rmax ); - - return( eps ); -/* - * End of HPL_dlamch - */ -} - -#ifdef STDC_HEADERS -static void HPL_dlamc1 -( - int * BETA, - int * T, - int * RND, - int * IEEE1 -) -#else -static void HPL_dlamc1 -( BETA, T, RND, IEEE1 ) -/* - * .. Scalar Arguments .. - */ - int * BETA, * IEEE1, * RND, * T; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamc1 determines the machine parameters given by BETA, T, RND, - * and IEEE1. - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamc1.f (version 2.0 -- 1992), that was itself - * based on the function ENVRON by Malcolm and incorporated suggestions - * by Gentleman and Marovich. See - * - * Malcolm M. A., Algorithms to reveal properties of floating-point - * arithmetic., Comms. of the ACM, 15, 949-951 (1972). - * - * Gentleman W. M. and Marovich S. B., More on algorithms that reveal - * properties of floating point arithmetic units., Comms. of the ACM, - * 17, 276-277 (1974). - * - * Arguments - * ========= - * - * BETA (local output) int * - * The base of the machine. - * - * T (local output) int * - * The number of ( BETA ) digits in the mantissa. - * - * RND (local output) int * - * Specifies whether proper rounding (RND=1) or chopping (RND=0) - * occurs in addition. This may not be a reliable guide to the - * way in which the machine performs its arithmetic. - * - * IEEE1 (local output) int * - * Specifies whether rounding appears to be done in the IEEE - * `round to nearest' style (IEEE1=1), (IEEE1=0) otherwise. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double a, b, c, f, one, qtr, savec, t1, t2; - static int first=1, lbeta, lieee1, lrnd, lt; -/* .. - * .. Executable Statements .. - */ - if( first != 0 ) - { - first = 0; one = HPL_rone; -/* - * lbeta, lieee1, lt and lrnd are the local values of BETA, IEEE1, T and - * RND. Throughout this routine we use the function HPL_dlamc3 to ensure - * that relevant values are stored and not held in registers, or are not - * affected by optimizers. - * - * Compute a = 2.0**m with the smallest positive integer m such that - * fl( a + 1.0 ) == a. - */ - a = HPL_rone; c = HPL_rone; - do - { a *= HPL_rtwo; c = HPL_dlamc3( a, one ); c = HPL_dlamc3( c, -a ); } - while( c == HPL_rone ); -/* - * Now compute b = 2.0**m with the smallest positive integer m such that - * fl( a + b ) > a. - */ - b = HPL_rone; c = HPL_dlamc3( a, b ); - while( c == a ) { b *= HPL_rtwo; c = HPL_dlamc3( a, b ); } -/* - * Now compute the base. a and c are neighbouring floating point num- - * bers in the interval ( BETA**T, BETA**( T + 1 ) ) and so their diffe- - * rence is BETA. Adding 0.25 to c is to ensure that it is truncated to - * BETA and not (BETA-1). - */ - qtr = one / 4.0; savec = c; - c = HPL_dlamc3( c, -a ); lbeta = (int)(c+qtr); -/* - * Now determine whether rounding or chopping occurs, by adding a bit - * less than BETA/2 and a bit more than BETA/2 to a. - */ - b = (double)(lbeta); - f = HPL_dlamc3( b / HPL_rtwo, -b / 100.0 ); c = HPL_dlamc3( f, a ); - if( c == a ) { lrnd = 1; } else { lrnd = 0; } - f = HPL_dlamc3( b / HPL_rtwo, b / 100.0 ); c = HPL_dlamc3( f, a ); - if( ( lrnd != 0 ) && ( c == a ) ) lrnd = 0; -/* - * Try and decide whether rounding is done in the IEEE round to nea- - * rest style. b/2 is half a unit in the last place of the two numbers - * a and savec. Furthermore, a is even, i.e. has last bit zero, and sa- - * vec is odd. Thus adding b/2 to a should not change a, but adding b/2 - * to savec should change savec. - */ - t1 = HPL_dlamc3( b / HPL_rtwo, a ); - t2 = HPL_dlamc3( b / HPL_rtwo, savec ); - if ( ( t1 == a ) && ( t2 > savec ) && ( lrnd != 0 ) ) lieee1 = 1; - else lieee1 = 0; -/* - * Now find the mantissa, T. It should be the integer part of log to the - * base BETA of a, however it is safer to determine T by powering. So we - * find T as the smallest positive integer for which fl( beta**t + 1.0 ) - * is equal to 1.0. - */ - lt = 0; a = HPL_rone; c = HPL_rone; - - do - { - lt++; a *= (double)(lbeta); - c = HPL_dlamc3( a, one ); c = HPL_dlamc3( c, -a ); - } while( c == HPL_rone ); - } - - *BETA = lbeta; *T = lt; *RND = lrnd; *IEEE1 = lieee1; -} - -#ifdef STDC_HEADERS -static void HPL_dlamc2 -( - int * BETA, - int * T, - int * RND, - double * EPS, - int * EMIN, - double * RMIN, - int * EMAX, - double * RMAX -) -#else -static void HPL_dlamc2( BETA, T, RND, EPS, EMIN, RMIN, EMAX, RMAX ) -/* - * .. Scalar Arguments .. - */ - int * BETA, * EMAX, * EMIN, * RND, * T; - double * EPS, * RMAX, * RMIN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamc2 determines the machine parameters specified in its argu- - * ment list. - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamc2.f (version 2.0 -- 1992), that was itself - * based on a function PARANOIA by W. Kahan of the University of Cali- - * fornia at Berkeley for the computation of the relative machine epsi- - * lon eps. - * - * Arguments - * ========= - * - * BETA (local output) int * - * The base of the machine. - * - * T (local output) int * - * The number of ( BETA ) digits in the mantissa. - * - * RND (local output) int * - * Specifies whether proper rounding (RND=1) or chopping (RND=0) - * occurs in addition. This may not be a reliable guide to the - * way in which the machine performs its arithmetic. - * - * EPS (local output) double * - * The smallest positive number such that fl( 1.0 - EPS ) < 1.0, - * where fl denotes the computed value. - * - * EMIN (local output) int * - * The minimum exponent before (gradual) underflow occurs. - * - * RMIN (local output) double * - * The smallest normalized number for the machine, given by - * BASE**( EMIN - 1 ), where BASE is the floating point value - * of BETA. - * - * EMAX (local output) int * - * The maximum exponent before overflow occurs. - * - * RMAX (local output) double * - * The largest positive number for the machine, given by - * BASE**EMAX * ( 1 - EPS ), where BASE is the floating point - * value of BETA. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - static double leps, lrmax, lrmin; - double a, b, c, half, one, rbase, sixth, small, - third, two, zero; - static int first=1, iwarn=0, lbeta=0, lemax, lemin, - lt=0; - int gnmin=0, gpmin=0, i, ieee, lieee1=0, - lrnd=0, ngnmin=0, ngpmin=0; -/* .. - * .. Executable Statements .. - */ - if( first != 0 ) - { - first = 0; zero = HPL_rzero; one = HPL_rone; two = HPL_rtwo; -/* - * lbeta, lt, lrnd, leps, lemin and lrmin are the local values of BETA, - * T, RND, EPS, EMIN and RMIN. - * - * Throughout this routine we use the function HPL_dlamc3 to ensure that - * relevant values are stored and not held in registers, or are not af- - * fected by optimizers. - * - * HPL_dlamc1 returns the parameters lbeta, lt, lrnd and lieee1. - */ - HPL_dlamc1( &lbeta, <, &lrnd, &lieee1 ); -/* - * Start to find eps. - */ - b = (double)(lbeta); a = HPL_dipow( b, -lt ); leps = a; -/* - * Try some tricks to see whether or not this is the correct EPS. - */ - b = two / 3.0; - half = one / HPL_rtwo; - sixth = HPL_dlamc3( b, -half ); - third = HPL_dlamc3( sixth, sixth ); - b = HPL_dlamc3( third, -half ); - b = HPL_dlamc3( b, sixth ); - b = Mabs( b ); if( b < leps ) b = leps; - - leps = HPL_rone; - - while( ( leps > b ) && ( b > zero ) ) - { - leps = b; - c = HPL_dlamc3( half * leps, - HPL_dipow( two, 5 ) * HPL_dipow( leps, 2 ) ); - c = HPL_dlamc3( half, -c ); b = HPL_dlamc3( half, c ); - c = HPL_dlamc3( half, -b ); b = HPL_dlamc3( half, c ); - } - if( a < leps ) leps = a; -/* - * Computation of EPS complete. - * - * Now find EMIN. Let a = + or - 1, and + or - (1 + BASE**(-3)). Keep - * dividing a by BETA until (gradual) underflow occurs. This is detected - * when we cannot recover the previous a. - */ - rbase = one / (double)(lbeta); small = one; - for( i = 0; i < 3; i++ ) small = HPL_dlamc3( small * rbase, zero ); - a = HPL_dlamc3( one, small ); - HPL_dlamc4( &ngpmin, one, lbeta ); HPL_dlamc4( &ngnmin, -one, lbeta ); - HPL_dlamc4( &gpmin, a, lbeta ); HPL_dlamc4( &gnmin, -a, lbeta ); - - ieee = 0; - - if( ( ngpmin == ngnmin ) && ( gpmin == gnmin ) ) - { - if( ngpmin == gpmin ) - { -/* - * Non twos-complement machines, no gradual underflow; e.g., VAX ) - */ - lemin = ngpmin; - } - else if( ( gpmin-ngpmin ) == 3 ) - { -/* - * Non twos-complement machines with gradual underflow; e.g., IEEE stan- - * dard followers - */ - lemin = ngpmin - 1 + lt; ieee = 1; - } - else - { -/* - * A guess; no known machine - */ - lemin = Mmin( ngpmin, gpmin ); - iwarn = 1; - } - } - else if( ( ngpmin == gpmin ) && ( ngnmin == gnmin ) ) - { - if( Mabs( ngpmin-ngnmin ) == 1 ) - { -/* - * Twos-complement machines, no gradual underflow; e.g., CYBER 205 - */ - lemin = Mmax( ngpmin, ngnmin ); - } - else - { -/* - * A guess; no known machine - */ - lemin = Mmin( ngpmin, ngnmin ); - iwarn = 1; - } - } - else if( ( Mabs( ngpmin-ngnmin ) == 1 ) && ( gpmin == gnmin ) ) - { - if( ( gpmin - Mmin( ngpmin, ngnmin ) ) == 3 ) - { -/* - * Twos-complement machines with gradual underflow; no known machine - */ - lemin = Mmax( ngpmin, ngnmin ) - 1 + lt; - } - else - { -/* - * A guess; no known machine - */ - lemin = Mmin( ngpmin, ngnmin ); - iwarn = 1; - } - } - else - { -/* - * A guess; no known machine - */ - lemin = Mmin( ngpmin, ngnmin ); lemin = Mmin( lemin, gpmin ); - lemin = Mmin( lemin, gnmin ); iwarn = 1; - } -/* - * Comment out this if block if EMIN is ok - */ - if( iwarn != 0 ) - { - first = 1; - HPL_fprintf( stderr, "\n %s %8d\n%s\n%s\n%s\n", -"WARNING. The value EMIN may be incorrect:- EMIN =", lemin, -"If, after inspection, the value EMIN looks acceptable, please comment ", -"out the if block as marked within the code of routine HPL_dlamc2, ", -"otherwise supply EMIN explicitly." ); - } -/* - * Assume IEEE arithmetic if we found denormalised numbers above, or if - * arithmetic seems to round in the IEEE style, determined in routine - * HPL_dlamc1. A true IEEE machine should have both things true; how- - * ever, faulty machines may have one or the other. - */ - if( ( ieee != 0 ) || ( lieee1 != 0 ) ) ieee = 1; - else ieee = 0; -/* - * Compute RMIN by successive division by BETA. We could compute RMIN - * as BASE**( EMIN - 1 ), but some machines underflow during this compu- - * tation. - */ - lrmin = HPL_rone; - for( i = 0; i < 1 - lemin; i++ ) - lrmin = HPL_dlamc3( lrmin*rbase, zero ); -/* - * Finally, call HPL_dlamc5 to compute emax and rmax. - */ - HPL_dlamc5( lbeta, lt, lemin, ieee, &lemax, &lrmax ); - } - *BETA = lbeta; *T = lt; *RND = lrnd; *EPS = leps; - *EMIN = lemin; *RMIN = lrmin; *EMAX = lemax; *RMAX = lrmax; -} - -#ifdef STDC_HEADERS -static double HPL_dlamc3( const double A, const double B ) -#else -static double HPL_dlamc3( A, B ) -/* - * .. Scalar Arguments .. - */ - const double A, B; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamc3 is intended to force a and b to be stored prior to doing - * the addition of a and b, for use in situations where optimizers - * might hold one of these in a register. - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamc3.f (version 2.0 -- 1992). - * - * Arguments - * ========= - * - * A, B (local input) double - * The values a and b. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - return( A + B ); -} - -#ifdef STDC_HEADERS -static void HPL_dlamc4 -( - int * EMIN, - const double START, - const int BASE -) -#else -static void HPL_dlamc4( EMIN, START, BASE ) -/* - * .. Scalar Arguments .. - */ - int * EMIN; - const int BASE; - const double START; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamc4 is a service function for HPL_dlamc2. - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamc4.f (version 2.0 -- 1992). - * - * Arguments - * ========= - * - * EMIN (local output) int * - * The minimum exponent before (gradual) underflow, computed by - * setting A = START and dividing by BASE until the previous A - * can not be recovered. - * - * START (local input) double - * The starting point for determining EMIN. - * - * BASE (local input) int - * The base of the machine. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double a, b1, b2, c1, c2, d1, d2, one, rbase, zero; - int i; -/* .. - * .. Executable Statements .. - */ - a = START; one = HPL_rone; rbase = one / (double)(BASE); - zero = HPL_rzero; - *EMIN = 1; b1 = HPL_dlamc3( a * rbase, zero ); c1 = c2 = d1 = d2 = a; - - do - { - (*EMIN)--; a = b1; - b1 = HPL_dlamc3( a / BASE, zero ); - c1 = HPL_dlamc3( b1 * BASE, zero ); - d1 = zero; for( i = 0; i < BASE; i++ ) d1 = d1 + b1; - b2 = HPL_dlamc3( a * rbase, zero ); - c2 = HPL_dlamc3( b2 / rbase, zero ); - d2 = zero; for( i = 0; i < BASE; i++ ) d2 = d2 + b2; - } while( ( c1 == a ) && ( c2 == a ) && ( d1 == a ) && ( d2 == a ) ); -} - -#ifdef STDC_HEADERS -static void HPL_dlamc5 -( - const int BETA, - const int P, - const int EMIN, - const int IEEE, - int * EMAX, - double * RMAX -) -#else -static void HPL_dlamc5( BETA, P, EMIN, IEEE, EMAX, RMAX ) -/* - * .. Scalar Arguments .. - */ - const int BETA, EMIN, IEEE, P; - int * EMAX; - double * RMAX; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlamc5 attempts to compute RMAX, the largest machine floating- - * point number, without overflow. It assumes that EMAX + abs(EMIN) sum - * approximately to a power of 2. It will fail on machines where this - * assumption does not hold, for example, the Cyber 205 (EMIN = -28625, - * EMAX = 28718). It will also fail if the value supplied for EMIN is - * too large (i.e. too close to zero), probably with overflow. - * - * Notes - * ===== - * - * This function has been manually translated from the Fortran 77 LAPACK - * auxiliary function dlamc5.f (version 2.0 -- 1992). - * - * Arguments - * ========= - * - * BETA (local input) int - * The base of floating-point arithmetic. - * - * P (local input) int - * The number of base BETA digits in the mantissa of a floating- - * point value. - * - * EMIN (local input) int - * The minimum exponent before (gradual) underflow. - * - * IEEE (local input) int - * A logical flag specifying whether or not the arithmetic sys- - * tem is thought to comply with the IEEE standard. - * - * EMAX (local output) int * - * The largest exponent before overflow. - * - * RMAX (local output) double * - * The largest machine floating-point number. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double oldy=HPL_rzero, recbas, y, z; - int exbits=1, expsum, i, lexp=1, nbits, try, - uexp; -/* .. - * .. Executable Statements .. - */ -/* - * First compute lexp and uexp, two powers of 2 that bound abs(EMIN). - * We then assume that EMAX + abs( EMIN ) will sum approximately to the - * bound that is closest to abs( EMIN ). (EMAX is the exponent of the - * required number RMAX). - */ -l_10: - try = (int)( (unsigned int)(lexp) << 1 ); - if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; } - - if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; } -/* - * Now -lexp is less than or equal to EMIN, and -uexp is greater than or - * equal to EMIN. exbits is the number of bits needed to store the expo- - * nent. - */ - if( ( uexp+EMIN ) > ( -lexp-EMIN ) ) - { expsum = (int)( (unsigned int)(lexp) << 1 ); } - else - { expsum = (int)( (unsigned int)(uexp) << 1 ); } -/* - * expsum is the exponent range, approximately equal to EMAX - EMIN + 1. - */ - *EMAX = expsum + EMIN - 1; -/* - * nbits is the total number of bits needed to store a floating-point - * number. - */ - nbits = 1 + exbits + P; - - if( ( nbits % 2 == 1 ) && ( BETA == 2 ) ) - { -/* - * Either there are an odd number of bits used to store a floating-point - * number, which is unlikely, or some bits are not used in the represen- - * tation of numbers, which is possible, (e.g. Cray machines) or the - * mantissa has an implicit bit, (e.g. IEEE machines, Dec Vax machines), - * which is perhaps the most likely. We have to assume the last alterna- - * tive. If this is true, then we need to reduce EMAX by one because - * there must be some way of representing zero in an implicit-bit sys- - * tem. On machines like Cray we are reducing EMAX by one unnecessarily. - */ - (*EMAX)--; - } - - if( IEEE != 0 ) - { -/* - * Assume we are on an IEEE machine which reserves one exponent for in- - * finity and NaN. - */ - (*EMAX)--; - } -/* - * Now create RMAX, the largest machine number, which should be equal to - * (1.0 - BETA**(-P)) * BETA**EMAX . First compute 1.0-BETA**(-P), being - * careful that the result is less than 1.0. - */ - recbas = HPL_rone / (double)(BETA); - z = (double)(BETA) - HPL_rone; - y = HPL_rzero; - - for( i = 0; i < P; i++ ) - { z *= recbas; if( y < HPL_rone ) oldy = y; y = HPL_dlamc3( y, z ); } - - if( y >= HPL_rone ) y = oldy; -/* - * Now multiply by BETA**EMAX to get RMAX. - */ - for( i = 0; i < *EMAX; i++ ) y = HPL_dlamc3( y * BETA, HPL_rzero ); - - *RMAX = y; -/* - * End of HPL_dlamch - */ -} - -#ifdef STDC_HEADERS -static double HPL_dipow -( - const double X, - const int N -) -#else -static double HPL_dipow( X, N ) -/* - * .. Scalar Arguments .. - */ - const int N; - const double X; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dipow computes the integer n-th power of a real scalar x. - * - * Arguments - * ========= - * - * X (local input) const double - * The real scalar x. - * - * N (local input) const int - * The integer power to raise x to. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double r, y=HPL_rone; - int k, n; -/* .. - * .. Executable Statements .. - */ - if( X == HPL_rzero ) return( HPL_rzero ); - if( N < 0 ) { n = -N; r = HPL_rone / X; } else { n = N; r = X; } - for( k = 0; k < n; k++ ) y *= r; - - return( y ); -} diff --git a/hpl/src/auxil/HPL_dlange.c b/hpl/src/auxil/HPL_dlange.c deleted file mode 100644 index 6903afd9d2a9dc44d070d253722e4d683da5b658..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_dlange.c +++ /dev/null @@ -1,184 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -double HPL_dlange -( - const HPL_T_NORM NORM, - const int M, - const int N, - const double * A, - const int LDA -) -#else -double HPL_dlange -( NORM, M, N, A, LDA ) - const HPL_T_NORM NORM; - const int M; - const int N; - const double * A; - const int LDA; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlange returns the value of the one norm, or the infinity norm, - * or the element of largest absolute value of a matrix A: - * - * max(abs(A(i,j))) when NORM = HPL_NORM_A, - * norm1(A), when NORM = HPL_NORM_1, - * normI(A), when NORM = HPL_NORM_I, - * - * where norm1 denotes the one norm of a matrix (maximum column sum) and - * normI denotes the infinity norm of a matrix (maximum row sum). Note - * that max(abs(A(i,j))) is not a matrix norm. - * - * Arguments - * ========= - * - * NORM (local input) const HPL_T_NORM - * On entry, NORM specifies the value to be returned by this - * function as described above. - * - * M (local input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * A (local input) const double * - * On entry, A points to an array of dimension (LDA,N), that - * contains the matrix A. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,M). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double s, v0=HPL_rzero, * work = NULL; - int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return( HPL_rzero ); - - if( NORM == HPL_NORM_A ) - { -/* - * max( abs( A ) ) - */ - for( j = 0; j < N; j++ ) - { - for( i = 0; i < M; i++ ) { v0 = Mmax( v0, Mabs( *A ) ); A++; } - A += LDA - M; - } - } - else if( NORM == HPL_NORM_1 ) - { -/* - * Find norm_1( A ). - */ - work = (double*)malloc( (size_t)(N) * sizeof( double ) ); - if( work == NULL ) - { HPL_abort( __LINE__, "HPL_dlange", "Memory allocation failed" ); } - else - { - for( j = 0; j < N; j++ ) - { - s = HPL_rzero; - for( i = 0; i < M; i++ ) { s += Mabs( *A ); A++; } - work[j] = s; A += LDA - M; - } -/* - * Find maximum sum of columns for 1-norm - */ - v0 = work[HPL_idamax( N, work, 1 )]; v0 = Mabs( v0 ); - if( work ) free( work ); - } - } - else if( NORM == HPL_NORM_I ) - { -/* - * Find norm_inf( A ) - */ - work = (double*)malloc( (size_t)(M) * sizeof( double ) ); - if( work == NULL ) - { HPL_abort( __LINE__, "HPL_dlange", "Memory allocation failed" ); } - else - { - for( i = 0; i < M; i++ ) { work[i] = HPL_rzero; } - - for( j = 0; j < N; j++ ) - { - for( i = 0; i < M; i++ ) { work[i] += Mabs( *A ); A++; } - A += LDA - M; - } -/* - * Find maximum sum of rows for inf-norm - */ - v0 = work[HPL_idamax( M, work, 1 )]; v0 = Mabs( v0 ); - if( work ) free( work ); - } - } - - return( v0 ); -/* - * End of HPL_dlange - */ -} diff --git a/hpl/src/auxil/HPL_dlaprnt.c b/hpl/src/auxil/HPL_dlaprnt.c deleted file mode 100644 index 3dd855a9aade6c730b284b19cfb815f005706c5b..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_dlaprnt.c +++ /dev/null @@ -1,130 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_dlaprnt -( - const int M, - const int N, - double * A, - const int IA, - const int JA, - const int LDA, - const char * CMATNM -) -#else -void HPL_dlaprnt -( M, N, A, IA, JA, LDA, CMATNM ) - const int M; - const int N; - double * A; - const int IA; - const int JA; - const int LDA; - const char * CMATNM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaprnt prints to standard error an M-by-N matrix A. - * - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A. M must be at - * least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of A. N must be - * at least zero. - * - * A (local input) double * - * On entry, A points to an array of dimension (LDA,N). - * - * IA (local input) const int - * On entry, IA specifies the starting row index to be printed. - * - * JA (local input) const int - * On entry, JA specifies the starting column index to be - * printed. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,M). - * - * CMATNM (local input) const char * - * On entry, CMATNM is the name of the matrix to be printed. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, j; -/* .. - * .. Executable Statements .. - */ - for( j = 0; j < N; j++ ) - { - for( i = 0; i < M; i++ ) - { - HPL_fprintf( stderr, "%s(%6d,%6d)=%30.18f\n", CMATNM, IA+i, - JA+j, *(Mptr( A, i, j, LDA )) ); - } - } -/* - * End of HPL_dlaprnt - */ -} diff --git a/hpl/src/auxil/HPL_dlatcpy.c b/hpl/src/auxil/HPL_dlatcpy.c deleted file mode 100644 index b9e7a10e2b71a725a98f1ab0f469821cb6e93241..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_dlatcpy.c +++ /dev/null @@ -1,398 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factors - * #ifndef HPL_LATCPY_M_DEPTH - * #define HPL_LATCPY_M_DEPTH 32 - * #define HPL_LATCPY_LOG2_M_DEPTH 5 - * #endif - * #ifndef HPL_LATCPY_N_DEPTH - * #define HPL_LATCPY_N_DEPTH 4 - * #define HPL_LATCPY_LOG2_N_DEPTH 2 - * #endif - */ -#ifndef HPL_LATCPY_M_DEPTH -#define HPL_LATCPY_M_DEPTH 4 -#define HPL_LATCPY_LOG2_M_DEPTH 2 -#endif -#ifndef HPL_LATCPY_N_DEPTH -#define HPL_LATCPY_N_DEPTH 2 -#define HPL_LATCPY_LOG2_N_DEPTH 1 -#endif - -#ifdef STDC_HEADERS -void HPL_dlatcpy -( - const int M, - const int N, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -void HPL_dlatcpy -( M, N, A, LDA, B, LDB ) - const int M; - const int N; - const double * A; - const int LDA; - double * B; - const int LDB; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlatcpy copies the transpose of an array A into an array B. - * - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of the array B and - * the number of columns of A. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of rows of the array A and - * the number of columns of B. N must be at least zero. - * - * A (local input) const double * - * On entry, A points to an array of dimension (LDA,M). - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,N). - * - * B (local output) double * - * On entry, B points to an array of dimension (LDB,N). On exit, - * B is overwritten with the transpose of A. - * - * LDB (local input) const int - * On entry, LDB specifies the leading dimension of the array B. - * LDB must be at least MAX(1,M). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#ifdef HPL_LATCPY_USE_COPY - register int j; -#else -#if ( HPL_LATCPY_N_DEPTH == 1 ) - const double * A0 = A; - double * B0 = B; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - const double * A0 = A, * A1 = A + 1; - double * B0 = B, * B1 = B + LDB; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - const double * A0 = A, * A1 = A + 1, - * A2 = A + 2, * A3 = A + 3; - double * B0 = B, * B1 = B + LDB, - * B2 = B + (LDB << 1), * B3 = B + 3 * LDB; -#endif - const int incA = -M * LDA + (1 << HPL_LATCPY_LOG2_N_DEPTH), - incB = ( (unsigned int)(LDB) << - HPL_LATCPY_LOG2_N_DEPTH ) - M, - incA0 = -M * LDA + 1, incB0 = LDB - M; - int mu, nu; - register int i, j; -#endif -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - -#ifdef HPL_LATCPY_USE_COPY - for( j = 0; j < N; j++, B0 += LDB ) HPL_dcopy( M, A0+j, LDA, B0, 1 ); -#else - mu = (int)( ( (unsigned int)(M) >> HPL_LATCPY_LOG2_M_DEPTH ) << - HPL_LATCPY_LOG2_M_DEPTH ); - nu = (int)( ( (unsigned int)(N) >> HPL_LATCPY_LOG2_N_DEPTH ) << - HPL_LATCPY_LOG2_N_DEPTH ); - - for( j = 0; j < nu; j += HPL_LATCPY_N_DEPTH ) - { - for( i = 0; i < mu; i += HPL_LATCPY_M_DEPTH ) - { -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[ 0] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[ 0] = *A0; A0 += LDA; B1[ 0] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[ 0] = *A0; A0 += LDA; B1[ 0] = *A1; A1 += LDA; - B2[ 0] = *A2; A2 += LDA; B3[ 0] = *A3; A3 += LDA; -#endif - -#if ( HPL_LATCPY_M_DEPTH > 1 ) - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[ 1] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[ 1] = *A0; A0 += LDA; B1[ 1] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[ 1] = *A0; A0 += LDA; B1[ 1] = *A1; A1 += LDA; - B2[ 1] = *A2; A2 += LDA; B3[ 1] = *A3; A3 += LDA; -#endif - -#endif -#if ( HPL_LATCPY_M_DEPTH > 2 ) - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[ 2] = *A0; A0 += LDA; B0[ 3] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[ 2] = *A0; A0 += LDA; B1[ 2] = *A1; A1 += LDA; - B0[ 3] = *A0; A0 += LDA; B1[ 3] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[ 2] = *A0; A0 += LDA; B1[ 2] = *A1; A1 += LDA; - B2[ 2] = *A2; A2 += LDA; B3[ 2] = *A3; A3 += LDA; - B0[ 3] = *A0; A0 += LDA; B1[ 3] = *A1; A1 += LDA; - B2[ 3] = *A2; A2 += LDA; B3[ 3] = *A3; A3 += LDA; -#endif - -#endif -#if ( HPL_LATCPY_M_DEPTH > 4 ) - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[ 4] = *A0; A0 += LDA; B0[ 5] = *A0; A0 += LDA; - B0[ 6] = *A0; A0 += LDA; B0[ 7] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[ 4] = *A0; A0 += LDA; B1[ 4] = *A1; A1 += LDA; - B0[ 5] = *A0; A0 += LDA; B1[ 5] = *A1; A1 += LDA; - B0[ 6] = *A0; A0 += LDA; B1[ 6] = *A1; A1 += LDA; - B0[ 7] = *A0; A0 += LDA; B1[ 7] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[ 4] = *A0; A0 += LDA; B1[ 4] = *A1; A1 += LDA; - B2[ 4] = *A2; A2 += LDA; B3[ 4] = *A3; A3 += LDA; - B0[ 5] = *A0; A0 += LDA; B1[ 5] = *A1; A1 += LDA; - B2[ 5] = *A2; A2 += LDA; B3[ 5] = *A3; A3 += LDA; - B0[ 6] = *A0; A0 += LDA; B1[ 6] = *A1; A1 += LDA; - B2[ 6] = *A2; A2 += LDA; B3[ 6] = *A3; A3 += LDA; - B0[ 7] = *A0; A0 += LDA; B1[ 7] = *A1; A1 += LDA; - B2[ 7] = *A2; A2 += LDA; B3[ 7] = *A3; A3 += LDA; -#endif - -#endif -#if ( HPL_LATCPY_M_DEPTH > 8 ) - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[ 8] = *A0; A0 += LDA; B0[ 9] = *A0; A0 += LDA; - B0[10] = *A0; A0 += LDA; B0[11] = *A0; A0 += LDA; - B0[12] = *A0; A0 += LDA; B0[13] = *A0; A0 += LDA; - B0[14] = *A0; A0 += LDA; B0[15] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[ 8] = *A0; A0 += LDA; B1[ 8] = *A1; A1 += LDA; - B0[ 9] = *A0; A0 += LDA; B1[ 9] = *A1; A1 += LDA; - B0[10] = *A0; A0 += LDA; B1[10] = *A1; A1 += LDA; - B0[11] = *A0; A0 += LDA; B1[11] = *A1; A1 += LDA; - B0[12] = *A0; A0 += LDA; B1[12] = *A1; A1 += LDA; - B0[13] = *A0; A0 += LDA; B1[13] = *A1; A1 += LDA; - B0[14] = *A0; A0 += LDA; B1[14] = *A1; A1 += LDA; - B0[15] = *A0; A0 += LDA; B1[15] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[ 8] = *A0; A0 += LDA; B1[ 8] = *A1; A1 += LDA; - B2[ 8] = *A2; A2 += LDA; B3[ 8] = *A3; A3 += LDA; - B0[ 9] = *A0; A0 += LDA; B1[ 9] = *A1; A1 += LDA; - B2[ 9] = *A2; A2 += LDA; B3[ 9] = *A3; A3 += LDA; - B0[10] = *A0; A0 += LDA; B1[10] = *A1; A1 += LDA; - B2[10] = *A2; A2 += LDA; B3[10] = *A3; A3 += LDA; - B0[11] = *A0; A0 += LDA; B1[11] = *A1; A1 += LDA; - B2[11] = *A2; A2 += LDA; B3[11] = *A3; A3 += LDA; - B0[12] = *A0; A0 += LDA; B1[12] = *A1; A1 += LDA; - B2[12] = *A2; A2 += LDA; B3[12] = *A3; A3 += LDA; - B0[13] = *A0; A0 += LDA; B1[13] = *A1; A1 += LDA; - B2[13] = *A2; A2 += LDA; B3[13] = *A3; A3 += LDA; - B0[14] = *A0; A0 += LDA; B1[14] = *A1; A1 += LDA; - B2[14] = *A2; A2 += LDA; B3[14] = *A3; A3 += LDA; - B0[15] = *A0; A0 += LDA; B1[15] = *A1; A1 += LDA; - B2[15] = *A2; A2 += LDA; B3[15] = *A3; A3 += LDA; -#endif - -#endif -#if ( HPL_LATCPY_M_DEPTH > 16 ) - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0[16] = *A0; A0 += LDA; B0[17] = *A0; A0 += LDA; - B0[18] = *A0; A0 += LDA; B0[19] = *A0; A0 += LDA; - B0[20] = *A0; A0 += LDA; B0[21] = *A0; A0 += LDA; - B0[22] = *A0; A0 += LDA; B0[23] = *A0; A0 += LDA; - B0[24] = *A0; A0 += LDA; B0[25] = *A0; A0 += LDA; - B0[26] = *A0; A0 += LDA; B0[27] = *A0; A0 += LDA; - B0[28] = *A0; A0 += LDA; B0[29] = *A0; A0 += LDA; - B0[30] = *A0; A0 += LDA; B0[31] = *A0; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0[16] = *A0; A0 += LDA; B1[16] = *A1; A1 += LDA; - B0[17] = *A0; A0 += LDA; B1[17] = *A1; A1 += LDA; - B0[18] = *A0; A0 += LDA; B1[18] = *A1; A1 += LDA; - B0[19] = *A0; A0 += LDA; B1[19] = *A1; A1 += LDA; - B0[20] = *A0; A0 += LDA; B1[20] = *A1; A1 += LDA; - B0[21] = *A0; A0 += LDA; B1[21] = *A1; A1 += LDA; - B0[22] = *A0; A0 += LDA; B1[22] = *A1; A1 += LDA; - B0[23] = *A0; A0 += LDA; B1[23] = *A1; A1 += LDA; - B0[24] = *A0; A0 += LDA; B1[24] = *A1; A1 += LDA; - B0[25] = *A0; A0 += LDA; B1[25] = *A1; A1 += LDA; - B0[26] = *A0; A0 += LDA; B1[26] = *A1; A1 += LDA; - B0[27] = *A0; A0 += LDA; B1[27] = *A1; A1 += LDA; - B0[28] = *A0; A0 += LDA; B1[28] = *A1; A1 += LDA; - B0[29] = *A0; A0 += LDA; B1[29] = *A1; A1 += LDA; - B0[30] = *A0; A0 += LDA; B1[30] = *A1; A1 += LDA; - B0[31] = *A0; A0 += LDA; B1[31] = *A1; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0[16] = *A0; A0 += LDA; B1[16] = *A1; A1 += LDA; - B2[16] = *A2; A2 += LDA; B3[16] = *A3; A3 += LDA; - B0[17] = *A0; A0 += LDA; B1[17] = *A1; A1 += LDA; - B2[17] = *A2; A2 += LDA; B3[17] = *A3; A3 += LDA; - B0[18] = *A0; A0 += LDA; B1[18] = *A1; A1 += LDA; - B2[18] = *A2; A2 += LDA; B3[18] = *A3; A3 += LDA; - B0[19] = *A0; A0 += LDA; B1[19] = *A1; A1 += LDA; - B2[19] = *A2; A2 += LDA; B3[19] = *A3; A3 += LDA; - B0[20] = *A0; A0 += LDA; B1[20] = *A1; A1 += LDA; - B2[20] = *A2; A2 += LDA; B3[20] = *A3; A3 += LDA; - B0[21] = *A0; A0 += LDA; B1[21] = *A1; A1 += LDA; - B2[21] = *A2; A2 += LDA; B3[21] = *A3; A3 += LDA; - B0[22] = *A0; A0 += LDA; B1[22] = *A1; A1 += LDA; - B2[22] = *A2; A2 += LDA; B3[22] = *A3; A3 += LDA; - B0[23] = *A0; A0 += LDA; B1[23] = *A1; A1 += LDA; - B2[23] = *A2; A2 += LDA; B3[23] = *A3; A3 += LDA; - B0[24] = *A0; A0 += LDA; B1[24] = *A1; A1 += LDA; - B2[24] = *A2; A2 += LDA; B3[24] = *A3; A3 += LDA; - B0[25] = *A0; A0 += LDA; B1[25] = *A1; A1 += LDA; - B2[25] = *A2; A2 += LDA; B3[25] = *A3; A3 += LDA; - B0[26] = *A0; A0 += LDA; B1[26] = *A1; A1 += LDA; - B2[26] = *A2; A2 += LDA; B3[26] = *A3; A3 += LDA; - B0[27] = *A0; A0 += LDA; B1[27] = *A1; A1 += LDA; - B2[27] = *A2; A2 += LDA; B3[27] = *A3; A3 += LDA; - B0[28] = *A0; A0 += LDA; B1[28] = *A1; A1 += LDA; - B2[28] = *A2; A2 += LDA; B3[28] = *A3; A3 += LDA; - B0[29] = *A0; A0 += LDA; B1[29] = *A1; A1 += LDA; - B2[29] = *A2; A2 += LDA; B3[29] = *A3; A3 += LDA; - B0[30] = *A0; A0 += LDA; B1[30] = *A1; A1 += LDA; - B2[30] = *A2; A2 += LDA; B3[30] = *A3; A3 += LDA; - B0[31] = *A0; A0 += LDA; B1[31] = *A1; A1 += LDA; - B2[31] = *A2; A2 += LDA; B3[31] = *A3; A3 += LDA; -#endif - -#endif -#if ( HPL_LATCPY_N_DEPTH == 1 ) - B0 += HPL_LATCPY_M_DEPTH; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - B0 += HPL_LATCPY_M_DEPTH; B1 += HPL_LATCPY_M_DEPTH; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - B0 += HPL_LATCPY_M_DEPTH; B1 += HPL_LATCPY_M_DEPTH; - B2 += HPL_LATCPY_M_DEPTH; B3 += HPL_LATCPY_M_DEPTH; -#endif - } - - for( i = mu; i < M; i++ ) - { -#if ( HPL_LATCPY_N_DEPTH == 1 ) - *B0 = *A0; B0++; A0 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - *B0 = *A0; B0++; A0 += LDA; *B1 = *A1; B1++; A1 += LDA; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - *B0 = *A0; B0++; A0 += LDA; *B1 = *A1; B1++; A1 += LDA; - *B2 = *A2; B2++; A2 += LDA; *B3 = *A3; B3++; A3 += LDA; -#endif - } - -#if ( HPL_LATCPY_N_DEPTH == 1 ) - A0 += incA; B0 += incB; -#elif ( HPL_LATCPY_N_DEPTH == 2 ) - A0 += incA; A1 += incA; B0 += incB; B1 += incB; -#elif ( HPL_LATCPY_N_DEPTH == 4 ) - A0 += incA; A1 += incA; A2 += incA; A3 += incA; - B0 += incB; B1 += incB; B2 += incB; B3 += incB; -#endif - } - - for( j = nu; j < N; j++, B0 += incB0, A0 += incA0 ) - { - for( i = 0; i < mu; i += HPL_LATCPY_M_DEPTH, B0 += HPL_LATCPY_M_DEPTH ) - { - B0[ 0]=*A0; A0 += LDA; -#if ( HPL_LATCPY_M_DEPTH > 1 ) - B0[ 1]=*A0; A0 += LDA; -#endif -#if ( HPL_LATCPY_M_DEPTH > 2 ) - B0[ 2]=*A0; A0 += LDA; B0[ 3]=*A0; A0 += LDA; -#endif -#if ( HPL_LATCPY_M_DEPTH > 4 ) - B0[ 4]=*A0; A0 += LDA; B0[ 5]=*A0; A0 += LDA; - B0[ 6]=*A0; A0 += LDA; B0[ 7]=*A0; A0 += LDA; -#endif -#if ( HPL_LATCPY_M_DEPTH > 8 ) - B0[ 8]=*A0; A0 += LDA; B0[ 9]=*A0; A0 += LDA; - B0[10]=*A0; A0 += LDA; B0[11]=*A0; A0 += LDA; - B0[12]=*A0; A0 += LDA; B0[13]=*A0; A0 += LDA; - B0[14]=*A0; A0 += LDA; B0[15]=*A0; A0 += LDA; -#endif -#if ( HPL_LATCPY_M_DEPTH > 16 ) - B0[16]=*A0; A0 += LDA; B0[17]=*A0; A0 += LDA; - B0[18]=*A0; A0 += LDA; B0[19]=*A0; A0 += LDA; - B0[20]=*A0; A0 += LDA; B0[21]=*A0; A0 += LDA; - B0[22]=*A0; A0 += LDA; B0[23]=*A0; A0 += LDA; - B0[24]=*A0; A0 += LDA; B0[25]=*A0; A0 += LDA; - B0[26]=*A0; A0 += LDA; B0[27]=*A0; A0 += LDA; - B0[28]=*A0; A0 += LDA; B0[29]=*A0; A0 += LDA; - B0[30]=*A0; A0 += LDA; B0[31]=*A0; A0 += LDA; -#endif - } - - for( i = mu; i < M; i++, B0++, A0 += LDA ) { *B0 = *A0; } - } -#endif -/* - * End of HPL_dlatcpy - */ -} diff --git a/hpl/src/auxil/HPL_fprintf.c b/hpl/src/auxil/HPL_fprintf.c deleted file mode 100644 index 7d24f057db483700c26d2d07abd5c8d63fce7dd2..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_fprintf.c +++ /dev/null @@ -1,114 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_fprintf -( - FILE * STREAM, - const char * FORM, - ... -) -#else -void HPL_fprintf( va_alist ) -va_dcl -#endif -{ -/* - * Purpose - * ======= - * - * HPL_fprintf is a wrapper around fprintf flushing the output stream. - * - * - * Arguments - * ========= - * - * STREAM (local input) FILE * - * On entry, STREAM specifies the output stream. - * - * FORM (local input) const char * - * On entry, FORM specifies the format, i.e., how the subsequent - * arguments are converted for output. - * - * (local input) ... - * On entry, ... is the list of arguments to be printed within - * the format string. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - va_list argptr; - char cline[256]; -#ifndef STDC_HEADERS - FILE * STREAM; - char * FORM; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef STDC_HEADERS - va_start( argptr, FORM ); -#else - va_start( argptr ); - STREAM = va_arg( argptr, FILE * ); - FORM = va_arg( argptr, char * ); -#endif - (void) vsprintf( cline, FORM, argptr ); - va_end( argptr ); - - (void) fprintf( STREAM, "%s", cline ); - (void) fflush( STREAM ); -/* - * End of HPL_fprintf - */ -} diff --git a/hpl/src/auxil/HPL_warn.c b/hpl/src/auxil/HPL_warn.c deleted file mode 100644 index c9624dbb8fb313d9642a08cb2cdcae08db0ddcc4..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/HPL_warn.c +++ /dev/null @@ -1,134 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_warn -( - FILE * STREAM, - int LINE, - const char * SRNAME, - const char * FORM, - ... -) -#else -void HPL_warn( va_alist ) -va_dcl -#endif -{ -/* - * Purpose - * ======= - * - * HPL_warn displays an error message. - * - * - * Arguments - * ========= - * - * STREAM (local input) FILE * - * On entry, STREAM specifies the output stream. - * - * LINE (local input) int - * On entry, LINE specifies the line number in the file where - * the error has occured. When LINE is not a positive line - * number, it is ignored. - * - * SRNAME (local input) const char * - * On entry, SRNAME should be the name of the routine calling - * this error handler. - * - * FORM (local input) const char * - * On entry, FORM specifies the format, i.e., how the subsequent - * arguments are converted for output. - * - * (local input) ... - * On entry, ... is the list of arguments to be printed within - * the format string. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - va_list argptr; - char cline[128]; -#ifndef STDC_HEADERS - FILE * STREAM; - int LINE; - char * FORM, * SRNAME; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef STDC_HEADERS - va_start( argptr, FORM ); -#else - va_start( argptr ); - STREAM = va_arg( argptr, FILE * ); - LINE = va_arg( argptr, int ); - SRNAME = va_arg( argptr, char * ); - FORM = va_arg( argptr, char * ); -#endif - (void) vsprintf( cline, FORM, argptr ); - va_end( argptr ); -/* - * Display an error message - */ - if( LINE <= 0 ) - HPL_fprintf( STREAM, "%s %s:\n>>> %s <<<\n\n", "HPL ERROR in function", - SRNAME, cline ); - else - HPL_fprintf( STREAM, "%s %d %s %s:\n>>> %s <<<\n\n", - "HPL ERROR on line", LINE, "of function", SRNAME, cline ); -/* - * End of HPL_warn - */ -} diff --git a/hpl/src/auxil/intel64/Make.inc b/hpl/src/auxil/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/auxil/intel64/Makefile b/hpl/src/auxil/intel64/Makefile deleted file mode 100644 index bb6b1cd68995bfb5e66b9440341ce02315934d94..0000000000000000000000000000000000000000 --- a/hpl/src/auxil/intel64/Makefile +++ /dev/null @@ -1,100 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h -# -## Object files ######################################################## -# -HPL_au0obj = \ - HPL_dlacpy.o HPL_dlatcpy.o HPL_fprintf.o \ - HPL_warn.o HPL_abort.o HPL_dlaprnt.o \ - HPL_dlange.o -HPL_au1obj = \ - HPL_dlamch.o -HPL_auxobj = \ - $(HPL_au0obj) $(HPL_au1obj) -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_auxobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_auxobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dlacpy.o : ../HPL_dlacpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlacpy.c -HPL_dlatcpy.o : ../HPL_dlatcpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlatcpy.c -HPL_fprintf.o : ../HPL_fprintf.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_fprintf.c -HPL_warn.o : ../HPL_warn.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_warn.c -HPL_abort.o : ../HPL_abort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_abort.c -HPL_dlaprnt.o : ../HPL_dlaprnt.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaprnt.c -HPL_dlange.o : ../HPL_dlange.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlange.c -HPL_dlamch.o : ../HPL_dlamch.c $(INCdep) - $(CC) -o $@ -c $(CCNOOPT) ../HPL_dlamch.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/auxil/intel64/lib.grd b/hpl/src/auxil/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/blas/HPL_daxpy.c b/hpl/src/blas/HPL_daxpy.c deleted file mode 100644 index 1436581585cddeec43fcea9587b2544128ebc0d4..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_daxpy.c +++ /dev/null @@ -1,175 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_daxpy - -#ifdef STDC_HEADERS -void HPL_daxpy -( - const int N, - const double ALPHA, - const double * X, - const int INCX, - double * Y, - const int INCY -) -#else -void HPL_daxpy -( N, ALPHA, X, INCX, Y, INCY ) - const int N; - const double ALPHA; - const double * X; - const int INCX; - double * Y; - const int INCY; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_daxpy scales the vector x by alpha and adds it to y. - * - * - * Arguments - * ========= - * - * N (local input) const int - * On entry, N specifies the length of the vectors x and y. N - * must be at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero, then the entries of the incremented array X - * need not be set on input. - * - * X (local input) const double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * Y (local input/output) double * - * On entry, Y is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. - * On exit, the entries of the incremented array Y are updated - * with the scaled entries of the incremented array X. - * - * INCY (local input) const int - * On entry, INCY specifies the increment for the elements of Y. - * INCY must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_daxpy( N, ALPHA, X, INCX, Y, INCY ); -#endif -#ifdef HPL_CALL_VSIPL - register const double alpha = ALPHA; - register double x0, x1, x2, x3, y0, y1, y2, y3; - const double * StX; - register int i; - int nu; - const int incX2 = 2 * INCX, incY2 = 2 * INCY, - incX3 = 3 * INCX, incY3 = 3 * INCY, - incX4 = 4 * INCX, incY4 = 4 * INCY; - - if( ( N > 0 ) && ( alpha != HPL_rzero ) ) - { - if( ( nu = ( N >> 2 ) << 2 ) != 0 ) - { - StX = X + nu * INCX; - - do - { - x0 = (*X); y0 = (*Y); x1 = X[INCX ]; y1 = Y[INCY ]; - x2 = X[incX2]; y2 = Y[incY2]; x3 = X[incX3]; y3 = Y[incY3]; - - *Y = y0 + alpha * x0; Y[INCY ] = y1 + alpha * x1; - Y[incY2] = y2 + alpha * x2; Y[incY3] = y3 + alpha * x3; - - X += incX4; - Y += incY4; - - } while( X != StX ); - } - - for( i = N - nu; i != 0; i-- ) - { - x0 = (*X); - y0 = (*Y); - - *Y = y0 + alpha * x0; - - X += INCX; - Y += INCY; - } - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA; -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77incx = INCX, F77incy = INCY; -#else -#define F77N N -#define F77incx INCX -#define F77incy INCY -#endif - F77daxpy( &F77N, &alpha, X, &F77incx, Y, &F77incy ); -#endif -/* - * End of HPL_daxpy - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dcopy.c b/hpl/src/blas/HPL_dcopy.c deleted file mode 100644 index 7eae7e3b2eaaa9556198481c1ca3cf6c0f0b0b82..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dcopy.c +++ /dev/null @@ -1,168 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dcopy - -#ifdef STDC_HEADERS -void HPL_dcopy -( - const int N, - const double * X, - const int INCX, - double * Y, - const int INCY -) -#else -void HPL_dcopy -( N, X, INCX, Y, INCY ) - const int N; - const double * X; - const int INCX; - double * Y; - const int INCY; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dcopy copies the vector x into the vector y. - * - * - * Arguments - * ========= - * - * N (local input) const int - * On entry, N specifies the length of the vectors x and y. N - * must be at least zero. - * - * X (local input) const double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * Y (local input/output) double * - * On entry, Y is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. - * On exit, the entries of the incremented array Y are updated - * with the entries of the incremented array X. - * - * INCY (local input) const int - * On entry, INCY specifies the increment for the elements of Y. - * INCY must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dcopy( N, X, INCX, Y, INCY ); -#endif -#ifdef HPL_CALL_VSIPL - register double x0, x1, x2, x3, x4, x5, x6, x7; - const double * StX; - register int i; - int nu; - const int incX2 = 2 * INCX, incY2 = 2 * INCY, - incX3 = 3 * INCX, incY3 = 3 * INCY, - incX4 = 4 * INCX, incY4 = 4 * INCY, - incX5 = 5 * INCX, incY5 = 5 * INCY, - incX6 = 6 * INCX, incY6 = 6 * INCY, - incX7 = 7 * INCX, incY7 = 7 * INCY, - incX8 = 8 * INCX, incY8 = 8 * INCY; - - if( N > 0 ) - { - if( ( nu = ( N >> 3 ) << 3 ) != 0 ) - { - StX = X + nu * INCX; - - do - { - x0 = (*X); x4 = X[incX4]; x1 = X[INCX ]; x5 = X[incX5]; - x2 = X[incX2]; x6 = X[incX6]; x3 = X[incX3]; x7 = X[incX7]; - - *Y = x0; Y[incY4] = x4; Y[INCY ] = x1; Y[incY5] = x5; - Y[incY2] = x2; Y[incY6] = x6; Y[incY3] = x3; Y[incY7] = x7; - - X += incX8; - Y += incY8; - - } while( X != StX ); - } - - for( i = N - nu; i != 0; i-- ) - { - x0 = (*X); - *Y = x0; - - X += INCX; - Y += INCY; - } - } -#endif -#ifdef HPL_CALL_FBLAS -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77incx = INCX, F77incy = INCY; -#else -#define F77N N -#define F77incx INCX -#define F77incy INCY -#endif - F77dcopy( &F77N, X, &F77incx, Y, &F77incy ); -#endif -/* - * End of HPL_dcopy - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dgemm.c b/hpl/src/blas/HPL_dgemm.c deleted file mode 100644 index 45df67a85d98966e9272da7dd0fd2964ccc85577..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dgemm.c +++ /dev/null @@ -1,521 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dgemm - -#ifdef HPL_CALL_VSIPL - -#ifdef STDC_HEADERS -static void HPL_dgemmNN -( - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -static void HPL_dgemmNN( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) - const int K, LDA, LDB, LDC, M, N; - const double ALPHA, BETA; - const double * A, * B; - double * C; -#endif -{ - register double t0; - int i, iail, iblj, icij, j, jal, jbj, jcj, l; - - for( j = 0, jbj = 0, jcj = 0; j < N; j++, jbj += LDB, jcj += LDC ) - { - HPL_dscal( M, BETA, C+jcj, 1 ); - for( l = 0, jal = 0, iblj = jbj; l < K; l++, jal += LDA, iblj += 1 ) - { - t0 = ALPHA * B[iblj]; - for( i = 0, iail = jal, icij = jcj; i < M; i++, iail += 1, icij += 1 ) - { C[icij] += A[iail] * t0; } - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dgemmNT -( - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -static void HPL_dgemmNT( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) - const int K, LDA, LDB, LDC, M, N; - const double ALPHA, BETA; - const double * A, * B; - double * C; -#endif -{ - register double t0; - int i, iail, ibj, ibjl, icij, j, jal, jcj, l; - - for( j = 0, ibj = 0, jcj = 0; j < N; j++, ibj += 1, jcj += LDC ) - { - HPL_dscal( M, BETA, C+jcj, 1 ); - for( l = 0, jal = 0, ibjl = ibj; l < K; l++, jal += LDA, ibjl += LDB ) - { - t0 = ALPHA * B[ibjl]; - for( i = 0, iail = jal, icij = jcj; i < M; i++, iail += 1, icij += 1 ) - { C[icij] += A[iail] * t0; } - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dgemmTN -( - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -static void HPL_dgemmTN( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) - const int K, LDA, LDB, LDC, M, N; - const double ALPHA, BETA; - const double * A, * B; - double * C; -#endif -{ - register double t0; - int i, iai, iail, iblj, icij, j, jbj, jcj, l; - - for( j = 0, jbj = 0, jcj = 0; j < N; j++, jbj += LDB, jcj += LDC ) - { - for( i = 0, icij = jcj, iai = 0; i < M; i++, icij += 1, iai += LDA ) - { - t0 = HPL_rzero; - for( l = 0, iail = iai, iblj = jbj; l < K; l++, iail += 1, iblj += 1 ) - { t0 += A[iail] * B[iblj]; } - if( BETA == HPL_rzero ) C[icij] = HPL_rzero; - else C[icij] *= BETA; - C[icij] += ALPHA * t0; - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dgemmTT -( - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -static void HPL_dgemmTT( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) - const int K, LDA, LDB, LDC, M, N; - const double ALPHA, BETA; - const double * A, * B; - double * C; -#endif -{ - register double t0; - int i, iali, ibj, ibjl, icij, j, jai, jcj, l; - - for( j = 0, ibj = 0, jcj = 0; j < N; j++, ibj += 1, jcj += LDC ) - { - for( i = 0, icij = jcj, jai = 0; i < M; i++, icij += 1, jai += LDA ) - { - t0 = HPL_rzero; - for( l = 0, iali = jai, ibjl = ibj; - l < K; l++, iali += 1, ibjl += LDB ) t0 += A[iali] * B[ibjl]; - if( BETA == HPL_rzero ) C[icij] = HPL_rzero; - else C[icij] *= BETA; - C[icij] += ALPHA * t0; - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dgemm0 -( - const enum HPL_TRANS TRANSA, - const enum HPL_TRANS TRANSB, - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -static void HPL_dgemm0( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, - BETA, C, LDC ) - const enum HPL_TRANS TRANSA, TRANSB; - const int K, LDA, LDB, LDC, M, N; - const double ALPHA, BETA; - const double * A, * B; - double * C; -#endif -{ - int i, j; - - if( ( M == 0 ) || ( N == 0 ) || - ( ( ( ALPHA == HPL_rzero ) || ( K == 0 ) ) && - ( BETA == HPL_rone ) ) ) return; - - if( ALPHA == HPL_rzero ) - { - for( j = 0; j < N; j++ ) - { for( i = 0; i < M; i++ ) *(C+i+j*LDC) = HPL_rzero; } - return; - } - - if( TRANSB == HplNoTrans ) - { - if( TRANSA == HplNoTrans ) - { HPL_dgemmNN( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ); } - else - { HPL_dgemmTN( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ); } - } - else - { - if( TRANSA == HplNoTrans ) - { HPL_dgemmNT( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ); } - else - { HPL_dgemmTT( M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ); } - } -} - -#endif - -#ifdef STDC_HEADERS -void HPL_dgemm -( - const enum HPL_ORDER ORDER, - const enum HPL_TRANS TRANSA, - const enum HPL_TRANS TRANSB, - const int M, - const int N, - const int K, - const double ALPHA, - const double * A, - const int LDA, - const double * B, - const int LDB, - const double BETA, - double * C, - const int LDC -) -#else -void HPL_dgemm -( ORDER, TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) - const enum HPL_ORDER ORDER; - const enum HPL_TRANS TRANSA; - const enum HPL_TRANS TRANSB; - const int M; - const int N; - const int K; - const double ALPHA; - const double * A; - const int LDA; - const double * B; - const int LDB; - const double BETA; - double * C; - const int LDC; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dgemm performs one of the matrix-matrix operations - * - * C := alpha * op( A ) * op( B ) + beta * C - * - * where op( X ) is one of - * - * op( X ) = X or op( X ) = X^T. - * - * Alpha and beta are scalars, and A, B and C are matrices, with op(A) - * an m by k matrix, op(B) a k by n matrix and C an m by n matrix. - * - * Arguments - * ========= - * - * ORDER (local input) const enum HPL_ORDER - * On entry, ORDER specifies the storage format of the operands - * as follows: - * ORDER = HplRowMajor, - * ORDER = HplColumnMajor. - * - * TRANSA (local input) const enum HPL_TRANS - * On entry, TRANSA specifies the form of op(A) to be used in - * the matrix-matrix operation follows: - * TRANSA==HplNoTrans : op( A ) = A, - * TRANSA==HplTrans : op( A ) = A^T, - * TRANSA==HplConjTrans : op( A ) = A^T. - * - * TRANSB (local input) const enum HPL_TRANS - * On entry, TRANSB specifies the form of op(B) to be used in - * the matrix-matrix operation follows: - * TRANSB==HplNoTrans : op( B ) = B, - * TRANSB==HplTrans : op( B ) = B^T, - * TRANSB==HplConjTrans : op( B ) = B^T. - * - * M (local input) const int - * On entry, M specifies the number of rows of the matrix - * op(A) and of the matrix C. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the matrix - * op(B) and the number of columns of the matrix C. N must be - * at least zero. - * - * K (local input) const int - * On entry, K specifies the number of columns of the matrix - * op(A) and the number of rows of the matrix op(B). K must be - * be at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero then the elements of the matrices A and B - * need not be set on input. - * - * A (local input) const double * - * On entry, A is an array of dimension (LDA,ka), where ka is - * k when TRANSA==HplNoTrans, and is m otherwise. Before - * entry with TRANSA==HplNoTrans, the leading m by k part of - * the array A must contain the matrix A, otherwise the leading - * k by m part of the array A must contain the matrix A. - * - * LDA (local input) const int - * On entry, LDA specifies the first dimension of A as declared - * in the calling (sub) program. When TRANSA==HplNoTrans then - * LDA must be at least max(1,m), otherwise LDA must be at least - * max(1,k). - * - * B (local input) const double * - * On entry, B is an array of dimension (LDB,kb), where kb is - * n when TRANSB==HplNoTrans, and is k otherwise. Before - * entry with TRANSB==HplNoTrans, the leading k by n part of - * the array B must contain the matrix B, otherwise the leading - * n by k part of the array B must contain the matrix B. - * - * LDB (local input) const int - * On entry, LDB specifies the first dimension of B as declared - * in the calling (sub) program. When TRANSB==HplNoTrans then - * LDB must be at least max(1,k), otherwise LDB must be at least - * max(1,n). - * - * BETA (local input) const double - * On entry, BETA specifies the scalar beta. When BETA is - * supplied as zero then the elements of the matrix C need - * not be set on input. - * - * C (local input/output) double * - * On entry, C is an array of dimension (LDC,n). Before entry, - * the leading m by n part of the array C must contain the - * matrix C, except when beta is zero, in which case C need not - * be set on entry. On exit, the array C is overwritten by the - * m by n matrix ( alpha*op( A )*op( B ) + beta*C ). - * - * LDC (local input) const int - * On entry, LDC specifies the first dimension of C as declared - * in the calling (sub) program. LDC must be at least - * max(1,m). - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dgemm( ORDER, TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, - BETA, C, LDC ); -#endif -#ifdef HPL_CALL_VSIPL - if( ORDER == HplColumnMajor ) - { - HPL_dgemm0( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, - C, LDC ); - } - else - { - HPL_dgemm0( TRANSB, TRANSA, N, M, K, ALPHA, B, LDB, A, LDA, BETA, - C, LDC ); - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA, beta = BETA; -#ifdef StringSunStyle -#ifdef HPL_USE_F77_INTEGER_DEF - F77_INTEGER IONE = 1; -#else - int IONE = 1; -#endif -#endif -#ifdef StringStructVal - F77_CHAR ftransa; - F77_CHAR ftransb; -#endif -#ifdef StringStructPtr - F77_CHAR ftransa; - F77_CHAR ftransb; -#endif -#ifdef StringCrayStyle - F77_CHAR ftransa; - F77_CHAR ftransb; -#endif -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77M = M, F77N = N, F77K = K, - F77lda = LDA, F77ldb = LDB, F77ldc = LDC; -#else -#define F77M M -#define F77N N -#define F77K K -#define F77lda LDA -#define F77ldb LDB -#define F77ldc LDC -#endif - char ctransa, ctransb; - - if( TRANSA == HplNoTrans ) ctransa = 'N'; - else if( TRANSA == HplTrans ) ctransa = 'T'; - else ctransa = 'C'; - - if( TRANSB == HplNoTrans ) ctransb = 'N'; - else if( TRANSB == HplTrans ) ctransb = 'T'; - else ctransb = 'C'; - - if( ORDER == HplColumnMajor ) - { -#ifdef StringSunStyle - F77dgemm( &ctransa, &ctransb, &F77M, &F77N, &F77K, &alpha, A, &F77lda, - B, &F77ldb, &beta, C, &F77ldc, IONE, IONE ); -#endif -#ifdef StringCrayStyle - ftransa = HPL_C2F_CHAR( ctransa ); ftransb = HPL_C2F_CHAR( ctransb ); - F77dgemm( ftransa, ftransb, &F77M, &F77N, &F77K, &alpha, A, &F77lda, - B, &F77ldb, &beta, C, &F77ldc ); -#endif -#ifdef StringStructVal - ftransa.len = 1; ftransa.cp = &ctransa; - ftransb.len = 1; ftransb.cp = &ctransb; - F77dgemm( ftransa, ftransb, &F77M, &F77N, &F77K, &alpha, A, &F77lda, - B, &F77ldb, &beta, C, &F77ldc ); -#endif -#ifdef StringStructPtr - ftransa.len = 1; ftransa.cp = &ctransa; - ftransb.len = 1; ftransb.cp = &ctransb; - F77dgemm( &ftransa, &ftransb, &F77M, &F77N, &F77K, &alpha, A, &F77lda, - B, &F77ldb, &beta, C, &F77ldc ); -#endif - } - else - { -#ifdef StringSunStyle - F77dgemm( &ctransb, &ctransa, &F77N, &F77M, &F77K, &alpha, B, &F77ldb, - A, &F77lda, &beta, C, &F77ldc, IONE, IONE ); -#endif -#ifdef StringCrayStyle - ftransa = HPL_C2F_CHAR( ctransa ); ftransb = HPL_C2F_CHAR( ctransb ); - F77dgemm( ftransb, ftransa, &F77N, &F77M, &F77K, &alpha, B, &F77ldb, - A, &F77lda, &beta, C, &F77ldc ); -#endif -#ifdef StringStructVal - ftransa.len = 1; ftransa.cp = &ctransa; - ftransb.len = 1; ftransb.cp = &ctransb; - F77dgemm( ftransb, ftransa, &F77N, &F77M, &F77K, &alpha, B, &F77ldb, - A, &F77lda, &beta, C, &F77ldc ); -#endif -#ifdef StringStructPtr - ftransa.len = 1; ftransa.cp = &ctransa; - ftransb.len = 1; ftransb.cp = &ctransb; - F77dgemm( &ftransb, &ftransa, &F77N, &F77M, &F77K, &alpha, B, &F77ldb, - A, &F77lda, &beta, C, &F77ldc ); -#endif - } -#endif -/* - * End of HPL_dgemm - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dgemv.c b/hpl/src/blas/HPL_dgemv.c deleted file mode 100644 index ce4f5440bb10973f4e40781172940ee95e59feec..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dgemv.c +++ /dev/null @@ -1,326 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dgemv - -#ifdef HPL_CALL_VSIPL - -#ifdef STDC_HEADERS -static void HPL_dgemv0 -( - const enum HPL_TRANS TRANS, - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - const double * X, - const int INCX, - const double BETA, - double * Y, - const int INCY -) -#else -static void HPL_dgemv0( TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY ) - const enum HPL_TRANS TRANS; - const int INCX, INCY, LDA, M, N; - const double ALPHA, BETA; - const double * A, * X; - double * Y; -#endif -{ -/* - * .. Local Variables .. - */ - int i, iaij, ix, iy, j, jaj, jx, jy; - register double t0; -/* .. - * .. Executable Statements .. - */ - if( ( M == 0 ) || ( N == 0 ) || - ( ( ALPHA == HPL_rzero ) && ( BETA == HPL_rone ) ) ) return; - - if( ALPHA == HPL_rzero ) { HPL_dscal( M, BETA, Y, INCY ); return; } - - if( TRANS == HplNoTrans ) - { - HPL_dscal( M, BETA, Y, INCY ); - for( j = 0, jaj = 0, jx = 0; j < N; j++, jaj += LDA, jx += INCX ) - { - t0 = ALPHA * X[jx]; - for( i = 0, iaij = jaj, iy = 0; i < M; i++, iaij += 1, iy += INCY ) - { Y[iy] += A[iaij] * t0; } - } - } - else - { - for( j = 0, jaj = 0, jy = 0; j < N; j++, jaj += LDA, jy += INCY ) - { - t0 = HPL_rzero; - for( i = 0, iaij = jaj, ix = 0; i < M; i++, iaij += 1, ix += INCX ) - { t0 += A[iaij] * X[ix]; } - if( BETA == HPL_rzero ) Y[jy] = ALPHA * t0; - else Y[jy] = BETA * Y[jy] + ALPHA * t0; - } - } -} -#endif - -#ifdef STDC_HEADERS -void HPL_dgemv -( - const enum HPL_ORDER ORDER, - const enum HPL_TRANS TRANS, - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - const double * X, - const int INCX, - const double BETA, - double * Y, - const int INCY -) -#else -void HPL_dgemv -( ORDER, TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY ) - const enum HPL_ORDER ORDER; - const enum HPL_TRANS TRANS; - const int M; - const int N; - const double ALPHA; - const double * A; - const int LDA; - const double * X; - const int INCX; - const double BETA; - double * Y; - const int INCY; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dgemv performs one of the matrix-vector operations - * - * y := alpha * op( A ) * x + beta * y, - * - * where op( X ) is one of - * - * op( X ) = X or op( X ) = X^T. - * - * where alpha and beta are scalars, x and y are vectors and A is an m - * by n matrix. - * - * Arguments - * ========= - * - * ORDER (local input) const enum HPL_ORDER - * On entry, ORDER specifies the storage format of the operands - * as follows: - * ORDER = HplRowMajor, - * ORDER = HplColumnMajor. - * - * TRANS (local input) const enum HPL_TRANS - * On entry, TRANS specifies the operation to be performed as - * follows: - * TRANS = HplNoTrans y := alpha*A *x + beta*y, - * TRANS = HplTrans y := alpha*A^T*x + beta*y. - * - * M (local input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero then A and X need not be set on input. - * - * A (local input) const double * - * On entry, A points to an array of size equal to or greater - * than LDA * n. Before entry, the leading m by n part of the - * array A must contain the matrix coefficients. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of A as - * declared in the calling (sub) program. LDA must be at - * least MAX(1,m). - * - * X (local input) const double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * BETA (local input) const double - * On entry, BETA specifies the scalar beta. When ALPHA is - * supplied as zero then Y need not be set on input. - * - * Y (local input/output) double * - * On entry, Y is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. - * Before entry with BETA non-zero, the incremented array Y must - * contain the vector y. On exit, Y is overwritten by the - * updated vector y. - * - * INCY (local input) const int - * On entry, INCY specifies the increment for the elements of Y. - * INCY must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dgemv( ORDER, TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY ); -#endif -#ifdef HPL_CALL_VSIPL - if( ORDER == HplColumnMajor ) - { - HPL_dgemv0( TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY ); - } - else - { - HPL_dgemv0( ( TRANS == HplNoTrans ? HplTrans : HplNoTrans ), - N, M, ALPHA, A, LDA, X, INCX, BETA, Y, INCY ); - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA, beta = BETA; -#ifdef StringSunStyle -#ifdef HPL_USE_F77_INTEGER_DEF - F77_INTEGER IONE = 1; -#else - int IONE = 1; -#endif -#endif -#ifdef StringStructVal - F77_CHAR ftran; -#endif -#ifdef StringStructPtr - F77_CHAR ftran; -#endif -#ifdef StringCrayStyle - F77_CHAR ftran; -#endif - -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77M = M, F77N = N, - F77lda = LDA, F77incx = INCX, F77incy = INCY; -#else -#define F77M M -#define F77N N -#define F77lda LDA -#define F77incx INCX -#define F77incy INCY -#endif - char ctran; - - if( ORDER == HplColumnMajor ) - { - ctran = ( TRANS == HplNoTrans ? 'N' : 'T' ); - -#ifdef StringSunStyle - F77dgemv( &ctran, &F77M, &F77N, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy, IONE ); -#endif -#ifdef StringCrayStyle - ftran = HPL_C2F_CHAR( ctran ); - F77dgemv( ftran, &F77M, &F77N, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif -#ifdef StringStructVal - ftran.len = 1; ftran.cp = &ctran; - F77dgemv( ftran, &F77M, &F77N, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif -#ifdef StringStructPtr - ftran.len = 1; ftran.cp = &ctran; - F77dgemv( &ftran, &F77M, &F77N, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif - } - else - { - ctran = ( TRANS == HplNoTrans ? 'T' : 'N' ); -#ifdef StringSunStyle - F77dgemv( &ctran, &F77N, &F77M, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy, IONE ); -#endif -#ifdef StringCrayStyle - ftran = HPL_C2F_CHAR( ctran ); - F77dgemv( ftran, &F77N, &F77M, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif -#ifdef StringStructVal - ftran.len = 1; ftran.cp = &ctran; - F77dgemv( ftran, &F77N, &F77M, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif -#ifdef StringStructPtr - ftran.len = 1; ftran.cp = &ctran; - F77dgemv( &ftran, &F77N, &F77M, &alpha, A, &F77lda, X, &F77incx, - &beta, Y, &F77incy ); -#endif - } - -#endif -/* - * End of HPL_dgemv - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dger.c b/hpl/src/blas/HPL_dger.c deleted file mode 100644 index 9813de1317d96ce5db0afb6675239e3ad174f798..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dger.c +++ /dev/null @@ -1,195 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dger - -#ifdef STDC_HEADERS -void HPL_dger -( - const enum HPL_ORDER ORDER, - const int M, - const int N, - const double ALPHA, - const double * X, - const int INCX, - double * Y, - const int INCY, - double * A, - const int LDA -) -#else -void HPL_dger -( ORDER, M, N, ALPHA, X, INCX, Y, INCY, A, LDA ) - const enum HPL_ORDER ORDER; - const int M; - const int N; - const double ALPHA; - const double * X; - const int INCX; - double * Y; - const int INCY; - double * A; - const int LDA; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dger performs the rank 1 operation - * - * A := alpha * x * y^T + A, - * - * where alpha is a scalar, x is an m-element vector, y is an n-element - * vector and A is an m by n matrix. - * - * Arguments - * ========= - * - * ORDER (local input) const enum HPL_ORDER - * On entry, ORDER specifies the storage format of the operands - * as follows: - * ORDER = HplRowMajor, - * ORDER = HplColumnMajor. - * - * M (local input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero then X and Y need not be set on input. - * - * X (local input) const double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( m - 1 ) * abs( INCX ) ) that contains the vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * Y (local input) double * - * On entry, Y is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. - * - * INCY (local input) const int - * On entry, INCY specifies the increment for the elements of Y. - * INCY must not be zero. - * - * A (local input/output) double * - * On entry, A points to an array of size equal to or greater - * than LDA * n. Before entry, the leading m by n part of the - * array A must contain the matrix coefficients. On exit, A is - * overwritten by the updated matrix. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of A as - * declared in the calling (sub) program. LDA must be at - * least MAX(1,m). - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dger( ORDER, M, N, ALPHA, X, INCX, Y, INCY, A, LDA ); -#endif -#ifdef HPL_CALL_VSIPL - register double t0; - int i, iaij, ix, iy, j, jaj, jx, jy; - - if( ( M == 0 ) || ( N == 0 ) || ( ALPHA == HPL_rzero ) ) return; - - if( ORDER == HplColumnMajor ) - { - for( j = 0, jaj = 0, jy = 0; j < N; j++, jaj += LDA, jy += INCY ) - { - t0 = ALPHA * Y[jy]; - for( i = 0, iaij = jaj, ix = 0; i < M; i++, iaij += 1, ix += INCX ) - { A[iaij] += X[ix] * t0; } - } - } - else - { - for( j = 0, jaj = 0, jx = 0; j < M; j++, jaj += LDA, jx += INCX ) - { - t0 = ALPHA * X[jx]; - for( i = 0, iaij = jaj, iy = 0; i < N; i++, iaij += 1, iy += INCY ) - { A[iaij] += Y[iy] * t0; } - } - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA; -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77M = M, F77N = N, - F77lda = LDA, F77incx = INCX, F77incy = INCY; -#else -#define F77M M -#define F77N N -#define F77lda LDA -#define F77incx INCX -#define F77incy INCY -#endif - - if( ORDER == HplColumnMajor ) - { F77dger( &F77M, &F77N, &alpha, X, &F77incx, Y, &F77incy, A, &F77lda ); } - else - { F77dger( &F77N, &F77M, &alpha, Y, &F77incy, X, &F77incx, A, &F77lda ); } -#endif -/* - * End of HPL_dger - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dscal.c b/hpl/src/blas/HPL_dscal.c deleted file mode 100644 index 2b798ec459cf3b3d868f1bf63d641e0935de03e9..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dscal.c +++ /dev/null @@ -1,179 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dscal - -#ifdef STDC_HEADERS -void HPL_dscal -( - const int N, - const double ALPHA, - double * X, - const int INCX -) -#else -void HPL_dscal -( N, ALPHA, X, INCX ) - const int N; - const double ALPHA; - double * X; - const int INCX; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dscal scales the vector x by alpha. - * - * - * Arguments - * ========= - * - * N (local input) const int - * On entry, N specifies the length of the vector x. N must be - * at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero, then the entries of the incremented array X - * need not be set on input. - * - * X (local input/output) double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * On exit, the entries of the incremented array X are scaled - * by the scalar alpha. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dscal( N, ALPHA, X, INCX ); -#endif -#ifdef HPL_CALL_VSIPL - register double x0, x1, x2, x3, x4, x5, x6, x7; - register const double alpha = ALPHA; - const double * StX; - register int i; - int nu; - const int incX2 = 2 * INCX, incX3 = 3 * INCX, - incX4 = 4 * INCX, incX5 = 5 * INCX, - incX6 = 6 * INCX, incX7 = 7 * INCX, - incX8 = 8 * INCX; - - if( ( N > 0 ) && ( alpha != HPL_rone ) ) - { - if( alpha == HPL_rzero ) - { - if( ( nu = ( N >> 3 ) << 3 ) != 0 ) - { - StX = (double *)X + nu * INCX; - - do - { - (*X) = HPL_rzero; X[incX4] = HPL_rzero; - X[INCX ] = HPL_rzero; X[incX5] = HPL_rzero; - X[incX2] = HPL_rzero; X[incX6] = HPL_rzero; - X[incX3] = HPL_rzero; X[incX7] = HPL_rzero; X += incX8; - - } while( X != StX ); - } - - for( i = N - nu; i != 0; i-- ) { *X = HPL_rzero; X += INCX; } - } - else - { - if( ( nu = ( N >> 3 ) << 3 ) != 0 ) - { - StX = X + nu * INCX; - - do - { - x0 = (*X); x4 = X[incX4]; x1 = X[INCX ]; x5 = X[incX5]; - x2 = X[incX2]; x6 = X[incX6]; x3 = X[incX3]; x7 = X[incX7]; - - x0 *= alpha; x4 *= alpha; x1 *= alpha; x5 *= alpha; - x2 *= alpha; x6 *= alpha; x3 *= alpha; x7 *= alpha; - - (*X) = x0; X[incX4] = x4; X[INCX ] = x1; X[incX5] = x5; - X[incX2] = x2; X[incX6] = x6; X[incX3] = x3; X[incX7] = x7; - - X += incX8; - - } while( X != StX ); - } - - for( i = N - nu; i != 0; i-- ) - { x0 = (*X); x0 *= alpha; *X = x0; X += INCX; } - } - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA; -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77incx = INCX; -#else -#define F77N N -#define F77incx INCX -#endif - - F77dscal( &F77N, &alpha, X, &F77incx ); -#endif -/* - * End of HPL_dscal - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dswap.c b/hpl/src/blas/HPL_dswap.c deleted file mode 100644 index 7620a048b63a4f8c369e9c70b85fb243a88806b0..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dswap.c +++ /dev/null @@ -1,157 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dswap - -#ifdef STDC_HEADERS -void HPL_dswap -( - const int N, - double * X, - const int INCX, - double * Y, - const int INCY -) -#else -void HPL_dswap -( N, X, INCX, Y, INCY ) - const int N; - double * X; - const int INCX; - double * Y; - const int INCY; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dswap swaps the vectors x and y. - * - * - * Arguments - * ========= - * - * N (local input) const int - * On entry, N specifies the length of the vectors x and y. N - * must be at least zero. - * - * X (local input/output) double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * On exit, the entries of the incremented array X are updated - * with the entries of the incremented array Y. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * Y (local input/output) double * - * On entry, Y is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCY ) ) that contains the vector y. - * On exit, the entries of the incremented array Y are updated - * with the entries of the incremented array X. - * - * INCY (local input) const int - * On entry, INCY specifies the increment for the elements of Y. - * INCY must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dswap( N, X, INCX, Y, INCY ); -#endif -#ifdef HPL_CALL_VSIPL - register double x0, x1, x2, x3, y0, y1, y2, y3; - double * StX; - register int i; - int nu; - const int incX2 = 2 * INCX, incY2 = 2 * INCY, - incX3 = 3 * INCX, incY3 = 3 * INCY, - incX4 = 4 * INCX, incY4 = 4 * INCY; - - if( N > 0 ) - { - if( ( nu = ( N >> 2 ) << 2 ) != 0 ) - { - StX = X + nu * INCX; - - do - { - x0 = (*X); y0 = (*Y); x1 = X[INCX ]; y1 = Y[INCY ]; - x2 = X[incX2]; y2 = Y[incY2]; x3 = X[incX3]; y3 = Y[incY3]; - *Y = x0; *X = y0; Y[INCY ] = x1; X[INCX ] = y1; - Y[incY2] = x2; X[incX2] = y2; Y[incY3] = x3; X[incX3] = y3; - X += incX4; Y += incY4; - - } while( X != StX ); - } - - for( i = N - nu; i != 0; i-- ) - { x0 = (*X); y0 = (*Y); *Y = x0; *X = y0; X += INCX; Y += INCY; } - } -#endif -#ifdef HPL_CALL_FBLAS -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77incx = INCX, F77incy = INCY; -#else -#define F77N N -#define F77incx INCX -#define F77incy INCY -#endif - F77dswap( &F77N, X, &F77incx, Y, &F77incy ); -#endif -/* - * End of HPL_dswap - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dtrsm.c b/hpl/src/blas/HPL_dtrsm.c deleted file mode 100644 index da4b06efa8465c8cb4bd07320acb2f3e306c3b58..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dtrsm.c +++ /dev/null @@ -1,977 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dtrsm - -#ifdef HPL_CALL_VSIPL - -#ifdef STDC_HEADERS -static void HPL_dtrsmLLNN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLLNN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iaik, ibij, ibkj, j, jak, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, ibij= jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = 0, jak = 0, ibkj = jbj; k < M; k++, jak += LDA, ibkj += 1 ) - { - B[ibkj] /= A[k+jak]; - for( i = k+1, iaik = k+1+jak, ibij = k+1+jbj; - i < M; i++, iaik +=1, ibij += 1 ) - { B[ibij] -= B[ibkj] * A[iaik]; } - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsmLLNU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLLNU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iaik, ibij, ibkj, j, jak, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, ibij= jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = 0, jak = 0, ibkj = jbj; k < M; k++, jak += LDA, ibkj += 1 ) - { - for( i = k+1, iaik = k+1+jak, ibij = k+1+jbj; - i < M; i++, iaik +=1, ibij += 1 ) - { B[ibij] -= B[ibkj] * A[iaik]; } - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsmLLTN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLLTN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iaki, ibij, ibkj, j, jai, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = M-1, jai = (M-1)*LDA, ibij = M-1+jbj; - i >= 0; i--, jai -= LDA, ibij -= 1 ) - { - t0 = ALPHA * B[ibij]; - for( k = i+1, iaki = i+1+jai, ibkj = i+1+jbj; - k < M; k++, iaki += 1, ibkj += 1 ) - { t0 -= A[iaki] * B[ibkj]; } - t0 /= A[i+jai]; - B[ibij] = t0; - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsmLLTU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLLTU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iaki, ibij, ibkj, j, jai, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = M-1, jai = (M-1)*LDA, ibij = M-1+jbj; - i >= 0; i--, jai -= LDA, ibij -= 1 ) - { - t0 = ALPHA * B[ibij]; - for( k = i+1, iaki = i+1+jai, ibkj = i+1+jbj; - k < M; k++, iaki += 1, ibkj += 1 ) - { t0 -= A[iaki] * B[ibkj]; } - B[ibij] = t0; - } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsmLUNN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLUNN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iaik, ibij, ibkj, j, jak, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = M-1, jak = (M-1)*LDA, ibkj = M-1+jbj; - k >= 0; k--, jak -= LDA, ibkj -= 1 ) - { - B[ibkj] /= A[k+jak]; - for( i = 0, iaik = jak, ibij = jbj; - i < k; i++, iaik += 1, ibij += 1 ) - { B[ibij] -= B[ibkj] * A[iaik]; } - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmLUNU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLUNU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iaik, ibij, ibkj, j, jak, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = M-1, jak = (M-1)*LDA, ibkj = M-1+jbj; - k >= 0; k--, jak -= LDA, ibkj -= 1 ) - { - for( i = 0, iaik = jak, ibij = jbj; - i < k; i++, iaik += 1, ibij += 1 ) - { B[ibij] -= B[ibkj] * A[iaik]; } - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmLUTN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLUTN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iaki, ibij, ibkj, j, jai, jbj, k; - register double t0; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, jai = 0, ibij = jbj; i < M; i++, jai += LDA, ibij += 1 ) - { - t0 = ALPHA * B[ibij]; - for( k = 0, iaki = jai, ibkj = jbj; k < i; k++, iaki += 1, ibkj += 1 ) - { t0 -= A[iaki] * B[ibkj]; } - t0 /= A[i+jai]; - B[ibij] = t0; - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmLUTU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmLUTU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iaki, ibij, ibkj, j, jai, jbj, k; - - for( j = 0, jbj = 0; j < N; j++, jbj += LDB ) - { - for( i = 0, jai = 0, ibij = jbj; i < M; i++, jai += LDA, ibij += 1 ) - { - t0 = ALPHA * B[ibij]; - for( k = 0, iaki = jai, ibkj = jbj; k < i; k++, iaki += 1, ibkj += 1 ) - { t0 -= A[iaki] * B[ibkj]; } - B[ibij] = t0; - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRLNN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRLNN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iakj, ibij, ibik, j, jaj, jbj, jbk, k; - - for( j = N-1, jaj = (N-1)*LDA, jbj = (N-1)*LDB; - j >= 0; j--, jaj -= LDA, jbj -= LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = j+1, iakj = j+1+jaj, jbk = (j+1)*LDB; - k < N; k++, iakj += 1, jbk += LDB ) - { - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= A[iakj] * B[ibik]; } - } - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] /= A[j+jaj]; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRLNU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRLNU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iakj, ibij, ibik, j, jaj, jbj, jbk, k; - - for( j = N-1, jaj = (N-1)*LDA, jbj = (N-1)*LDB; - j >= 0; j--, jaj -= LDA, jbj -= LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = j+1, iakj = j+1+jaj, jbk = (j+1)*LDB; - k < N; k++, iakj += 1, jbk += LDB ) - { - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= A[iakj] * B[ibik]; } - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRLTN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRLTN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iajk, ibij, ibik, j, jak, jbj, jbk, k; - - for( k = 0, jak = 0, jbk = 0; k < N; k++, jak += LDA, jbk += LDB ) - { - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] /= A[k+jak]; } - for( j = k+1, iajk = (k+1)+jak, jbj = (k+1)*LDB; - j < N; j++, iajk += 1, jbj += LDB ) - { - t0 = A[iajk]; - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= t0 * B[ibik]; } - } - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] *= ALPHA; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRLTU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRLTU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iajk, ibij, ibik, j, jak, jbj, jbk, k; - - for( k = 0, jak = 0, jbk = 0; k < N; k++, jak += LDA, jbk += LDB ) - { - for( j = k+1, iajk = (k+1)+jak, jbj = (k+1)*LDB; - j < N; j++, iajk += 1, jbj += LDB ) - { - t0 = A[iajk]; - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= t0 * B[ibik]; } - } - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] *= ALPHA; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRUNN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRUNN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iakj, ibij, ibik, j, jaj, jbj, jbk, k; - - for( j = 0, jaj = 0, jbj = 0; j < N; j++, jaj += LDA, jbj += LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = 0, iakj = jaj, jbk = 0; k < j; k++, iakj += 1, jbk += LDB ) - { - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= A[iakj] * B[ibik]; } - } - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] /= A[j+jaj]; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRUNU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRUNU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, iakj, ibij, ibik, j, jaj, jbj, jbk, k; - - for( j = 0, jaj = 0, jbj = 0; j < N; j++, jaj += LDA, jbj += LDB ) - { - for( i = 0, ibij = jbj; i < M; i++, ibij += 1 ) { B[ibij] *= ALPHA; } - for( k = 0, iakj = jaj, jbk = 0; k < j; k++, iakj += 1, jbk += LDB ) - { - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= A[iakj] * B[ibik]; } - } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsmRUTN -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRUTN( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iajk, ibij, ibik, j, jak, jbj, jbk, k; - - for( k = N-1, jak = (N-1)*LDA, jbk = (N-1)*LDB; - k >= 0; k--, jak -= LDA, jbk -= LDB ) - { - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] /= A[k+jak]; } - for( j = 0, iajk = jak, jbj = 0; j < k; j++, iajk += 1, jbj += LDB ) - { - t0 = A[iajk]; - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= t0 * B[ibik]; } - } - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] *= ALPHA; } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsmRUTU -( - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsmRUTU( M, N, ALPHA, A, LDA, B, LDB ) - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - register double t0; - int i, iajk, ibij, ibik, j, jak, jbj, jbk, k; - - for( k = N-1, jak = (N-1)*LDA, jbk = (N-1)*LDB; - k >= 0; k--, jak -= LDA, jbk -= LDB ) - { - for( j = 0, iajk = jak, jbj = 0; j < k; j++, iajk += 1, jbj += LDB ) - { - t0 = A[iajk]; - for( i = 0, ibij = jbj, ibik = jbk; i < M; i++, ibij += 1, ibik += 1 ) - { B[ibij] -= t0 * B[ibik]; } - } - for( i = 0, ibik = jbk; i < M; i++, ibik += 1 ) { B[ibik] *= ALPHA; } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsm0 -( - const enum HPL_SIDE SIDE, - const enum HPL_UPLO UPLO, - const enum HPL_TRANS TRANS, - const enum HPL_DIAG DIAG, - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -static void HPL_dtrsm0( SIDE, UPLO, TRANS, DIAG, M, N, ALPHA, A, LDA, B, LDB ) - const enum HPL_SIDE SIDE; - const enum HPL_UPLO UPLO; - const enum HPL_TRANS TRANS; - const enum HPL_DIAG DIAG; - const int LDA, LDB, M, N; - const double ALPHA; - const double * A; - double * B; -#endif -{ - int i, j; - - if( ( M == 0 ) || ( N == 0 ) ) return; - - if( ALPHA == HPL_rzero ) - { - for( j = 0; j < N; j++ ) - { for( i = 0; i < M; i++ ) *(B+i+j*LDB) = HPL_rzero; } - return; - } - - if( SIDE == HplLeft ) - { - if( UPLO == HplUpper ) - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmLUNN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmLUNU( M, N, ALPHA, A, LDA, B, LDB ); } - } - else - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmLUTN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmLUTU( M, N, ALPHA, A, LDA, B, LDB ); } - } - } - else - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmLLNN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmLLNU( M, N, ALPHA, A, LDA, B, LDB ); } - } - else - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmLLTN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmLLTU( M, N, ALPHA, A, LDA, B, LDB ); } - } - } - } - else - { - if( UPLO == HplUpper ) - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmRUNN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmRUNU( M, N, ALPHA, A, LDA, B, LDB ); } - } - else - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmRUTN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmRUTU( M, N, ALPHA, A, LDA, B, LDB ); } - } - } - else - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmRLNN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmRLNU( M, N, ALPHA, A, LDA, B, LDB ); } - } - else - { - if( DIAG == HplNonUnit ) - { HPL_dtrsmRLTN( M, N, ALPHA, A, LDA, B, LDB ); } - else { HPL_dtrsmRLTU( M, N, ALPHA, A, LDA, B, LDB ); } - } - } - } -} - -#endif - -#ifdef STDC_HEADERS -void HPL_dtrsm -( - const enum HPL_ORDER ORDER, - const enum HPL_SIDE SIDE, - const enum HPL_UPLO UPLO, - const enum HPL_TRANS TRANS, - const enum HPL_DIAG DIAG, - const int M, - const int N, - const double ALPHA, - const double * A, - const int LDA, - double * B, - const int LDB -) -#else -void HPL_dtrsm -( ORDER, SIDE, UPLO, TRANS, DIAG, M, N, ALPHA, A, LDA, B, LDB ) - const enum HPL_ORDER ORDER; - const enum HPL_SIDE SIDE; - const enum HPL_UPLO UPLO; - const enum HPL_TRANS TRANS; - const enum HPL_DIAG DIAG; - const int M; - const int N; - const double ALPHA; - const double * A; - const int LDA; - double * B; - const int LDB; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dtrsm solves one of the matrix equations - * - * op( A ) * X = alpha * B, or X * op( A ) = alpha * B, - * - * where alpha is a scalar, X and B are m by n matrices, A is a unit, or - * non-unit, upper or lower triangular matrix and op(A) is one of - * - * op( A ) = A or op( A ) = A^T. - * - * The matrix X is overwritten on B. - * - * No test for singularity or near-singularity is included in this - * routine. Such tests must be performed before calling this routine. - * - * Arguments - * ========= - * - * ORDER (local input) const enum HPL_ORDER - * On entry, ORDER specifies the storage format of the operands - * as follows: - * ORDER = HplRowMajor, - * ORDER = HplColumnMajor. - * - * SIDE (local input) const enum HPL_SIDE - * On entry, SIDE specifies whether op(A) appears on the left - * or right of X as follows: - * SIDE==HplLeft op( A ) * X = alpha * B, - * SIDE==HplRight X * op( A ) = alpha * B. - * - * UPLO (local input) const enum HPL_UPLO - * On entry, UPLO specifies whether the upper or lower - * triangular part of the array A is to be referenced. When - * UPLO==HplUpper, only the upper triangular part of A is to be - * referenced, otherwise only the lower triangular part of A is - * to be referenced. - * - * TRANS (local input) const enum HPL_TRANS - * On entry, TRANSA specifies the form of op(A) to be used in - * the matrix-matrix operation follows: - * TRANSA==HplNoTrans : op( A ) = A, - * TRANSA==HplTrans : op( A ) = A^T, - * TRANSA==HplConjTrans : op( A ) = A^T. - * - * DIAG (local input) const enum HPL_DIAG - * On entry, DIAG specifies whether A is unit triangular or - * not. When DIAG==HplUnit, A is assumed to be unit triangular, - * and otherwise, A is not assumed to be unit triangular. - * - * M (local input) const int - * On entry, M specifies the number of rows of the matrix B. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the matrix B. - * N must be at least zero. - * - * ALPHA (local input) const double - * On entry, ALPHA specifies the scalar alpha. When ALPHA is - * supplied as zero then the elements of the matrix B need not - * be set on input. - * - * A (local input) const double * - * On entry, A points to an array of size equal to or greater - * than LDA * k, where k is m when SIDE==HplLeft and is n - * otherwise. Before entry with UPLO==HplUpper, the leading - * k by k upper triangular part of the array A must contain the - * upper triangular matrix and the strictly lower triangular - * part of A is not referenced. When UPLO==HplLower on entry, - * the leading k by k lower triangular part of the array A must - * contain the lower triangular matrix and the strictly upper - * triangular part of A is not referenced. - * - * Note that when DIAG==HplUnit, the diagonal elements of A - * not referenced either, but are assumed to be unity. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of A as - * declared in the calling (sub) program. LDA must be at - * least MAX(1,m) when SIDE==HplLeft, and MAX(1,n) otherwise. - * - * B (local input/output) double * - * On entry, B points to an array of size equal to or greater - * than LDB * n. Before entry, the leading m by n part of the - * array B must contain the matrix B, except when beta is zero, - * in which case B need not be set on entry. On exit, the array - * B is overwritten by the m by n solution matrix. - * - * LDB (local input) const int - * On entry, LDB specifies the leading dimension of B as - * declared in the calling (sub) program. LDB must be at - * least MAX(1,m). - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dtrsm( ORDER, SIDE, UPLO, TRANS, DIAG, M, N, ALPHA, A, LDA, B, LDB ); -#endif -#ifdef HPL_CALL_VSIPL - if( ORDER == HplColumnMajor ) - { - HPL_dtrsm0( SIDE, UPLO, TRANS, DIAG, M, N, ALPHA, A, LDA, B, LDB ); - } - else - { - HPL_dtrsm0( ( SIDE == HplRight ? HplLeft : HplRight ), - ( UPLO == HplLower ? HplUpper : HplLower ), - TRANS, DIAG, N, M, ALPHA, A, LDA, B, LDB ); - } -#endif -#ifdef HPL_CALL_FBLAS - double alpha = ALPHA; -#ifdef StringSunStyle -#if defined( HPL_USE_F77_INTEGER_DEF ) - F77_INTEGER IONE = 1; -#else - int IONE = 1; -#endif -#endif -#ifdef StringStructVal - F77_CHAR fside; - F77_CHAR fuplo; - F77_CHAR ftran; - F77_CHAR fdiag; -#endif -#ifdef StringStructPtr - F77_CHAR fside; - F77_CHAR fuplo; - F77_CHAR ftran; - F77_CHAR fdiag; -#endif -#ifdef StringCrayStyle - F77_CHAR fside; - F77_CHAR fuplo; - F77_CHAR ftran; - F77_CHAR fdiag; -#endif -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77M = M, F77N = N, - F77lda = LDA, F77ldb = LDB; -#else -#define F77M M -#define F77N N -#define F77lda LDA -#define F77ldb LDB -#endif - char cside, cuplo, ctran, cdiag; - - if( TRANS == HplNoTrans ) ctran = 'N'; - else if( TRANS == HplTrans ) ctran = 'T'; - else ctran = 'C'; - cdiag = ( DIAG == HplUnit ? 'U' : 'N' ); - - if( ORDER == HplColumnMajor ) - { - cside = ( SIDE == HplRight ? 'R' : 'L' ); - cuplo = ( UPLO == HplLower ? 'L' : 'U' ); -#ifdef StringSunStyle - F77dtrsm( &cside, &cuplo, &ctran, &cdiag, &F77M, &F77N, &alpha, - A, &F77lda, B, &F77ldb, IONE, IONE, IONE, IONE ); -#endif -#ifdef StringCrayStyle - fside = HPL_C2F_CHAR( cside ); fuplo = HPL_C2F_CHAR( cuplo ); - ftran = HPL_C2F_CHAR( ctran ); fdiag = HPL_C2F_CHAR( cdiag ); - F77dtrsm( fside, fuplo, ftran, fdiag, &F77M, &F77N, &alpha, - A, &F77lda, B, &F77ldb ); -#endif -#ifdef StringStructVal - fside.len = 1; fside.cp = &cside; fuplo.len = 1; fuplo.cp = &cuplo; - ftran.len = 1; ftran.cp = &ctran; fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsm( fside, fuplo, ftran, fdiag, &F77M, &F77N, &alpha, - A, &F77lda, B, &F77ldb ); -#endif -#ifdef StringStructPtr - fside.len = 1; fside.cp = &cside; fuplo.len = 1; fuplo.cp = &cuplo; - ftran.len = 1; ftran.cp = &ctran; fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsm( &fside, &fuplo, &ftran, &fdiag, &F77M, &F77N, &alpha, - A, &F77lda, B, &F77ldb ); -#endif - } - else - { - cside = ( SIDE == HplRight ? 'L' : 'R' ); - cuplo = ( UPLO == HplLower ? 'U' : 'L' ); -#ifdef StringSunStyle - F77dtrsm( &cside, &cuplo, &ctran, &cdiag, &F77N, &F77M, &alpha, - A, &F77lda, B, &F77ldb, IONE, IONE, IONE, IONE ); -#endif -#ifdef StringCrayStyle - fside = HPL_C2F_CHAR( cside ); fuplo = HPL_C2F_CHAR( cuplo ); - ftran = HPL_C2F_CHAR( ctran ); fdiag = HPL_C2F_CHAR( cdiag ); - F77dtrsm( fside, fuplo, ftran, fdiag, &F77N, &F77M, &alpha, - A, &F77lda, B, &F77ldb ); -#endif -#ifdef StringStructVal - fside.len = 1; fside.cp = &cside; fuplo.len = 1; fuplo.cp = &cuplo; - ftran.len = 1; ftran.cp = &ctran; fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsm( fside, fuplo, ftran, fdiag, &F77N, &F77M, &alpha, - A, &F77lda, B, &F77ldb ); -#endif -#ifdef StringStructPtr - fside.len = 1; fside.cp = &cside; fuplo.len = 1; fuplo.cp = &cuplo; - ftran.len = 1; ftran.cp = &ctran; fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsm( &fside, &fuplo, &ftran, &fdiag, &F77N, &F77M, &alpha, - A, &F77lda, B, &F77ldb ); -#endif - } -#endif -/* - * End of HPL_dtrsm - */ -} - -#endif diff --git a/hpl/src/blas/HPL_dtrsv.c b/hpl/src/blas/HPL_dtrsv.c deleted file mode 100644 index 55b31f751b834d6be14d9986d760919157f83c14..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_dtrsv.c +++ /dev/null @@ -1,520 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_dtrsv - -#ifdef HPL_CALL_VSIPL - -#ifdef STDC_HEADERS -static void HPL_dtrsvLNN -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvLNN( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx, ldap1 = LDA + 1; - register double t0; - - for( j = 0, jaj = 0, jx = 0; j < N; j++, jaj += ldap1, jx += INCX ) - { - X[jx] /= A[jaj]; t0 = X[jx]; - for( i = j+1, iaij = jaj+1, ix = jx + INCX; - i < N; i++, iaij += 1, ix += INCX ) { X[ix] -= t0 * A[iaij]; } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsvLNU -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvLNU( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx, ldap1 = LDA + 1; - register double t0; - - for( j = 0, jaj = 0, jx = 0; j < N; j++, jaj += ldap1, jx += INCX ) - { - t0 = X[jx]; - for( i = j+1, iaij = jaj+1, ix = jx + INCX; - i < N; i++, iaij += 1, ix += INCX ) { X[ix] -= t0 * A[iaij]; } - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsvLTN -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvLTN( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx, ldap1 = LDA + 1; - register double t0; - - for( j = N-1, jaj = (N-1)*(ldap1), jx = (N-1)*INCX; - j >= 0; j--, jaj -= ldap1, jx -= INCX ) - { - t0 = X[jx]; - for( i = j+1, iaij = 1+jaj, ix = jx + INCX; - i < N; i++, iaij += 1, ix += INCX ) { t0 -= A[iaij] * X[ix]; } - t0 /= A[jaj]; X[jx] = t0; - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsvLTU -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvLTU( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx, ldap1 = LDA + 1; - register double t0; - - for( j = N-1, jaj = (N-1)*(ldap1), jx = (N-1)*INCX; - j >= 0; j--, jaj -= ldap1, jx -= INCX ) - { - t0 = X[jx]; - for( i = j+1, iaij = 1+jaj, ix = jx + INCX; - i < N; i++, iaij += 1, ix += INCX ) { t0 -= A[iaij] * X[ix]; } - X[jx] = t0; - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsvUNN -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvUNN( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx; - register double t0; - - for( j = N-1, jaj = (N-1)*LDA, jx = (N-1)*INCX; - j >= 0; j--, jaj -= LDA, jx -= INCX ) - { - X[jx] /= A[j+jaj]; t0 = X[jx]; - for( i = 0, iaij = jaj, ix = 0; i < j; i++, iaij += 1, ix += INCX ) - { X[ix] -= t0 * A[iaij]; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsvUNU -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvUNU( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx; - register double t0; - - for( j = N-1, jaj = (N-1)*LDA, jx = (N-1)*INCX; - j >= 0; j--, jaj -= LDA, jx -= INCX ) - { - t0 = X[jx]; - for( i = 0, iaij = jaj, ix = 0; i < j; i++, iaij += 1, ix += INCX ) - { X[ix] -= t0 * A[iaij]; } - } -} - - -#ifdef STDC_HEADERS -static void HPL_dtrsvUTN -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvUTN( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx; - register double t0; - - for( j = 0, jaj = 0,jx = 0; j < N; j++, jaj += LDA, jx += INCX ) - { - t0 = X[jx]; - for( i = 0, iaij = jaj, ix = 0; i < j; i++, iaij += 1, ix += INCX ) - { t0 -= A[iaij] * X[ix]; } - t0 /= A[iaij]; X[jx] = t0; - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsvUTU -( - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsvUTU( N, A, LDA, X, INCX ) - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - int i, iaij, ix, j, jaj, jx; - register double t0; - - for( j = 0, jaj = 0, jx = 0; j < N; j++, jaj += LDA, jx += INCX ) - { - t0 = X[jx]; - for( i = 0, iaij = jaj, ix = 0; i < j; i++, iaij += 1, ix += INCX ) - { t0 -= A[iaij] * X[ix]; } - X[jx] = t0; - } -} - -#ifdef STDC_HEADERS -static void HPL_dtrsv0 -( - const enum HPL_UPLO UPLO, - const enum HPL_TRANS TRANS, - const enum HPL_DIAG DIAG, - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -static void HPL_dtrsv0( UPLO, TRANS, DIAG, N, A, LDA, X, INCX ) - const enum HPL_UPLO UPLO; - const enum HPL_TRANS TRANS; - const enum HPL_DIAG DIAG; - const int INCX, LDA, N; - const double * A; - double * X; -#endif -{ - if( N == 0 ) return; - - if( UPLO == HplUpper ) - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) { HPL_dtrsvUNN( N, A, LDA, X, INCX ); } - else { HPL_dtrsvUNU( N, A, LDA, X, INCX ); } - } - else - { - if( DIAG == HplNonUnit ) { HPL_dtrsvUTN( N, A, LDA, X, INCX ); } - else { HPL_dtrsvUTU( N, A, LDA, X, INCX ); } - } - } - else - { - if( TRANS == HplNoTrans ) - { - if( DIAG == HplNonUnit ) { HPL_dtrsvLNN( N, A, LDA, X, INCX ); } - else { HPL_dtrsvLNU( N, A, LDA, X, INCX ); } - } - else - { - if( DIAG == HplNonUnit ) { HPL_dtrsvLTN( N, A, LDA, X, INCX ); } - else { HPL_dtrsvLTU( N, A, LDA, X, INCX ); } - } - } -} - -#endif - -#ifdef STDC_HEADERS -void HPL_dtrsv -( - const enum HPL_ORDER ORDER, - const enum HPL_UPLO UPLO, - const enum HPL_TRANS TRANS, - const enum HPL_DIAG DIAG, - const int N, - const double * A, - const int LDA, - double * X, - const int INCX -) -#else -void HPL_dtrsv -( ORDER, UPLO, TRANS, DIAG, N, A, LDA, X, INCX ) - const enum HPL_ORDER ORDER; - const enum HPL_UPLO UPLO; - const enum HPL_TRANS TRANS; - const enum HPL_DIAG DIAG; - const int N; - const double * A; - const int LDA; - double * X; - const int INCX; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dtrsv solves one of the systems of equations - * - * A * x = b, or A^T * x = b, - * - * where b and x are n-element vectors and A is an n by n non-unit, or - * unit, upper or lower triangular matrix. - * - * No test for singularity or near-singularity is included in this - * routine. Such tests must be performed before calling this routine. - * - * Arguments - * ========= - * - * ORDER (local input) const enum HPL_ORDER - * On entry, ORDER specifies the storage format of the operands - * as follows: - * ORDER = HplRowMajor, - * ORDER = HplColumnMajor. - * - * UPLO (local input) const enum HPL_UPLO - * On entry, UPLO specifies whether the upper or lower - * triangular part of the array A is to be referenced. When - * UPLO==HplUpper, only the upper triangular part of A is to be - * referenced, otherwise only the lower triangular part of A is - * to be referenced. - * - * TRANS (local input) const enum HPL_TRANS - * On entry, TRANS specifies the equations to be solved as - * follows: - * TRANS==HplNoTrans A * x = b, - * TRANS==HplTrans A^T * x = b. - * - * DIAG (local input) const enum HPL_DIAG - * On entry, DIAG specifies whether A is unit triangular or - * not. When DIAG==HplUnit, A is assumed to be unit triangular, - * and otherwise, A is not assumed to be unit triangular. - * - * N (local input) const int - * On entry, N specifies the order of the matrix A. N must be at - * least zero. - * - * A (local input) const double * - * On entry, A points to an array of size equal to or greater - * than LDA * n. Before entry with UPLO==HplUpper, the leading - * n by n upper triangular part of the array A must contain the - * upper triangular matrix and the strictly lower triangular - * part of A is not referenced. When UPLO==HplLower on entry, - * the leading n by n lower triangular part of the array A must - * contain the lower triangular matrix and the strictly upper - * triangular part of A is not referenced. - * - * Note that when DIAG==HplUnit, the diagonal elements of A - * not referenced either, but are assumed to be unity. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of A as - * declared in the calling (sub) program. LDA must be at - * least MAX(1,n). - * - * X (local input/output) double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * Before entry, the incremented array X must contain the n - * element right-hand side vector b. On exit, X is overwritten - * with the solution vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - cblas_dtrsv( ORDER, UPLO, TRANS, DIAG, N, A, LDA, X, INCX ); -#endif -#ifdef HPL_CALL_VSIPL - if( ORDER == HplColumnMajor ) - { - HPL_dtrsv0( UPLO, TRANS, DIAG, N, A, LDA, X, INCX ); - } - else - { - HPL_dtrsv0( ( UPLO == HplUpper ? HplLower : HplUpper ), - ( TRANS == HplNoTrans ? HplTrans : HplNoTrans ), - DIAG, N, A, LDA, X, INCX ); - } -#endif -#ifdef HPL_CALL_FBLAS -#ifdef StringSunStyle -#ifdef HPL_USE_F77_INTEGER_DEF - F77_INTEGER IONE = 1; -#else - int IONE = 1; -#endif -#endif -#ifdef StringStructVal - F77_CHAR fuplo, ftran, fdiag; -#endif -#ifdef StringStructPtr - F77_CHAR fuplo, ftran, fdiag; -#endif -#ifdef StringCrayStyle - F77_CHAR fuplo, ftran, fdiag; -#endif - -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77lda = LDA, F77incx = INCX; -#else -#define F77N N -#define F77lda LDA -#define F77incx INCX -#endif - char cuplo, ctran, cdiag; - - if( ORDER == HplColumnMajor ) - { - cuplo = ( UPLO == HplUpper ? 'U' : 'L' ); - ctran = ( TRANS == HplNoTrans ? 'N' : 'T' ); - } - else - { - cuplo = ( UPLO == HplUpper ? 'L' : 'U' ); - ctran = ( TRANS == HplNoTrans ? 'T' : 'N' ); - } - cdiag = ( DIAG == HplNonUnit ? 'N' : 'U' ); - -#ifdef StringSunStyle - F77dtrsv( &cuplo, &ctran, &cdiag, &F77N, A, &F77lda, X, &F77incx, - IONE, IONE, IONE ); -#endif -#ifdef StringCrayStyle - ftran = HPL_C2F_CHAR( ctran ); fdiag = HPL_C2F_CHAR( cdiag ); - fuplo = HPL_C2F_CHAR( cuplo ); - F77dtrsv( fuplo, ftran, fdiag, &F77N, A, &F77lda, X, &F77incx ); -#endif -#ifdef StringStructVal - fuplo.len = 1; fuplo.cp = &cuplo; ftran.len = 1; ftran.cp = &ctran; - fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsv( fuplo, ftran, fdiag, &F77N, A, &F77lda, X, &F77incx ); -#endif -#ifdef StringStructPtr - fuplo.len = 1; fuplo.cp = &cuplo; ftran.len = 1; ftran.cp = &ctran; - fdiag.len = 1; fdiag.cp = &cdiag; - F77dtrsv( &fuplo, &ftran, &fdiag, &F77N, A, &F77lda, X, &F77incx ); -#endif - -#endif -/* - * End of HPL_dtrsv - */ -} - -#endif diff --git a/hpl/src/blas/HPL_idamax.c b/hpl/src/blas/HPL_idamax.c deleted file mode 100644 index 01882eb01498202ddc70ded2373d2026d91300b1..0000000000000000000000000000000000000000 --- a/hpl/src/blas/HPL_idamax.c +++ /dev/null @@ -1,167 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifndef HPL_idamax - -#ifdef STDC_HEADERS -int HPL_idamax -( - const int N, - const double * X, - const int INCX -) -#else -int HPL_idamax -( N, X, INCX ) - const int N; - const double * X; - const int INCX; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_idamax returns the index in an n-vector x of the first element - * having maximum absolute value. - * - * Arguments - * ========= - * - * N (local input) const int - * On entry, N specifies the length of the vector x. N must be - * at least zero. - * - * X (local input) const double * - * On entry, X is an incremented array of dimension at least - * ( 1 + ( n - 1 ) * abs( INCX ) ) that contains the vector x. - * - * INCX (local input) const int - * On entry, INCX specifies the increment for the elements of X. - * INCX must not be zero. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_CALL_CBLAS - return( (int)(cblas_idamax( N, X, INCX )) ); -#endif -#ifdef HPL_CALL_VSIPL - register double absxi, smax = HPL_rzero, x0, x1, x2, x3, - x4, x5, x6, x7; - const double * StX; - register int imax = 0, i = 0, j; - int nu; - const int incX2 = 2 * INCX, incX3 = 3 * INCX, - incX4 = 4 * INCX, incX5 = 5 * INCX, - incX6 = 6 * INCX, incX7 = 7 * INCX, - incX8 = 8 * INCX; - - if( N > 0 ) - { - if( ( nu = ( N >> 3 ) << 3 ) != 0 ) - { - StX = X + nu * INCX; - - do - { - x0 = (*X); x4 = X[incX4]; x1 = X[INCX ]; x5 = X[incX5]; - x2 = X[incX2]; x6 = X[incX6]; x3 = X[incX3]; x7 = X[incX7]; - - absxi = Mabs( x0 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x1 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x2 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x3 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x4 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x5 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x6 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - absxi = Mabs( x7 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - - X += incX8; - - } while( X != StX ); - } - - for( j = N - nu; j != 0; j-- ) - { - x0 = (*X); - absxi = Mabs( x0 ); if( absxi > smax ) { imax = i; smax = absxi; } - i += 1; - X += INCX; - } - } - return( imax ); -#endif -#ifdef HPL_CALL_FBLAS -#ifdef HPL_USE_F77_INTEGER_DEF - const F77_INTEGER F77N = N, F77incx = INCX; -#else -#define F77N N -#define F77incx INCX -#endif - int imax = 0; - - if( N > 0 ) imax = F77idamax( &F77N, X, &F77incx ) - 1; - return( imax ); -#endif -/* - * End of HPL_idamax - */ -} - -#endif diff --git a/hpl/src/blas/intel64/Make.inc b/hpl/src/blas/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/blas/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/blas/intel64/Makefile b/hpl/src/blas/intel64/Makefile deleted file mode 100644 index e0dce425071d4277a56c6087b9fb344455e7284f..0000000000000000000000000000000000000000 --- a/hpl/src/blas/intel64/Makefile +++ /dev/null @@ -1,98 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h -# -## Object files ######################################################## -# -HPL_blaobj = \ - HPL_dcopy.o HPL_daxpy.o HPL_dscal.o \ - HPL_idamax.o HPL_dgemv.o HPL_dtrsv.o \ - HPL_dger.o HPL_dgemm.o HPL_dtrsm.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_blaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_blaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dcopy.o : ../HPL_dcopy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dcopy.c -HPL_daxpy.o : ../HPL_daxpy.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_daxpy.c -HPL_dscal.o : ../HPL_dscal.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dscal.c -HPL_idamax.o : ../HPL_idamax.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_idamax.c -HPL_dgemv.o : ../HPL_dgemv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dgemv.c -HPL_dtrsv.o : ../HPL_dtrsv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dtrsv.c -HPL_dger.o : ../HPL_dger.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dger.c -HPL_dgemm.o : ../HPL_dgemm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dgemm.c -HPL_dtrsm.o : ../HPL_dtrsm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dtrsm.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/blas/intel64/lib.grd b/hpl/src/blas/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/comm/HPL_1rinM.c b/hpl/src/comm/HPL_1rinM.c deleted file mode 100644 index 2e9de1c03f9444e9cb75f76ab3a6bf328ce961e1..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_1rinM.c +++ /dev/null @@ -1,224 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#ifdef STDC_HEADERS -int HPL_binit_1rinM -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_1rinM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -/* - * Create the MPI user-defined data type - */ - ierr = HPL_packL( PANEL, 0, PANEL->len, 0 ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); - - return( HPL_SUCCESS ); -#endif -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF PANEL->buffers[0] -#define _M_COUNT PANEL->counts[0] -#define _M_TYPE PANEL->dtypes[0] - -#else - -#define _M_BUFF (void *)(PANEL->L2) -#define _M_COUNT PANEL->len -#define _M_TYPE MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_1rinM -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_1rinM( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int ierr, go, next, msgid, partner, prev, - rank, root, size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: If I am the root process, then send message to its two - * next neighbors. Otherwise, probe for message. If the message is here, - * then receive it, and if I am not the last process of the ring, or - * just after the root process, then forward it to the next. Otherwise, - * inform the caller that the panel has still not been received. - */ - rank = PANEL->grid->mycol; comm = PANEL->grid->row_comm; - root = PANEL->pcol; msgid = PANEL->msgid; - next = MModAdd1( rank, size ); - - if( rank == root ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, comm ); - if( ( ierr == MPI_SUCCESS ) && ( size > 2 ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, MModAdd1( next, - size ), msgid, comm ); - } - } - else - { - prev = MModSub1( rank, size ); - if( ( size > 2 ) && - ( MModSub1( prev, size ) == root ) ) partner = root; - else partner = prev; - - ierr = MPI_Iprobe( partner, msgid, comm, &go, &PANEL->status[0] ); - - if( ierr == MPI_SUCCESS ) - { - if( go != 0 ) - { - ierr = MPI_Recv( _M_BUFF, _M_COUNT, _M_TYPE, partner, msgid, - comm, &PANEL->status[0] ); - if( ( ierr == MPI_SUCCESS ) && - ( prev != root ) && ( next != root ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, - comm ); - } - } - else { *IFLAG = HPL_KEEP_TESTING; return( *IFLAG ); } - } - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_1rinM -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_1rinM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -/* - * Release the arrays of request / status / data-types and buffers - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_free( &PANEL->dtypes[0] ); - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else - return( HPL_SUCCESS ); -#endif -} diff --git a/hpl/src/comm/HPL_1ring.c b/hpl/src/comm/HPL_1ring.c deleted file mode 100644 index 72d3924a4fef168e2debc23c5e7cc7c67d4c07b1..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_1ring.c +++ /dev/null @@ -1,216 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#ifdef STDC_HEADERS -int HPL_binit_1ring -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_1ring( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -/* - * Create the MPI user-defined data type - */ - ierr = HPL_packL( PANEL, 0, PANEL->len, 0 ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); - - return( HPL_SUCCESS ); -#endif -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF PANEL->buffers[0] -#define _M_COUNT PANEL->counts[0] -#define _M_TYPE PANEL->dtypes[0] - -#else - -#define _M_BUFF (void *)(PANEL->L2) -#define _M_COUNT PANEL->len -#define _M_TYPE MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_1ring -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_1ring( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int ierr, go, next, msgid, prev, rank, root, - size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: If I am the root process, start spreading the panel. If - * I am not the root process, probe for message. If the message is here, - * then receive it, and if I am not the last process of the ring, then - * forward it to the next. Otherwise, inform the caller that the panel - * has still not been received. - */ - rank = PANEL->grid->mycol; comm = PANEL->grid->row_comm; - root = PANEL->pcol; msgid = PANEL->msgid; - - if( rank == root ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, MModAdd1( rank, - size ), msgid, comm ); - } - else - { - prev = MModSub1( rank, size ); - - ierr = MPI_Iprobe( prev, msgid, comm, &go, &PANEL->status[0] ); - - if( ierr == MPI_SUCCESS ) - { - if( go != 0 ) - { - ierr = MPI_Recv( _M_BUFF, _M_COUNT, _M_TYPE, prev, msgid, - comm, &PANEL->status[0] ); - next = MModAdd1( rank, size ); - if( ( ierr == MPI_SUCCESS ) && ( next != root ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, - msgid, comm ); - } - } - else { *IFLAG = HPL_KEEP_TESTING; return( *IFLAG ); } - } - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_1ring -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_1ring( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -/* - * Release the arrays of request / status / data-types and buffers - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_free( &PANEL->dtypes[0] ); - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else - return( HPL_SUCCESS ); -#endif -} diff --git a/hpl/src/comm/HPL_2rinM.c b/hpl/src/comm/HPL_2rinM.c deleted file mode 100644 index bb176b4cbf04102cd8bd792af1f46c56549f83af..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_2rinM.c +++ /dev/null @@ -1,236 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#ifdef STDC_HEADERS -int HPL_binit_2rinM -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_2rinM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -/* - * Create the MPI user-defined data type - */ - ierr = HPL_packL( PANEL, 0, PANEL->len, 0 ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); - - return( HPL_SUCCESS ); -#endif -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF PANEL->buffers[0] -#define _M_COUNT PANEL->counts[0] -#define _M_TYPE PANEL->dtypes[0] - -#else - -#define _M_BUFF (void *)(PANEL->L2) -#define _M_COUNT PANEL->len -#define _M_TYPE MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_2rinM -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_2rinM( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int ierr, go, next, msgid, partner, prev, - rank, roo2, root, size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: root process send to its two right neighbors and mid-pro- - * cess. If I am not the root process, probe for message. If the message - * is there, then receive it. If I am not the last process of both rings - * then forward it to the next. Otherwise, inform the caller that the - * panel has still not been received. - */ - rank = PANEL->grid->mycol; comm = PANEL->grid->row_comm; - root = PANEL->pcol; msgid = PANEL->msgid; - next = MModAdd1( rank, size ); roo2 = ( ( size + 1 ) >> 1 ); - roo2 = MModAdd( root, roo2, size ); - - if( rank == root ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, comm ); - - if( ( ierr == MPI_SUCCESS ) && ( size > 2 ) ) - { - if( MModAdd1( next, size ) != roo2 ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, - MModAdd1( next, size ), msgid, comm ); - } - - if( ierr == MPI_SUCCESS ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, roo2, msgid, - comm ); - } - } - } - else - { - prev = MModSub1( rank, size ); - if( ( prev == root ) || ( rank == roo2 ) || - ( MModSub1( prev, size ) == root ) ) partner = root; - else partner = prev; - - ierr = MPI_Iprobe( partner, msgid, comm, &go, &PANEL->status[0] ); - - if( ierr == MPI_SUCCESS ) - { - if( go != 0 ) - { - ierr = MPI_Recv( _M_BUFF, _M_COUNT, _M_TYPE, partner, msgid, - comm, &PANEL->status[0] ); - if( ( ierr == MPI_SUCCESS ) && ( prev != root ) && - ( next != roo2 ) && ( next != root ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, - comm ); - } - } - else { *IFLAG = HPL_KEEP_TESTING; return( *IFLAG ); } - } - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_2rinM -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_2rinM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -/* - * Release the arrays of request / status / data-types and buffers - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_free( &PANEL->dtypes[0] ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else - return( HPL_SUCCESS ); -#endif -} diff --git a/hpl/src/comm/HPL_2ring.c b/hpl/src/comm/HPL_2ring.c deleted file mode 100644 index 85f18ae0d7d47a6847109be09019aca7198c28ca..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_2ring.c +++ /dev/null @@ -1,224 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#ifdef STDC_HEADERS -int HPL_binit_2ring -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_2ring( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -/* - * Create the MPI user-defined data type - */ - ierr = HPL_packL( PANEL, 0, PANEL->len, 0 ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); - - return( HPL_SUCCESS ); -#endif -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF PANEL->buffers[0] -#define _M_COUNT PANEL->counts[0] -#define _M_TYPE PANEL->dtypes[0] - -#else - -#define _M_BUFF (void *)(PANEL->L2) -#define _M_COUNT PANEL->len -#define _M_TYPE MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_2ring -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_2ring( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int ierr, go, next, msgid, partner, rank, - roo2, root, size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: root process send to its right neighbor and mid-process. - * If I am not the root process, probe for message. If the message is - * there, then receive it, and if I am not the last process of both - * rings, then forward it to the next. Otherwise, inform the caller that - * the panel has still not been received. - */ - rank = PANEL->grid->mycol; comm = PANEL->grid->row_comm; - root = PANEL->pcol; msgid = PANEL->msgid; - next = MModAdd1( rank, size ); roo2 = ( ( size + 1 ) >> 1 ); - roo2 = MModAdd( root, roo2, size ); - - if( rank == root ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, comm ); - if( ( ierr == MPI_SUCCESS ) && ( size > 2 ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, roo2, msgid, - comm ); - } - } - else - { - partner = MModSub1( rank, size ); - if( ( partner == root ) || ( rank == roo2 ) ) partner = root; - - ierr = MPI_Iprobe( partner, msgid, comm, &go, &PANEL->status[0] ); - - if( ierr == MPI_SUCCESS ) - { - if( go != 0 ) - { - ierr = MPI_Recv( _M_BUFF, _M_COUNT, _M_TYPE, partner, msgid, - comm, &PANEL->status[0] ); - if( ( ierr == MPI_SUCCESS ) && - ( next != roo2 ) && ( next != root ) ) - { - ierr = MPI_Send( _M_BUFF, _M_COUNT, _M_TYPE, next, msgid, - comm ); - } - } - else { *IFLAG = HPL_KEEP_TESTING; return( *IFLAG ); } - } - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_2ring -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_2ring( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -/* - * Release the arrays of request / status / data-types and buffers - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_free( &PANEL->dtypes[0] ); - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -#else - return( HPL_SUCCESS ); -#endif -} diff --git a/hpl/src/comm/HPL_bcast.c b/hpl/src/comm/HPL_bcast.c deleted file mode 100644 index b11840a5580bb328d6d02205f090d6aafed6cbad..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_bcast.c +++ /dev/null @@ -1,118 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_bcast -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast -( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_bcast broadcasts the current panel. Successful completion is - * indicated by IFLAG set to HPL_SUCCESS on return. IFLAG will be set to - * HPL_FAILURE on failure and to HPL_KEEP_TESTING when the operation was - * not completed, in which case this function should be called again. - * - * Arguments - * ========= - * - * PANEL (input/output) HPL_T_panel * - * On entry, PANEL points to the current panel data structure - * being broadcast. - * - * IFLAG (output) int * - * On exit, IFLAG indicates whether or not the broadcast has - * occured. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int ierr; - HPL_T_TOP top; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Retrieve the selected virtual broadcast topology - */ - top = PANEL->algo->btopo; - - switch( top ) - { - case HPL_1RING_M : ierr = HPL_bcast_1rinM( PANEL, IFLAG ); break; - case HPL_1RING : ierr = HPL_bcast_1ring( PANEL, IFLAG ); break; - case HPL_2RING_M : ierr = HPL_bcast_2rinM( PANEL, IFLAG ); break; - case HPL_2RING : ierr = HPL_bcast_2ring( PANEL, IFLAG ); break; - case HPL_BLONG_M : ierr = HPL_bcast_blonM( PANEL, IFLAG ); break; - case HPL_BLONG : ierr = HPL_bcast_blong( PANEL, IFLAG ); break; - default : ierr = HPL_SUCCESS; - } - - return( ierr ); -/* - * End of HPL_bcast - */ -} diff --git a/hpl/src/comm/HPL_binit.c b/hpl/src/comm/HPL_binit.c deleted file mode 100644 index 30d16830b441f6aabbfb4a51fe0812c45b4bd6bb..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_binit.c +++ /dev/null @@ -1,108 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_binit -( - HPL_T_panel * PANEL -) -#else -int HPL_binit -( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_binit initializes a row broadcast. Successful completion is - * indicated by the returned error code HPL_SUCCESS. - * - * Arguments - * ========= - * - * PANEL (input/output) HPL_T_panel * - * On entry, PANEL points to the current panel data structure - * being broadcast. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int ierr; - HPL_T_TOP top; -/* .. - * .. Executable Statements .. - */ - if( PANEL->grid->npcol <= 1 ) return( HPL_SUCCESS ); -/* - * Retrieve the selected virtual broadcast topology - */ - top = PANEL->algo->btopo; - - switch( top ) - { - case HPL_1RING_M : ierr = HPL_binit_1rinM( PANEL ); break; - case HPL_1RING : ierr = HPL_binit_1ring( PANEL ); break; - case HPL_2RING_M : ierr = HPL_binit_2rinM( PANEL ); break; - case HPL_2RING : ierr = HPL_binit_2ring( PANEL ); break; - case HPL_BLONG_M : ierr = HPL_binit_blonM( PANEL ); break; - case HPL_BLONG : ierr = HPL_binit_blong( PANEL ); break; - default : ierr = HPL_SUCCESS; - } - - return( ierr ); -/* - * End of HPL_binit - */ -} diff --git a/hpl/src/comm/HPL_blonM.c b/hpl/src/comm/HPL_blonM.c deleted file mode 100644 index 997e34b38f16303fb9c832dc455f633be1afc1d1..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_blonM.c +++ /dev/null @@ -1,445 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#define I_SEND 0 -#define I_RECV 1 - -#ifdef STDC_HEADERS -int HPL_binit_blonM -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_blonM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif - return( HPL_SUCCESS ); -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF_S1 PANEL->buffers[I_SEND] -#define _M_COUNT_S1 PANEL->counts[I_SEND] -#define _M_TYPE_S1 PANEL->dtypes[I_SEND] - -#define _M_BUFF_S2 PANEL->buffers[I_SEND] -#define _M_COUNT_S2 PANEL->counts[I_SEND] -#define _M_TYPE_S2 PANEL->dtypes[I_SEND] - -#define _M_BUFF_R1 PANEL->buffers[I_RECV] -#define _M_COUNT_R1 PANEL->counts[I_RECV] -#define _M_TYPE_R1 PANEL->dtypes[I_RECV] - -#define _M_BUFF_R2 PANEL->buffers[I_RECV] -#define _M_COUNT_R2 PANEL->counts[I_RECV] -#define _M_TYPE_R2 PANEL->dtypes[I_RECV] - -#define _M_ROLL_BUFF_S PANEL->buffers[I_SEND] -#define _M_ROLL_COUNT_S PANEL->counts[I_SEND] -#define _M_ROLL_TYPE_S PANEL->dtypes[I_SEND] - -#define _M_ROLL_BUFF_R PANEL->buffers[I_RECV] -#define _M_ROLL_COUNT_R PANEL->counts[I_RECV] -#define _M_ROLL_TYPE_R PANEL->dtypes[I_RECV] - -#else - -#define _M_BUFF_S1 (void *)(PANEL->L2) -#define _M_COUNT_S1 PANEL->len -#define _M_TYPE_S1 MPI_DOUBLE - -#define _M_BUFF_S2 (void *)(PANEL->L2 + ibuf) -#define _M_COUNT_S2 lbuf -#define _M_TYPE_S2 MPI_DOUBLE - -#define _M_BUFF_R1 (void *)(PANEL->L2) -#define _M_COUNT_R1 PANEL->len -#define _M_TYPE_R1 MPI_DOUBLE - -#define _M_BUFF_R2 (void *)(PANEL->L2 + ibuf) -#define _M_COUNT_R2 lbuf -#define _M_TYPE_R2 MPI_DOUBLE - -#define _M_ROLL_BUFF_S (void *)(PANEL->L2 + ibufS) -#define _M_ROLL_COUNT_S lbufS -#define _M_ROLL_TYPE_S MPI_DOUBLE -#define _M_ROLL_BUFF_R (void *)(PANEL->L2 + ibufR) -#define _M_ROLL_COUNT_R lbufR -#define _M_ROLL_TYPE_R MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_blonM -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_blonM( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int COUNT, count, go=1, ierr=MPI_SUCCESS, ibuf, - ibufR, ibufS, dummy=0, indx, ip2=1, k, l, - lbuf, lbufR, lbufS, mask=1, msgid, mydist, - mydist2, next, npm1, npm2, partner, prev, - rank, root, size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: root process sends to its right neighbor, then spread - * the panel on the other npcol - 2 processes. If I am not the root - * process, probe for message received. If the message is there, then - * receive it. If I am just after the root process, return. Otherwise, - * keep spreading on those npcol - 2 processes. Otherwise, inform the - * caller that the panel has still not been received. - */ - comm = PANEL->grid->row_comm; rank = PANEL->grid->mycol; - root = PANEL->pcol; msgid = PANEL->msgid; - prev = MModSub1( rank, size ); - - if( rank == root ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, 0, PANEL->len, I_SEND ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Ssend( _M_BUFF_S1, _M_COUNT_S1, _M_TYPE_S1, - MModAdd1( rank, size ), msgid, comm ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_SEND] ); -#endif - } - else if( prev == root ) - { -/* - * This probing mechanism causes problems when lookhead is on. Too many - * messages are exchanged in this virtual topology causing a hang on - * some machines. It is currently disabled until a better understanding - * is acquired. - * - * ierr = MPI_Iprobe( root, msgid, comm, &go, &PANEL->status[0] ); - */ - if( ierr == MPI_SUCCESS ) - { /* if panel is here, proceed */ - if( go != 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - ierr = HPL_packL( PANEL, 0, PANEL->len, I_RECV ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( _M_BUFF_R1, _M_COUNT_R1, _M_TYPE_R1, - root, msgid, comm, &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_RECV] ); -#endif - } - else { *IFLAG = HPL_KEEP_TESTING; return( HPL_KEEP_TESTING ); } - } - } -/* - * if I am just after the root, exit now. The message receive completed - * successfully, this guy is done. If there are only 2 processes in each - * row of processes, we are done as well. - */ - if( ( prev == root ) || ( size == 2 ) ) - { - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - return( *IFLAG ); - } -/* - * Otherwise, proceed with broadcast - Spread the panel across process - * columns - */ - npm2 = ( npm1 = size - 1 ) - 1; COUNT = PANEL->len; - - k = npm2; while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - if( rank == root ) mydist2 = ( mydist = 0 ); - else mydist2 = ( mydist = MModSub( rank, root, size ) - 1 ); - - indx = ip2; count = COUNT / npm1; count = Mmax( count, 1 ); - - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - lbuf = COUNT - ( ibuf = indx * count ); - if( indx + ip2 < npm1 ) { l = ip2 * count; lbuf = Mmin( lbuf, l ); } - - partner = mydist ^ ip2; - - if( ( mydist & ip2 ) != 0 ) - { - partner = MModAdd( root, partner, size ); - if( partner != root ) partner = MModAdd1( partner, size ); -/* - * This probing mechanism causes problems when lookhead is on. Too many - * messages are exchanged in this virtual topology causing a hang on - * some machines. It is currently disabled until a better understanding - * is acquired. - */ -#if 0 - ierr = MPI_Iprobe( partner, msgid, comm, &go, &PANEL->status[0] ); - - if( ierr == MPI_SUCCESS ) - { /* if panel is not here, return and keep testing */ - if( go == 0 ) - { *IFLAG = HPL_KEEP_TESTING; return( HPL_KEEP_TESTING ); } - } -#endif - if( lbuf > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibuf, lbuf, I_RECV ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( _M_BUFF_R2, _M_COUNT_R2, _M_TYPE_R2, - partner, msgid, comm, &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_RECV] ); -#endif - } - else /* Recv message of length zero to enable probe */ - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->status[0] ); - } - } - else if( partner < npm1 ) - { - partner = MModAdd( root, partner, size ); - if( partner != root ) partner = MModAdd1( partner, size ); - - if( lbuf > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibuf, lbuf, I_SEND ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Ssend( _M_BUFF_S2, _M_COUNT_S2, _M_TYPE_S2, - partner, msgid, comm ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_SEND] ); -#endif - } - else /* Recv message of length zero to enable probe */ - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Ssend( (void *)(&dummy), 0, MPI_BYTE, - partner, msgid, comm ); - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; indx -= ip2; } - else { mydist2 -= ip2; ip2 >>= 1; indx += ip2; } - - } while( ip2 > 0 ); -/* - * Roll the pieces - */ - prev = MModSub1( rank, size ); - if( MModSub1( prev, size ) == root ) prev = root; - next = MModAdd1( rank, size ); - if( rank == root ) next = MModAdd1( next, size ); - - for( k = 0; k < npm2; k++ ) - { - l = ( k >> 1 ); -/* - * Who is sending to who and how much - */ - if( ( ( mydist + k ) & 1 ) != 0 ) - { - ibufS = ( indx = MModAdd( mydist, l, npm1 ) ) * count; - lbufS = ( indx == npm2 ? COUNT : ibufS + count ); - lbufS = Mmin( COUNT, lbufS ) - ibufS; lbufS = Mmax( 0, lbufS ); - - ibufR = ( indx = MModSub( mydist, l+1, npm1 ) ) * count; - lbufR = ( indx == npm2 ? COUNT : ibufR + count ); - lbufR = Mmin( COUNT, lbufR ) - ibufR; lbufR = Mmax( 0, lbufR ); - - partner = prev; - } - else - { - ibufS = ( indx = MModSub( mydist, l, npm1 ) ) * count; - lbufS = ( indx == npm2 ? COUNT : ibufS + count ); - lbufS = Mmin( COUNT, lbufS ) - ibufS; lbufS = Mmax( 0, lbufS ); - - ibufR = ( indx = MModAdd( mydist, l+1, npm1 ) ) * count; - lbufR = ( indx == npm2 ? COUNT : ibufR + count ); - lbufR = Mmin( COUNT, lbufR ) - ibufR; lbufR = Mmax( 0, lbufR ); - - partner = next; - } -/* - * Exchange the messages - */ - if( lbufS > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibufS, lbufS, I_SEND ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Issend( _M_ROLL_BUFF_S, _M_ROLL_COUNT_S, - _M_ROLL_TYPE_S, partner, msgid, comm, - &PANEL->request[0] ); - } - else - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Issend( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->request[0] ); - } - - if( lbufR > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibufR, lbufR, I_RECV ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( _M_ROLL_BUFF_R, _M_ROLL_COUNT_R, - _M_ROLL_TYPE_R, partner, msgid, comm, - &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_RECV] ); -#endif - } - else - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->status[0] ); - } - - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait ( &PANEL->request[0], &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ( lbufS > 0 ) && ( ierr == MPI_SUCCESS ) ) - ierr = MPI_Type_free( &PANEL->dtypes[I_SEND] ); -#endif - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_blonM -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_blonM( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } - - return( HPL_SUCCESS ); -} diff --git a/hpl/src/comm/HPL_blong.c b/hpl/src/comm/HPL_blong.c deleted file mode 100644 index b791b87b2b6b90ae299fe51290dc1038eef76b07..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_blong.c +++ /dev/null @@ -1,363 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#define I_SEND 0 -#define I_RECV 1 - -#ifdef STDC_HEADERS -int HPL_binit_blong -( - HPL_T_panel * PANEL -) -#else -int HPL_binit_blong( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } -#ifdef HPL_USE_MPI_DATATYPE -#ifdef HPL_COPY_L -/* - * Copy the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif -#else -/* - * Force the copy of the panel into a contiguous buffer - */ - HPL_copyL( PANEL ); -#endif - return( HPL_SUCCESS ); -} - -#ifdef HPL_USE_MPI_DATATYPE - -#define _M_BUFF_S PANEL->buffers[I_SEND] -#define _M_COUNT_S PANEL->counts[I_SEND] -#define _M_TYPE_S PANEL->dtypes[I_SEND] - -#define _M_BUFF_R PANEL->buffers[I_RECV] -#define _M_COUNT_R PANEL->counts[I_RECV] -#define _M_TYPE_R PANEL->dtypes[I_RECV] - -#define _M_ROLL_BUFF_S PANEL->buffers[I_SEND] -#define _M_ROLL_COUNT_S PANEL->counts[I_SEND] -#define _M_ROLL_TYPE_S PANEL->dtypes[I_SEND] - -#define _M_ROLL_BUFF_R PANEL->buffers[I_RECV] -#define _M_ROLL_COUNT_R PANEL->counts[I_RECV] -#define _M_ROLL_TYPE_R PANEL->dtypes[I_RECV] - -#else - -#define _M_BUFF_S (void *)(PANEL->L2 + ibuf) -#define _M_COUNT_S lbuf -#define _M_TYPE_S MPI_DOUBLE - -#define _M_BUFF_R (void *)(PANEL->L2 + ibuf) -#define _M_COUNT_R lbuf -#define _M_TYPE_R MPI_DOUBLE - -#define _M_ROLL_BUFF_S (void *)(PANEL->L2 + ibufS) -#define _M_ROLL_COUNT_S lbufS -#define _M_ROLL_TYPE_S MPI_DOUBLE - -#define _M_ROLL_BUFF_R (void *)(PANEL->L2 + ibufR) -#define _M_ROLL_COUNT_R lbufR -#define _M_ROLL_TYPE_R MPI_DOUBLE - -#endif - -#ifdef STDC_HEADERS -int HPL_bcast_blong -( - HPL_T_panel * PANEL, - int * IFLAG -) -#else -int HPL_bcast_blong( PANEL, IFLAG ) - HPL_T_panel * PANEL; - int * IFLAG; -#endif -{ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - int COUNT, count, dummy=0, ierr=MPI_SUCCESS, - ibuf, ibufR, ibufS, indx, ip2, k, l, lbuf, - lbufR, lbufS, mask, msgid, mydist, mydist2, - next, npm1, partner, prev, rank, root, size; -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } - if( ( size = PANEL->grid->npcol ) <= 1 ) - { *IFLAG = HPL_SUCCESS; return( HPL_SUCCESS ); } -/* - * Cast phase: If I am the root process, start spreading the panel. If - * I am not the root process, test for message receive completion. If - * the message is there, then receive it, and keep spreading in a - * blocking fashion this time. Otherwise, inform the caller that the - * panel has still not been received. - */ - comm = PANEL->grid->row_comm; rank = PANEL->grid->mycol; - mask = PANEL->grid->col_mask; ip2 = PANEL->grid->col_ip2m1; - root = PANEL->pcol; msgid = PANEL->msgid; - COUNT = PANEL->len; npm1 = size - 1; - mydist2 = ( mydist = MModSub( rank, root, size ) ); indx = ip2; - count = COUNT / size; count = Mmax( count, 1 ); -/* - * Spread the panel across process columns - */ - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - lbuf = COUNT - ( ibuf = indx * count ); - if( indx + ip2 < size ) { l = ip2 * count; lbuf = Mmin( lbuf, l ); } - - partner = mydist ^ ip2; - - if( ( mydist & ip2 ) != 0 ) - { - partner = MModAdd( root, partner, size ); -/* - * This probing mechanism causes problems when lookhead is on. Too many - * messages are exchanged in this virtual topology causing a hang on - * some machines. It is currently disabled until a better understanding - * is acquired. - */ -#if 0 - ierr = MPI_Iprobe( partner, msgid, comm, &go, &PANEL->status[0] ); - if( ierr == MPI_SUCCESS ) - { /* if panel is not here, return and keep testing */ - if( go == 0 ) - { *IFLAG = HPL_KEEP_TESTING; return( HPL_KEEP_TESTING ); } - } -#endif - if( lbuf > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibuf, lbuf, I_RECV ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( _M_BUFF_R, _M_COUNT_R, _M_TYPE_R, - partner, msgid, comm, &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_RECV] ); -#endif - } - else /* Recv message of length zero to enable probe */ - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->status[0] ); - } - } - else if( partner < size ) - { - partner = MModAdd( root, partner, size ); - - if( lbuf > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibuf, lbuf, I_SEND ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Ssend( _M_BUFF_S, _M_COUNT_S, _M_TYPE_S, - partner, msgid, comm ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_SEND] ); -#endif - } - else /* Send message of length zero to enable probe */ - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Ssend( (void *)(&dummy), 0, MPI_BYTE, - partner, msgid, comm ); - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; indx -= ip2; } - else { mydist2 -= ip2; ip2 >>= 1; indx += ip2; } - - } while( ip2 > 0 ); -/* - * Roll the pieces - */ - prev = MModSub1( rank, size ); next = MModAdd1( rank, size ); - - for( k = 0; k < npm1; k++ ) - { - l = ( k >> 1 ); -/* - * Who is sending to who and how much - */ - if( ( ( mydist + k ) & 1 ) != 0 ) - { - ibufS = ( indx = MModAdd( mydist, l, size ) ) * count; - lbufS = ( indx == npm1 ? COUNT : ibufS + count ); - lbufS = Mmin( COUNT, lbufS ) - ibufS; lbufS = Mmax( 0, lbufS ); - - ibufR = ( indx = MModSub( mydist, l+1, size ) ) * count; - lbufR = ( indx == npm1 ? COUNT : ibufR + count ); - lbufR = Mmin( COUNT, lbufR ) - ibufR; lbufR = Mmax( 0, lbufR ); - - partner = prev; - } - else - { - ibufS = ( indx = MModSub( mydist, l, size ) ) * count; - lbufS = ( indx == npm1 ? COUNT : ibufS + count ); - lbufS = Mmin( COUNT, lbufS ) - ibufS; lbufS = Mmax( 0, lbufS ); - - ibufR = ( indx = MModAdd( mydist, l+1, size ) ) * count; - lbufR = ( indx == npm1 ? COUNT : ibufR + count ); - lbufR = Mmin( COUNT, lbufR ) - ibufR; lbufR = Mmax( 0, lbufR ); - - partner = next; - } -/* - * Exchange the messages - */ - if( lbufS > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibufS, lbufS, I_SEND ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Issend( _M_ROLL_BUFF_S, _M_ROLL_COUNT_S, - _M_ROLL_TYPE_S, partner, msgid, comm, - &PANEL->request[0] ); - } - else - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Issend( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->request[0] ); - } - - if( lbufR > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = HPL_packL( PANEL, ibufR, lbufR, I_RECV ); -#endif - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( _M_ROLL_BUFF_R, _M_ROLL_COUNT_R, - _M_ROLL_TYPE_R, partner, msgid, comm, - &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &PANEL->dtypes[I_RECV] ); -#endif - } - else - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(&dummy), 0, MPI_BYTE, partner, - msgid, comm, &PANEL->status[0] ); - } - - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait ( &PANEL->request[0], &PANEL->status[0] ); -#ifdef HPL_USE_MPI_DATATYPE - if( ( lbufS > 0 ) && ( ierr == MPI_SUCCESS ) ) - ierr = MPI_Type_free( &PANEL->dtypes[I_SEND] ); -#endif - } -/* - * If the message was received and being forwarded, return HPL_SUCCESS. - * If an error occured in an MPI call, return HPL_FAILURE. - */ - *IFLAG = ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ); - - return( *IFLAG ); -} - -#ifdef STDC_HEADERS -int HPL_bwait_blong -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait_blong( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* .. - * .. Executable Statements .. - */ - if( PANEL == NULL ) { return( HPL_SUCCESS ); } - if( PANEL->grid->npcol <= 1 ) { return( HPL_SUCCESS ); } - - return( HPL_SUCCESS ); -} diff --git a/hpl/src/comm/HPL_bwait.c b/hpl/src/comm/HPL_bwait.c deleted file mode 100644 index e4432862859a3321a26c21e766d1ca1f068b3887..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_bwait.c +++ /dev/null @@ -1,109 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_bwait -( - HPL_T_panel * PANEL -) -#else -int HPL_bwait -( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_bwait HPL_bwait waits for the row broadcast of the current panel to - * terminate. Successful completion is indicated by the returned error - * code HPL_SUCCESS. - * - * Arguments - * ========= - * - * PANEL (input/output) HPL_T_panel * - * On entry, PANEL points to the current panel data structure - * being broadcast. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int ierr; - HPL_T_TOP top; -/* .. - * .. Executable Statements .. - */ - if( PANEL->grid->npcol <= 1 ) return( HPL_SUCCESS ); -/* - * Retrieve the selected virtual broadcast topology - */ - top = PANEL->algo->btopo; - - switch( top ) - { - case HPL_1RING_M : ierr = HPL_bwait_1rinM( PANEL ); break; - case HPL_1RING : ierr = HPL_bwait_1ring( PANEL ); break; - case HPL_2RING_M : ierr = HPL_bwait_2rinM( PANEL ); break; - case HPL_2RING : ierr = HPL_bwait_2ring( PANEL ); break; - case HPL_BLONG_M : ierr = HPL_bwait_blonM( PANEL ); break; - case HPL_BLONG : ierr = HPL_bwait_blong( PANEL ); break; - default : ierr = HPL_SUCCESS; - } - - return( ierr ); -/* - * End of HPL_bwait - */ -} diff --git a/hpl/src/comm/HPL_copyL.c b/hpl/src/comm/HPL_copyL.c deleted file mode 100644 index b07555aa4cfcafc188703b2aa5be1ad54a6e1fac..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_copyL.c +++ /dev/null @@ -1,108 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_copyL -( - HPL_T_panel * PANEL -) -#else -void HPL_copyL -( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_copyL copies the panel of columns, the L1 replicated submatrix, - * the pivot array and the info scalar into a contiguous workspace for - * later broadcast. - * - * The copy of this panel into a contiguous buffer can be enforced by - * specifying -DHPL_COPY_L in the architecture specific Makefile. - * - * Arguments - * ========= - * - * PANEL (input/output) HPL_T_panel * - * On entry, PANEL points to the current panel data structure - * being broadcast. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int jb, lda; -/* .. - * .. Executable Statements .. - */ - if( PANEL->grid->mycol == PANEL->pcol ) - { - jb = PANEL->jb; lda = PANEL->lda; - - if( PANEL->grid->myrow == PANEL->prow ) - { - HPL_dlacpy( PANEL->mp-jb, jb, Mptr( PANEL->A, jb, -jb, lda ), - lda, PANEL->L2, PANEL->ldl2 ); - } - else - { - HPL_dlacpy( PANEL->mp, jb, Mptr( PANEL->A, 0, -jb, lda ), - lda, PANEL->L2, PANEL->ldl2 ); - } - } -/* - * End of HPL_copyL - */ -} diff --git a/hpl/src/comm/HPL_packL.c b/hpl/src/comm/HPL_packL.c deleted file mode 100644 index 4bf98b01e153bd796f135bdfc8037f376eb7f904..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_packL.c +++ /dev/null @@ -1,245 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_packL -( - HPL_T_panel * PANEL, - const int INDEX, - const int LEN, - const int IBUF -) -#else -int HPL_packL -( PANEL, INDEX, LEN, IBUF ) - HPL_T_panel * PANEL; - const int INDEX; - const int LEN; - const int IBUF; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_packL forms the MPI data type for the panel to be broadcast. - * Successful completion is indicated by the returned error code - * MPI_SUCCESS. - * - * Arguments - * ========= - * - * PANEL (input/output) HPL_T_panel * - * On entry, PANEL points to the current panel data structure - * being broadcast. - * - * INDEX (input) const int - * On entry, INDEX points to the first entry of the packed - * buffer being broadcast. - * - * LEN (input) const int - * On entry, LEN is the length of the packed buffer. - * - * IBUF (input) const int - * On entry, IBUF specifies the panel buffer/count/type entries - * that should be initialized. - * - * --------------------------------------------------------------------- - */ -#ifdef HPL_USE_MPI_DATATYPE -/* - * .. Local Variables .. - */ -#ifndef HPL_COPY_L - MPI_Datatype * type = NULL; - void * * * bufs = NULL; - double * A; - int * blen = NULL; - MPI_Aint * disp = NULL; - int curr, i, i1, ibuf, ierr=MPI_SUCCESS, j1, - jb, jbm, jbp1, lda, len, m, m1, nbufs; -#else - int ierr; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_COPY_L -/* - * Panel + L1 + DPIV have been copied into a contiguous buffer - Create - * and commit a contiguous data type - */ - PANEL->buffers[IBUF] = (void *)(PANEL->L2 + INDEX); - PANEL->counts [IBUF] = 1; - - ierr = MPI_Type_contiguous( LEN, MPI_DOUBLE, &PANEL->dtypes[IBUF] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &PANEL->dtypes[IBUF] ); - - return( ierr ); -#else -/* - * Panel is not contiguous (because of LDA and also L1 + DPIV) - Create - * and commit a struct data type - */ - jbp1 = ( jb = PANEL->jb ) + 1; -/* - * Temporaries to create the type struct. - */ - bufs = (void * * *)malloc( jbp1 * sizeof( void * * ) ); - blen = (int *)malloc( jbp1 * sizeof( int ) ); - disp = (MPI_Aint *)malloc( jbp1 * sizeof( MPI_Aint ) ); - type = (MPI_Datatype *)malloc( jbp1 * sizeof( MPI_Datatype ) ); - - if( ( bufs != NULL ) && ( blen != NULL ) && - ( disp != NULL ) && ( type != NULL ) ) - { - m = PANEL->mp; curr = (int)( PANEL->grid->myrow == PANEL->prow ); - if( curr != 0 ) m -= jb; - - len = LEN; ibuf = INDEX; nbufs = 0; jbm = jb * m; - - if( ( m > 0 ) && ( ibuf < jbm ) ) - { -/* - * Retrieve proper pointers depending on process row and column - */ - if( PANEL->grid->mycol == PANEL->pcol ) - { - lda = PANEL->lda; - if( curr != 0 ) { A = Mptr( PANEL->A, jb, -jb, lda ); } - else { A = Mptr( PANEL->A, 0, -jb, lda ); } - } - else { lda = PANEL->ldl2; A = PANEL->L2; } -/* - * Pack the first (partial) column of L - */ - m1 = m - ( i1 = ibuf - ( j1 = ibuf / m ) * m ); - m1 = Mmin( len, m1 ); - - bufs[nbufs] = (void *)(Mptr( A, i1, j1, lda )); - type[nbufs] = MPI_DOUBLE; - blen[nbufs] = m1; - if( ierr == MPI_SUCCESS ) - ierr = MPI_Address( bufs[nbufs], &disp[nbufs] ); - - nbufs++; len -= m1; j1++; ibuf += m1; -/* - * Pack the remaining columns of L - */ - while( ( len > 0 ) && ( j1 < jb ) ) - { - m1 = Mmin( len, m ); - - bufs[nbufs] = (void*)(Mptr( A, 0, j1, lda )); - type[nbufs] = MPI_DOUBLE; - blen[nbufs] = m1; - if( ierr == MPI_SUCCESS ) - ierr = MPI_Address( bufs[nbufs], &disp[nbufs] ); - - nbufs++; len -= m1; j1++; ibuf += m1; - } - } -/* - * Pack L1, DPIV, DINFO - */ - if( len > 0 ) - { /* L1, DPIV, DINFO */ - bufs[nbufs] = (void *)(PANEL->L1 + ibuf - jbm); - type[nbufs] = MPI_DOUBLE; - blen[nbufs] = len; - if( ierr == MPI_SUCCESS ) - ierr = MPI_Address( bufs[nbufs], &disp[nbufs] ); - nbufs++; - } - - for( i = 1; i < nbufs; i++ ) disp[i] -= disp[0]; disp[0] = 0; - - PANEL->buffers[IBUF] = (void *)(bufs[0]); PANEL->counts [IBUF] = 1; -/* - * construct the struct type - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_struct( nbufs, blen, disp, type, - &PANEL->dtypes[IBUF] ); -/* - * release temporaries - */ - if( bufs ) free( bufs ); - if( blen ) free( blen ); - if( disp ) free( disp ); - if( type ) free( type ); -/* - * commit the type - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &PANEL->dtypes[IBUF] ); - - return( ierr ); - } - else - { -/* - * Memory allocation failed -> abort - */ - HPL_pabort( __LINE__, "HPL_packL", "Memory allocation failed" ); - return( MPI_SUCCESS ); /* never executed (hopefully ...) */ - } -#endif -#else - /* HPL_USE_MPI_DATATYPE not defined - Oops, there is a bug - somewhere, so, just in case and until I find it ... */ - return( MPI_SUCCESS ); -#endif -/* - * End of HPL_packL - */ -} diff --git a/hpl/src/comm/HPL_recv.c b/hpl/src/comm/HPL_recv.c deleted file mode 100644 index 74cca94e7ef36be827aaf329dc6ad33b2016c5fb..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_recv.c +++ /dev/null @@ -1,142 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Do not use MPI user-defined data types no matter what. This routine - * is used for small contiguous messages. - */ -#ifdef HPL_USE_MPI_DATATYPE -#undef HPL_USE_MPI_DATATYPE -#endif - -#ifdef STDC_HEADERS -int HPL_recv -( - double * RBUF, - int RCOUNT, - int SRC, - int RTAG, - MPI_Comm COMM -) -#else -int HPL_recv -( RBUF, RCOUNT, SRC, RTAG, COMM ) - double * RBUF; - int RCOUNT; - int SRC; - int RTAG; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_recv is a simple wrapper around MPI_Recv. Its main purpose is - * to allow for some experimentation / tuning of this simple routine. - * Successful completion is indicated by the returned error code - * HPL_SUCCESS. In the case of messages of length less than or equal to - * zero, this function returns immediately. - * - * Arguments - * ========= - * - * RBUF (local output) double * - * On entry, RBUF specifies the starting address of buffer to be - * received. - * - * RCOUNT (local input) int - * On entry, RCOUNT specifies the number of double precision - * entries in RBUF. RCOUNT must be at least zero. - * - * SRC (local input) int - * On entry, SRC specifies the rank of the sending process in - * the communication space defined by COMM. - * - * RTAG (local input) int - * On entry, STAG specifies the message tag to be used for this - * communication operation. - * - * COMM (local input) MPI_Comm - * The MPI communicator identifying the communication space. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Status status; -#ifdef HPL_USE_MPI_DATATYPE - MPI_Datatype type; -#endif - int ierr; -/* .. - * .. Executable Statements .. - */ - if( RCOUNT <= 0 ) return( HPL_SUCCESS ); - -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_contiguous( RCOUNT, MPI_DOUBLE, &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(RBUF), 1, type, SRC, RTAG, COMM, - &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else - ierr = MPI_Recv( (void *)(RBUF), RCOUNT, MPI_DOUBLE, SRC, RTAG, - COMM, &status ); -#endif - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -/* - * End of HPL_recv - */ -} diff --git a/hpl/src/comm/HPL_sdrv.c b/hpl/src/comm/HPL_sdrv.c deleted file mode 100644 index ff2a65ae4aa8e052655caca074d2ecd75c00c568..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_sdrv.c +++ /dev/null @@ -1,239 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Do not use MPI user-defined data types no matter what. This routine - * is used for small contiguous messages. - */ -#ifdef HPL_USE_MPI_DATATYPE -#undef HPL_USE_MPI_DATATYPE -#endif - -#ifdef STDC_HEADERS -int HPL_sdrv -( - double * SBUF, - int SCOUNT, - int STAG, - double * RBUF, - int RCOUNT, - int RTAG, - int PARTNER, - MPI_Comm COMM -) -#else -int HPL_sdrv -( SBUF, SCOUNT, STAG, RBUF, RCOUNT, RTAG, PARTNER, COMM ) - double * SBUF; - int SCOUNT; - int STAG; - double * RBUF; - int RCOUNT; - int RTAG; - int PARTNER; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_sdrv is a simple wrapper around MPI_Sendrecv. Its main purpose is - * to allow for some experimentation and tuning of this simple function. - * Messages of length less than or equal to zero are not sent nor - * received. Successful completion is indicated by the returned error - * code HPL_SUCCESS. - * - * Arguments - * ========= - * - * SBUF (local input) double * - * On entry, SBUF specifies the starting address of buffer to be - * sent. - * - * SCOUNT (local input) int - * On entry, SCOUNT specifies the number of double precision - * entries in SBUF. SCOUNT must be at least zero. - * - * STAG (local input) int - * On entry, STAG specifies the message tag to be used for the - * sending communication operation. - * - * RBUF (local output) double * - * On entry, RBUF specifies the starting address of buffer to be - * received. - * - * RCOUNT (local input) int - * On entry, RCOUNT specifies the number of double precision - * entries in RBUF. RCOUNT must be at least zero. - * - * RTAG (local input) int - * On entry, RTAG specifies the message tag to be used for the - * receiving communication operation. - * - * PARTNER (local input) int - * On entry, PARTNER specifies the rank of the collaborative - * process in the communication space defined by COMM. - * - * COMM (local input) MPI_Comm - * The MPI communicator identifying the communication space. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#ifdef HPL_USE_MPI_DATATYPE - MPI_Datatype type[2]; -#endif - MPI_Request request; - MPI_Status status; - int ierr; -/* .. - * .. Executable Statements .. - */ - if( RCOUNT > 0 ) - { - if( SCOUNT > 0 ) - { -#ifdef HPL_USE_MPI_DATATYPE -/* - * Post asynchronous receive - */ - ierr = MPI_Type_contiguous( RCOUNT, MPI_DOUBLE, &type[0] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[0] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Irecv( (void *)(RBUF), 1, type[0], PARTNER, - RTAG, COMM, &request ); -/* - * Blocking send - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_contiguous( SCOUNT, MPI_DOUBLE, &type[1] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[1] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( (void *)(SBUF), 1, type[1], PARTNER, - STAG, COMM ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[1] ); -/* - * Wait for the receive to complete - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait( &request, &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[0] ); -#else -/* - * Post asynchronous receive - */ - ierr = MPI_Irecv( (void *)(RBUF), RCOUNT, MPI_DOUBLE, - PARTNER, RTAG, COMM, &request ); -/* - * Blocking send - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( (void *)(SBUF), SCOUNT, MPI_DOUBLE, - PARTNER, STAG, COMM ); -/* - * Wait for the receive to complete - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait( &request, &status ); -#endif - } - else - { -/* - * Blocking receive - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_contiguous( RCOUNT, MPI_DOUBLE, &type[0] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[0] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( (void *)(RBUF), 1, type[0], PARTNER, RTAG, - COMM, &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[0] ); -#else - ierr = MPI_Recv( (void *)(RBUF), RCOUNT, MPI_DOUBLE, - PARTNER, RTAG, COMM, &status ); -#endif - } - } - else if( SCOUNT > 0 ) - { -/* - * Blocking send - */ -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_contiguous( SCOUNT, MPI_DOUBLE, &type[1] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[1] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( (void *)(SBUF), 1, type[1], PARTNER, STAG, - COMM ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[1] ) ); -#else - ierr = MPI_Send( (void *)(SBUF), SCOUNT, MPI_DOUBLE, PARTNER, - STAG, COMM ); -#endif - } - else { ierr = MPI_SUCCESS; } - - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -/* - * End of HPL_sdrv - */ -} diff --git a/hpl/src/comm/HPL_send.c b/hpl/src/comm/HPL_send.c deleted file mode 100644 index ec07f1ebb5a836befde212d2433d5cd19d66542a..0000000000000000000000000000000000000000 --- a/hpl/src/comm/HPL_send.c +++ /dev/null @@ -1,139 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Do not use MPI user-defined data types no matter what. This routine - * is used for small contiguous messages. - */ -#ifdef HPL_USE_MPI_DATATYPE -#undef HPL_USE_MPI_DATATYPE -#endif - -#ifdef STDC_HEADERS -int HPL_send -( - double * SBUF, - int SCOUNT, - int DEST, - int STAG, - MPI_Comm COMM -) -#else -int HPL_send -( SBUF, SCOUNT, DEST, STAG, COMM ) - double * SBUF; - int SCOUNT; - int DEST; - int STAG; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_send is a simple wrapper around MPI_Send. Its main purpose is - * to allow for some experimentation / tuning of this simple routine. - * Successful completion is indicated by the returned error code - * MPI_SUCCESS. In the case of messages of length less than or equal to - * zero, this function returns immediately. - * - * Arguments - * ========= - * - * SBUF (local input) double * - * On entry, SBUF specifies the starting address of buffer to be - * sent. - * - * SCOUNT (local input) int - * On entry, SCOUNT specifies the number of double precision - * entries in SBUF. SCOUNT must be at least zero. - * - * DEST (local input) int - * On entry, DEST specifies the rank of the receiving process in - * the communication space defined by COMM. - * - * STAG (local input) int - * On entry, STAG specifies the message tag to be used for this - * communication operation. - * - * COMM (local input) MPI_Comm - * The MPI communicator identifying the communication space. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#ifdef HPL_USE_MPI_DATATYPE - MPI_Datatype type; -#endif - int ierr; -/* .. - * .. Executable Statements .. - */ - if( SCOUNT <= 0 ) return( HPL_SUCCESS ); - -#ifdef HPL_USE_MPI_DATATYPE - ierr = MPI_Type_contiguous( SCOUNT, MPI_DOUBLE, &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( (void *)(SBUF), 1, type, DEST, STAG, COMM ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else - ierr = MPI_Send( (void *)(SBUF), SCOUNT, MPI_DOUBLE, DEST, STAG, COMM ); -#endif - return( ( ierr == MPI_SUCCESS ? HPL_SUCCESS : HPL_FAILURE ) ); -/* - * End of HPL_send - */ -} diff --git a/hpl/src/comm/intel64/Make.inc b/hpl/src/comm/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/comm/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/comm/intel64/Makefile b/hpl/src/comm/intel64/Makefile deleted file mode 100644 index bef4904ac620dd9e2779f23d2d981a97a083ce24..0000000000000000000000000000000000000000 --- a/hpl/src/comm/intel64/Makefile +++ /dev/null @@ -1,111 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h \ - $(INCdir)/hpl_panel.h $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_comobj = \ - HPL_1ring.o HPL_1rinM.o HPL_2ring.o \ - HPL_2rinM.o HPL_blong.o HPL_blonM.o \ - HPL_packL.o HPL_copyL.o HPL_binit.o \ - HPL_bcast.o HPL_bwait.o HPL_send.o \ - HPL_recv.o HPL_sdrv.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_comobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_comobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_1ring.o : ../HPL_1ring.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_1ring.c -HPL_1rinM.o : ../HPL_1rinM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_1rinM.c -HPL_2ring.o : ../HPL_2ring.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_2ring.c -HPL_2rinM.o : ../HPL_2rinM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_2rinM.c -HPL_blong.o : ../HPL_blong.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_blong.c -HPL_blonM.o : ../HPL_blonM.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_blonM.c -HPL_packL.o : ../HPL_packL.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_packL.c -HPL_copyL.o : ../HPL_copyL.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_copyL.c -HPL_binit.o : ../HPL_binit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_binit.c -HPL_bcast.o : ../HPL_bcast.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_bcast.c -HPL_bwait.o : ../HPL_bwait.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_bwait.c -HPL_send.o : ../HPL_send.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_send.c -HPL_recv.o : ../HPL_recv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_recv.c -HPL_sdrv.o : ../HPL_sdrv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_sdrv.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/comm/intel64/lib.grd b/hpl/src/comm/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/grid/HPL_all_reduce.c b/hpl/src/grid/HPL_all_reduce.c deleted file mode 100644 index 13a15983b480217de2748fa2c23284449619a928..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_all_reduce.c +++ /dev/null @@ -1,114 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_all_reduce -( - void * BUFFER, - const int COUNT, - const HPL_T_TYPE DTYPE, - const HPL_T_OP OP, - MPI_Comm COMM -) -#else -int HPL_all_reduce -( BUFFER, COUNT, DTYPE, OP, COMM ) - void * BUFFER; - const int COUNT; - const HPL_T_TYPE DTYPE; - const HPL_T_OP OP; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_all_reduce performs a global reduce operation across all - * processes of a group leaving the results on all processes. - * - * Arguments - * ========= - * - * BUFFER (local input/global output) void * - * On entry, BUFFER points to the buffer to be combined. On - * exit, this array contains the combined data and is identical - * on all processes in the group. - * - * COUNT (global input) const int - * On entry, COUNT indicates the number of entries in BUFFER. - * COUNT must be at least zero. - * - * DTYPE (global input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * OP (global input) const HPL_T_OP - * On entry, OP is a pointer to the local combine function. - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int hplerr; -/* .. - * .. Executable Statements .. - */ - hplerr = HPL_reduce( BUFFER, COUNT, DTYPE, OP, 0, COMM ); - if( hplerr != MPI_SUCCESS ) return( hplerr ); - return( HPL_broadcast( BUFFER, COUNT, DTYPE, 0, COMM ) ); -/* - * End of HPL_all_reduce - */ -} diff --git a/hpl/src/grid/HPL_barrier.c b/hpl/src/grid/HPL_barrier.c deleted file mode 100644 index c07c873a2e1232e5347fc95f5d2aa5afad64765c..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_barrier.c +++ /dev/null @@ -1,90 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_barrier -( - MPI_Comm COMM -) -#else -int HPL_barrier -( COMM ) - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_barrier blocks the caller until all process members have call it. - * The call returns at any process only after all group members have - * entered the call. - * - * Arguments - * ========= - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i=0; -/* .. - * .. Executable Statements .. - */ - return( HPL_broadcast( (void*)(&i), 1, HPL_INT, 0, COMM ) ); -/* - * End of HPL_barrier - */ -} diff --git a/hpl/src/grid/HPL_broadcast.c b/hpl/src/grid/HPL_broadcast.c deleted file mode 100644 index 1927170914c1cd17ff6817f6bb552e260f789174..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_broadcast.c +++ /dev/null @@ -1,147 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_broadcast -( - void * BUFFER, - const int COUNT, - const HPL_T_TYPE DTYPE, - const int ROOT, - MPI_Comm COMM -) -#else -int HPL_broadcast -( BUFFER, COUNT, DTYPE, ROOT, COMM ) - void * BUFFER; - const int COUNT; - const HPL_T_TYPE DTYPE; - const int ROOT; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_broadcast broadcasts a message from the process with rank ROOT to - * all processes in the group. - * - * Arguments - * ========= - * - * BUFFER (local input/output) void * - * On entry, BUFFER points to the buffer to be broadcast. On - * exit, this array contains the broadcast data and is identical - * on all processes in the group. - * - * COUNT (global input) const int - * On entry, COUNT indicates the number of entries in BUFFER. - * COUNT must be at least zero. - * - * DTYPE (global input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * ROOT (global input) const int - * On entry, ROOT is the coordinate of the source process. - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int hplerr=MPI_SUCCESS, ip2=1, kk, mask=1, - mpierr, mydist, partner, rank, size, - tag = MSGID_BEGIN_COLL; - MPI_Status status; -/* .. - * .. Executable Statements .. - */ - if( COUNT <= 0 ) return( MPI_SUCCESS ); - mpierr = MPI_Comm_size( COMM, &size ); if( size <= 1 ) return( mpierr ); - mpierr = MPI_Comm_rank( COMM, &rank ); - - kk = size - 1; - while( kk > 1 ) { kk >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - mydist = MModSub( rank, ROOT, size ); - - do - { - mask ^= ip2; - if( ( mydist & mask ) == 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { - partner = MModAdd( ROOT, partner, size ); - mpierr = MPI_Recv( BUFFER, COUNT, HPL_2_MPI_TYPE( DTYPE ), - partner, tag, COMM, &status ); - } - else if( partner < size ) - { - partner = MModAdd( ROOT, partner, size ); - mpierr = MPI_Send( BUFFER, COUNT, HPL_2_MPI_TYPE( DTYPE ), - partner, tag, COMM ); - } - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - } - ip2 >>= 1; - } while( ip2 ); - - return( hplerr ); -/* - * End of HPL_broadcast - */ -} diff --git a/hpl/src/grid/HPL_grid_exit.c b/hpl/src/grid/HPL_grid_exit.c deleted file mode 100644 index ebdde42fae9696c21afb774cea8d62ab4e1f8ce6..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_grid_exit.c +++ /dev/null @@ -1,109 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_grid_exit -( - HPL_T_grid * GRID -) -#else -int HPL_grid_exit -( GRID ) - HPL_T_grid * GRID; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_grid_exit marks the process grid object for deallocation. The - * returned error code MPI_SUCCESS indicates successful completion. - * Other error codes are (MPI) implementation dependent. - * - * Arguments - * ========= - * - * GRID (local input/output) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid to be released. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int hplerr = MPI_SUCCESS, mpierr; -/* .. - * .. Executable Statements .. - */ - if( GRID->all_comm != MPI_COMM_NULL ) - { - mpierr = MPI_Comm_free( &(GRID->row_comm) ); - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - mpierr = MPI_Comm_free( &(GRID->col_comm) ); - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - mpierr = MPI_Comm_free( &(GRID->all_comm) ); - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - } - - GRID->order = HPL_COLUMN_MAJOR; - - GRID->iam = GRID->myrow = GRID->mycol = -1; - GRID->nprow = GRID->npcol = GRID->nprocs = -1; - - GRID->row_ip2 = GRID->row_hdim = GRID->row_ip2m1 = GRID->row_mask = -1; - GRID->col_ip2 = GRID->col_hdim = GRID->col_ip2m1 = GRID->col_mask = -1; - - return( hplerr ); -/* - * End of HPL_grid_exit - */ -} diff --git a/hpl/src/grid/HPL_grid_info.c b/hpl/src/grid/HPL_grid_info.c deleted file mode 100644 index 514f4e43068da48905a7272ca7cf6a9ad2e1b698..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_grid_info.c +++ /dev/null @@ -1,116 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_grid_info -( - const HPL_T_grid * GRID, - int * NPROW, - int * NPCOL, - int * MYROW, - int * MYCOL -) -#else -int HPL_grid_info -( GRID, NPROW, NPCOL, MYROW, MYCOL ) - const HPL_T_grid * GRID; - int * NPROW; - int * NPCOL; - int * MYROW; - int * MYCOL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_grid_info returns the grid shape and the coordinates in the grid - * of the calling process. Successful completion is indicated by the - * returned error code MPI_SUCCESS. Other error codes depend on the MPI - * implementation. - * - * Arguments - * ========= - * - * GRID (local input) const HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * NPROW (global output) int * - * On exit, NPROW specifies the number of process rows in the - * grid. NPROW is at least one. - * - * NPCOL (global output) int * - * On exit, NPCOL specifies the number of process columns in - * the grid. NPCOL is at least one. - * - * MYROW (global output) int * - * On exit, MYROW specifies my row process coordinate in the - * grid. MYROW is greater than or equal to zero and less than - * NPROW. - * - * MYCOL (global output) int * - * On exit, MYCOL specifies my column process coordinate in the - * grid. MYCOL is greater than or equal to zero and less than - * NPCOL. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - *NPROW = GRID->nprow; *NPCOL = GRID->npcol; - *MYROW = GRID->myrow; *MYCOL = GRID->mycol; - return( MPI_SUCCESS ); -/* - * End of HPL_grid_info - */ -} diff --git a/hpl/src/grid/HPL_grid_init.c b/hpl/src/grid/HPL_grid_init.c deleted file mode 100644 index fd74c05b731a6b32a975b0b2ba678179f0784778..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_grid_init.c +++ /dev/null @@ -1,184 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_grid_init -( - MPI_Comm COMM, - const HPL_T_ORDER ORDER, - const int NPROW, - const int NPCOL, - HPL_T_grid * GRID -) -#else -int HPL_grid_init -( COMM, ORDER, NPROW, NPCOL, GRID ) - MPI_Comm COMM; - const HPL_T_ORDER ORDER; - const int NPROW; - const int NPCOL; - HPL_T_grid * GRID; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_grid_init creates a NPROW x NPCOL process grid using column- or - * row-major ordering from an initial collection of processes identified - * by an MPI communicator. Successful completion is indicated by the - * returned error code MPI_SUCCESS. Other error codes depend on the MPI - * implementation. The coordinates of processes that are not part of the - * grid are set to values outside of [0..NPROW) x [0..NPCOL). - * - * Arguments - * ========= - * - * COMM (global/local input) MPI_Comm - * On entry, COMM is the MPI communicator identifying the - * initial collection of processes out of which the grid is - * formed. - * - * ORDER (global input) const HPL_T_ORDER - * On entry, ORDER specifies how the processes should be ordered - * in the grid as follows: - * ORDER = HPL_ROW_MAJOR row-major ordering; - * ORDER = HPL_COLUMN_MAJOR column-major ordering; - * - * NPROW (global input) const int - * On entry, NPROW specifies the number of process rows in the - * grid to be created. NPROW must be at least one. - * - * NPCOL (global input) const int - * On entry, NPCOL specifies the number of process columns in - * the grid to be created. NPCOL must be at least one. - * - * GRID (local input/output) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information to be initialized. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int hdim, hplerr=MPI_SUCCESS, ierr, ip2, k, - mask, mycol, myrow, nprocs, rank, size; -/* .. - * .. Executable Statements .. - */ - MPI_Comm_rank( COMM, &rank ); MPI_Comm_size( COMM, &size ); -/* - * Abort if illegal process grid - */ - nprocs = NPROW * NPCOL; - if( ( nprocs > size ) || ( NPROW < 1 ) || ( NPCOL < 1 ) ) - { HPL_pabort( __LINE__, "HPL_grid_init", "Illegal Grid" ); } -/* - * Row- or column-major ordering of the processes - */ - if( ORDER == HPL_ROW_MAJOR ) - { - GRID->order = HPL_ROW_MAJOR; - myrow = rank / NPCOL; mycol = rank - myrow * NPCOL; - } - else - { - GRID->order = HPL_COLUMN_MAJOR; - mycol = rank / NPROW; myrow = rank - mycol * NPROW; - } - GRID->iam = rank; GRID->myrow = myrow; GRID->mycol = mycol; - GRID->nprow = NPROW; GRID->npcol = NPCOL; GRID->nprocs = nprocs; -/* - * row_ip2 : largest power of two <= nprow; - * row_hdim : row_ip2 procs hypercube dim; - * row_ip2m1 : largest power of two <= nprow-1; - * row_mask : row_ip2m1 procs hypercube mask; - */ - hdim = 0; ip2 = 1; k = NPROW; - while( k > 1 ) { k >>= 1; ip2 <<= 1; hdim++; } - GRID->row_ip2 = ip2; GRID->row_hdim = hdim; - - mask = ip2 = 1; k = NPROW - 1; - while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - GRID->row_ip2m1 = ip2; GRID->row_mask = mask; -/* - * col_ip2 : largest power of two <= npcol; - * col_hdim : col_ip2 procs hypercube dim; - * col_ip2m1 : largest power of two <= npcol-1; - * col_mask : col_ip2m1 procs hypercube mask; - */ - hdim = 0; ip2 = 1; k = NPCOL; - while( k > 1 ) { k >>= 1; ip2 <<= 1; hdim++; } - GRID->col_ip2 = ip2; GRID->col_hdim = hdim; - - mask = ip2 = 1; k = NPCOL - 1; - while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - GRID->col_ip2m1 = ip2; GRID->col_mask = mask; -/* - * All communicator, leave if I am not part of this grid. Creation of the - * row- and column communicators. - */ - ierr = MPI_Comm_split( COMM, ( rank < nprocs ? 0 : MPI_UNDEFINED ), - rank, &(GRID->all_comm) ); - if( GRID->all_comm == MPI_COMM_NULL ) return( ierr ); - - ierr = MPI_Comm_split( GRID->all_comm, myrow, mycol, &(GRID->row_comm) ); - if( ierr != MPI_SUCCESS ) hplerr = ierr; - - ierr = MPI_Comm_split( GRID->all_comm, mycol, myrow, &(GRID->col_comm) ); - if( ierr != MPI_SUCCESS ) hplerr = ierr; - - return( hplerr ); -/* - * End of HPL_grid_init - */ -} diff --git a/hpl/src/grid/HPL_max.c b/hpl/src/grid/HPL_max.c deleted file mode 100644 index effba9c6d02679dc55783aef824c501859419465..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_max.c +++ /dev/null @@ -1,118 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_max -( - const int N, - const void * IN, - void * INOUT, - const HPL_T_TYPE DTYPE -) -#else -void HPL_max -( N, IN, INOUT, DTYPE ) - const int N; - const void * IN; - void * INOUT; - const HPL_T_TYPE DTYPE; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_max combines (max) two buffers. - * - * - * Arguments - * ========= - * - * N (input) const int - * On entry, N specifies the length of the buffers to be - * combined. N must be at least zero. - * - * IN (input) const void * - * On entry, IN points to the input-only buffer to be combined. - * - * INOUT (input/output) void * - * On entry, INOUT points to the input-output buffer to be - * combined. On exit, the entries of this array contains the - * combined results. - * - * DTYPE (input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - register int i; -/* .. - * .. Executable Statements .. - */ - if( DTYPE == HPL_INT ) - { - const int * a = (const int *)(IN); - int * b = (int *)(INOUT); - for( i = 0; i < N; i++ ) b[i] = Mmax( a[i], b[i] ); - } - else - { - const double * a = (const double *)(IN); - double * b = (double *)(INOUT); - for( i = 0; i < N; i++ ) b[i] = Mmax( a[i], b[i] ); - } -/* - * End of HPL_max - */ -} diff --git a/hpl/src/grid/HPL_min.c b/hpl/src/grid/HPL_min.c deleted file mode 100644 index f1a9c21e36245157f81db05cadf1c9207360196d..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_min.c +++ /dev/null @@ -1,118 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_min -( - const int N, - const void * IN, - void * INOUT, - const HPL_T_TYPE DTYPE -) -#else -void HPL_min -( N, IN, INOUT, DTYPE ) - const int N; - const void * IN; - void * INOUT; - const HPL_T_TYPE DTYPE; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_min combines (min) two buffers. - * - * - * Arguments - * ========= - * - * N (input) const int - * On entry, N specifies the length of the buffers to be - * combined. N must be at least zero. - * - * IN (input) const void * - * On entry, IN points to the input-only buffer to be combined. - * - * INOUT (input/output) void * - * On entry, INOUT points to the input-output buffer to be - * combined. On exit, the entries of this array contains the - * combined results. - * - * DTYPE (input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - register int i; -/* .. - * .. Executable Statements .. - */ - if( DTYPE == HPL_INT ) - { - const int * a = (const int *)(IN); - int * b = (int *)(INOUT); - for( i = 0; i < N; i++ ) b[i] = Mmin( a[i], b[i] ); - } - else - { - const double * a = (const double *)(IN); - double * b = (double *)(INOUT); - for( i = 0; i < N; i++ ) b[i] = Mmin( a[i], b[i] ); - } -/* - * End of HPL_min - */ -} diff --git a/hpl/src/grid/HPL_pnum.c b/hpl/src/grid/HPL_pnum.c deleted file mode 100644 index a9863dc057535f59f6456c3479aabce446fb30da..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_pnum.c +++ /dev/null @@ -1,103 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_pnum -( - const HPL_T_grid * GRID, - const int MYROW, - const int MYCOL -) -#else -int HPL_pnum -( GRID, MYROW, MYCOL ) - const HPL_T_grid * GRID; - const int MYROW; - const int MYCOL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pnum determines the rank of a process as a function of its - * coordinates in the grid. - * - * Arguments - * ========= - * - * GRID (local input) const HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * MYROW (local input) const int - * On entry, MYROW specifies the row coordinate of the process - * whose rank is to be determined. MYROW must be greater than or - * equal to zero and less than NPROW. - * - * MYCOL (local input) const int - * On entry, MYCOL specifies the column coordinate of the - * process whose rank is to be determined. MYCOL must be greater - * than or equal to zero and less than NPCOL. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - if( GRID->order == HPL_ROW_MAJOR ) - return( MYROW * GRID->npcol + MYCOL ); - else - return( MYCOL * GRID->nprow + MYROW ); -/* - * End of HPL_pnum - */ -} diff --git a/hpl/src/grid/HPL_reduce.c b/hpl/src/grid/HPL_reduce.c deleted file mode 100644 index 645ff08bc4a19ef8c18077ed60ad650683d3f448..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_reduce.c +++ /dev/null @@ -1,179 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_reduce -( - void * BUFFER, - const int COUNT, - const HPL_T_TYPE DTYPE, - const HPL_T_OP OP, - const int ROOT, - MPI_Comm COMM -) -#else -int HPL_reduce -( BUFFER, COUNT, DTYPE, OP, ROOT, COMM ) - void * BUFFER; - const int COUNT; - const HPL_T_TYPE DTYPE; - const HPL_T_OP OP; - const int ROOT; - MPI_Comm COMM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_reduce performs a global reduce operation across all processes of - * a group. Note that the input buffer is used as workarray and in all - * processes but the accumulating process corrupting the original data. - * - * Arguments - * ========= - * - * BUFFER (local input/output) void * - * On entry, BUFFER points to the buffer to be reduced. On - * exit, and in process of rank ROOT this array contains the - * reduced data. This buffer is also used as workspace during - * the operation in the other processes of the group. - * - * COUNT (global input) const int - * On entry, COUNT indicates the number of entries in BUFFER. - * COUNT must be at least zero. - * - * DTYPE (global input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * OP (global input) const HPL_T_OP - * On entry, OP is a pointer to the local combine function. - * - * ROOT (global input) const int - * On entry, ROOT is the coordinate of the accumulating process. - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Status status; - void * buffer = NULL; - int hplerr=MPI_SUCCESS, d=1, i, ip2=1, mask=0, - mpierr, mydist, partner, rank, size, - tag = MSGID_BEGIN_COLL; -/* .. - * .. Executable Statements .. - */ - if( COUNT <= 0 ) return( MPI_SUCCESS ); - mpierr = MPI_Comm_size( COMM, &size ); - if( size == 1 ) return( MPI_SUCCESS ); - mpierr = MPI_Comm_rank( COMM, &rank ); - i = size - 1; while( i > 1 ) { i >>= 1; d++; } - - if( DTYPE == HPL_INT ) - buffer = (void *)( (int *) malloc( (size_t)(COUNT) * - sizeof( int ) ) ); - else - buffer = (void *)( (double *)malloc( (size_t)(COUNT) * - sizeof( double ) ) ); - - if( !( buffer ) ) - { HPL_pabort( __LINE__, "HPL_reduce", "Memory allocation failed" ); } - - if( ( mydist = MModSub( rank, ROOT, size ) ) == 0 ) - { - do - { - mpierr = MPI_Recv( buffer, COUNT, HPL_2_MPI_TYPE( DTYPE ), - MModAdd( ROOT, ip2, size ), tag, COMM, - &status ); - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - OP( COUNT, buffer, BUFFER, DTYPE ); - ip2 <<= 1; d--; - } while( d ); - } - else - { - do - { - if( ( mydist & mask ) == 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { - partner = MModAdd( ROOT, partner, size ); - mpierr = MPI_Send( BUFFER, COUNT, HPL_2_MPI_TYPE( DTYPE ), - partner, tag, COMM ); - } - else if( partner < size ) - { - partner = MModAdd( ROOT, partner, size ); - mpierr = MPI_Recv( buffer, COUNT, HPL_2_MPI_TYPE( DTYPE ), - partner, tag, COMM, &status ); - OP( COUNT, buffer, BUFFER, DTYPE ); - } - if( mpierr != MPI_SUCCESS ) hplerr = mpierr; - } - mask ^= ip2; ip2 <<= 1; d--; - } while( d ); - } - if( buffer ) free( buffer ); - - return( hplerr ); -/* - * End of HPL_reduce - */ -} diff --git a/hpl/src/grid/HPL_sum.c b/hpl/src/grid/HPL_sum.c deleted file mode 100644 index b8be8c89d9d0e403e66b078637949506f405d702..0000000000000000000000000000000000000000 --- a/hpl/src/grid/HPL_sum.c +++ /dev/null @@ -1,118 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_sum -( - const int N, - const void * IN, - void * INOUT, - const HPL_T_TYPE DTYPE -) -#else -void HPL_sum -( N, IN, INOUT, DTYPE ) - const int N; - const void * IN; - void * INOUT; - const HPL_T_TYPE DTYPE; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_sum combines (sum) two buffers. - * - * - * Arguments - * ========= - * - * N (input) const int - * On entry, N specifies the length of the buffers to be - * combined. N must be at least zero. - * - * IN (input) const void * - * On entry, IN points to the input-only buffer to be combined. - * - * INOUT (input/output) void * - * On entry, INOUT points to the input-output buffer to be - * combined. On exit, the entries of this array contains the - * combined results. - * - * DTYPE (input) const HPL_T_TYPE - * On entry, DTYPE specifies the type of the buffers operands. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - register int i; -/* .. - * .. Executable Statements .. - */ - if( DTYPE == HPL_INT ) - { - const int * a = (const int *)(IN); - int * b = (int *)(INOUT); - for( i = 0; i < N; i++ ) b[i] += a[i]; - } - else - { - const double * a = (const double *)(IN); - double * b = (double *)(INOUT); - for( i = 0; i < N; i++ ) b[i] += a[i]; - } -/* - * End of HPL_sum - */ -} diff --git a/hpl/src/grid/intel64/Make.inc b/hpl/src/grid/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/grid/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/grid/intel64/Makefile b/hpl/src/grid/intel64/Makefile deleted file mode 100644 index 4a493f49d7f6c84552fd2112ad95fc1f4e0924b3..0000000000000000000000000000000000000000 --- a/hpl/src/grid/intel64/Makefile +++ /dev/null @@ -1,103 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h -# -## Object files ######################################################## -# -HPL_griobj = \ - HPL_grid_init.o HPL_pnum.o HPL_grid_info.o \ - HPL_grid_exit.o HPL_broadcast.o HPL_reduce.o \ - HPL_all_reduce.o HPL_barrier.o HPL_min.o \ - HPL_max.o HPL_sum.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_griobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_griobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_grid_init.o : ../HPL_grid_init.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_init.c -HPL_pnum.o : ../HPL_pnum.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pnum.c -HPL_grid_info.o : ../HPL_grid_info.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_info.c -HPL_grid_exit.o : ../HPL_grid_exit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_grid_exit.c -HPL_broadcast.o : ../HPL_broadcast.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_broadcast.c -HPL_reduce.o : ../HPL_reduce.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_reduce.c -HPL_all_reduce.o : ../HPL_all_reduce.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_all_reduce.c -HPL_barrier.o : ../HPL_barrier.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_barrier.c -HPL_min.o : ../HPL_min.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_min.c -HPL_max.o : ../HPL_max.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_max.c -HPL_sum.o : ../HPL_sum.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_sum.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/grid/intel64/lib.grd b/hpl/src/grid/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/panel/HPL_pdpanel_disp.c b/hpl/src/panel/HPL_pdpanel_disp.c deleted file mode 100644 index 756094c9a01fd12a6b847f469561d7322bc16c9b..0000000000000000000000000000000000000000 --- a/hpl/src/panel/HPL_pdpanel_disp.c +++ /dev/null @@ -1,97 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_pdpanel_disp -( - HPL_T_panel * * PANEL -) -#else -int HPL_pdpanel_disp -( PANEL ) - HPL_T_panel * * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanel_disp deallocates the panel structure and resources and - * stores the error code returned by the panel factorization. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * * - * On entry, PANEL points to the address of the panel data - * structure to be deallocated. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int mpierr; -/* .. - * .. Executable Statements .. - */ -/* - * Deallocate the panel resources and panel structure - */ - mpierr = HPL_pdpanel_free( *PANEL ); - if( *PANEL ) free( *PANEL ); - *PANEL = NULL; - - return( mpierr ); -/* - * End of HPL_pdpanel_disp - */ -} diff --git a/hpl/src/panel/HPL_pdpanel_free.c b/hpl/src/panel/HPL_pdpanel_free.c deleted file mode 100644 index fbf95d808b553ec75a9bac6f31694e5db5a3bfdd..0000000000000000000000000000000000000000 --- a/hpl/src/panel/HPL_pdpanel_free.c +++ /dev/null @@ -1,104 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_pdpanel_free -( - HPL_T_panel * PANEL -) -#else -int HPL_pdpanel_free -( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanel_free deallocates the panel resources and stores the error - * code returned by the panel factorization. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the panel data structure from - * which the resources should be deallocated. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - if( PANEL->pmat->info == 0 ) PANEL->pmat->info = *(PANEL->DINFO); -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( PANEL->L1block, VSIP_TRUE ); - (void) vsip_blockrelease_d( PANEL->L2block, VSIP_TRUE ); - if( PANEL->grid->nprow > 1 ) - (void) vsip_blockrelease_d( PANEL->Ublock, VSIP_TRUE ); -/* - * Destroy blocks - */ - vsip_blockdestroy_d( PANEL->L1block ); - vsip_blockdestroy_d( PANEL->L2block ); - if( PANEL->grid->nprow > 1 ) - vsip_blockdestroy_d( PANEL->Ublock ); -#endif - - if( PANEL->WORK ) free( PANEL->WORK ); - if( PANEL->IWORK ) free( PANEL->IWORK ); - - return( MPI_SUCCESS ); -/* - * End of HPL_pdpanel_free - */ -} diff --git a/hpl/src/panel/HPL_pdpanel_init.c b/hpl/src/panel/HPL_pdpanel_init.c deleted file mode 100644 index 3fff1cb301271b1c7c50371cf85ef12dc2f5f22a..0000000000000000000000000000000000000000 --- a/hpl/src/panel/HPL_pdpanel_init.c +++ /dev/null @@ -1,348 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef HPL_NO_MPI_DATATYPE /* The user insists to not use MPI types */ -#ifndef HPL_COPY_L /* and also want to avoid the copy of L ... */ -#define HPL_COPY_L /* well, sorry, can not do that: force the copy */ -#endif -#endif - -#ifdef STDC_HEADERS -void HPL_pdpanel_init -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - const int M, - const int N, - const int JB, - HPL_T_pmat * A, - const int IA, - const int JA, - const int TAG, - HPL_T_panel * PANEL -) -#else -void HPL_pdpanel_init -( GRID, ALGO, M, N, JB, A, IA, JA, TAG, PANEL ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - const int M; - const int N; - const int JB; - HPL_T_pmat * A; - const int IA; - const int JA; - const int TAG; - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanel_init initializes a panel data structure. - * - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * M (local input) const int - * On entry, M specifies the global number of rows of the panel. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the global number of columns of the - * panel and trailing submatrix. N must be at least zero. - * - * JB (global input) const int - * On entry, JB specifies is the number of columns of the panel. - * JB must be at least zero. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * IA (global input) const int - * On entry, IA is the global row index identifying the panel - * and trailing submatrix. IA must be at least zero. - * - * JA (global input) const int - * On entry, JA is the global column index identifying the panel - * and trailing submatrix. JA must be at least zero. - * - * TAG (global input) const int - * On entry, TAG is the row broadcast message id. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - size_t dalign; - int icurcol, icurrow, ii, itmp1, jj, lwork, - ml2, mp, mycol, myrow, nb, npcol, nprow, - nq, nu; -/* .. - * .. Executable Statements .. - */ - PANEL->grid = GRID; /* ptr to the process grid */ - PANEL->algo = ALGO; /* ptr to the algo parameters */ - PANEL->pmat = A; /* ptr to the local array info */ - - myrow = GRID->myrow; mycol = GRID->mycol; - nprow = GRID->nprow; npcol = GRID->npcol; nb = A->nb; - - HPL_infog2l( IA, JA, nb, nb, nb, nb, 0, 0, myrow, mycol, - nprow, npcol, &ii, &jj, &icurrow, &icurcol ); - mp = HPL_numrocI( M, IA, nb, nb, myrow, 0, nprow ); - nq = HPL_numrocI( N, JA, nb, nb, mycol, 0, npcol ); - /* ptr to trailing part of A */ - PANEL->A = Mptr( (double *)(A->A), ii, jj, A->ld ); -/* - * Workspace pointers are initialized to NULL. - */ - PANEL->WORK = NULL; PANEL->L2 = NULL; PANEL->L1 = NULL; - PANEL->DPIV = NULL; PANEL->DINFO = NULL; PANEL->U = NULL; - PANEL->IWORK = NULL; -/* - * Local lengths, indexes process coordinates - */ - PANEL->nb = nb; /* distribution blocking factor */ - PANEL->jb = JB; /* panel width */ - PANEL->m = M; /* global # of rows of trailing part of A */ - PANEL->n = N; /* global # of cols of trailing part of A */ - PANEL->ia = IA; /* global row index of trailing part of A */ - PANEL->ja = JA; /* global col index of trailing part of A */ - PANEL->mp = mp; /* local # of rows of trailing part of A */ - PANEL->nq = nq; /* local # of cols of trailing part of A */ - PANEL->ii = ii; /* local row index of trailing part of A */ - PANEL->jj = jj; /* local col index of trailing part of A */ - PANEL->lda = A->ld; /* local leading dim of array A */ - PANEL->prow = icurrow; /* proc row owning 1st row of trailing A */ - PANEL->pcol = icurcol; /* proc col owning 1st col of trailing A */ - PANEL->msgid = TAG; /* message id to be used for panel bcast */ -/* - * Initialize ldl2 and len to temporary dummy values and Update tag for - * next panel - */ - PANEL->ldl2 = 0; /* local leading dim of array L2 */ - PANEL->len = 0; /* length of the buffer to broadcast */ -/* - * Figure out the exact amount of workspace needed by the factorization - * and the update - Allocate that space - Finish the panel data structu- - * re initialization. - * - * L1: JB x JB in all processes - * DPIV: JB in all processes - * DINFO: 1 in all processes - * - * We make sure that those three arrays are contiguous in memory for the - * later panel broadcast. We also choose to put this amount of space - * right after L2 (when it exist) so that one can receive a contiguous - * buffer. - */ - dalign = ALGO->align * sizeof( double ); - - if( npcol == 1 ) /* P x 1 process grid */ - { /* space for L1, DPIV, DINFO */ - lwork = ALGO->align + ( PANEL->len = JB * JB + JB + 1 ); - if( nprow > 1 ) /* space for U */ - { nu = nq - JB; lwork += JB * Mmax( 0, nu ); } - - if( !( PANEL->WORK = (void *)malloc( (size_t)(lwork) * - sizeof( double ) ) ) ) - { - HPL_pabort( __LINE__, "HPL_pdpanel_init", - "Memory allocation failed" ); - } -/* - * Initialize the pointers of the panel structure - Always re-use A in - * the only process column - */ - PANEL->L2 = PANEL->A + ( myrow == icurrow ? JB : 0 ); - PANEL->ldl2 = A->ld; - PANEL->L1 = (double *)HPL_PTR( PANEL->WORK, dalign ); - PANEL->DPIV = PANEL->L1 + JB * JB; - PANEL->DINFO = PANEL->DPIV + JB; *(PANEL->DINFO) = 0.0; - PANEL->U = ( nprow > 1 ? PANEL->DINFO + 1: NULL ); - } - else - { /* space for L2, L1, DPIV */ - ml2 = ( myrow == icurrow ? mp - JB : mp ); ml2 = Mmax( 0, ml2 ); - PANEL->len = ml2*JB + ( itmp1 = JB*JB + JB + 1 ); -#ifdef HPL_COPY_L - lwork = ALGO->align + PANEL->len; -#else - lwork = ALGO->align + ( mycol == icurcol ? itmp1 : PANEL->len ); -#endif - if( nprow > 1 ) /* space for U */ - { - nu = ( mycol == icurcol ? nq - JB : nq ); - lwork += JB * Mmax( 0, nu ); - } - - if( !( PANEL->WORK = (void *)malloc( (size_t)(lwork) * - sizeof( double ) ) ) ) - { - HPL_pabort( __LINE__, "HPL_pdpanel_init", - "Memory allocation failed" ); - } -/* - * Initialize the pointers of the panel structure - Re-use A in the cur- - * rent process column when HPL_COPY_L is not defined. - */ -#ifdef HPL_COPY_L - PANEL->L2 = (double *)HPL_PTR( PANEL->WORK, dalign ); - PANEL->ldl2 = Mmax( 1, ml2 ); - PANEL->L1 = PANEL->L2 + ml2 * JB; -#else - if( mycol == icurcol ) - { - PANEL->L2 = PANEL->A + ( myrow == icurrow ? JB : 0 ); - PANEL->ldl2 = A->ld; - PANEL->L1 = (double *)HPL_PTR( PANEL->WORK, dalign ); - } - else - { - PANEL->L2 = (double *)HPL_PTR( PANEL->WORK, dalign ); - PANEL->ldl2 = Mmax( 1, ml2 ); - PANEL->L1 = PANEL->L2 + ml2 * JB; - } -#endif - PANEL->DPIV = PANEL->L1 + JB * JB; - PANEL->DINFO = PANEL->DPIV + JB; *(PANEL->DINFO) = 0.0; - PANEL->U = ( nprow > 1 ? PANEL->DINFO + 1 : NULL ); - } -#ifdef HPL_CALL_VSIPL - PANEL->Ablock = A->block; -/* - * Create blocks and bind them to the data pointers - */ - PANEL->L1block = vsip_blockbind_d( (vsip_scalar_d *)(PANEL->L1), - (vsip_length)(JB*JB), VSIP_MEM_NONE ); - PANEL->L2block = vsip_blockbind_d( (vsip_scalar_d *)(PANEL->L2), - (vsip_length)(PANEL->ldl2*JB), - VSIP_MEM_NONE ); - if( nprow > 1 ) - { - nu = ( mycol == icurcol ? nq - JB : nq ); - PANEL->Ublock = vsip_blockbind_d( (vsip_scalar_d *)(PANEL->U), - (vsip_length)(JB * Mmax( 0, nu )), - VSIP_MEM_NONE ); - } - else { PANEL->Ublock = A->block; } -#endif -/* - * If nprow is 1, we just allocate an array of JB integers for the swap. - * When nprow > 1, we allocate the space for the index arrays immediate- - * ly. The exact size of this array depends on the swapping routine that - * will be used, so we allocate the maximum: - * - * IWORK[0] is of size at most 1 + - * IPL is of size at most 1 + - * IPID is of size at most 4 * JB + - * - * For HPL_pdlaswp00: - * lindxA is of size at most 2 * JB + - * lindxAU is of size at most 2 * JB + - * llen is of size at most NPROW + - * llen_sv is of size at most NPROW. - * - * For HPL_pdlaswp01: - * ipA is of size ar most 1 + - * lindxA is of size at most 2 * JB + - * lindxAU is of size at most 2 * JB + - * iplen is of size at most NPROW + 1 + - * ipmap is of size at most NPROW + - * ipmapm1 is of size at most NPROW + - * permU is of size at most JB + - * iwork is of size at most MAX( 2*JB, NPROW+1 ). - * - * that is 3 + 8*JB + MAX(2*NPROW, 3*NPROW+1+JB+MAX(2*JB,NPROW+1)) - * = 4 + 9*JB + 3*NPROW + MAX( 2*JB, NPROW+1 ). - * - * We use the fist entry of this to work array to indicate whether the - * the local index arrays have already been computed, and if yes, by - * which function: - * IWORK[0] = -1: no index arrays have been computed so far; - * IWORK[0] = 0: HPL_pdlaswp00 already computed those arrays; - * IWORK[0] = 1: HPL_pdlaswp01 already computed those arrays; - * This allows to save some redundant and useless computations. - */ - if( nprow == 1 ) { lwork = JB; } - else - { - itmp1 = (JB << 1); lwork = nprow + 1; itmp1 = Mmax( itmp1, lwork ); - lwork = 4 + (9 * JB) + (3 * nprow) + itmp1; - } - - PANEL->IWORK = (int *)malloc( (size_t)(lwork) * sizeof( int ) ); - - if( PANEL->IWORK == NULL ) - { HPL_pabort( __LINE__, "HPL_pdpanel_init", "Memory allocation failed" ); } - /* Initialize the first entry of the workarray */ - *(PANEL->IWORK) = -1; -/* - * End of HPL_pdpanel_init - */ -} diff --git a/hpl/src/panel/HPL_pdpanel_new.c b/hpl/src/panel/HPL_pdpanel_new.c deleted file mode 100644 index c370032aa60dc29182a32237e596fad33d1c9761..0000000000000000000000000000000000000000 --- a/hpl/src/panel/HPL_pdpanel_new.c +++ /dev/null @@ -1,152 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpanel_new -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - const int M, - const int N, - const int JB, - HPL_T_pmat * A, - const int IA, - const int JA, - const int TAG, - HPL_T_panel * * PANEL -) -#else -void HPL_pdpanel_new -( GRID, ALGO, M, N, JB, A, IA, JA, TAG, PANEL ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - const int M; - const int N; - const int JB; - HPL_T_pmat * A; - const int IA; - const int JA; - const int TAG; - HPL_T_panel * * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanel_new creates and initializes a panel data structure. - * - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * M (local input) const int - * On entry, M specifies the global number of rows of the panel. - * M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the global number of columns of the - * panel and trailing submatrix. N must be at least zero. - * - * JB (global input) const int - * On entry, JB specifies is the number of columns of the panel. - * JB must be at least zero. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * IA (global input) const int - * On entry, IA is the global row index identifying the panel - * and trailing submatrix. IA must be at least zero. - * - * JA (global input) const int - * On entry, JA is the global column index identifying the panel - * and trailing submatrix. JA must be at least zero. - * - * TAG (global input) const int - * On entry, TAG is the row broadcast message id. - * - * PANEL (local input/output) HPL_T_panel * * - * On entry, PANEL points to the address of the panel data - * structure to create and initialize. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - HPL_T_panel * p = NULL; -/* .. - * .. Executable Statements .. - */ -/* - * Allocate the panel structure - Check for enough memory - */ - if( !( p = (HPL_T_panel *)malloc( sizeof( HPL_T_panel ) ) ) ) - { - HPL_pabort( __LINE__, "HPL_pdpanel_new", "Memory allocation failed" ); - } - - HPL_pdpanel_init( GRID, ALGO, M, N, JB, A, IA, JA, TAG, p ); - *PANEL = p; -/* - * End of HPL_pdpanel_new - */ -} diff --git a/hpl/src/panel/intel64/Make.inc b/hpl/src/panel/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/panel/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/panel/intel64/Makefile b/hpl/src/panel/intel64/Makefile deleted file mode 100644 index 92a4fa5fa87057d2afa92f6981570e0ec90afa61..0000000000000000000000000000000000000000 --- a/hpl/src/panel/intel64/Makefile +++ /dev/null @@ -1,90 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_comm.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_panel.h $(INCdir)/hpl_pfact.h \ - $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_panobj = \ - HPL_pdpanel_new.o HPL_pdpanel_init.o HPL_pdpanel_disp.o \ - HPL_pdpanel_free.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_panobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_panobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pdpanel_new.o : ../HPL_pdpanel_new.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_new.c -HPL_pdpanel_init.o : ../HPL_pdpanel_init.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_init.c -HPL_pdpanel_disp.o : ../HPL_pdpanel_disp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_disp.c -HPL_pdpanel_free.o : ../HPL_pdpanel_free.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanel_free.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/panel/intel64/lib.grd b/hpl/src/panel/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/pauxil/HPL_dlaswp00N.c b/hpl/src/pauxil/HPL_dlaswp00N.c deleted file mode 100644 index 02db4c8fe5cb03932d97af981297954af7c8a2b4..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp00N.c +++ /dev/null @@ -1,198 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP00N_DEPTH -#define HPL_LASWP00N_DEPTH 32 -#define HPL_LASWP00N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp00N -( - const int M, - const int N, - double * A, - const int LDA, - const int * IPIV -) -#else -void HPL_dlaswp00N -( M, N, A, LDA, IPIV ) - const int M; - const int N; - double * A; - const int LDA; - const int * IPIV; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp00N performs a series of local row interchanges on a matrix - * A. One row interchange is initiated for rows 0 through M-1 of A. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of the array A to be - * interchanged. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the number of columns of the array A. - * N must be at least zero. - * - * A (local input/output) double * - * On entry, A points to an array of dimension (LDA,N) to which - * the row interchanges will be applied. On exit, the permuted - * matrix. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * IPIV (local input) const int * - * On entry, IPIV is an array of size M that contains the - * pivoting information. For k in [0..M), IPIV[k]=IROFF + l - * implies that local rows k and l are to be interchanged. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - register double r; - double * a0, * a1; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP00N_LOG2_DEPTH ); - int ip, nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP00N_LOG2_DEPTH ) - << HPL_LASWP00N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP00N_DEPTH, A += incA ) - { - for( i = 0; i < M; i++ ) - { - if( i != ( ip = IPIV[i] ) ) - { - a0 = A + i; a1 = A + ip; - - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#if ( HPL_LASWP00N_DEPTH > 1 ) - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#endif -#if ( HPL_LASWP00N_DEPTH > 2 ) - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#endif -#if ( HPL_LASWP00N_DEPTH > 4 ) - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#endif -#if ( HPL_LASWP00N_DEPTH > 8 ) - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#endif -#if ( HPL_LASWP00N_DEPTH > 16 ) - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; - r = *a0; *a0 = *a1; *a1 = r; a0 += LDA; a1 += LDA; -#endif - } - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - if( i != ( ip = IPIV[i] ) ) - { - a0 = A + i; a1 = A + ip; - for( j = 0; j < nr; j++, a0 += LDA, a1 += LDA ) - { r = *a0; *a0 = *a1; *a1 = r; } - } - } - } -/* - * End of HPL_dlaswp00N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp01N.c b/hpl/src/pauxil/HPL_dlaswp01N.c deleted file mode 100644 index 293e3c73352e02f244fdbbc6048f4e2d39115205..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp01N.c +++ /dev/null @@ -1,209 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP01N_DEPTH -#define HPL_LASWP01N_DEPTH 32 -#define HPL_LASWP01N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp01N -( - const int M, - const int N, - double * A, - const int LDA, - double * U, - const int LDU, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp01N -( M, N, A, LDA, U, LDU, LINDXA, LINDXAU ) - const int M; - const int N; - double * A; - const int LDA; - double * U; - const int LDU; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp01N copies scattered rows of A into itself and into an - * array U. The row offsets in A of the source rows are specified by - * LINDXA. The destination of those rows are specified by LINDXAU. A - * positive value of LINDXAU indicates that the array destination is U, - * and A otherwise. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A that should be - * moved within A or copied into U. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of rows of A that should be - * moved within A or copied into U. N must be at least zero. - * - * A (local input/output) double * - * On entry, A points to an array of dimension (LDA,N). The rows - * of this array specified by LINDXA should be moved within A or - * copied into U. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,N). The rows - * of A specified by LINDXA are be copied within this array U at - * the positions indicated by positive values of LINDXAU. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,M). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be moved within A or - * or copied into U. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M that contains - * the local row indexes of U where the rows of A should be - * copied at. This array also contains the local row offsets in - * A where some of the rows of A should be moved to. A positive - * value of LINDXAU[i] indicates that the row LINDXA[i] of A - * should be copied into U at the position LINDXAU[i]; otherwise - * the row LINDXA[i] of A should be moved at the position - * -LINDXAU[i] within A. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * a0, * a1; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP01N_LOG2_DEPTH ), - incU = (int)( (unsigned int)(LDU) << - HPL_LASWP01N_LOG2_DEPTH ); - int lda1, nu, nr; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP01N_LOG2_DEPTH ) << - HPL_LASWP01N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP01N_DEPTH, A += incA, U += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - if( LINDXAU[i] >= 0 ) { a1 = U + (size_t)(LINDXAU[i]); lda1 = LDU; } - else { a1 = A - (size_t)(LINDXAU[i]); lda1 = LDA; } - - *a1 = *a0; a1 += lda1; a0 += LDA; -#if ( HPL_LASWP01N_DEPTH > 1 ) - *a1 = *a0; a1 += lda1; a0 += LDA; -#endif -#if ( HPL_LASWP01N_DEPTH > 2 ) - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; -#endif -#if ( HPL_LASWP01N_DEPTH > 4 ) - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; -#endif -#if ( HPL_LASWP01N_DEPTH > 8 ) - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; -#endif -#if ( HPL_LASWP01N_DEPTH > 16 ) - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; - *a1 = *a0; a1 += lda1; a0 += LDA; *a1 = *a0; a1 += lda1; a0 += LDA; -#endif - } - } - - if( nr ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - if( LINDXAU[i] >= 0 ) { a1 = U + (size_t)(LINDXAU[i]); lda1 = LDU; } - else { a1 = A - (size_t)(LINDXAU[i]); lda1 = LDA; } - for( j = 0; j < nr; j++, a1 += lda1, a0 += LDA ) { *a1 = *a0; } - } - } -/* - * End of HPL_dlaswp01N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp01T.c b/hpl/src/pauxil/HPL_dlaswp01T.c deleted file mode 100644 index 29dbd98988bfe1b388f20c4648e33e0b028c86da..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp01T.c +++ /dev/null @@ -1,252 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP01T_DEPTH -#define HPL_LASWP01T_DEPTH 32 -#define HPL_LASWP01T_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp01T -( - const int M, - const int N, - double * A, - const int LDA, - double * U, - const int LDU, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp01T -( M, N, A, LDA, U, LDU, LINDXA, LINDXAU ) - const int M; - const int N; - double * A; - const int LDA; - double * U; - const int LDU; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp01T copies scattered rows of A into itself and into an - * array U. The row offsets in A of the source rows are specified by - * LINDXA. The destination of those rows are specified by LINDXAU. A - * positive value of LINDXAU indicates that the array destination is U, - * and A otherwise. Rows of A are stored as columns in U. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A that should be - * moved within A or copied into U. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of rows of A that should be - * moved within A or copied into U. N must be at least zero. - * - * A (local input/output) double * - * On entry, A points to an array of dimension (LDA,N). The rows - * of this array specified by LINDXA should be moved within A or - * copied into U. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,M). The rows - * of A specified by LINDXA are copied within this array U at - * the positions indicated by positive values of LINDXAU. The - * rows of A are stored as columns in U. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,N). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be moved within A or - * or copied into U. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M that contains - * the local row indexes of U where the rows of A should be - * copied at. This array also contains the local row offsets in - * A where some of the rows of A should be moved to. A positive - * value of LINDXAU[i] indicates that the row LINDXA[i] of A - * should be copied into U at the position LINDXAU[i]; otherwise - * the row LINDXA[i] of A should be moved at the position - * -LINDXAU[i] within A. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * a0, * a1; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP01T_LOG2_DEPTH ), - incU = ( 1 << HPL_LASWP01T_LOG2_DEPTH ); - int nu, nr; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP01T_LOG2_DEPTH ) << - HPL_LASWP01T_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP01T_DEPTH, A += incA, U += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - - if( LINDXAU[i] >= 0 ) - { - a1 = U + (size_t)(LINDXAU[i]) * (size_t)(LDU); - - a1[ 0] = *a0; a0 += LDA; -#if ( HPL_LASWP01T_DEPTH > 1 ) - a1[ 1] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 2 ) - a1[ 2] = *a0; a0 += LDA; a1[ 3] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 4 ) - a1[ 4] = *a0; a0 += LDA; a1[ 5] = *a0; a0 += LDA; - a1[ 6] = *a0; a0 += LDA; a1[ 7] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 8 ) - a1[ 8] = *a0; a0 += LDA; a1[ 9] = *a0; a0 += LDA; - a1[10] = *a0; a0 += LDA; a1[11] = *a0; a0 += LDA; - a1[12] = *a0; a0 += LDA; a1[13] = *a0; a0 += LDA; - a1[14] = *a0; a0 += LDA; a1[15] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 16 ) - a1[16] = *a0; a0 += LDA; a1[17] = *a0; a0 += LDA; - a1[18] = *a0; a0 += LDA; a1[19] = *a0; a0 += LDA; - a1[20] = *a0; a0 += LDA; a1[21] = *a0; a0 += LDA; - a1[22] = *a0; a0 += LDA; a1[23] = *a0; a0 += LDA; - a1[24] = *a0; a0 += LDA; a1[25] = *a0; a0 += LDA; - a1[26] = *a0; a0 += LDA; a1[27] = *a0; a0 += LDA; - a1[28] = *a0; a0 += LDA; a1[29] = *a0; a0 += LDA; - a1[30] = *a0; a0 += LDA; a1[31] = *a0; a0 += LDA; -#endif - } - else - { - a1 = A - (size_t)(LINDXAU[i]); - - *a1 = *a0; a1 += LDA; a0 += LDA; -#if ( HPL_LASWP01T_DEPTH > 1 ) - *a1 = *a0; a1 += LDA; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 2 ) - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 4 ) - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 8 ) - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; -#endif -#if ( HPL_LASWP01T_DEPTH > 16 ) - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; - *a1 = *a0; a1 += LDA; a0 += LDA; *a1 = *a0; a1 += LDA; a0 += LDA; -#endif - } - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - - if( LINDXAU[i] >= 0 ) - { - a1 = U + (size_t)(LINDXAU[i]) * (size_t)(LDU); - for( j = 0; j < nr; j++, a0 += LDA ) { a1[j] = *a0; } - } - else - { - a1 = A - (size_t)(LINDXAU[i]); - for( j = 0; j < nr; j++, a1 += LDA, a0 += LDA ) { *a1 = *a0; } - } - } - } -/* - * End of HPL_dlaswp01T - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp02N.c b/hpl/src/pauxil/HPL_dlaswp02N.c deleted file mode 100644 index c433c95072228c6a847f8493a7af4751b66a50f3..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp02N.c +++ /dev/null @@ -1,205 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP02N_DEPTH -#define HPL_LASWP02N_DEPTH 32 -#define HPL_LASWP02N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp02N -( - const int M, - const int N, - const double * A, - const int LDA, - double * W0, - double * W, - const int LDW, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp02N -( M, N, A, LDA, W0, W, LDW, LINDXA, LINDXAU ) - const int M; - const int N; - const double * A; - const int LDA; - double * W0; - double * W; - const int LDW; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp02N packs scattered rows of an array A into workspace W. - * The row offsets in A are specified by LINDXA. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A that should be - * copied into W. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of rows of A that should be - * copied into W. N must be at least zero. - * - * A (local input) const double * - * On entry, A points to an array of dimension (LDA,N). The rows - * of this array specified by LINDXA should be copied into W. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * W0 (local input/output) double * - * On exit, W0 is an array of size (M-1)*LDW+1, that contains - * the destination offset in U where the columns of W should be - * copied. - * - * W (local output) double * - * On entry, W is an array of size (LDW,M). On exit, W contains - * the rows LINDXA[i] for i in [0..M) of A stored contiguously - * in W(:,i). - * - * LDW (local input) const int - * On entry, LDW specifies the leading dimension of the array W. - * LDW must be at least MAX(1,N+1). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be copied into W. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M that contains - * the local row indexes of U that should be copied into A and - * replaced by the rows of W. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * A0 = A, * a0; - double * w0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP02N_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - for( i = 0; i < M; i++ ) - *(W0+(size_t)(i)*(size_t)(LDW)) = (double)(LINDXAU[i]); - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP02N_LOG2_DEPTH ) << - HPL_LASWP02N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; - j += HPL_LASWP02N_DEPTH, A0 += incA, W += HPL_LASWP02N_DEPTH ) - { - for( i = 0; i < M; i++ ) - { - a0 = A0 + (size_t)(LINDXA[i]); w0 = W + (size_t)(i) * (size_t)(LDW); - - w0[ 0] = *a0; a0 += LDA; -#if ( HPL_LASWP02N_DEPTH > 1 ) - w0[ 1] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP02N_DEPTH > 2 ) - w0[ 2] = *a0; a0 += LDA; w0[ 3] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP02N_DEPTH > 4 ) - w0[ 4] = *a0; a0 += LDA; w0[ 5] = *a0; a0 += LDA; - w0[ 6] = *a0; a0 += LDA; w0[ 7] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP02N_DEPTH > 8 ) - w0[ 8] = *a0; a0 += LDA; w0[ 9] = *a0; a0 += LDA; - w0[10] = *a0; a0 += LDA; w0[11] = *a0; a0 += LDA; - w0[12] = *a0; a0 += LDA; w0[13] = *a0; a0 += LDA; - w0[14] = *a0; a0 += LDA; w0[15] = *a0; a0 += LDA; -#endif -#if ( HPL_LASWP02N_DEPTH > 16 ) - w0[16] = *a0; a0 += LDA; w0[17] = *a0; a0 += LDA; - w0[18] = *a0; a0 += LDA; w0[19] = *a0; a0 += LDA; - w0[20] = *a0; a0 += LDA; w0[21] = *a0; a0 += LDA; - w0[22] = *a0; a0 += LDA; w0[23] = *a0; a0 += LDA; - w0[24] = *a0; a0 += LDA; w0[25] = *a0; a0 += LDA; - w0[26] = *a0; a0 += LDA; w0[27] = *a0; a0 += LDA; - w0[28] = *a0; a0 += LDA; w0[29] = *a0; a0 += LDA; - w0[30] = *a0; a0 += LDA; w0[31] = *a0; a0 += LDA; -#endif - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - a0 = A0 + (size_t)(LINDXA[i]); w0 = W + (size_t)(i) * (size_t)(LDW); - for( j = 0; j < nr; j++, a0 += LDA ) { w0[j] = *a0; } - } - } -/* - * End of HPL_dlaswp02N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp03N.c b/hpl/src/pauxil/HPL_dlaswp03N.c deleted file mode 100644 index 7f4860f29e16e158d05bd08b1eb3632ace55f259..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp03N.c +++ /dev/null @@ -1,194 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP03N_DEPTH -#define HPL_LASWP03N_DEPTH 32 -#define HPL_LASWP03N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp03N -( - const int M, - const int N, - double * U, - const int LDU, - const double * W0, - const double * W, - const int LDW -) -#else -void HPL_dlaswp03N -( M, N, U, LDU, W0, W, LDW ) - const int M; - const int N; - double * U; - const int LDU; - const double * W0; - const double * W; - const int LDW; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp03N copies columns of W into rows of an array U. The - * destination in U of these columns contained in W is stored within W0. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of columns of W stored - * contiguously that should be copied into U. M must be at least - * zero. - * - * N (local input) const int - * On entry, N specifies the length of columns of W stored - * contiguously that should be copied into U. N must be at least - * zero. - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,N). Columns - * of W are copied as rows within this array U at the positions - * specified in W0. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,M). - * - * W0 (local input) const double * - * On entry, W0 is an array of size (M-1)*LDW+1, that contains - * the destination offset in U where the columns of W should be - * copied. - * - * W (local input) const double * - * On entry, W is an array of size (LDW,M), that contains data - * to be copied into U. For i in [0..M), entries W(:,i) should - * be copied into the row or column W0(i*LDW) of U. - * - * LDW (local input) const int - * On entry, LDW specifies the leading dimension of the array W. - * LDW must be at least MAX(1,N+1). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * w = W, * w0; - double * u0; - const int incU = (int)( (unsigned int)(LDU) << - HPL_LASWP03N_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP03N_LOG2_DEPTH ) << - HPL_LASWP03N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; - j += HPL_LASWP03N_DEPTH, U += incU, w += HPL_LASWP03N_DEPTH ) - { - for( i = 0; i < M; i++ ) - { - u0 = U + (size_t)(*( W0 + (size_t)(i) * (size_t)(LDW) )); - w0 = w + (size_t)(i) * (size_t)(LDW); - - *u0 = w0[ 0]; u0 += LDU; -#if ( HPL_LASWP03N_DEPTH > 1 ) - *u0 = w0[ 1]; u0 += LDU; -#endif -#if ( HPL_LASWP03N_DEPTH > 2 ) - *u0 = w0[ 2]; u0 += LDU; *u0 = w0[ 3]; u0 += LDU; -#endif -#if ( HPL_LASWP03N_DEPTH > 4 ) - *u0 = w0[ 4]; u0 += LDU; *u0 = w0[ 5]; u0 += LDU; - *u0 = w0[ 6]; u0 += LDU; *u0 = w0[ 7]; u0 += LDU; -#endif -#if ( HPL_LASWP03N_DEPTH > 8 ) - *u0 = w0[ 8]; u0 += LDU; *u0 = w0[ 9]; u0 += LDU; - *u0 = w0[10]; u0 += LDU; *u0 = w0[11]; u0 += LDU; - *u0 = w0[12]; u0 += LDU; *u0 = w0[13]; u0 += LDU; - *u0 = w0[14]; u0 += LDU; *u0 = w0[15]; u0 += LDU; -#endif -#if ( HPL_LASWP03N_DEPTH > 16 ) - *u0 = w0[16]; u0 += LDU; *u0 = w0[17]; u0 += LDU; - *u0 = w0[18]; u0 += LDU; *u0 = w0[19]; u0 += LDU; - *u0 = w0[20]; u0 += LDU; *u0 = w0[21]; u0 += LDU; - *u0 = w0[22]; u0 += LDU; *u0 = w0[23]; u0 += LDU; - *u0 = w0[24]; u0 += LDU; *u0 = w0[25]; u0 += LDU; - *u0 = w0[26]; u0 += LDU; *u0 = w0[27]; u0 += LDU; - *u0 = w0[28]; u0 += LDU; *u0 = w0[29]; u0 += LDU; - *u0 = w0[30]; u0 += LDU; *u0 = w0[31]; u0 += LDU; -#endif - } - } - - if( nr ) - { - for( i = 0; i < M; i++ ) - { - u0 = U + (size_t)(*( W0 + (size_t)(i) * (size_t)(LDW) )); - w0 = w + (size_t)(i) * (size_t)(LDW); - for( j = 0; j < nr; j++, u0 += LDU ) { *u0 = w0[j]; } - } - } -/* - * End of HPL_dlaswp03N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp03T.c b/hpl/src/pauxil/HPL_dlaswp03T.c deleted file mode 100644 index 858ee2fb0889302d875c7bce81e3c367136b676a..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp03T.c +++ /dev/null @@ -1,186 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP03T_DEPTH -#define HPL_LASWP03T_DEPTH 32 -#define HPL_LASWP03T_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp03T -( - const int M, - const int N, - double * U, - const int LDU, - const double * W0, - const double * W, - const int LDW -) -#else -void HPL_dlaswp03T -( M, N, U, LDU, W0, W, LDW ) - const int M; - const int N; - double * U; - const int LDU; - const double * W0; - const double * W; - const int LDW; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp03T copies columns of W into an array U. The destination - * in U of these columns contained in W is stored within W0. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of columns of W stored - * contiguously that should be copied into U. M must be at least - * zero. - * - * N (local input) const int - * On entry, N specifies the length of columns of W stored - * contiguously that should be copied into U. N must be at least - * zero. - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,M). Columns - * of W are copied within the array U at the positions specified - * in W0. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,N). - * - * W0 (local input) const double * - * On entry, W0 is an array of size (M-1)*LDW+1, that contains - * the destination offset in U where the columns of W should be - * copied. - * - * W (local input) const double * - * On entry, W is an array of size (LDW,M), that contains data - * to be copied into U. For i in [0..M), entries W(:,i) should - * be copied into the row or column W0(i*LDW) of U. - * - * LDW (local input) const int - * On entry, LDW specifies the leading dimension of the array W. - * LDW must be at least MAX(1,N+1). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * w = W, * w0; - double * u0; - const int incU = ( 1 << HPL_LASWP03T_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP03T_LOG2_DEPTH ) << - HPL_LASWP03T_LOG2_DEPTH ) ); - - for( j = 0; j < nu; - j += HPL_LASWP03T_DEPTH, U += incU, w += HPL_LASWP03T_DEPTH ) - { - for( i = 0; i < M; i++ ) - { - u0 = U + (size_t)(*(W0+(size_t)(i)*(size_t)(LDW))) * (size_t)(LDU); - w0 = w + (size_t)(i) * (size_t)(LDW); - - u0[ 0] = w0[ 0]; -#if ( HPL_LASWP03T_DEPTH > 1 ) - u0[ 1] = w0[ 1]; -#endif -#if ( HPL_LASWP03T_DEPTH > 2 ) - u0[ 2] = w0[ 2]; u0[ 3] = w0[ 3]; -#endif -#if ( HPL_LASWP03T_DEPTH > 4 ) - u0[ 4] = w0[ 4]; u0[ 5] = w0[ 5]; u0[ 6] = w0[ 6]; u0[ 7] = w0[ 7]; -#endif -#if ( HPL_LASWP03T_DEPTH > 8 ) - u0[ 8] = w0[ 8]; u0[ 9] = w0[ 9]; u0[10] = w0[10]; u0[11] = w0[11]; - u0[12] = w0[12]; u0[13] = w0[13]; u0[14] = w0[14]; u0[15] = w0[15]; -#endif -#if ( HPL_LASWP03T_DEPTH > 16 ) - u0[16] = w0[16]; u0[17] = w0[17]; u0[18] = w0[18]; u0[19] = w0[19]; - u0[20] = w0[20]; u0[21] = w0[21]; u0[22] = w0[22]; u0[23] = w0[23]; - u0[24] = w0[24]; u0[25] = w0[25]; u0[26] = w0[26]; u0[27] = w0[27]; - u0[28] = w0[28]; u0[29] = w0[29]; u0[30] = w0[30]; u0[31] = w0[31]; -#endif - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - u0 = U + (size_t)(*(W0+(size_t)(i)*(size_t)(LDW))) * (size_t)(LDU); - w0 = w + (size_t)(i) * (size_t)(LDW); - for( j = 0; j < nr; j++ ) { u0[j] = w0[j]; } - } - } -/* - * End of HPL_dlaswp03T - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp04N.c b/hpl/src/pauxil/HPL_dlaswp04N.c deleted file mode 100644 index 043ac98e3fd6187d252133fe7d47483e50201bbc..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp04N.c +++ /dev/null @@ -1,285 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP04N_DEPTH -#define HPL_LASWP04N_DEPTH 32 -#define HPL_LASWP04N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp04N -( - const int M0, - const int M1, - const int N, - double * U, - const int LDU, - double * A, - const int LDA, - const double * W0, - const double * W, - const int LDW, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp04N -( M0, M1, N, U, LDU, A, LDA, W0, W, LDW, LINDXA, LINDXAU ) - const int M0; - const int M1; - const int N; - double * U; - const int LDU; - double * A; - const int LDA; - const double * W0; - const double * W; - const int LDW; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp04N copies M0 rows of U into A and replaces those rows of U - * with columns of W. In addition M1 - M0 columns of W are copied into - * rows of U. - * - * Arguments - * ========= - * - * M0 (local input) const int - * On entry, M0 specifies the number of rows of U that should be - * copied into A and replaced by columns of W. M0 must be at - * least zero. - * - * M1 (local input) const int - * On entry, M1 specifies the number of columns of W that should - * be copied into rows of U. M1 must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the rows of U that should - * be copied into A. N must be at least zero. - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,N). This - * array contains the rows that are to be copied into A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,M1). - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * rows of U indicated by LINDXAU. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M0). - * - * W0 (local input) const double * - * On entry, W0 is an array of size (M-1)*LDW+1, that contains - * the destination offset in U where the columns of W should be - * copied. - * - * W (local input) const double * - * On entry, W is an array of size (LDW,M0+M1), that contains - * data to be copied into U. For i in [M0..M0+M1), the entries - * W(:,i) are copied into the row W0(i*LDW) of U. - * - * LDW (local input) const int - * On entry, LDW specifies the leading dimension of the array W. - * LDW must be at least MAX(1,N+1). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M0 containing the - * local row indexes A into which rows of U are copied. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M0 that contains - * the local row indexes of U that should be copied into A and - * replaced by the columns of W. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * w = W, * w0; - double * a0, * u0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP04N_LOG2_DEPTH ), - incU = (int)( (unsigned int)(LDU) << - HPL_LASWP04N_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( ( M0 <= 0 ) && ( M1 <= 0 ) ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP04N_LOG2_DEPTH ) << - HPL_LASWP04N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP04N_DEPTH, A += incA, U += incU, - w += HPL_LASWP04N_DEPTH ) - { - for( i = 0; i < M0; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - u0 = U + (size_t)(LINDXAU[i]); - w0 = w + (size_t)(i) * (size_t)(LDW); - - *a0 = *u0; *u0 = w0[ 0]; a0 += LDA; u0 += LDU; -#if ( HPL_LASWP04N_DEPTH > 1 ) - *a0 = *u0; *u0 = w0[ 1]; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 2 ) - *a0 = *u0; *u0 = w0[ 2]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[ 3]; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 4 ) - *a0 = *u0; *u0 = w0[ 4]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[ 5]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[ 6]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[ 7]; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 8 ) - *a0 = *u0; *u0 = w0[ 8]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[ 9]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[10]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[11]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[12]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[13]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[14]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[15]; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 16 ) - *a0 = *u0; *u0 = w0[16]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[17]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[18]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[19]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[20]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[21]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[22]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[23]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[24]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[25]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[26]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[27]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[28]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[29]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[30]; a0 += LDA; u0 += LDU; - *a0 = *u0; *u0 = w0[31]; a0 += LDA; u0 += LDU; -#endif - } - - for( i = M0; i < M1; i++ ) - { - u0 = U + (size_t)(*(W0+(size_t)(i)*(size_t)(LDW))); - w0 = w + (size_t)(i) * (size_t)(LDW); - - *u0 = w0[ 0]; u0 += LDU; -#if ( HPL_LASWP04N_DEPTH > 1 ) - *u0 = w0[ 1]; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 2 ) - *u0 = w0[ 2]; u0 += LDU; *u0 = w0[ 3]; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 4 ) - *u0 = w0[ 4]; u0 += LDU; *u0 = w0[ 5]; u0 += LDU; - *u0 = w0[ 6]; u0 += LDU; *u0 = w0[ 7]; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 8 ) - *u0 = w0[ 8]; u0 += LDU; *u0 = w0[ 9]; u0 += LDU; - *u0 = w0[10]; u0 += LDU; *u0 = w0[11]; u0 += LDU; - *u0 = w0[12]; u0 += LDU; *u0 = w0[13]; u0 += LDU; - *u0 = w0[14]; u0 += LDU; *u0 = w0[15]; u0 += LDU; -#endif -#if ( HPL_LASWP04N_DEPTH > 16 ) - *u0 = w0[16]; u0 += LDU; *u0 = w0[17]; u0 += LDU; - *u0 = w0[18]; u0 += LDU; *u0 = w0[19]; u0 += LDU; - *u0 = w0[20]; u0 += LDU; *u0 = w0[21]; u0 += LDU; - *u0 = w0[22]; u0 += LDU; *u0 = w0[23]; u0 += LDU; - *u0 = w0[24]; u0 += LDU; *u0 = w0[25]; u0 += LDU; - *u0 = w0[26]; u0 += LDU; *u0 = w0[27]; u0 += LDU; - *u0 = w0[28]; u0 += LDU; *u0 = w0[29]; u0 += LDU; - *u0 = w0[30]; u0 += LDU; *u0 = w0[31]; u0 += LDU; -#endif - } - } - - if( nr ) - { - for( i = 0; i < M0; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - u0 = U + (size_t)(LINDXAU[i]); - w0 = w + (size_t)(i) * (size_t)(LDW); - for( j = 0; j < nr; j++, a0 += LDA, u0 += LDU ) - { *a0 = *u0; *u0 = w0[j]; } - } - for( i = M0; i < M1; i++ ) - { - u0 = U + (size_t)(*(W0+(size_t)(i)*(size_t)(LDW))); - w0 = w + (size_t)(i) * (size_t)(LDW); - for( j = 0; j < nr; j++, u0 += LDU ) { *u0 = w0[j]; } - } - } -/* - * End of HPL_dlaswp04N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp04T.c b/hpl/src/pauxil/HPL_dlaswp04T.c deleted file mode 100644 index 2808f087854f39378c15ec98bd82ed06a0ddf8a7..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp04T.c +++ /dev/null @@ -1,270 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP04T_DEPTH -#define HPL_LASWP04T_DEPTH 32 -#define HPL_LASWP04T_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp04T -( - const int M0, - const int M1, - const int N, - double * U, - const int LDU, - double * A, - const int LDA, - const double * W0, - const double * W, - const int LDW, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp04T -( M0, M1, N, U, LDU, A, LDA, W0, W, LDW, LINDXA, LINDXAU ) - const int M0; - const int M1; - const int N; - double * U; - const int LDU; - double * A; - const int LDA; - const double * W0; - const double * W; - const int LDW; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp04T copies M0 columns of U into rows of A and replaces those - * columns of U with columns of W. In addition M1 - M0 columns of W are - * copied into U. - * - * Arguments - * ========= - * - * M0 (local input) const int - * On entry, M0 specifies the number of columns of U that should - * be copied into A and replaced by columns of W. M0 must be at - * least zero. - * - * M1 (local input) const int - * On entry, M1 specifies the number of columnns of W that will - * be copied into U. M1 must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the columns of U that - * will be copied into rows of A. N must be at least zero. - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,*). This - * array contains the columns that are to be copied into rows of - * A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,N). - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * columns of U indicated by LINDXAU. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M0). - * - * W0 (local input) const double * - * On entry, W0 is an array of size (M-1)*LDW+1, that contains - * the destination offset in U where the columns of W should be - * copied. - * - * W (local input) const double * - * On entry, W is an array of size (LDW,M0+M1), that contains - * data to be copied into U. For i in [M0..M0+M1), the entries - * W(:,i) are copied into the column W0(i*LDW) of U. - * - * LDW (local input) const int - * On entry, LDW specifies the leading dimension of the array W. - * LDW must be at least MAX(1,N+1). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M0 containing the - * local row indexes A into which columns of U are copied. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M0 that contains - * the local column indexes of U that should be copied into A - * and replaced by the columns of W. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * w = W, * w0; - double * a0, * u0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP04T_LOG2_DEPTH ), - incU = ( 1 << HPL_LASWP04T_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( ( M0 <= 0 ) && ( M1 <= 0 ) ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP04T_LOG2_DEPTH ) << - HPL_LASWP04T_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP04T_DEPTH, A += incA, U += incU, - w += HPL_LASWP04T_DEPTH ) - { - for( i = 0; i < M0; i++ ) - { - a0 = A + LINDXA[i]; u0 = U + LINDXAU[i] * LDU; w0 = w + i * LDW; - - *a0 = u0[ 0]; u0[ 0] = w0[ 0]; a0 += LDA; -#if ( HPL_LASWP04T_DEPTH > 1 ) - *a0 = u0[ 1]; u0[ 1] = w0[ 1]; a0 += LDA; -#endif -#if ( HPL_LASWP04T_DEPTH > 2 ) - *a0 = u0[ 2]; u0[ 2] = w0[ 2]; a0 += LDA; - *a0 = u0[ 3]; u0[ 3] = w0[ 3]; a0 += LDA; -#endif -#if ( HPL_LASWP04T_DEPTH > 4 ) - *a0 = u0[ 4]; u0[ 4] = w0[ 4]; a0 += LDA; - *a0 = u0[ 5]; u0[ 5] = w0[ 5]; a0 += LDA; - *a0 = u0[ 6]; u0[ 6] = w0[ 6]; a0 += LDA; - *a0 = u0[ 7]; u0[ 7] = w0[ 7]; a0 += LDA; -#endif -#if ( HPL_LASWP04T_DEPTH > 8 ) - *a0 = u0[ 8]; u0[ 8] = w0[ 8]; a0 += LDA; - *a0 = u0[ 9]; u0[ 9] = w0[ 9]; a0 += LDA; - *a0 = u0[10]; u0[10] = w0[10]; a0 += LDA; - *a0 = u0[11]; u0[11] = w0[11]; a0 += LDA; - *a0 = u0[12]; u0[12] = w0[12]; a0 += LDA; - *a0 = u0[13]; u0[13] = w0[13]; a0 += LDA; - *a0 = u0[14]; u0[14] = w0[14]; a0 += LDA; - *a0 = u0[15]; u0[15] = w0[15]; a0 += LDA; -#endif -#if ( HPL_LASWP04T_DEPTH > 16 ) - *a0 = u0[16]; u0[16] = w0[16]; a0 += LDA; - *a0 = u0[17]; u0[17] = w0[17]; a0 += LDA; - *a0 = u0[18]; u0[18] = w0[18]; a0 += LDA; - *a0 = u0[19]; u0[19] = w0[19]; a0 += LDA; - *a0 = u0[20]; u0[20] = w0[20]; a0 += LDA; - *a0 = u0[21]; u0[21] = w0[21]; a0 += LDA; - *a0 = u0[22]; u0[22] = w0[22]; a0 += LDA; - *a0 = u0[23]; u0[23] = w0[23]; a0 += LDA; - *a0 = u0[24]; u0[24] = w0[24]; a0 += LDA; - *a0 = u0[25]; u0[25] = w0[25]; a0 += LDA; - *a0 = u0[26]; u0[26] = w0[26]; a0 += LDA; - *a0 = u0[27]; u0[27] = w0[27]; a0 += LDA; - *a0 = u0[28]; u0[28] = w0[28]; a0 += LDA; - *a0 = u0[29]; u0[29] = w0[29]; a0 += LDA; - *a0 = u0[30]; u0[30] = w0[30]; a0 += LDA; - *a0 = u0[31]; u0[31] = w0[31]; a0 += LDA; -#endif - } - for( i = M0; i < M1; i++ ) - { - u0 = U + (int)(*(W0+i*LDW)) * LDU; w0 = w + i * LDW; - - u0[ 0] = w0[ 0]; -#if ( HPL_LASWP04T_DEPTH > 1 ) - u0[ 1] = w0[ 1]; -#endif -#if ( HPL_LASWP04T_DEPTH > 2 ) - u0[ 2] = w0[ 2]; u0[ 3] = w0[ 3]; -#endif -#if ( HPL_LASWP04T_DEPTH > 4 ) - u0[ 4] = w0[ 4]; u0[ 5] = w0[ 5]; u0[ 6] = w0[ 6]; u0[ 7] = w0[ 7]; -#endif -#if ( HPL_LASWP04T_DEPTH > 8 ) - u0[ 8] = w0[ 8]; u0[ 9] = w0[ 9]; u0[10] = w0[10]; u0[11] = w0[11]; - u0[12] = w0[12]; u0[13] = w0[13]; u0[14] = w0[14]; u0[15] = w0[15]; -#endif -#if ( HPL_LASWP04T_DEPTH > 16 ) - u0[16] = w0[16]; u0[17] = w0[17]; u0[18] = w0[18]; u0[19] = w0[19]; - u0[20] = w0[20]; u0[21] = w0[21]; u0[22] = w0[22]; u0[23] = w0[23]; - u0[24] = w0[24]; u0[25] = w0[25]; u0[26] = w0[26]; u0[27] = w0[27]; - u0[28] = w0[28]; u0[29] = w0[29]; u0[30] = w0[30]; u0[31] = w0[31]; -#endif - } - } - - if( nr > 0 ) - { - for( i = 0; i < M0; i++ ) - { - a0 = A + LINDXA[i]; u0 = U + LINDXAU[i] * LDU; w0 = w + i * LDW; - for( j = 0; j < nr; j++, a0 += LDA ) { *a0 = u0[j]; u0[j] = w0[j]; } - } - for( i = M0; i < M1; i++ ) - { - u0 = U + (int)(*(W0+i*LDW)) * LDU; w0 = w + i * LDW; - for( j = 0; j < nr; j++ ) { u0[j] = w0[j]; } - } - } -/* - * End of HPL_dlaswp04T - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp05N.c b/hpl/src/pauxil/HPL_dlaswp05N.c deleted file mode 100644 index bb9b3e6b8f939434aa647f2b15e0467b10c1ed0d..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp05N.c +++ /dev/null @@ -1,195 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP05N_DEPTH -#define HPL_LASWP05N_DEPTH 32 -#define HPL_LASWP05N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp05N -( - const int M, - const int N, - double * A, - const int LDA, - const double * U, - const int LDU, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp05N -( M, N, A, LDA, U, LDU, LINDXA, LINDXAU ) - const int M; - const int N; - double * A; - const int LDA; - const double * U; - const int LDU; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp05N copies rows of U of global offset LINDXAU into rows of - * A at positions indicated by LINDXA. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of U that should be - * copied into A. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the rows of U that should - * be copied into A. N must be at least zero. - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * rows of U indicated by LINDXAU. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) const double * - * On entry, U points to an array of dimension (LDU,N). This - * array contains the rows that are to be copied into A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,M). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be copied from U. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M that contains - * the local row indexes of U that should be copied in A. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * U0 = U, * u0; - double * a0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP05N_LOG2_DEPTH ), - incU = (int)( (unsigned int)(LDU) << - HPL_LASWP05N_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP05N_LOG2_DEPTH ) << - HPL_LASWP05N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP05N_DEPTH, A += incA, U0 += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); u0 = U0 + (size_t)(LINDXAU[i]); - - *a0 = *u0; a0 += LDA; u0 += LDU; -#if ( HPL_LASWP05N_DEPTH > 1 ) - *a0 = *u0; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP05N_DEPTH > 2 ) - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP05N_DEPTH > 4 ) - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP05N_DEPTH > 8 ) - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP05N_DEPTH > 16 ) - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; - *a0 = *u0; a0 += LDA; u0 += LDU; *a0 = *u0; a0 += LDA; u0 += LDU; -#endif - } - } - - if( nr ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); u0 = U0 + (size_t)(LINDXAU[i]); - for( j = 0; j < nr; j++, a0 += LDA, u0 += LDU ) { *a0 = *u0; } - } - } -/* - * End of HPL_dlaswp05N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp05T.c b/hpl/src/pauxil/HPL_dlaswp05T.c deleted file mode 100644 index 9ed4530c1087d667029f92ee75c3eac30e7e2526..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp05T.c +++ /dev/null @@ -1,196 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP05T_DEPTH -#define HPL_LASWP05T_DEPTH 32 -#define HPL_LASWP05T_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp05T -( - const int M, - const int N, - double * A, - const int LDA, - const double * U, - const int LDU, - const int * LINDXA, - const int * LINDXAU -) -#else -void HPL_dlaswp05T -( M, N, A, LDA, U, LDU, LINDXA, LINDXAU ) - const int M; - const int N; - double * A; - const int LDA; - const double * U; - const int LDU; - const int * LINDXA; - const int * LINDXAU; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp05T copies columns of U of global offset LINDXAU into rows - * of A at positions indicated by LINDXA. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of columns of U that shouldbe copied into A. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the columns of U that will - * be copied into rows of A. N must be at least zero. - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * columns of U indicated by LINDXAU. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) const double * - * On entry, U points to an array of dimension (LDU,*). This - * array contains the columns that are to be copied into rows of - * A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,N). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be copied from U. - * - * LINDXAU (local input) const int * - * On entry, LINDXAU is an array of dimension M that contains - * the local column indexes of U that should be copied in A. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - const double * U0 = U, * u0; - double * a0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP05T_LOG2_DEPTH ), - incU = ( 1 << HPL_LASWP05T_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP05T_LOG2_DEPTH ) << - HPL_LASWP05T_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP05T_DEPTH, A += incA, U0 += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[ i]); - u0 = U0 + (size_t)(LINDXAU[i]) * (size_t)(LDU); - - *a0 = u0[ 0]; a0 += LDA; -#if ( HPL_LASWP05T_DEPTH > 1 ) - *a0 = u0[ 1]; a0 += LDA; -#endif -#if ( HPL_LASWP05T_DEPTH > 2 ) - *a0 = u0[ 2]; a0 += LDA; *a0 = u0[ 3]; a0 += LDA; -#endif -#if ( HPL_LASWP05T_DEPTH > 4 ) - *a0 = u0[ 4]; a0 += LDA; *a0 = u0[ 5]; a0 += LDA; - *a0 = u0[ 6]; a0 += LDA; *a0 = u0[ 7]; a0 += LDA; -#endif -#if ( HPL_LASWP05T_DEPTH > 8 ) - *a0 = u0[ 8]; a0 += LDA; *a0 = u0[ 9]; a0 += LDA; - *a0 = u0[10]; a0 += LDA; *a0 = u0[11]; a0 += LDA; - *a0 = u0[12]; a0 += LDA; *a0 = u0[13]; a0 += LDA; - *a0 = u0[14]; a0 += LDA; *a0 = u0[15]; a0 += LDA; -#endif -#if ( HPL_LASWP05T_DEPTH > 16 ) - *a0 = u0[16]; a0 += LDA; *a0 = u0[17]; a0 += LDA; - *a0 = u0[18]; a0 += LDA; *a0 = u0[19]; a0 += LDA; - *a0 = u0[20]; a0 += LDA; *a0 = u0[21]; a0 += LDA; - *a0 = u0[22]; a0 += LDA; *a0 = u0[23]; a0 += LDA; - *a0 = u0[24]; a0 += LDA; *a0 = u0[25]; a0 += LDA; - *a0 = u0[26]; a0 += LDA; *a0 = u0[27]; a0 += LDA; - *a0 = u0[28]; a0 += LDA; *a0 = u0[29]; a0 += LDA; - *a0 = u0[30]; a0 += LDA; *a0 = u0[31]; a0 += LDA; -#endif - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[ i]); - u0 = U0 + (size_t)(LINDXAU[i]) * (size_t)(LDU); - for( j = 0; j < nr; j++, a0 += LDA ) { *a0 = u0[j]; } - } - } -/* - * End of HPL_dlaswp05T - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp06N.c b/hpl/src/pauxil/HPL_dlaswp06N.c deleted file mode 100644 index 85affd84a6ccc29597725db4c548ba22e39912f2..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp06N.c +++ /dev/null @@ -1,206 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP06N_DEPTH -#define HPL_LASWP06N_DEPTH 32 -#define HPL_LASWP06N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp06N -( - const int M, - const int N, - double * A, - const int LDA, - double * U, - const int LDU, - const int * LINDXA -) -#else -void HPL_dlaswp06N -( M, N, A, LDA, U, LDU, LINDXA ) - const int M; - const int N; - double * A; - const int LDA; - double * U; - const int LDU; - const int * LINDXA; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp06N swaps rows of U with rows of A at positions - * indicated by LINDXA. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A that should be - * swapped with rows of U. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the rows of A that should - * be swapped with rows of U. N must be at least zero. - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * rows or columns of U. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,N). This - * array contains the rows of U that are to be swapped with rows - * of A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,M). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be swapped with U. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double r; - double * U0 = U, * a0, * u0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP06N_LOG2_DEPTH ), - incU = (int)( (unsigned int)(LDU) << - HPL_LASWP06N_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP06N_LOG2_DEPTH ) << - HPL_LASWP06N_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP06N_DEPTH, A += incA, U0 += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); u0 = U0 + (size_t)(i); - - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#if ( HPL_LASWP06N_DEPTH > 1 ) - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP06N_DEPTH > 2 ) - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP06N_DEPTH > 4 ) - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP06N_DEPTH > 8 ) - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#endif -#if ( HPL_LASWP06N_DEPTH > 16 ) - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; - r = *a0; *a0 = *u0; *u0 = r; a0 += LDA; u0 += LDU; -#endif - } - } - - if( nr ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); u0 = U0 + (size_t)(i); - for( j = 0; j < nr; j++, a0 += LDA, u0 += LDU ) - { r = *a0; *a0 = *u0; *u0 = r; } - } - } -/* - * End of HPL_dlaswp06N - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp06T.c b/hpl/src/pauxil/HPL_dlaswp06T.c deleted file mode 100644 index 10d1136d823266153f3d90ba0759ad27e748d59e..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp06T.c +++ /dev/null @@ -1,207 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP06T_DEPTH -#define HPL_LASWP06T_DEPTH 32 -#define HPL_LASWP06T_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp06T -( - const int M, - const int N, - double * A, - const int LDA, - double * U, - const int LDU, - const int * LINDXA -) -#else -void HPL_dlaswp06T -( M, N, A, LDA, U, LDU, LINDXA ) - const int M; - const int N; - double * A; - const int LDA; - double * U; - const int LDU; - const int * LINDXA; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp06T swaps columns of U with rows of A at positions - * indicated by LINDXA. - * - * Arguments - * ========= - * - * M (local input) const int - * On entry, M specifies the number of rows of A that should be - * swapped with columns of U. M must be at least zero. - * - * N (local input) const int - * On entry, N specifies the length of the rows of A that should - * be swapped with columns of U. N must be at least zero. - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * the rows of this array specified by LINDXA are replaced by - * columns of U. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * U (local input/output) double * - * On entry, U points to an array of dimension (LDU,*). This - * array contains the columns of U that are to be swapped with - * rows of A. - * - * LDU (local input) const int - * On entry, LDU specifies the leading dimension of the array U. - * LDU must be at least MAX(1,N). - * - * LINDXA (local input) const int * - * On entry, LINDXA is an array of dimension M that contains the - * local row indexes of A that should be swapped with U. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double r; - double * U0 = U, * a0, * u0; - const int incA = (int)( (unsigned int)(LDA) << - HPL_LASWP06T_LOG2_DEPTH ), - incU = ( 1 << HPL_LASWP06T_LOG2_DEPTH ); - int nr, nu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - nr = N - ( nu = (int)( ( (unsigned int)(N) >> HPL_LASWP06T_LOG2_DEPTH ) << - HPL_LASWP06T_LOG2_DEPTH ) ); - - for( j = 0; j < nu; j += HPL_LASWP06T_DEPTH, A += incA, U0 += incU ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - u0 = U0 + (size_t)(i) * (size_t)(LDU); - - r = *a0; *a0 = u0[ 0]; u0[ 0] = r; a0 += LDA; -#if ( HPL_LASWP06T_DEPTH > 1 ) - r = *a0; *a0 = u0[ 1]; u0[ 1] = r; a0 += LDA; -#endif -#if ( HPL_LASWP06T_DEPTH > 2 ) - r = *a0; *a0 = u0[ 2]; u0[ 2] = r; a0 += LDA; - r = *a0; *a0 = u0[ 3]; u0[ 3] = r; a0 += LDA; -#endif -#if ( HPL_LASWP06T_DEPTH > 4 ) - r = *a0; *a0 = u0[ 4]; u0[ 4] = r; a0 += LDA; - r = *a0; *a0 = u0[ 5]; u0[ 5] = r; a0 += LDA; - r = *a0; *a0 = u0[ 6]; u0[ 6] = r; a0 += LDA; - r = *a0; *a0 = u0[ 7]; u0[ 7] = r; a0 += LDA; -#endif -#if ( HPL_LASWP06T_DEPTH > 8 ) - r = *a0; *a0 = u0[ 8]; u0[ 8] = r; a0 += LDA; - r = *a0; *a0 = u0[ 9]; u0[ 9] = r; a0 += LDA; - r = *a0; *a0 = u0[10]; u0[10] = r; a0 += LDA; - r = *a0; *a0 = u0[11]; u0[11] = r; a0 += LDA; - r = *a0; *a0 = u0[12]; u0[12] = r; a0 += LDA; - r = *a0; *a0 = u0[13]; u0[13] = r; a0 += LDA; - r = *a0; *a0 = u0[14]; u0[14] = r; a0 += LDA; - r = *a0; *a0 = u0[15]; u0[15] = r; a0 += LDA; -#endif -#if ( HPL_LASWP06T_DEPTH > 16 ) - r = *a0; *a0 = u0[16]; u0[16] = r; a0 += LDA; - r = *a0; *a0 = u0[17]; u0[17] = r; a0 += LDA; - r = *a0; *a0 = u0[18]; u0[18] = r; a0 += LDA; - r = *a0; *a0 = u0[19]; u0[19] = r; a0 += LDA; - r = *a0; *a0 = u0[20]; u0[20] = r; a0 += LDA; - r = *a0; *a0 = u0[21]; u0[21] = r; a0 += LDA; - r = *a0; *a0 = u0[22]; u0[22] = r; a0 += LDA; - r = *a0; *a0 = u0[23]; u0[23] = r; a0 += LDA; - r = *a0; *a0 = u0[24]; u0[24] = r; a0 += LDA; - r = *a0; *a0 = u0[25]; u0[25] = r; a0 += LDA; - r = *a0; *a0 = u0[26]; u0[26] = r; a0 += LDA; - r = *a0; *a0 = u0[27]; u0[27] = r; a0 += LDA; - r = *a0; *a0 = u0[28]; u0[28] = r; a0 += LDA; - r = *a0; *a0 = u0[29]; u0[29] = r; a0 += LDA; - r = *a0; *a0 = u0[30]; u0[30] = r; a0 += LDA; - r = *a0; *a0 = u0[31]; u0[31] = r; a0 += LDA; -#endif - } - } - - if( nr > 0 ) - { - for( i = 0; i < M; i++ ) - { - a0 = A + (size_t)(LINDXA[i]); - u0 = U0 + (size_t)(i) * (size_t)(LDU); - for( j = 0; j < nr; j++, a0 += LDA ) - { r = *a0; *a0 = u0[j]; u0[j] = r; } - } - } -/* - * End of HPL_dlaswp06T - */ -} diff --git a/hpl/src/pauxil/HPL_dlaswp10N.c b/hpl/src/pauxil/HPL_dlaswp10N.c deleted file mode 100644 index c0c8d4e3cdf3c14539eaec4cbdd37599afe8ecd2..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_dlaswp10N.c +++ /dev/null @@ -1,186 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LASWP10N_DEPTH -#define HPL_LASWP10N_DEPTH 32 -#define HPL_LASWP10N_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlaswp10N -( - const int M, - const int N, - double * A, - const int LDA, - const int * IPIV -) -#else -void HPL_dlaswp10N -( M, N, A, LDA, IPIV ) - const int M; - const int N; - double * A; - const int LDA; - const int * IPIV; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlaswp10N performs a sequence of local column interchanges on a - * matrix A. One column interchange is initiated for columns 0 through - * N-1 of A. - * - * Arguments - * ========= - * - * M (local input) const int - * __arg0__ - * - * N (local input) const int - * On entry, M specifies the number of rows of the array A. M - * must be at least zero. - * - * A (local input/output) double * - * On entry, N specifies the number of columns of the array A. N - * must be at least zero. - * - * LDA (local input) const int - * On entry, A points to an array of dimension (LDA,N). This - * array contains the columns onto which the interchanges should - * be applied. On exit, A contains the permuted matrix. - * - * IPIV (local input) const int * - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least MAX(1,M). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double r; - double * a0, * a1; - const int incA = ( 1 << HPL_LASWP10N_LOG2_DEPTH ); - int jp, mr, mu; - register int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; - - mr = M - ( mu = (int)( ( (unsigned int)(M) >> HPL_LASWP10N_LOG2_DEPTH ) - << HPL_LASWP10N_LOG2_DEPTH ) ); - - for( j = 0; j < N; j++ ) - { - if( j != ( jp = IPIV[j] ) ) - { - a0 = A + j * LDA; a1 = A + jp * LDA; - - for( i = 0; i < mu; i += incA, a0 += incA, a1 += incA ) - { - r = *a0; *a0 = *a1; *a1 = r; -#if ( HPL_LASWP10N_DEPTH > 1 ) - r = a0[ 1]; a0[ 1] = a1[ 1]; a1[ 1] = r; -#endif -#if ( HPL_LASWP10N_DEPTH > 2 ) - r = a0[ 2]; a0[ 2] = a1[ 2]; a1[ 2] = r; - r = a0[ 3]; a0[ 3] = a1[ 3]; a1[ 3] = r; -#endif -#if ( HPL_LASWP10N_DEPTH > 4 ) - r = a0[ 4]; a0[ 4] = a1[ 4]; a1[ 4] = r; - r = a0[ 5]; a0[ 5] = a1[ 5]; a1[ 5] = r; - r = a0[ 6]; a0[ 6] = a1[ 6]; a1[ 6] = r; - r = a0[ 7]; a0[ 7] = a1[ 7]; a1[ 7] = r; -#endif -#if ( HPL_LASWP10N_DEPTH > 8 ) - r = a0[ 8]; a0[ 8] = a1[ 8]; a1[ 8] = r; - r = a0[ 9]; a0[ 9] = a1[ 9]; a1[ 9] = r; - r = a0[10]; a0[10] = a1[10]; a1[10] = r; - r = a0[11]; a0[11] = a1[11]; a1[11] = r; - r = a0[12]; a0[12] = a1[12]; a1[12] = r; - r = a0[13]; a0[13] = a1[13]; a1[13] = r; - r = a0[14]; a0[14] = a1[14]; a1[14] = r; - r = a0[15]; a0[15] = a1[15]; a1[15] = r; -#endif -#if ( HPL_LASWP10N_DEPTH > 16 ) - r = a0[16]; a0[16] = a1[16]; a1[16] = r; - r = a0[17]; a0[17] = a1[17]; a1[17] = r; - r = a0[18]; a0[18] = a1[18]; a1[18] = r; - r = a0[19]; a0[19] = a1[19]; a1[19] = r; - r = a0[20]; a0[20] = a1[20]; a1[20] = r; - r = a0[21]; a0[21] = a1[21]; a1[21] = r; - r = a0[22]; a0[22] = a1[22]; a1[22] = r; - r = a0[23]; a0[23] = a1[23]; a1[23] = r; - r = a0[24]; a0[24] = a1[24]; a1[24] = r; - r = a0[25]; a0[25] = a1[25]; a1[25] = r; - r = a0[26]; a0[26] = a1[26]; a1[26] = r; - r = a0[27]; a0[27] = a1[27]; a1[27] = r; - r = a0[28]; a0[28] = a1[28]; a1[28] = r; - r = a0[29]; a0[29] = a1[29]; a1[29] = r; - r = a0[30]; a0[30] = a1[30]; a1[30] = r; - r = a0[31]; a0[31] = a1[31]; a1[31] = r; -#endif - } - - for( i = 0; i < mr; i++ ) - { r = a0[i]; a0[i] = a1[i]; a1[i] = r; } - } - } -/* - * End of HPL_dlaswp10N - */ -} diff --git a/hpl/src/pauxil/HPL_indxg2l.c b/hpl/src/pauxil/HPL_indxg2l.c deleted file mode 100644 index f1fd2b7dc365dcee67752edbc7cbe7508d2e3545..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_indxg2l.c +++ /dev/null @@ -1,151 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_indxg2l -( - const int IG, - const int INB, - const int NB, - const int SRCPROC, - const int NPROCS -) -#else -int HPL_indxg2l -( IG, INB, NB, SRCPROC, NPROCS ) - const int IG; - const int INB; - const int NB; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_indxg2l computes the local index of a matrix entry pointed to by - * the global index IG. This local returned index is the same in all - * processes. - * - * Arguments - * ========= - * - * IG (input) const int - * On entry, IG specifies the global index of the matrix entry. - * IG must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of the - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix. NB must be larger than one. - * - * SRCPROC (input) const int - * On entry, if SRCPROC = -1, the data is not distributed but - * replicated, in which case this routine returns IG in all - * processes. Otherwise, the value of SRCPROC is ignored. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process rows - * or columns over which the matrix is distributed. NPROCS must - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( IG < INB ) || ( SRCPROC == -1 ) || ( NPROCS == 1 ) ) -/* - * IG belongs to the first block, or the data is not distributed, or - * there is just one process in this dimension of the grid. - */ - return( IG ); -/* - * IG = INB - NB + ( l * NPROCS + MYROC ) * NB + X with 0 <= X < NB, - * thus IG is to be found in the block (IG-INB+NB) / NB = l*NPROCS+MYROC - * with 0 <= MYROC < NPROCS. The local index to be returned depends on - * whether IG resides in the process owning the first partial block of - * size INB (MYROC=0). To determine this cheaply, let i = (IG-INB) / NB, - * so that if NPROCS divides i+1, i.e. MYROC=0, we have i+1 = l*NPROCS. - * If we set j = i / NPROCS, it follows that j = l-1. Therefore, i+1 is - * equal to (j+1) * NPROCS. Conversely, if NPROCS does not divide i+1, - * then i+1 = l*NPROCS + MYROC with 1 <= MYROC < NPROCS. It follows that - * j=l and thus (j+1)*NPROCS > i+1. - */ - j = ( i = ( IG - INB ) / NB ) / NPROCS; -/* - * When IG resides in the process owning the first partial block of size - * INB (MYROC = 0), then the result IL can be written as: - * IL = INB - NB + l * NB + X = IG + ( l - (l * NPROCS + MYROC) ) * NB. - * Using the above notation, we have i+1 = l*NPROCS + MYROC = l*NPROCS, - * i.e l = ( i+1 ) / NPROCS = j+1, since NPROCS divides i+1, therefore - * IL = IG + ( j + 1 - ( i + 1 ) ) * NB. - * - * Otherwise when MYROC >= 1, the result IL can be written as: - * IL = l * NB + X = IG - INB + ( ( l+1 ) - ( l * NPROCS + MYROC ) )*NB. - * We still have i+1 = l*NPROCS+MYROC. Since NPROCS does not divide i+1, - * we have j = (l*NPROCS+MYROC-1) / NPROCS = l, i.e - * IL = IG - INB + ( j + 1 - ( i + 1 ) ) * NB. - */ - return( NB * (j - i) + - ( ( i + 1 - ( j + 1 )*NPROCS ) ? IG - INB : IG ) ); -/* - * End of HPL_indxg2l - */ -} diff --git a/hpl/src/pauxil/HPL_indxg2lp.c b/hpl/src/pauxil/HPL_indxg2lp.c deleted file mode 100644 index 41eea3f8aabe111a2e6559f6888b412b309de6ed..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_indxg2lp.c +++ /dev/null @@ -1,176 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_indxg2lp -( - int * IL, - int * PROC, - const int IG, - const int INB, - const int NB, - const int SRCPROC, - const int NPROCS -) -#else -void HPL_indxg2lp -( IL, PROC, IG, INB, NB, SRCPROC, NPROCS ) - int * IL; - int * PROC; - const int IG; - const int INB; - const int NB; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_indxg2lp computes the local index of a matrix entry pointed to by - * the global index IG as well as the process coordinate which posseses - * this entry. The local returned index is the same in all processes. - * - * Arguments - * ========= - * - * IL (output) int * - * On exit, IL specifies the local index corresponding to IG. IL - * is at least zero. - * - * PROC (output) int * - * On exit, PROC is the coordinate of the process owning the - * entry specified by the global index IG. PROC is at least zero - * and less than NPROCS. - * - * IG (input) const int - * On entry, IG specifies the global index of the matrix entry. - * IG must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of the - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * SRCPROC (input) const int - * On entry, if SRCPROC = -1, the data is not distributed but - * replicated, in which case this routine returns IG in all - * processes. Otherwise, the value of SRCPROC is ignored. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process rows - * or columns over which the matrix is distributed. NPROCS must - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, j; -/* .. - * .. Executable Statements .. - */ - if( ( IG < INB ) || ( SRCPROC == -1 ) || ( NPROCS == 1 ) ) - { -/* - * IG belongs to the first block, or the data is not distributed, or - * there is just one process in this dimension of the grid. - */ - *IL = IG; - *PROC = SRCPROC; - } - else - { -/* - * IG = INB - NB + ( l * NPROCS + MYROC ) * NB + X with 0 <= X < NB, - * thus IG is to be found in the block (IG-INB+NB) / NB = l*NPROCS+MYROC - * with 0 <= MYROC < NPROCS. The local index to be returned depends on - * whether IG resides in the process owning the first partial block of - * size INB (MYROC=0). To determine this cheaply, let i = (IG-INB) / NB, - * so that if NPROCS divides i+1, i.e. MYROC=0, we have i+1 = l*NPROCS. - * If we set j = i / NPROCS, it follows that j = l-1. Therefore, i+1 is - * equal to (j+1) * NPROCS. Conversely, if NPROCS does not divide i+1, - * then i+1 = l*NPROCS + MYROC with 1 <= MYROC < NPROCS. It follows that - * j=l and thus (j+1)*NPROCS > i+1. - */ - j = ( i = ( IG - INB ) / NB ) / NPROCS; -/* - * IG is in block 1 + ( IG - INB ) / NB. Add this to SRCPROC and take - * the NPROCS modulo (definition of the block-cyclic data distribution). - */ - *PROC = SRCPROC + 1 + i; - *PROC = MPosMod( *PROC, NPROCS ); -/* - * When IG resides in the process owning the first partial block of size - * INB (MYROC = 0), then the result IL can be written as: - * IL = INB - NB + l * NB + X = IG + ( l - (l * NPROCS + MYROC) ) * NB. - * Using the above notation, we have i+1 = l*NPROCS + MYROC = l*NPROCS, - * i.e l = ( i+1 ) / NPROCS = j+1, since NPROCS divides i+1, therefore - * IL = IG + ( j + 1 - ( i + 1 ) ) * NB. - * - * Otherwise when MYROC >= 1, the result IL can be written as: - * IL = l * NB + X = IG - INB + ( ( l+1 ) - ( l * NPROCS + MYROC ) )*NB. - * We still have i+1 = l*NPROCS+MYROC. Since NPROCS does not divide i+1, - * we have j = (l*NPROCS+MYROC-1) / NPROCS = l, i.e - * IL = IG - INB + ( j + 1 - ( i + 1 ) ) * NB. - */ - *IL = NB * (j - i) + - ( ( i + 1 - ( j + 1 )*NPROCS ) ? IG - INB : IG ); - } -/* - * End of HPL_indxg2lp - */ -} diff --git a/hpl/src/pauxil/HPL_indxg2p.c b/hpl/src/pauxil/HPL_indxg2p.c deleted file mode 100644 index e182da4722292389cb73a30baee5dc60243f74cd..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_indxg2p.c +++ /dev/null @@ -1,128 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_indxg2p -( - const int IG, - const int INB, - const int NB, - const int SRCPROC, - const int NPROCS -) -#else -int HPL_indxg2p -( IG, INB, NB, SRCPROC, NPROCS ) - const int IG; - const int INB; - const int NB; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_indxg2p computes the process coordinate which posseses the entry - * of a matrix specified by a global index IG. - * - * Arguments - * ========= - * - * IG (input) const int - * On entry, IG specifies the global index of the matrix entry. - * IG must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of the - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * SRCPROC (input) const int - * On entry, SRCPROC specifies the coordinate of the process - * that possesses the first row or column of the matrix. SRCPROC - * must be at least zero and strictly less than NPROCS. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process rows - * or columns over which the matrix is distributed. NPROCS must - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int proc; -/* .. - * .. Executable Statements .. - */ - if( ( IG < INB ) || ( SRCPROC == -1 ) || ( NPROCS == 1 ) ) -/* - * IG belongs to the first block, or the data is not distributed, or - * there is just one process in this dimension of the grid. - */ - return( SRCPROC ); -/* - * Otherwise, IG is in block 1 + ( IG - INB ) / NB. Add this to SRCPROC - * and take the NPROCS modulo (definition of the block-cyclic data dis- - * tribution). - */ - proc = SRCPROC + 1 + ( IG - INB ) / NB; - return( MPosMod( proc, NPROCS ) ); -/* - * End of HPL_indxg2p - */ -} diff --git a/hpl/src/pauxil/HPL_indxl2g.c b/hpl/src/pauxil/HPL_indxl2g.c deleted file mode 100644 index bbc4b1bc3a1ad1b0cc98f955e4fcaf93000226f0..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_indxl2g.c +++ /dev/null @@ -1,164 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_indxl2g -( - const int IL, - const int INB, - const int NB, - const int PROC, - const int SRCPROC, - const int NPROCS -) -#else -int HPL_indxl2g -( IL, INB, NB, PROC, SRCPROC, NPROCS ) - const int IL; - const int INB; - const int NB; - const int PROC; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_indxl2g computes the global index of a matrix entry pointed to - * by the local index IL of the process indicated by PROC. - * - * Arguments - * ========= - * - * IL (input) const int - * On entry, IL specifies the local index of the matrix entry. - * IL must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of the - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * PROC (input) const int - * On entry, PROC specifies the coordinate of the process whose - * local array row or column is to be determined. PROC must be - * at least zero and strictly less than NPROCS. - * - * SRCPROC (input) const int - * On entry, SRCPROC specifies the coordinate of the process - * that possesses the first row or column of the matrix. SRCPROC - * must be at least zero and strictly less than NPROCS. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process rows - * or columns over which the matrix is distributed. NPROCS must - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - if( ( SRCPROC == -1 ) || ( NPROCS == 1 ) ) - { -/* - * The data is not distributed, or there is just one process in this di- - * mension of the grid. - */ - return( IL ); - } - else if( PROC == SRCPROC ) - { -/* - * If I am SRCPROC, my first block is of size INB - */ - if( IL < INB ) -/* - * If IL belongs to the first block, the local and global indexes are - * equal. - */ - return ( IL ); -/* - * The number of entire blocks before the one IL belongs to is - * ( IL - INB ) / NB + 1. In the other NPROCS-1 processes, there are - * thus NB*( ( IL-INB )/NB + 1 ) entries, that are globally before the - * global entry corresponding to IL. - */ - return( ( NPROCS - 1 ) * NB * ( ( IL - INB ) / NB + 1 ) + IL ); - } - else if( PROC < SRCPROC ) - { -/* - * Otherwise, the process of coordinate MOD(SRCPROC+1, NPROCS) owns the - * second block. Let IPROC = PROC-SRCPROC-1+NPROCS be the number of pro- - * cesses between this process and PROC not included when going from - * left to right on the process line with possible wrap around. These - * IPROC processes have one more NB block than the other processes, who - * own IL / NB blocks of size NB. - */ - return( NB*( (NPROCS-1)*(IL/NB)+PROC-SRCPROC-1+NPROCS )+IL+INB ); - } - else - { -/* - * Same reasoning as above with IPROC = PROC - SRCPROC - 1. - */ - return( NB*( (NPROCS-1)*(IL/NB)+PROC-SRCPROC-1 )+IL+INB ); - } -/* - * End of HPL_indxl2g - */ -} diff --git a/hpl/src/pauxil/HPL_infog2l.c b/hpl/src/pauxil/HPL_infog2l.c deleted file mode 100644 index d40b2eb14d9fbae497b02360b16acf178b42df76..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_infog2l.c +++ /dev/null @@ -1,382 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_infog2l -( - int I, - int J, - const int IMB, - const int MB, - const int INB, - const int NB, - const int RSRC, - const int CSRC, - const int MYROW, - const int MYCOL, - const int NPROW, - const int NPCOL, - int * II, - int * JJ, - int * PROW, - int * PCOL -) -#else -void HPL_infog2l -( I, J, IMB, MB, INB, NB, RSRC, CSRC, MYROW, MYCOL, NPROW, NPCOL, II, JJ, PROW, PCOL ) - int I; - int J; - const int IMB; - const int MB; - const int INB; - const int NB; - const int RSRC; - const int CSRC; - const int MYROW; - const int MYCOL; - const int NPROW; - const int NPCOL; - int * II; - int * JJ; - int * PROW; - int * PCOL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_infog2l computes the starting local index II, JJ corresponding to - * the submatrix starting globally at the entry pointed by I, J. This - * routine returns the coordinates in the grid of the process owning the - * matrix entry of global indexes I, J, namely PROW and PCOL. - * - * Arguments - * ========= - * - * I (global input) int - * On entry, I specifies the global row index of the matrix - * entry. I must be at least zero. - * - * J (global input) int - * On entry, J specifies the global column index of the matrix - * entry. J must be at least zero. - * - * IMB (global input) const int - * On entry, IMB specifies the size of the first row block of - * the global matrix. IMB must be at least one. - * - * MB (global input) const int - * On entry, MB specifies the blocking factor used to partition - * and distribute the rows of the matrix A. MB must be larger - * than one. - * - * INB (global input) const int - * On entry, INB specifies the size of the first column block of - * the global matrix. INB must be at least one. - * - * NB (global input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the columns of the matrix A. NB must be larger - * than one. - * - * RSRC (global input) const int - * On entry, RSRC specifies the row coordinate of the process - * that possesses the row I. RSRC must be at least zero and - * strictly less than NPROW. - * - * CSRC (global input) const int - * On entry, CSRC specifies the column coordinate of the process - * that possesses the column J. CSRC must be at least zero and - * strictly less than NPCOL. - * - * MYROW (local input) const int - * On entry, MYROW specifies my row process coordinate in the - * grid. MYROW is greater than or equal to zero and less than - * NPROW. - * - * MYCOL (local input) const int - * On entry, MYCOL specifies my column process coordinate in the - * grid. MYCOL is greater than or equal to zero and less than - * NPCOL. - * - * NPROW (global input) const int - * On entry, NPROW specifies the number of process rows in the - * grid. NPROW is at least one. - * - * NPCOL (global input) const int - * On entry, NPCOL specifies the number of process columns in - * the grid. NPCOL is at least one. - * - * II (local output) int * - * On exit, II specifies the local starting row index of the - * submatrix. On exit, II is at least 0. - * - * JJ (local output) int * - * On exit, JJ specifies the local starting column index of the - * submatrix. On exit, JJ is at least 0. - * - * PROW (global output) int * - * On exit, PROW is the row coordinate of the process owning the - * entry specified by the global index I. PROW is at least zero - * and less than NPROW. - * - * PCOL (global output) int * - * On exit, PCOL is the column coordinate of the process owning - * the entry specified by the global index J. PCOL is at least - * zero and less than NPCOL. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int ilocblk, imb, inb, mb, mydist, nb, nblocks, csrc, rsrc; -/* .. - * .. Executable Statements .. - */ - imb = IMB; - *PROW = RSRC; - - if( ( *PROW == -1 ) || ( NPROW == 1 ) ) - { -/* - * The data is not distributed, or there is just one process row in the - * grid. - */ - *II = I; - } - else if( I < imb ) - { -/* - * I refers to an entry in the first block of rows - */ - *II = ( MYROW == *PROW ? I : 0 ); - } - else - { - mb = MB; - rsrc = *PROW; -/* - * The discussion goes as follows: compute my distance from the source - * process so that within this process coordinate system, the source - * process is the process such that mydist = 0, or equivalently - * MYROW == rsrc. - * - * Find out the global coordinate of the block I belongs to (nblocks), - * as well as the minimum local number of blocks that every process has. - * - * when mydist < nblocks-ilocblk*NPROCS, I own ilocblk + 1 full blocks, - * when mydist > nblocks-ilocblk*NPROCS, I own ilocblk full blocks, - * when mydist = nblocks-ilocblk*NPROCS, I own ilocblk full blocks - * but not I, or I own ilocblk + 1 blocks and the entry I refers to. - */ - if( MYROW == rsrc ) - { -/* - * I refers to an entry that is not in the first block, find out which - * process has it. - */ - nblocks = ( I - imb ) / mb + 1; - *PROW += nblocks; - *PROW -= ( *PROW / NPROW ) * NPROW; -/* - * Since mydist = 0 and nblocks - ilocblk * NPROW >= 0, there are only - * three possible cases: - * - * 1) When 0 = mydist = nblocks - ilocblk * NPROW = 0 and I do not own - * I, in which case II = IMB + ( ilocblk - 1 ) * MB. Note that this - * case cannot happen when ilocblk is zero, since nblocks is at - * least one. - * - * 2) When 0 = mydist = nblocks - ilocblk * NPROW = 0 and I own I, in - * which case I and II can respectively be written as IMB + - * (nblocks-1)*NB + IL and IMB + (ilocblk-1) * MB + IL. That is - * II = I + (ilocblk-nblocks)*MB. Note that this case cannot happen - * when ilocblk is zero, since nblocks is at least one. - * - * 3) mydist = 0 < nblocks - ilocblk * NPROW, the source process owns - * ilocblk+1 full blocks, and therefore II = IMB + ilocblk * MB. - * Note that when ilocblk is zero, II is just IMB. - */ - if( nblocks < NPROW ) - { - *II = imb; - } - else - { - ilocblk = nblocks / NPROW; - if( ilocblk * NPROW >= nblocks ) - { - *II = ( ( MYROW == *PROW ) ? - I + ( ilocblk - nblocks ) * mb : - imb + ( ilocblk - 1 ) * mb ); - } - else - { - *II = imb + ilocblk * mb; - } - } - } - else - { -/* - * I refers to an entry that is not in the first block, find out which - * process has it. - */ - nblocks = ( I -= imb ) / mb + 1; - *PROW += nblocks; - *PROW -= ( *PROW / NPROW ) * NPROW; -/* - * Compute my distance from the source process so that within this pro- - * cess coordinate system, the source process is the process such that - * mydist=0. - */ - if( ( mydist = MYROW - rsrc ) < 0 ) mydist += NPROW; -/* - * When mydist < nblocks - ilocblk * NPROW, I own ilocblk+1 full blocks - * of size MB since I am not the source process, i.e. II=(ilocblk+1)*MB. - * When mydist>=nblocks-ilocblk*NPROW and I do not own I, I own ilocblk - * full blocks of size MB, i.e. II = ilocblk*MB, otherwise I own ilocblk - * blocks and I, in which case I can be written as IMB + (nblocks-1)*MB - * + IL and II = ilocblk*MB + IL = I - IMB + (ilocblk - nblocks + 1)*MB. - */ - if( nblocks < NPROW ) - { - mydist -= nblocks; - *II = ( ( mydist < 0 ) ? mb : - ( ( MYROW == *PROW ) ? - I + ( 1 - nblocks ) * mb : 0 ) ); - } - else - { - ilocblk = nblocks / NPROW; - mydist -= nblocks - ilocblk * NPROW; - *II = ( ( mydist < 0 ) ? ( ilocblk + 1 ) * mb : - ( ( MYROW == *PROW ) ? - ( ilocblk - nblocks + 1 ) * mb + I : - ilocblk * mb ) ); - } - } - } -/* - * Idem for the columns - */ - inb = INB; - *PCOL = CSRC; - - if( ( *PCOL == -1 ) || ( NPCOL == 1 ) ) - { - *JJ = J; - } - else if( J < inb ) - { - *JJ = ( MYCOL == *PCOL ? J : 0 ); - } - else - { - nb = NB; - csrc = *PCOL; - - if( MYCOL == csrc ) - { - nblocks = ( J - inb ) / nb + 1; - *PCOL += nblocks; - *PCOL -= ( *PCOL / NPCOL ) * NPCOL; - - if( nblocks < NPCOL ) - { - *JJ = inb; - } - else - { - ilocblk = nblocks / NPCOL; - if( ilocblk * NPCOL >= nblocks ) - { - *JJ = ( ( MYCOL == *PCOL ) ? - J + ( ilocblk - nblocks ) * nb : - inb + ( ilocblk - 1 ) * nb ); - } - else - { - *JJ = inb + ilocblk * nb; - } - } - } - else - { - nblocks = ( J -= inb ) / nb + 1; - *PCOL += nblocks; - *PCOL -= ( *PCOL / NPCOL ) * NPCOL; - - if( ( mydist = MYCOL - csrc ) < 0 ) mydist += NPCOL; - - if( nblocks < NPCOL ) - { - mydist -= nblocks; - *JJ = ( ( mydist < 0 ) ? nb : ( ( MYCOL == *PCOL ) ? - J + ( 1 - nblocks )*nb : 0 ) ); - } - else - { - ilocblk = nblocks / NPCOL; - mydist -= nblocks - ilocblk * NPCOL; - *JJ = ( ( mydist < 0 ) ? ( ilocblk + 1 ) * nb : - ( ( MYCOL == *PCOL ) ? - ( ilocblk - nblocks + 1 ) * nb + J : - ilocblk * nb ) ); - } - } - } -/* - * End of HPL_infog2l - */ -} diff --git a/hpl/src/pauxil/HPL_numroc.c b/hpl/src/pauxil/HPL_numroc.c deleted file mode 100644 index 380afcf30eb71bb6424f497011e7316d5cf4f416..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_numroc.c +++ /dev/null @@ -1,120 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_numroc -( - const int N, - const int INB, - const int NB, - const int PROC, - const int SRCPROC, - const int NPROCS -) -#else -int HPL_numroc -( N, INB, NB, PROC, SRCPROC, NPROCS ) - const int N; - const int INB; - const int NB; - const int PROC; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_numroc returns the local number of matrix rows/columns process - * PROC will get if we give out N rows/columns starting from global - * index 0. - * - * Arguments - * ========= - * - * N (input) const int - * On entry, N specifies the number of rows/columns being dealt - * out. N must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of the - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * PROC (input) const int - * On entry, PROC specifies the coordinate of the process whose - * local portion is determined. PROC must be at least zero and - * strictly less than NPROCS. - * - * SRCPROC (input) const int - * On entry, SRCPROC specifies the coordinate of the process - * that possesses the first row or column of the matrix. SRCPROC - * must be at least zero and strictly less than NPROCS. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process rows - * or columns over which the matrix is distributed. NPROCS must - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - return( HPL_numrocI( N, 0, INB, NB, PROC, SRCPROC, NPROCS ) ); -/* - * End of HPL_numroc - */ -} diff --git a/hpl/src/pauxil/HPL_numrocI.c b/hpl/src/pauxil/HPL_numrocI.c deleted file mode 100644 index 83bb42d787608b90838f3b20562751fc08e6a6f5..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_numrocI.c +++ /dev/null @@ -1,243 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int HPL_numrocI -( - const int N, - const int I, - const int INB, - const int NB, - const int PROC, - const int SRCPROC, - const int NPROCS -) -#else -int HPL_numrocI -( N, I, INB, NB, PROC, SRCPROC, NPROCS ) - const int N; - const int I; - const int INB; - const int NB; - const int PROC; - const int SRCPROC; - const int NPROCS; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_numrocI returns the local number of matrix rows/columns process - * PROC will get if we give out N rows/columns starting from global - * index I. - * - * Arguments - * ========= - * - * N (input) const int - * On entry, N specifies the number of rows/columns being dealt - * out. N must be at least zero. - * - * I (input) const int - * On entry, I specifies the global index of the matrix entry - * I must be at least zero. - * - * INB (input) const int - * On entry, INB specifies the size of the first block of th - * global matrix. INB must be at least one. - * - * NB (input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * PROC (input) const int - * On entry, PROC specifies the coordinate of the process whos - * local portion is determined. PROC must be at least zero an - * strictly less than NPROCS. - * - * SRCPROC (input) const int - * On entry, SRCPROC specifies the coordinate of the proces - * that possesses the first row or column of the matrix. SRCPRO - * must be at least zero and strictly less than NPROCS. - * - * NPROCS (input) const int - * On entry, NPROCS specifies the total number of process row - * or columns over which the matrix is distributed. NPROCS mus - * be at least one. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int ilocblk, inb, mydist, nblocks, srcproc; -/* .. - * .. Executable Statements .. - */ - if( ( SRCPROC == -1 ) || ( NPROCS == 1 ) ) -/* - * The data is not distributed, or there is just one process in this di- - * mension of the grid. - */ - return( N ); -/* - * Compute coordinate of process owning I and corresponding INB - */ - srcproc = SRCPROC; - - if( ( inb = INB - I ) <= 0 ) - { -/* - * I is not in the first block, find out which process has it and update - * the size of first block - */ - srcproc += ( nblocks = (-inb) / NB + 1 ); - srcproc -= ( srcproc / NPROCS ) * NPROCS; - inb += nblocks * NB; - } -/* - * Now everything is just like N, I=0, INB, NB, srcproc, NPROCS. The - * discussion goes as follows: compute my distance from the source pro- - * cess so that within this process coordinate system, the source pro- - * cess is the process such that mydist = 0, or PROC == srcproc. - * - * Find out how many full blocks are globally (nblocks) and locally - * (ilocblk) in those N entries. Then remark that - * - * when mydist < nblocks - ilocblk*NPROCS, I own ilocblk+1 full blocks, - * when mydist > nblocks - ilocblk*NPROCS, I own ilocblk full blocks, - * when mydist = nblocks - ilocblk*NPROCS, either the last block is not - * full and I own it, or the last block is full and I am the first pro- - * cess owning only ilocblk full blocks. - */ - if( PROC == srcproc ) - { -/* - * I am the source process, i.e. I own I (mydist=0). When N <= INB, the - * answer is simply N. - */ - if( N <= inb ) return( N ); -/* - * Find out how many full blocks are globally (nblocks) and locally - * (ilocblk) in those N entries. - */ - nblocks = ( N - inb ) / NB + 1; -/* - * Since mydist = 0 and nblocks - ilocblk * NPROCS >= 0, there are only - * two possible cases: - * - * 1) When mydist = nblocks - ilocblk * NPROCS = 0, that is NPROCS di- - * vides the global number of full blocks, then the source process - * srcproc owns one more block than the other processes; and N can - * be rewritten as N = INB + (nblocks-1) * NB + LNB with LNB >= 0 - * size of the last block. Similarly, the local value Np correspon- - * ding to N can be written as Np = INB + (ilocblk-1) * NB + LNB = - * N + ( ilocblk-1 - (nblocks-1) )*NB. Note that this case cannot - * happen when ilocblk is zero, since nblocks is at least one. - * - * 2) mydist = 0 < nblocks - ilocblk * NPROCS, the source process only - * owns full blocks, and therefore Np = INB + ilocblk * NB. Note - * that when ilocblk is zero, Np is just INB. - */ - if( nblocks < NPROCS ) return( inb ); - - ilocblk = nblocks / NPROCS; - return( ( nblocks - ilocblk * NPROCS ) ? inb + ilocblk * NB : - N + ( ilocblk - nblocks ) * NB ); - } - else - { -/* - * I am not the source process. When N <= INB, the answer is simply 0. - */ - if( N <= inb ) return( 0 ); -/* - * Find out how many full blocks are globally (nblocks) and locally - * (ilocblk) in those N entries - */ - nblocks = ( N - inb ) / NB + 1; -/* - * Compute my distance from the source process so that within this pro- - * cess coordinate system, the source process is the process such that - * mydist=0. - */ - if( ( mydist = PROC - srcproc ) < 0 ) mydist += NPROCS; -/* - * When mydist < nblocks - ilocblk*NPROCS, I own ilocblk + 1 full blocks - * of size NB since I am not the source process, - * - * when mydist > nblocks - ilocblk * NPROCS, I own ilocblk full blocks - * of size NB since I am not the source process, - * - * when mydist = nblocks - ilocblk*NPROCS, - * either the last block is not full and I own it, in which case - * N = INB + (nblocks - 1)*NB + LNB with LNB the size of the last - * block such that NB > LNB > 0; the local value Np corresponding to - * N is given by Np = ilocblk*NB+LNB = N-INB+(ilocblk-nblocks+1)*NB; - * or the last block is full and I am the first process owning only - * ilocblk full blocks of size NB, that is N = INB+(nblocks-1)*NB and - * Np = ilocblk * NB = N - INB + (ilocblk-nblocks+1) * NB. - */ - if( nblocks < NPROCS ) - return( ( mydist < nblocks ) ? NB : ( ( mydist > nblocks ) ? 0 : - N - inb + NB * ( 1 - nblocks ) ) ); - - ilocblk = nblocks / NPROCS; - mydist -= nblocks - ilocblk * NPROCS; - return( ( mydist < 0 ) ? ( ilocblk + 1 ) * NB : - ( ( mydist > 0 ) ? ilocblk * NB : - N - inb + NB * ( ilocblk - nblocks + 1 ) ) ); - } -/* - * End of HPL_numrocI - */ -} diff --git a/hpl/src/pauxil/HPL_pabort.c b/hpl/src/pauxil/HPL_pabort.c deleted file mode 100644 index a55f0014fbea3679a90c77717b12deeaf44afdbc..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_pabort.c +++ /dev/null @@ -1,137 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pabort -( - int LINE, - const char * SRNAME, - const char * FORM, - ... -) -#else -void HPL_pabort( va_alist ) -va_dcl -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pabort displays an error message on stderr and halts execution. - * - * - * Arguments - * ========= - * - * LINE (local input) int - * On entry, LINE specifies the line number in the file where - * the error has occured. When LINE is not a positive line - * number, it is ignored. - * - * SRNAME (local input) const char * - * On entry, SRNAME should be the name of the routine calling - * this error handler. - * - * FORM (local input) const char * - * On entry, FORM specifies the format, i.e., how the subsequent - * arguments are converted for output. - * - * (local input) ... - * On entry, ... is the list of arguments to be printed within - * the format string. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - va_list argptr; - int rank; - char cline[128]; -#ifndef STDC_HEADERS - int LINE; - char * FORM, * SRNAME; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef STDC_HEADERS - va_start( argptr, FORM ); -#else - va_start( argptr ); - LINE = va_arg( argptr, int ); - SRNAME = va_arg( argptr, char * ); - FORM = va_arg( argptr, char * ); -#endif - (void) vsprintf( cline, FORM, argptr ); - va_end( argptr ); - - MPI_Comm_rank( MPI_COMM_WORLD, &rank ); -/* - * Display an error message - */ - if( LINE <= 0 ) - HPL_fprintf( stderr, "%s %s %d, %s %s:\n>>> %s <<< Abort ...\n\n", - "HPL ERROR", "from process #", rank, "in function", - SRNAME, cline ); - else - HPL_fprintf( stderr, - "%s %s %d, %s %d %s %s:\n>>> %s <<< Abort ...\n\n", - "HPL ERROR", "from process #", rank, "on line", LINE, - "of function", SRNAME, cline ); - - MPI_Abort( MPI_COMM_WORLD, -1 ); - exit( -1 ); -/* - * End of HPL_pabort - */ -} diff --git a/hpl/src/pauxil/HPL_pdlamch.c b/hpl/src/pauxil/HPL_pdlamch.c deleted file mode 100644 index 75617e3fd67478ec1744973b3f33999d03fa4381..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_pdlamch.c +++ /dev/null @@ -1,143 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -double HPL_pdlamch -( - MPI_Comm COMM, - const HPL_T_MACH CMACH -) -#else -double HPL_pdlamch -( COMM, CMACH ) - MPI_Comm COMM; - const HPL_T_MACH CMACH; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlamch determines machine-specific arithmetic constants such as - * the relative machine precision (eps), the safe minimum(sfmin) such that - * 1/sfmin does not overflow, the base of the machine (base), the precision - * (prec), the number of (base) digits in the mantissa (t), whether - * rounding occurs in addition (rnd = 1.0 and 0.0 otherwise), the minimum - * exponent before (gradual) underflow (emin), the underflow threshold - * (rmin)- base**(emin-1), the largest exponent before overflow (emax), the - * overflow threshold (rmax) - (base**emax)*(1-eps). - * - * Arguments - * ========= - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection. - * - * CMACH (global input) const HPL_T_MACH - * Specifies the value to be returned by HPL_pdlamch - * = HPL_MACH_EPS, HPL_pdlamch := eps (default) - * = HPL_MACH_SFMIN, HPL_pdlamch := sfmin - * = HPL_MACH_BASE, HPL_pdlamch := base - * = HPL_MACH_PREC, HPL_pdlamch := eps*base - * = HPL_MACH_MLEN, HPL_pdlamch := t - * = HPL_MACH_RND, HPL_pdlamch := rnd - * = HPL_MACH_EMIN, HPL_pdlamch := emin - * = HPL_MACH_RMIN, HPL_pdlamch := rmin - * = HPL_MACH_EMAX, HPL_pdlamch := emax - * = HPL_MACH_RMAX, HPL_pdlamch := rmax - * - * where - * - * eps = relative machine precision, - * sfmin = safe minimum, - * base = base of the machine, - * prec = eps*base, - * t = number of digits in the mantissa, - * rnd = 1.0 if rounding occurs in addition, - * emin = minimum exponent before underflow, - * rmin = underflow threshold, - * emax = largest exponent before overflow, - * rmax = overflow threshold. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double param; -/* .. - * .. Executable Statements .. - */ - param = HPL_dlamch( CMACH ); - - switch( CMACH ) - { - case HPL_MACH_EPS : - case HPL_MACH_SFMIN : - case HPL_MACH_EMIN : - case HPL_MACH_RMIN : - (void) HPL_all_reduce( (void *)(¶m), 1, HPL_DOUBLE, - HPL_max, COMM ); - break; - case HPL_MACH_EMAX : - case HPL_MACH_RMAX : - (void) HPL_all_reduce( (void *)(¶m), 1, HPL_DOUBLE, - HPL_min, COMM ); - break; - default : - break; - } - - return( param ); -/* - * End of HPL_pdlamch - */ -} diff --git a/hpl/src/pauxil/HPL_pdlange.c b/hpl/src/pauxil/HPL_pdlange.c deleted file mode 100644 index 4596638cc3fe85d1987f72dfa17f353124368864..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_pdlange.c +++ /dev/null @@ -1,242 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -double HPL_pdlange -( - const HPL_T_grid * GRID, - const HPL_T_NORM NORM, - const int M, - const int N, - const int NB, - const double * A, - const int LDA -) -#else -double HPL_pdlange -( GRID, NORM, M, N, NB, A, LDA ) - const HPL_T_grid * GRID; - const HPL_T_NORM NORM; - const int M; - const int N; - const int NB; - const double * A; - const int LDA; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlange returns the value of the one norm, or the infinity norm, - * or the element of largest absolute value of a distributed matrix A: - * - * - * max(abs(A(i,j))) when NORM = HPL_NORM_A, - * norm1(A), when NORM = HPL_NORM_1, - * normI(A), when NORM = HPL_NORM_I, - * - * where norm1 denotes the one norm of a matrix (maximum column sum) and - * normI denotes the infinity norm of a matrix (maximum row sum). Note - * that max(abs(A(i,j))) is not a matrix norm. - * - * Arguments - * ========= - * - * GRID (local input) const HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * NORM (global input) const HPL_T_NORM - * On entry, NORM specifies the value to be returned by this - * function as described above. - * - * M (global input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (global input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * NB (global input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix. NB must be larger than one. - * - * A (local input) const double * - * On entry, A points to an array of dimension (LDA,LocQ(N)), - * that contains the local pieces of the distributed matrix A. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,LocP(M)). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double s, v0=HPL_rzero, * work = NULL; - MPI_Comm Acomm, Ccomm, Rcomm; - int ii, jj, mp, mycol, myrow, npcol, nprow, - nq; -/* .. - * .. Executable Statements .. - */ - (void) HPL_grid_info( GRID, &nprow, &npcol, &myrow, &mycol ); - Rcomm = GRID->row_comm; Ccomm = GRID->col_comm; - Acomm = GRID->all_comm; - - Mnumroc( mp, M, NB, NB, myrow, 0, nprow ); - Mnumroc( nq, N, NB, NB, mycol, 0, npcol ); - - if( Mmin( M, N ) == 0 ) { return( v0 ); } - else if( NORM == HPL_NORM_A ) - { -/* - * max( abs( A ) ) - */ - if( ( nq > 0 ) && ( mp > 0 ) ) - { - for( jj = 0; jj < nq; jj++ ) - { - for( ii = 0; ii < mp; ii++ ) - { v0 = Mmax( v0, Mabs( *A ) ); A++; } - A += LDA - mp; - } - } - (void) HPL_reduce( (void *)(&v0), 1, HPL_DOUBLE, HPL_max, 0, - Acomm ); - } - else if( NORM == HPL_NORM_1 ) - { -/* - * Find norm_1( A ). - */ - if( nq > 0 ) - { - work = (double*)malloc( (size_t)(nq) * sizeof( double ) ); - if( work == NULL ) - { HPL_pabort( __LINE__, "HPL_pdlange", "Memory allocation failed" ); } - - for( jj = 0; jj < nq; jj++ ) - { - s = HPL_rzero; - for( ii = 0; ii < mp; ii++ ) { s += Mabs( *A ); A++; } - work[jj] = s; A += LDA - mp; - } -/* - * Find sum of global matrix columns, store on row 0 of process grid - */ - (void) HPL_reduce( (void *)(work), nq, HPL_DOUBLE, HPL_sum, - 0, Ccomm ); -/* - * Find maximum sum of columns for 1-norm - */ - if( myrow == 0 ) - { v0 = work[HPL_idamax( nq, work, 1 )]; v0 = Mabs( v0 ); } - if( work ) free( work ); - } -/* - * Find max in row 0, store result in process (0,0) - */ - if( myrow == 0 ) - (void) HPL_reduce( (void *)(&v0), 1, HPL_DOUBLE, HPL_max, 0, - Rcomm ); - } - else if( NORM == HPL_NORM_I ) - { -/* - * Find norm_inf( A ) - */ - if( mp > 0 ) - { - work = (double*)malloc( (size_t)(mp) * sizeof( double ) ); - if( work == NULL ) - { HPL_pabort( __LINE__, "HPL_pdlange", "Memory allocation failed" ); } - - for( ii = 0; ii < mp; ii++ ) { work[ii] = HPL_rzero; } - - for( jj = 0; jj < nq; jj++ ) - { - for( ii = 0; ii < mp; ii++ ) - { work[ii] += Mabs( *A ); A++; } - A += LDA - mp; - } -/* - * Find sum of global matrix rows, store on column 0 of process grid - */ - (void) HPL_reduce( (void *)(work), mp, HPL_DOUBLE, HPL_sum, - 0, Rcomm ); -/* - * Find maximum sum of rows for inf-norm - */ - if( mycol == 0 ) - { v0 = work[HPL_idamax( mp, work, 1 )]; v0 = Mabs( v0 ); } - if( work ) free( work ); - } -/* - * Find max in column 0, store result in process (0,0) - */ - if( mycol == 0 ) - (void) HPL_reduce( (void *)(&v0), 1, HPL_DOUBLE, HPL_max, - 0, Ccomm ); - } -/* - * Broadcast answer to every process in the grid - */ - (void) HPL_broadcast( (void *)(&v0), 1, HPL_DOUBLE, 0, Acomm ); - - return( v0 ); -/* - * End of HPL_pdlange - */ -} diff --git a/hpl/src/pauxil/HPL_pdlaprnt.c b/hpl/src/pauxil/HPL_pdlaprnt.c deleted file mode 100644 index 8135e34f26ab471dfaa933eb48937c4082a0e82b..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_pdlaprnt.c +++ /dev/null @@ -1,190 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdlaprnt -( - const HPL_T_grid * GRID, - const int M, - const int N, - const int NB, - double * A, - const int LDA, - const int IAROW, - const int IACOL, - const char * CMATNM -) -#else -void HPL_pdlaprnt -( GRID, M, N, NB, A, LDA, IAROW, IACOL, CMATNM ) - const HPL_T_grid * GRID; - const int M; - const int N; - const int NB; - double * A; - const int LDA; - const int IAROW; - const int IACOL; - const char * CMATNM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlaprnt prints to standard error a distributed matrix A. The - * local pieces of A are sent to the process of coordinates (0,0) in - * the grid and then printed. - * - * Arguments - * ========= - * - * GRID (local input) const HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * M (global input) const int - * On entry, M specifies the number of rows of the coefficient - * matrix A. M must be at least zero. - * - * N (global input) const int - * On entry, N specifies the number of columns of the - * coefficient matrix A. N must be at least zero. - * - * NB (global input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix. NB must be larger than one. - * - * A (local input) double * - * On entry, A points to an array of dimension (LDA,LocQ(N)). - * This array contains the coefficient matrix to be printed. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,LocP(M)). - * - * IAROW (global input) const int - * On entry, IAROW specifies the row process coordinate owning - * the first row of A. IAROW must be larger than or equal to - * zero and less than NPROW. - * - * IACOL (global input) const int - * On entry, IACOL specifies the column process coordinate - * owning the first column of A. IACOL must be larger than or - * equal to zero and less than NPCOL. - * - * CMATNM (global input) const char * - * On entry, CMATNM is the name of the matrix to be printed. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Comm Acomm; - double * buf = NULL; - int h, i, ib, icurcol=IACOL, icurrow=IAROW, - ii=0, j, jb, jj=0, mycol, myrow, npcol, - nprow, src; -/* .. - * .. Executable Statements .. - */ - (void) HPL_grid_info( GRID, &nprow, &npcol, &myrow, &mycol ); - Acomm = GRID->all_comm; - if( ( myrow == 0 ) && ( mycol == 0 ) ) - buf = (double*)malloc( (size_t)(NB) * sizeof( double ) ); - - for( j = 0; j < N; j += NB ) - { - jb = N-j; jb = Mmin( jb, NB ); - for( h = 0; h < jb; h++ ) - { - (void) HPL_barrier( Acomm ); - - for( i = 0; i < M; i += NB ) - { - ib = M-i; ib = Mmin( ib, NB ); - if( ( icurrow == 0 ) && ( icurcol == 0 ) ) - { - if( ( myrow == 0 ) && ( mycol == 0 ) ) - HPL_dlaprnt( ib, 1, Mptr( A, ii, jj+h, LDA ), i+1, - j+h+1, LDA, CMATNM ); - } - else - { - if( ( myrow == icurrow ) && ( mycol == icurcol ) ) - { - (void) HPL_send( Mptr( A, ii, jj+h, LDA ), ib, 0, - 9000+(j+h)*M+i, Acomm ); - } - else if( ( myrow == 0 ) && ( mycol == 0 ) ) - { - src = HPL_pnum( GRID, icurrow, icurcol ); - (void) HPL_recv( buf, ib, src, 9000+(j+h)*M+i, - Acomm ); - HPL_dlaprnt( ib, 1, buf, i+1, j+h+1, NB, CMATNM ); - } - } - if( myrow == icurrow ) ii += ib; - icurrow = MModAdd1( icurrow, nprow ); - (void) HPL_barrier( Acomm ); - } - ii = 0; icurrow = IAROW; - } - if( mycol == icurcol ) jj += jb; - icurcol = MModAdd1( icurcol, npcol ); - (void) HPL_barrier( Acomm ); - } - if( ( myrow == 0 ) && ( mycol == 0 ) && ( buf ) ) free( buf ); -/* - * End of HPL_pdlaprnt - */ -} diff --git a/hpl/src/pauxil/HPL_pwarn.c b/hpl/src/pauxil/HPL_pwarn.c deleted file mode 100644 index 667e8bb80f11941881ede92d78961c84af895df4..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/HPL_pwarn.c +++ /dev/null @@ -1,139 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pwarn -( - FILE * STREAM, - int LINE, - const char * SRNAME, - const char * FORM, - ... -) -#else -void HPL_pwarn( va_alist ) -va_dcl -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pwarn displays an error message. - * - * - * Arguments - * ========= - * - * STREAM (local input) FILE * - * On entry, STREAM specifies the output stream. - * - * LINE (local input) int - * On entry, LINE specifies the line number in the file where - * the error has occured. When LINE is not a positive line - * number, it is ignored. - * - * SRNAME (local input) const char * - * On entry, SRNAME should be the name of the routine calling - * this error handler. - * - * FORM (local input) const char * - * On entry, FORM specifies the format, i.e., how the subsequent - * arguments are converted for output. - * - * (local input) ... - * On entry, ... is the list of arguments to be printed within - * the format string. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - va_list argptr; - int rank; - char cline[128]; -#ifndef STDC_HEADERS - FILE * STREAM; - int LINE; - char * FORM, * SRNAME; -#endif -/* .. - * .. Executable Statements .. - */ -#ifdef STDC_HEADERS - va_start( argptr, FORM ); -#else - va_start( argptr ); - STREAM = va_arg( argptr, FILE * ); - LINE = va_arg( argptr, int ); - SRNAME = va_arg( argptr, char * ); - FORM = va_arg( argptr, char * ); -#endif - (void) vsprintf( cline, FORM, argptr ); - va_end( argptr ); - - MPI_Comm_rank( MPI_COMM_WORLD, &rank ); -/* - * Display an error message - */ - if( LINE <= 0 ) - HPL_fprintf( STREAM, "%s %s %d, %s %s:\n>>> %s <<<\n\n", - "HPL ERROR", "from process #", rank, "in function", - SRNAME, cline ); - else - HPL_fprintf( STREAM, "%s %s %d, %s %d %s %s:\n>>> %s <<<\n\n", - "HPL ERROR", "from process #", rank, "on line", LINE, - "of function", SRNAME, cline ); -/* - * End of HPL_pwarn - */ -} diff --git a/hpl/src/pauxil/intel64/Make.inc b/hpl/src/pauxil/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/pauxil/intel64/Makefile b/hpl/src/pauxil/intel64/Makefile deleted file mode 100644 index 69bb9125dcd435c7f087dd0ffb619377879b52b6..0000000000000000000000000000000000000000 --- a/hpl/src/pauxil/intel64/Makefile +++ /dev/null @@ -1,137 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_pauxil.h -# -## Object files ######################################################## -# -HPL_pauobj = \ - HPL_indxg2l.o HPL_indxg2lp.o HPL_indxg2p.o \ - HPL_indxl2g.o HPL_infog2l.o HPL_numroc.o \ - HPL_numrocI.o HPL_dlaswp00N.o HPL_dlaswp10N.o \ - HPL_dlaswp01N.o HPL_dlaswp01T.o HPL_dlaswp02N.o \ - HPL_dlaswp03N.o HPL_dlaswp03T.o HPL_dlaswp04N.o \ - HPL_dlaswp04T.o HPL_dlaswp05N.o HPL_dlaswp05T.o \ - HPL_dlaswp06N.o HPL_dlaswp06T.o HPL_pwarn.o \ - HPL_pabort.o HPL_pdlaprnt.o HPL_pdlamch.o \ - HPL_pdlange.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pauobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pauobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_indxg2l.o : ../HPL_indxg2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2l.c -HPL_indxg2lp.o : ../HPL_indxg2lp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2lp.c -HPL_indxg2p.o : ../HPL_indxg2p.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxg2p.c -HPL_indxl2g.o : ../HPL_indxl2g.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_indxl2g.c -HPL_infog2l.o : ../HPL_infog2l.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_infog2l.c -HPL_numroc.o : ../HPL_numroc.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_numroc.c -HPL_numrocI.o : ../HPL_numrocI.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_numrocI.c -HPL_dlaswp00N.o : ../HPL_dlaswp00N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp00N.c -HPL_dlaswp10N.o : ../HPL_dlaswp10N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp10N.c -HPL_dlaswp01N.o : ../HPL_dlaswp01N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp01N.c -HPL_dlaswp01T.o : ../HPL_dlaswp01T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp01T.c -HPL_dlaswp02N.o : ../HPL_dlaswp02N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp02N.c -HPL_dlaswp03N.o : ../HPL_dlaswp03N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp03N.c -HPL_dlaswp03T.o : ../HPL_dlaswp03T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp03T.c -HPL_dlaswp04N.o : ../HPL_dlaswp04N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp04N.c -HPL_dlaswp04T.o : ../HPL_dlaswp04T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp04T.c -HPL_dlaswp05N.o : ../HPL_dlaswp05N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp05N.c -HPL_dlaswp05T.o : ../HPL_dlaswp05T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp05T.c -HPL_dlaswp06N.o : ../HPL_dlaswp06N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp06N.c -HPL_dlaswp06T.o : ../HPL_dlaswp06T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlaswp06T.c -HPL_pwarn.o : ../HPL_pwarn.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pwarn.c -HPL_pabort.o : ../HPL_pabort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pabort.c -HPL_pdlaprnt.o : ../HPL_pdlaprnt.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaprnt.c -HPL_pdlamch.o : ../HPL_pdlamch.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlamch.c -HPL_pdlange.o : ../HPL_pdlange.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlange.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/pauxil/intel64/lib.grd b/hpl/src/pauxil/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/pfact/HPL_dlocmax.c b/hpl/src/pfact/HPL_dlocmax.c deleted file mode 100644 index 7fcd8ef48afbc3480c08f5a3b76a52aa5677755c..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_dlocmax.c +++ /dev/null @@ -1,149 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_dlocmax -( - HPL_T_panel * PANEL, - const int N, - const int II, - const int JJ, - double * WORK -) -#else -void HPL_dlocmax -( PANEL, N, II, JJ, WORK ) - HPL_T_panel * PANEL; - const int N; - const int II; - const int JJ; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlocmax finds the maximum entry in the current column and packs - * the useful information in WORK[0:3]. On exit, WORK[0] contains the - * local maximum absolute value scalar, WORK[1] is the corresponding - * local row index, WORK[2] is the corresponding global row index, and - * WORK[3] is the coordinate of the process owning this max. When N is - * less than 1, the WORK[0:2] is initialized to zero, and WORK[3] is set - * to the total number of process rows. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * N (local input) const int - * On entry, N specifies the local number of rows of the column - * of A on which we operate. - * - * II (local input) const int - * On entry, II specifies the row offset where the column to be - * operated on starts with respect to the panel. - * - * JJ (local input) const int - * On entry, JJ specifies the column offset where the column to - * be operated on starts with respect to the panel. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 4. On exit, - * WORK[0] contains the local maximum absolute value scalar, - * WORK[1] contains the corresponding local row index, WORK[2] - * contains the corresponding global row index, and WORK[3] is - * the coordinate of process owning this max. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A; - int kk, igindx, ilindx, myrow, nb, nprow; -/* .. - * .. Executable Statements .. - */ - if( N > 0 ) - { - A = Mptr( PANEL->A, II, JJ, PANEL->lda ); - myrow = PANEL->grid->myrow; - nprow = PANEL->grid->nprow; - nb = PANEL->nb; - kk = PANEL->ii + II + ( ilindx = HPL_idamax( N, A, 1 ) ); - Mindxl2g( igindx, kk, nb, nb, myrow, 0, nprow ); -/* - * WORK[0] := local maximum absolute value scalar, - * WORK[1] := corresponding local row index, - * WORK[2] := corresponding global row index, - * WORK[3] := coordinate of process owning this max. - */ - WORK[0] = A[ilindx]; WORK[1] = (double)(ilindx); - WORK[2] = (double)(igindx); WORK[3] = (double)(myrow); - } - else - { -/* - * If I do not have any row of A, then set the coordinate of the process - * (WORK[3]) owning this "ghost" row, such that it will never be used, - * even if there are only zeros in the current column of A. - */ - WORK[0] = WORK[1] = WORK[2] = HPL_rzero; - WORK[3] = (double)(PANEL->grid->nprow); - } -/* - * End of HPL_dlocmax - */ -} diff --git a/hpl/src/pfact/HPL_dlocswpN.c b/hpl/src/pfact/HPL_dlocswpN.c deleted file mode 100644 index 741a0aabfb192e0a2127470d40a6fea6a05f0665..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_dlocswpN.c +++ /dev/null @@ -1,436 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LOCSWP_DEPTH -#define HPL_LOCSWP_DEPTH 32 -#define HPL_LOCSWP_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlocswpN -( - HPL_T_panel * PANEL, - const int II, - const int JJ, - double * WORK -) -#else -void HPL_dlocswpN -( PANEL, II, JJ, WORK ) - HPL_T_panel * PANEL; - const int II; - const int JJ; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlocswpN performs the local swapping operations within a panel. - * The lower triangular N0-by-N0 upper block of the panel is stored in - * no-transpose form (i.e. just like the input matrix itself). - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * II (local input) const int - * On entry, II specifies the row offset where the column to be - * operated on starts with respect to the panel. - * - * JJ (local input) const int - * On entry, JJ specifies the column offset where the column to - * be operated on starts with respect to the panel. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2 * (4+2*N0). - * WORK[0] contains the local maximum absolute value scalar, - * WORK[1] contains the corresponding local row index, WORK[2] - * contains the corresponding global row index, and WORK[3] is - * the coordinate of process owning this max. The N0 length max - * row is stored in WORK[4:4+N0-1]; Note that this is also the - * JJth row (or column) of L1. The remaining part of this array - * is used as workspace. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double gmax; - double * A1, * A2, * L, * Wr0, * Wmx; - int ilindx, lda, myrow, n0, nr, nu; - register int i; -/* .. - * .. Executable Statements .. - */ - myrow = PANEL->grid->myrow; n0 = PANEL->jb; lda = PANEL->lda; - - Wr0 = ( Wmx = WORK + 4 ) + n0; Wmx[JJ] = gmax = WORK[0]; - nu = (int)( ( (unsigned int)(n0) >> HPL_LOCSWP_LOG2_DEPTH ) - << HPL_LOCSWP_LOG2_DEPTH ); - nr = n0 - nu; -/* - * Replicated swap and copy of the current (new) row of A into L1 - */ - L = Mptr( PANEL->L1, JJ, 0, n0 ); -/* - * If the pivot is non-zero ... - */ - if( gmax != HPL_rzero ) - { -/* - * and if I own the current row of A ... - */ - if( myrow == PANEL->prow ) - { -/* - * and if I also own the row to be swapped with the current row of A ... - */ - if( myrow == (int)(WORK[3]) ) - { -/* - * and if the current row of A is not to swapped with itself ... - */ - if( ( ilindx = (int)(WORK[1]) ) != 0 ) - { -/* - * then copy the max row into L1 and locally swap the 2 rows of A. - */ - A1 = Mptr( PANEL->A, II, 0, lda ); - A2 = Mptr( A1, ilindx, 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, Wr0 += HPL_LOCSWP_DEPTH ) - { - *L=*A1=Wmx[ 0]; *A2=Wr0[ 0]; L+=n0; A1+=lda; A2+=lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *L=*A1=Wmx[ 1]; *A2=Wr0[ 1]; L+=n0; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *L=*A1=Wmx[ 2]; *A2=Wr0[ 2]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[ 3]; *A2=Wr0[ 3]; L+=n0; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *L=*A1=Wmx[ 4]; *A2=Wr0[ 4]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[ 5]; *A2=Wr0[ 5]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[ 6]; *A2=Wr0[ 6]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[ 7]; *A2=Wr0[ 7]; L+=n0; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *L=*A1=Wmx[ 8]; *A2=Wr0[ 8]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[ 9]; *A2=Wr0[ 9]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[10]; *A2=Wr0[10]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[11]; *A2=Wr0[11]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[12]; *A2=Wr0[12]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[13]; *A2=Wr0[13]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[14]; *A2=Wr0[14]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[15]; *A2=Wr0[15]; L+=n0; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *L=*A1=Wmx[16]; *A2=Wr0[16]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[17]; *A2=Wr0[17]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[18]; *A2=Wr0[18]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[19]; *A2=Wr0[19]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[20]; *A2=Wr0[20]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[21]; *A2=Wr0[21]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[22]; *A2=Wr0[22]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[23]; *A2=Wr0[23]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[24]; *A2=Wr0[24]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[25]; *A2=Wr0[25]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[26]; *A2=Wr0[26]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[27]; *A2=Wr0[27]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[28]; *A2=Wr0[28]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[29]; *A2=Wr0[29]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[30]; *A2=Wr0[30]; L+=n0; A1+=lda; A2+=lda; - *L=*A1=Wmx[31]; *A2=Wr0[31]; L+=n0; A1+=lda; A2+=lda; -#endif - } - for( i = 0; i < nr; i++, L += n0, A1 += lda, A2 += lda ) - { *L = *A1 = Wmx[i]; *A2 = Wr0[i]; } - } - else - { -/* - * otherwise the current row of A is swapped with itself, so just copy - * the current of A into L1. - */ - *Mptr( PANEL->A, II, JJ, lda ) = gmax; - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, Wr0 += HPL_LOCSWP_DEPTH ) - { - *L = Wmx[ 0]; L+=n0; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *L = Wmx[ 1]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *L = Wmx[ 2]; L+=n0; *L = Wmx[ 3]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *L = Wmx[ 4]; L+=n0; *L = Wmx[ 5]; L+=n0; - *L = Wmx[ 6]; L+=n0; *L = Wmx[ 7]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *L = Wmx[ 8]; L+=n0; *L = Wmx[ 9]; L+=n0; - *L = Wmx[10]; L+=n0; *L = Wmx[11]; L+=n0; - *L = Wmx[12]; L+=n0; *L = Wmx[13]; L+=n0; - *L = Wmx[14]; L+=n0; *L = Wmx[15]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *L = Wmx[16]; L+=n0; *L = Wmx[17]; L+=n0; - *L = Wmx[18]; L+=n0; *L = Wmx[19]; L+=n0; - *L = Wmx[20]; L+=n0; *L = Wmx[21]; L+=n0; - *L = Wmx[22]; L+=n0; *L = Wmx[23]; L+=n0; - *L = Wmx[24]; L+=n0; *L = Wmx[25]; L+=n0; - *L = Wmx[26]; L+=n0; *L = Wmx[27]; L+=n0; - *L = Wmx[28]; L+=n0; *L = Wmx[29]; L+=n0; - *L = Wmx[30]; L+=n0; *L = Wmx[31]; L+=n0; -#endif - } - for( i = 0; i < nr; i++, L += n0 ) { *L = Wmx[i]; } - } - } - else - { -/* - * otherwise, the row to be swapped with the current row of A is in Wmx, - * so copy Wmx into L1 and A. - */ - A1 = Mptr( PANEL->A, II, 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH ) - { - *L = *A1 = Wmx[ 0]; L += n0; A1 += lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *L = *A1 = Wmx[ 1]; L += n0; A1 += lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *L = *A1 = Wmx[ 2]; L += n0; A1 += lda; - *L = *A1 = Wmx[ 3]; L += n0; A1 += lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *L = *A1 = Wmx[ 4]; L += n0; A1 += lda; - *L = *A1 = Wmx[ 5]; L += n0; A1 += lda; - *L = *A1 = Wmx[ 6]; L += n0; A1 += lda; - *L = *A1 = Wmx[ 7]; L += n0; A1 += lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *L = *A1 = Wmx[ 8]; L += n0; A1 += lda; - *L = *A1 = Wmx[ 9]; L += n0; A1 += lda; - *L = *A1 = Wmx[10]; L += n0; A1 += lda; - *L = *A1 = Wmx[11]; L += n0; A1 += lda; - *L = *A1 = Wmx[12]; L += n0; A1 += lda; - *L = *A1 = Wmx[13]; L += n0; A1 += lda; - *L = *A1 = Wmx[14]; L += n0; A1 += lda; - *L = *A1 = Wmx[15]; L += n0; A1 += lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *L = *A1 = Wmx[16]; L += n0; A1 += lda; - *L = *A1 = Wmx[17]; L += n0; A1 += lda; - *L = *A1 = Wmx[18]; L += n0; A1 += lda; - *L = *A1 = Wmx[19]; L += n0; A1 += lda; - *L = *A1 = Wmx[20]; L += n0; A1 += lda; - *L = *A1 = Wmx[21]; L += n0; A1 += lda; - *L = *A1 = Wmx[22]; L += n0; A1 += lda; - *L = *A1 = Wmx[23]; L += n0; A1 += lda; - *L = *A1 = Wmx[24]; L += n0; A1 += lda; - *L = *A1 = Wmx[25]; L += n0; A1 += lda; - *L = *A1 = Wmx[26]; L += n0; A1 += lda; - *L = *A1 = Wmx[27]; L += n0; A1 += lda; - *L = *A1 = Wmx[28]; L += n0; A1 += lda; - *L = *A1 = Wmx[29]; L += n0; A1 += lda; - *L = *A1 = Wmx[30]; L += n0; A1 += lda; - *L = *A1 = Wmx[31]; L += n0; A1 += lda; -#endif - } - - for( i = 0; i < nr; i++, L += n0, A1 += lda ) - { *L = *A1 = Wmx[i]; } - } - } - else - { -/* - * otherwise I do not own the current row of A, so copy the max row Wmx - * into L1. - */ - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH ) - { - *L = Wmx[ 0]; L+=n0; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *L = Wmx[ 1]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *L = Wmx[ 2]; L+=n0; *L = Wmx[ 3]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *L = Wmx[ 4]; L+=n0; *L = Wmx[ 5]; L+=n0; - *L = Wmx[ 6]; L+=n0; *L = Wmx[ 7]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *L = Wmx[ 8]; L+=n0; *L = Wmx[ 9]; L+=n0; - *L = Wmx[10]; L+=n0; *L = Wmx[11]; L+=n0; - *L = Wmx[12]; L+=n0; *L = Wmx[13]; L+=n0; - *L = Wmx[14]; L+=n0; *L = Wmx[15]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *L = Wmx[16]; L+=n0; *L = Wmx[17]; L+=n0; - *L = Wmx[18]; L+=n0; *L = Wmx[19]; L+=n0; - *L = Wmx[20]; L+=n0; *L = Wmx[21]; L+=n0; - *L = Wmx[22]; L+=n0; *L = Wmx[23]; L+=n0; - *L = Wmx[24]; L+=n0; *L = Wmx[25]; L+=n0; - *L = Wmx[26]; L+=n0; *L = Wmx[27]; L+=n0; - *L = Wmx[28]; L+=n0; *L = Wmx[29]; L+=n0; - *L = Wmx[30]; L+=n0; *L = Wmx[31]; L+=n0; -#endif - } - for( i = 0; i < nr; i++, L += n0 ) { *L = Wmx[i]; } -/* - * and if I own the max row, overwrite it with the current row Wr0. - */ - if( myrow == (int)(WORK[3]) ) - { - A2 = Mptr( PANEL->A, II + (size_t)(WORK[1]), 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wr0 += HPL_LOCSWP_DEPTH ) - { - *A2 = Wr0[ 0]; A2+=lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *A2 = Wr0[ 1]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *A2 = Wr0[ 2]; A2+=lda; *A2 = Wr0[ 3]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *A2 = Wr0[ 4]; A2+=lda; *A2 = Wr0[ 5]; A2+=lda; - *A2 = Wr0[ 6]; A2+=lda; *A2 = Wr0[ 7]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *A2 = Wr0[ 8]; A2+=lda; *A2 = Wr0[ 9]; A2+=lda; - *A2 = Wr0[10]; A2+=lda; *A2 = Wr0[11]; A2+=lda; - *A2 = Wr0[12]; A2+=lda; *A2 = Wr0[13]; A2+=lda; - *A2 = Wr0[14]; A2+=lda; *A2 = Wr0[15]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *A2 = Wr0[16]; A2+=lda; *A2 = Wr0[17]; A2+=lda; - *A2 = Wr0[18]; A2+=lda; *A2 = Wr0[19]; A2+=lda; - *A2 = Wr0[20]; A2+=lda; *A2 = Wr0[21]; A2+=lda; - *A2 = Wr0[22]; A2+=lda; *A2 = Wr0[23]; A2+=lda; - *A2 = Wr0[24]; A2+=lda; *A2 = Wr0[25]; A2+=lda; - *A2 = Wr0[26]; A2+=lda; *A2 = Wr0[27]; A2+=lda; - *A2 = Wr0[28]; A2+=lda; *A2 = Wr0[29]; A2+=lda; - *A2 = Wr0[30]; A2+=lda; *A2 = Wr0[31]; A2+=lda; -#endif - } - - for( i = 0; i < nr; i++, A2 += lda ) { *A2 = Wr0[i]; } - } - } - } - else - { -/* - * Otherwise the max element in the current column is zero, simply copy - * the current row Wr0 into L1. The matrix is singular. - */ - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wr0 += HPL_LOCSWP_DEPTH ) - { - *L = Wr0[ 0]; L+=n0; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *L = Wr0[ 1]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *L = Wr0[ 2]; L+=n0; *L = Wr0[ 3]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *L = Wr0[ 4]; L+=n0; *L = Wr0[ 5]; L+=n0; - *L = Wr0[ 6]; L+=n0; *L = Wr0[ 7]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *L = Wr0[ 8]; L+=n0; *L = Wr0[ 9]; L+=n0; - *L = Wr0[10]; L+=n0; *L = Wr0[11]; L+=n0; - *L = Wr0[12]; L+=n0; *L = Wr0[13]; L+=n0; - *L = Wr0[14]; L+=n0; *L = Wr0[15]; L+=n0; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *L = Wr0[16]; L+=n0; *L = Wr0[17]; L+=n0; - *L = Wr0[18]; L+=n0; *L = Wr0[19]; L+=n0; - *L = Wr0[20]; L+=n0; *L = Wr0[21]; L+=n0; - *L = Wr0[22]; L+=n0; *L = Wr0[23]; L+=n0; - *L = Wr0[24]; L+=n0; *L = Wr0[25]; L+=n0; - *L = Wr0[26]; L+=n0; *L = Wr0[27]; L+=n0; - *L = Wr0[28]; L+=n0; *L = Wr0[29]; L+=n0; - *L = Wr0[30]; L+=n0; *L = Wr0[31]; L+=n0; -#endif - } - - for( i = 0; i < nr; i++, L += n0 ) { *L = Wr0[i]; } -/* - * set INFO. - */ - if( *(PANEL->DINFO) == 0.0 ) - *(PANEL->DINFO) = (double)(PANEL->ia + JJ + 1); - } -/* - * End of HPL_dlocswpN - */ -} diff --git a/hpl/src/pfact/HPL_dlocswpT.c b/hpl/src/pfact/HPL_dlocswpT.c deleted file mode 100644 index 62fb7f956b38115b02f5d65c6879bcba5344dc81..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_dlocswpT.c +++ /dev/null @@ -1,406 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * Define default value for unrolling factor - */ -#ifndef HPL_LOCSWP_DEPTH -#define HPL_LOCSWP_DEPTH 32 -#define HPL_LOCSWP_LOG2_DEPTH 5 -#endif - -#ifdef STDC_HEADERS -void HPL_dlocswpT -( - HPL_T_panel * PANEL, - const int II, - const int JJ, - double * WORK -) -#else -void HPL_dlocswpT -( PANEL, II, JJ, WORK ) - HPL_T_panel * PANEL; - const int II; - const int JJ; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dlocswpT performs the local swapping operations within a panel. - * The lower triangular N0-by-N0 upper block of the panel is stored in - * transpose form. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * II (local input) const int - * On entry, II specifies the row offset where the column to be - * operated on starts with respect to the panel. - * - * JJ (local input) const int - * On entry, JJ specifies the column offset where the column to - * be operated on starts with respect to the panel. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2 * (4+2*N0). - * WORK[0] contains the local maximum absolute value scalar, - * WORK[1] contains the corresponding local row index, WORK[2] - * contains the corresponding global row index, and WORK[3] is - * the coordinate of process owning this max. The N0 length max - * row is stored in WORK[4:4+N0-1]; Note that this is also the - * JJth row (or column) of L1. The remaining part of this array - * is used as workspace. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double gmax; - double * A1, * A2, * L, * Wr0, * Wmx; - int ilindx, lda, myrow, n0, nr, nu; - register int i; -/* .. - * .. Executable Statements .. - */ - myrow = PANEL->grid->myrow; n0 = PANEL->jb; lda = PANEL->lda; - - Wr0 = ( Wmx = WORK + 4 ) + n0; Wmx[JJ] = gmax = WORK[0]; - nu = (int)( ( (unsigned int)(n0) >> HPL_LOCSWP_LOG2_DEPTH ) - << HPL_LOCSWP_LOG2_DEPTH ); - nr = n0 - nu; -/* - * Replicated swap and copy of the current (new) row of A into L1 - */ - L = Mptr( PANEL->L1, 0, JJ, n0 ); -/* - * If the pivot is non-zero ... - */ - if( gmax != HPL_rzero ) - { -/* - * and if I own the current row of A ... - */ - if( myrow == PANEL->prow ) - { -/* - * and if I also own the row to be swapped with the current row of A ... - */ - if( myrow == (int)(WORK[3]) ) - { -/* - * and if the current row of A is not to swapped with itself ... - */ - if( ( ilindx = (int)(WORK[1]) ) != 0 ) - { -/* - * then copy the max row into L1 and locally swap the 2 rows of A. - */ - A1 = Mptr( PANEL->A, II, 0, lda ); - A2 = Mptr( A1, ilindx, 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, Wr0 += HPL_LOCSWP_DEPTH, - L += HPL_LOCSWP_DEPTH ) - { - L[ 0]=*A1=Wmx[ 0]; *A2=Wr0[ 0]; A1+=lda; A2+=lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - L[ 1]=*A1=Wmx[ 1]; *A2=Wr0[ 1]; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - L[ 2]=*A1=Wmx[ 2]; *A2=Wr0[ 2]; A1+=lda; A2+=lda; - L[ 3]=*A1=Wmx[ 3]; *A2=Wr0[ 3]; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - L[ 4]=*A1=Wmx[ 4]; *A2=Wr0[ 4]; A1+=lda; A2+=lda; - L[ 5]=*A1=Wmx[ 5]; *A2=Wr0[ 5]; A1+=lda; A2+=lda; - L[ 6]=*A1=Wmx[ 6]; *A2=Wr0[ 6]; A1+=lda; A2+=lda; - L[ 7]=*A1=Wmx[ 7]; *A2=Wr0[ 7]; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - L[ 8]=*A1=Wmx[ 8]; *A2=Wr0[ 8]; A1+=lda; A2+=lda; - L[ 9]=*A1=Wmx[ 9]; *A2=Wr0[ 9]; A1+=lda; A2+=lda; - L[10]=*A1=Wmx[10]; *A2=Wr0[10]; A1+=lda; A2+=lda; - L[11]=*A1=Wmx[11]; *A2=Wr0[11]; A1+=lda; A2+=lda; - L[12]=*A1=Wmx[12]; *A2=Wr0[12]; A1+=lda; A2+=lda; - L[13]=*A1=Wmx[13]; *A2=Wr0[13]; A1+=lda; A2+=lda; - L[14]=*A1=Wmx[14]; *A2=Wr0[14]; A1+=lda; A2+=lda; - L[15]=*A1=Wmx[15]; *A2=Wr0[15]; A1+=lda; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - L[16]=*A1=Wmx[16]; *A2=Wr0[16]; A1+=lda; A2+=lda; - L[17]=*A1=Wmx[17]; *A2=Wr0[17]; A1+=lda; A2+=lda; - L[18]=*A1=Wmx[18]; *A2=Wr0[18]; A1+=lda; A2+=lda; - L[19]=*A1=Wmx[19]; *A2=Wr0[19]; A1+=lda; A2+=lda; - L[20]=*A1=Wmx[20]; *A2=Wr0[20]; A1+=lda; A2+=lda; - L[21]=*A1=Wmx[21]; *A2=Wr0[21]; A1+=lda; A2+=lda; - L[22]=*A1=Wmx[22]; *A2=Wr0[22]; A1+=lda; A2+=lda; - L[23]=*A1=Wmx[23]; *A2=Wr0[23]; A1+=lda; A2+=lda; - L[24]=*A1=Wmx[24]; *A2=Wr0[24]; A1+=lda; A2+=lda; - L[25]=*A1=Wmx[25]; *A2=Wr0[25]; A1+=lda; A2+=lda; - L[26]=*A1=Wmx[26]; *A2=Wr0[26]; A1+=lda; A2+=lda; - L[27]=*A1=Wmx[27]; *A2=Wr0[27]; A1+=lda; A2+=lda; - L[28]=*A1=Wmx[28]; *A2=Wr0[28]; A1+=lda; A2+=lda; - L[29]=*A1=Wmx[29]; *A2=Wr0[29]; A1+=lda; A2+=lda; - L[30]=*A1=Wmx[30]; *A2=Wr0[30]; A1+=lda; A2+=lda; - L[31]=*A1=Wmx[31]; *A2=Wr0[31]; A1+=lda; A2+=lda; -#endif - } - - for( i = 0; i < nr; i++, A1 += lda, A2 += lda ) - { L[i] = *A1 = Wmx[i]; *A2 = Wr0[i]; } - } - else - { -/* - * otherwise the current row of A is swapped with itself, so just copy - * the current of A into L1. - */ - *Mptr( PANEL->A, II, JJ, lda ) = gmax; - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, L += HPL_LOCSWP_DEPTH ) - { - L[ 0]=Wmx[ 0]; -#if ( HPL_LOCSWP_DEPTH > 1 ) - L[ 1]=Wmx[ 1]; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - L[ 2]=Wmx[ 2]; L[ 3]=Wmx[ 3]; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - L[ 4]=Wmx[ 4]; L[ 5]=Wmx[ 5]; - L[ 6]=Wmx[ 6]; L[ 7]=Wmx[ 7]; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - L[ 8]=Wmx[ 8]; L[12]=Wmx[12]; - L[ 9]=Wmx[ 9]; L[13]=Wmx[13]; - L[10]=Wmx[10]; L[14]=Wmx[14]; - L[11]=Wmx[11]; L[15]=Wmx[15]; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - L[16]=Wmx[16]; L[20]=Wmx[20]; - L[17]=Wmx[17]; L[21]=Wmx[21]; - L[18]=Wmx[18]; L[22]=Wmx[22]; - L[19]=Wmx[19]; L[23]=Wmx[23]; - L[24]=Wmx[24]; L[28]=Wmx[28]; - L[25]=Wmx[25]; L[29]=Wmx[29]; - L[26]=Wmx[26]; L[30]=Wmx[30]; - L[27]=Wmx[27]; L[31]=Wmx[31]; -#endif - } - for( i = 0; i < nr; i++ ) { L[i] = Wmx[i]; } - } - } - else - { -/* - * otherwise, the row to be swapped with the current row of A is in Wmx, - * so copy Wmx into L1 and A. - */ - A1 = Mptr( PANEL->A, II, 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, L += HPL_LOCSWP_DEPTH ) - { - L[ 0]=*A1=Wmx[ 0]; A1+=lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - L[ 1]=*A1=Wmx[ 1]; A1+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - L[ 2]=*A1=Wmx[ 2]; A1+=lda; L[ 3]=*A1=Wmx[ 3]; A1+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - L[ 4]=*A1=Wmx[ 4]; A1+=lda; L[ 5]=*A1=Wmx[ 5]; A1+=lda; - L[ 6]=*A1=Wmx[ 6]; A1+=lda; L[ 7]=*A1=Wmx[ 7]; A1+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - L[ 8]=*A1=Wmx[ 8]; A1+=lda; L[ 9]=*A1=Wmx[ 9]; A1+=lda; - L[10]=*A1=Wmx[10]; A1+=lda; L[11]=*A1=Wmx[11]; A1+=lda; - L[12]=*A1=Wmx[12]; A1+=lda; L[13]=*A1=Wmx[13]; A1+=lda; - L[14]=*A1=Wmx[14]; A1+=lda; L[15]=*A1=Wmx[15]; A1+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - L[16]=*A1=Wmx[16]; A1+=lda; L[17]=*A1=Wmx[17]; A1+=lda; - L[18]=*A1=Wmx[18]; A1+=lda; L[19]=*A1=Wmx[19]; A1+=lda; - L[20]=*A1=Wmx[20]; A1+=lda; L[21]=*A1=Wmx[21]; A1+=lda; - L[22]=*A1=Wmx[22]; A1+=lda; L[23]=*A1=Wmx[23]; A1+=lda; - L[24]=*A1=Wmx[24]; A1+=lda; L[25]=*A1=Wmx[25]; A1+=lda; - L[26]=*A1=Wmx[26]; A1+=lda; L[27]=*A1=Wmx[27]; A1+=lda; - L[28]=*A1=Wmx[28]; A1+=lda; L[29]=*A1=Wmx[29]; A1+=lda; - L[30]=*A1=Wmx[30]; A1+=lda; L[31]=*A1=Wmx[31]; A1+=lda; -#endif - } - - for( i = 0; i < nr; i++, A1 += lda ) { L[i]=*A1=Wmx[i]; } - } - } - else - { -/* - * otherwise I do not own the current row of A, so copy the max row Wmx - * into L1. - */ - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wmx += HPL_LOCSWP_DEPTH, L += HPL_LOCSWP_DEPTH ) - { - L[ 0]=Wmx[ 0]; -#if ( HPL_LOCSWP_DEPTH > 1 ) - L[ 1]=Wmx[ 1]; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - L[ 2]=Wmx[ 2]; L[ 3]=Wmx[ 3]; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - L[ 4]=Wmx[ 4]; L[ 5]=Wmx[ 5]; L[ 6]=Wmx[ 6]; L[ 7]=Wmx[ 7]; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - L[ 8]=Wmx[ 8]; L[ 9]=Wmx[ 9]; L[10]=Wmx[10]; L[11]=Wmx[11]; - L[12]=Wmx[12]; L[13]=Wmx[13]; L[14]=Wmx[14]; L[15]=Wmx[15]; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - L[16]=Wmx[16]; L[17]=Wmx[17]; L[18]=Wmx[18]; L[19]=Wmx[19]; - L[20]=Wmx[20]; L[21]=Wmx[21]; L[22]=Wmx[22]; L[23]=Wmx[23]; - L[24]=Wmx[24]; L[25]=Wmx[25]; L[26]=Wmx[26]; L[27]=Wmx[27]; - L[28]=Wmx[28]; L[29]=Wmx[29]; L[30]=Wmx[30]; L[31]=Wmx[31]; -#endif - } - for( i = 0; i < nr; i++ ) { L[i] = Wmx[i]; } -/* - * and if I own the max row, overwrite it with the current row Wr0. - */ - if( myrow == (int)(WORK[3]) ) - { - A2 = Mptr( PANEL->A, II + (size_t)(WORK[1]), 0, lda ); - - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wr0 += HPL_LOCSWP_DEPTH ) - { - *A2 = Wr0[ 0]; A2+=lda; -#if ( HPL_LOCSWP_DEPTH > 1 ) - *A2 = Wr0[ 1]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - *A2 = Wr0[ 2]; A2+=lda; *A2 = Wr0[ 3]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - *A2 = Wr0[ 4]; A2+=lda; *A2 = Wr0[ 5]; A2+=lda; - *A2 = Wr0[ 6]; A2+=lda; *A2 = Wr0[ 7]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - *A2 = Wr0[ 8]; A2+=lda; *A2 = Wr0[ 9]; A2+=lda; - *A2 = Wr0[10]; A2+=lda; *A2 = Wr0[11]; A2+=lda; - *A2 = Wr0[12]; A2+=lda; *A2 = Wr0[13]; A2+=lda; - *A2 = Wr0[14]; A2+=lda; *A2 = Wr0[15]; A2+=lda; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - *A2 = Wr0[16]; A2+=lda; *A2 = Wr0[17]; A2+=lda; - *A2 = Wr0[18]; A2+=lda; *A2 = Wr0[19]; A2+=lda; - *A2 = Wr0[20]; A2+=lda; *A2 = Wr0[21]; A2+=lda; - *A2 = Wr0[22]; A2+=lda; *A2 = Wr0[23]; A2+=lda; - *A2 = Wr0[24]; A2+=lda; *A2 = Wr0[25]; A2+=lda; - *A2 = Wr0[26]; A2+=lda; *A2 = Wr0[27]; A2+=lda; - *A2 = Wr0[28]; A2+=lda; *A2 = Wr0[29]; A2+=lda; - *A2 = Wr0[30]; A2+=lda; *A2 = Wr0[31]; A2+=lda; -#endif - } - for( i = 0; i < nr; i++, A2 += lda ) { *A2 = Wr0[i]; } - } - } - } - else - { -/* - * Otherwise the max element in the current column is zero, simply copy - * the current row Wr0 into L1. The matrix is singular. - */ - for( i = 0; i < nu; i += HPL_LOCSWP_DEPTH, - Wr0 += HPL_LOCSWP_DEPTH, L += HPL_LOCSWP_DEPTH ) - { - L[ 0]=Wr0[ 0]; -#if ( HPL_LOCSWP_DEPTH > 1 ) - L[ 1]=Wr0[ 1]; -#endif -#if ( HPL_LOCSWP_DEPTH > 2 ) - L[ 2]=Wr0[ 2]; L[ 3]=Wr0[ 3]; -#endif -#if ( HPL_LOCSWP_DEPTH > 4 ) - L[ 4]=Wr0[ 4]; L[ 5]=Wr0[ 5]; L[ 6]=Wr0[ 6]; L[ 7]=Wr0[ 7]; -#endif -#if ( HPL_LOCSWP_DEPTH > 8 ) - L[ 8]=Wr0[ 8]; L[12]=Wr0[12]; L[ 9]=Wr0[ 9]; L[13]=Wr0[13]; - L[10]=Wr0[10]; L[14]=Wr0[14]; L[11]=Wr0[11]; L[15]=Wr0[15]; -#endif -#if ( HPL_LOCSWP_DEPTH > 16 ) - L[16]=Wr0[16]; L[20]=Wr0[20]; L[17]=Wr0[17]; L[21]=Wr0[21]; - L[18]=Wr0[18]; L[22]=Wr0[22]; L[19]=Wr0[19]; L[23]=Wr0[23]; - L[24]=Wr0[24]; L[28]=Wr0[28]; L[25]=Wr0[25]; L[29]=Wr0[29]; - L[26]=Wr0[26]; L[30]=Wr0[30]; L[27]=Wr0[27]; L[31]=Wr0[31]; -#endif - } - for( i = 0; i < nr; i++ ) { L[i] = Wr0[i]; } -/* - * Set INFO. - */ - if( *(PANEL->DINFO) == 0.0 ) - *(PANEL->DINFO) = (double)(PANEL->ia + JJ + 1); - } -/* - * End of HPL_dlocswpT - */ -} diff --git a/hpl/src/pfact/HPL_pdfact.c b/hpl/src/pfact/HPL_pdfact.c deleted file mode 100644 index 135013b18d34bc81e103e173b9c184b7fca63479..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdfact.c +++ /dev/null @@ -1,141 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdfact -( - HPL_T_panel * PANEL -) -#else -void HPL_pdfact -( PANEL ) - HPL_T_panel * PANEL; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdfact recursively factorizes a 1-dimensional panel of columns. - * The RPFACT function pointer specifies the recursive algorithm to be - * used, either Crout, Left- or Right looking. NBMIN allows to vary the - * recursive stopping criterium in terms of the number of columns in the - * panel, and NDIV allows to specify the number of subpanels each panel - * should be divided into. Usuallly a value of 2 will be chosen. Finally - * PFACT is a function pointer specifying the non-recursive algorithm to - * to be used on at most NBMIN columns. One can also choose here between - * Crout, Left- or Right looking. Empirical tests seem to indicate that - * values of 4 or 8 for NBMIN give the best results. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - void * vptr = NULL; - int align, jb; -/* .. - * .. Executable Statements .. - */ - jb = PANEL->jb; PANEL->n -= jb; PANEL->ja += jb; - - if( ( PANEL->grid->mycol != PANEL->pcol ) || ( jb <= 0 ) ) return; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_RPFACT ); -#endif - align = PANEL->algo->align; - vptr = (void *)malloc( ( (size_t)(align) + - (size_t)(((4+((unsigned int)(jb) << 1)) << 1) )) * - sizeof(double) ); - if( vptr == NULL ) - { HPL_pabort( __LINE__, "HPL_pdfact", "Memory allocation failed" ); } -/* - * Factor the panel - Update the panel pointers - */ - PANEL->algo->rffun( PANEL, PANEL->mp, jb, 0, (double *)HPL_PTR( vptr, - ((size_t)(align) * sizeof(double) ) ) ); - if( vptr ) free( vptr ); - - PANEL->A = Mptr( PANEL->A, 0, jb, PANEL->lda ); - PANEL->nq -= jb; PANEL->jj += jb; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_RPFACT ); -#endif -/* - * End of HPL_pdfact - */ -} diff --git a/hpl/src/pfact/HPL_pdmxswp.c b/hpl/src/pfact/HPL_pdmxswp.c deleted file mode 100644 index 3d2aae6dc336162fb523b06f0cab4cf0124d31bd..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdmxswp.c +++ /dev/null @@ -1,311 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdmxswp -( - HPL_T_panel * PANEL, - const int M, - const int II, - const int JJ, - double * WORK -) -#else -void HPL_pdmxswp -( PANEL, M, II, JJ, WORK ) - HPL_T_panel * PANEL; - const int M; - const int II; - const int JJ; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdmxswp swaps and broadcasts the absolute value max row using - * bi-directional exchange. The buffer is partially set by HPL_dlocmax. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by - * - * log_2( P ) * ( lat + ( 2 * N0 + 4 ) / bdwth ) - * - * where lat and bdwth are the latency and bandwidth of the network for - * double precision real elements. Communication only occurs in one - * process column. Mono-directional links will cause the communication - * cost to double. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of the matrix - * column on which this function operates. - * - * II (local input) const int - * On entry, II specifies the row offset where the column to be - * operated on starts with respect to the panel. - * - * JJ (local input) const int - * On entry, JJ specifies the column offset where the column to - * be operated on starts with respect to the panel. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2 * (4+2*N0). - * It is assumed that HPL_dlocmax was called prior to this - * routine to initialize the first four entries of this array. - * On exit, the N0 length max row is stored in WORK[4:4+N0-1]; - * Note that this is also the JJth row (or column) of L1. The - * remaining part is used as a temporary array. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double gmax, tmp1; - double * A0, * Wmx, * Wwork; - HPL_T_grid * grid; - MPI_Comm comm; - unsigned int hdim, ip2, ip2_, ipow, k, mask; - int Np2, cnt_, cnt0, i, icurrow, lda, mydist, - mydis_, myrow, n0, nprow, partner, rcnt, - root, scnt, size_; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_MXSWP ); -#endif - grid = PANEL->grid; myrow = grid->myrow; nprow = grid->nprow; -/* - * ip2 : the smallest power of two less than or equal to nprow; - * hdim : dimension of the hypercube made of those ip2 processes; - * Np2 : logical flag indicating whether or not nprow is a power of 2; - */ - comm = grid->col_comm; ip2 = (unsigned int)(grid->row_ip2); - hdim = (unsigned int)(grid->row_hdim); n0 = PANEL->jb; - icurrow = PANEL->prow; Np2 = (int)( ( size_ = nprow - ip2 ) != 0 ); - mydist = MModSub( myrow, icurrow, nprow ); -/* - * Set up pointers in workspace: WORK and Wwork point to the beginning - * of the buffers of size 4 + 2*N0 to be combined. Wmx points to the row - * owning the local (before combine) and global (after combine) absolute - * value max. A0 points to the copy of the current row of the matrix. - */ - cnt0 = ( cnt_ = n0 + 4 ) + n0; A0 = ( Wmx = WORK + 4 ) + n0; - Wwork = WORK + cnt0; -/* - * Wmx[0:N0-1] := A[ilindx,0:N0-1] where ilindx is (int)(WORK[1]) (row - * with max in current column). If I am the current process row, pack in - * addition the current row of A in A0[0:N0-1]. If I do not own any row - * of A, then zero out Wmx[0:N0-1]. - */ - if( M > 0 ) - { - lda = PANEL->lda; - HPL_dcopy( n0, Mptr( PANEL->A, II+(int)(WORK[1]), 0, lda ), lda, - Wmx, 1 ); - if( myrow == icurrow ) - { HPL_dcopy( n0, Mptr( PANEL->A, II, 0, lda ), lda, A0, 1 ); } - } - else { for( i = 0; i < n0; i++ ) Wmx[i] = HPL_rzero; } -/* - * Combine the results (bi-directional exchange): the process coordina- - * tes are relative to icurrow, this allows to reduce the communication - * volume when nprow is not a power of 2. - * - * When nprow is not a power of 2: proc[i-ip2] receives local data from - * proc[i] for all i in [ip2..nprow). In addition, proc[0] (icurrow) - * sends to proc[ip2] the current row of A for later broadcast in procs - * [ip2..nprow). - */ - if( ( Np2 != 0 ) && - ( ( partner = (int)((unsigned int)(mydist) ^ ip2 ) ) < nprow ) ) - { - if( ( mydist & ip2 ) != 0 ) - { - if( mydist == (int)(ip2) ) - (void) HPL_sdrv( WORK, cnt_, MSGID_BEGIN_PFACT, A0, n0, - MSGID_BEGIN_PFACT, MModAdd( partner, - icurrow, nprow ), comm ); - else - (void) HPL_send( WORK, cnt_, MModAdd( partner, icurrow, - nprow ), MSGID_BEGIN_PFACT, comm ); - } - else - { - if( mydist == 0 ) - (void) HPL_sdrv( A0, n0, MSGID_BEGIN_PFACT, Wwork, cnt_, - MSGID_BEGIN_PFACT, MModAdd( partner, - icurrow, nprow ), comm ); - else - (void) HPL_recv( Wwork, cnt_, MModAdd( partner, icurrow, - nprow ), MSGID_BEGIN_PFACT, comm ); - - tmp1 = Mabs( Wwork[0] ); gmax = Mabs( WORK[0] ); - if( ( tmp1 > gmax ) || - ( ( tmp1 == gmax ) && ( Wwork[3] < WORK[3] ) ) ) - { HPL_dcopy( cnt_, Wwork, 1, WORK, 1 ); } - } - } - - if( mydist < (int)(ip2) ) - { -/* - * power of 2 part of the processes collection: processes [0..ip2) are - * combining (binary exchange); proc[0] has two rows to send, but one to - * receive. At every step k in [0..hdim) of the algorithm, a process - * pair exchanging 2 rows is such that myrow >> k+1 is 0. Among those - * processes the ones that are sending one more row than what they are - * receiving are such that myrow >> k is equal to 0. - */ - k = 0; ipow = 1; - - while( k < hdim ) - { - if( ( (unsigned int)(mydist) >> ( k + 1 ) ) == 0 ) - { - if( ( (unsigned int)(mydist) >> k ) == 0 ) - { scnt = cnt0; rcnt = cnt_; } - else - { scnt = cnt_; rcnt = cnt0; } - } - else { scnt = rcnt = cnt_; } - - partner = (int)( (unsigned int)(mydist) ^ ipow ); - (void) HPL_sdrv( WORK, scnt, MSGID_BEGIN_PFACT, Wwork, rcnt, - MSGID_BEGIN_PFACT, MModAdd( partner, icurrow, - nprow ), comm ); - - tmp1 = Mabs( Wwork[0] ); gmax = Mabs( WORK[0] ); - if( ( tmp1 > gmax ) || - ( ( tmp1 == gmax ) && ( Wwork[3] < WORK[3] ) ) ) - { - HPL_dcopy( ( rcnt == cnt0 ? cnt0 : cnt_ ), Wwork, 1, - WORK, 1 ); - } - else if( rcnt == cnt0 ) - { HPL_dcopy( n0, Wwork+cnt_, 1, A0, 1 ); } - - ipow <<= 1; k++; - } - } - else if( size_ > 1 ) - { -/* - * proc[ip2] broadcast current row of A to procs [ip2+1..nprow). - */ - k = (unsigned int)(size_) - 1; ip2_ = mask = 1; - while( k > 1 ) { k >>= 1; ip2_ <<= 1; mask <<= 1; mask++; } - - root = MModAdd( icurrow, (int)(ip2), nprow ); - mydis_ = MModSub( myrow, root, nprow ); - - do - { - mask ^= ip2_; - if( ( mydis_ & mask ) == 0 ) - { - partner = (int)(mydis_ ^ ip2_); - if( ( mydis_ & ip2_ ) != 0 ) - { - (void) HPL_recv( A0, n0, MModAdd( root, partner, - nprow ), MSGID_BEGIN_PFACT, comm ); - } - else if( partner < size_ ) - { - (void) HPL_send( A0, n0, MModAdd( root, partner, - nprow ), MSGID_BEGIN_PFACT, comm ); - } - } - ip2_ >>= 1; - } while( ip2_ > 0 ); - } -/* - * If nprow is not a power of 2, for all i in [ip2..nprow), proc[i-ip2] - * sends the pivot row to proc[i] along with the first four entries of - * the WORK array. - */ - if( ( Np2 != 0 ) && - ( ( partner = (int)((unsigned int)(mydist) ^ ip2 ) ) < nprow ) ) - { - if( ( mydist & ip2 ) != 0 ) - { - (void) HPL_recv( WORK, cnt_, MModAdd( partner, icurrow, - nprow ), MSGID_BEGIN_PFACT, comm ); - } - else - { - (void) HPL_send( WORK, cnt_, MModAdd( partner, icurrow, - nprow ), MSGID_BEGIN_PFACT, comm ); - } - } -/* - * Save the global pivot index in pivot array - */ - (PANEL->DPIV)[JJ] = WORK[2]; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_MXSWP ); -#endif -/* - * End of HPL_pdmxswp - */ -} diff --git a/hpl/src/pfact/HPL_pdpancrN.c b/hpl/src/pfact/HPL_pdpancrN.c deleted file mode 100644 index 3599991a6303e6a7ee80c5399f250c6da13e7a22..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpancrN.c +++ /dev/null @@ -1,270 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpancrN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpancrN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpancrN factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Crout variant of the usual - * one-dimensional algorithm. The lower triangular N0-by-N0 upper block - * of the panel is stored in no-transpose form (i.e. just like the input - * matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Yv1, * Xv0, * Xv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, kk=0, lda, - m=M, n0; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Xv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column - initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 > 0 ) - { -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); -/* - * Compute row (column) jj of L1 - */ - if( kk > 0 ) - { - L1ptr = Mptr( L1, jj, jj+1, n0 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Xv0, ICOFF, jj+1, kk, Nm1 ); - Xv1 = vsip_msubview_d( Xv0, jj, ICOFF, 1, kk ); - Yv1 = vsip_msubview_d( Xv0, jj, jj+1, 1, Nm1 ); - - vsip_gemp_d( -HPL_rone, Xv1, VSIP_MAT_NTRANS, Av1, VSIP_MAT_NTRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplTrans, kk, Nm1, -HPL_rone, - Mptr( L1, ICOFF, jj+1, n0 ), n0, Mptr( L1, jj, - ICOFF, n0 ), n0, HPL_rone, L1ptr, n0 ); -#endif - if( curr != 0 ) - HPL_dcopy( Nm1, L1ptr, n0, Mptr( A, ii, jj+1, lda ), lda ); - } -/* - * Scale current column by its absolute value max entry - Update dia- - * diagonal and subdiagonal elements in column A(iip1:iip1+Mm1-1, jj+1) - * and find local absolute value max in that column (Only one pass - * through cache for each current column). This sequence of operations - * could benefit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+ICOFF, Mm1, kk+1 ); - Xv1 = vsip_msubview_d( Xv0, ICOFF, jj+1, kk+1, 1 ); - Yv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+1, Mm1, 1 ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Xv1, VSIP_MAT_NTRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - vsip_mdestroy_d( Yv1 ); - vsip_mdestroy_d( Xv1 ); - vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplNoTrans, Mm1, kk+1, -HPL_rone, - Mptr( A, iip1, ICOFF, lda ), lda, Mptr( L1, ICOFF, - jj+1, n0 ), 1, HPL_rone, Mptr( A, iip1, jj+1, lda ), - 1 ); -#endif - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); - if( curr != 0 ) { ii = iip1; iip1++; m = Mm1; Mm1--; } - - Nm1--; jj++; kk++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); - -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Xv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Xv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpancrN - */ -} diff --git a/hpl/src/pfact/HPL_pdpancrT.c b/hpl/src/pfact/HPL_pdpancrT.c deleted file mode 100644 index 3064405490fdf10081b16db487ad9fcbe8eda189..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpancrT.c +++ /dev/null @@ -1,267 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpancrT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpancrT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpancrT factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Crout variant of the usual - * one-dimensional algorithm. The lower triangular N0-by-N0 upper block - * of the panel is stored in transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Yv1, * Xv0, * Xv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, kk=0, lda, - m=M, n0; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Xv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column - initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 > 0 ) - { -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); -/* - * Compute row (column) jj of L1 - */ - if( kk > 0 ) - { - L1ptr = Mptr( L1, jj+1, jj, n0 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Xv0, jj+1, ICOFF, Nm1, kk ); - Xv1 = vsip_msubview_d( Xv0, ICOFF, jj, kk, 1 ); - Yv1 = vsip_msubview_d( Xv0, jj+1, jj, Nm1, 1 ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Xv1, VSIP_MAT_NTRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplNoTrans, Nm1, kk, -HPL_rone, - Mptr( L1, jj+1, ICOFF, n0 ), n0, Mptr( L1, ICOFF, - jj, n0 ), 1, HPL_rone, L1ptr, 1 ); -#endif - if( curr != 0 ) - HPL_dcopy( Nm1, L1ptr, 1, Mptr( A, ii, jj+1, lda ), lda ); - } -/* - * Scale current column by its absolute value max entry - Update dia- - * diagonal and subdiagonal elements in column A(iip1:iip1+Mm1-1, jj+1) - * and find local absolute value max in that column (Only one pass - * through cache for each current column). This sequence of operations - * could benefit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+ICOFF, Mm1, kk+1 ); - Xv1 = vsip_msubview_d( Xv0, jj+1, ICOFF, 1, kk+1 ); - Yv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+1, Mm1, 1 ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Xv1, VSIP_MAT_TRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplNoTrans, Mm1, kk+1, -HPL_rone, - Mptr( A, iip1, ICOFF, lda ), lda, Mptr( L1, jj+1, ICOFF, - n0 ), n0, HPL_rone, Mptr( A, iip1, jj+1, lda ), 1 ); -#endif - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); - if( curr != 0 ) { ii = iip1; iip1++; m = Mm1; Mm1--; } - - Nm1--; jj++; kk++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Xv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Xv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpancrT - */ -} diff --git a/hpl/src/pfact/HPL_pdpanllN.c b/hpl/src/pfact/HPL_pdpanllN.c deleted file mode 100644 index 66d0a7f8156ebadd4acdd659aa0f9180fd2d2726..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpanllN.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpanllN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpanllN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanllN factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Left-looking variant of the - * usual one-dimensional algorithm. The lower triangular N0-by-N0 upper - * block of the panel is stored in no-transpose form (i.e. just like the - * input matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Yv1, * Xv0, * Xv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, kk, lda, - m=M, n0; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Xv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column and initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 > 0 ) - { -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); - - L1ptr = Mptr( L1, ICOFF, jj+1, n0 ); kk = jj + 1 - ICOFF; - HPL_dtrsv( HplColumnMajor, HplLower, HplNoTrans, HplUnit, kk, - Mptr( L1, ICOFF, ICOFF, n0 ), n0, L1ptr, 1 ); -/* - * Scale current column by its absolute value max entry - Update and - * find local absolute value max in next column (Only one pass through - * cache for each next column). This sequence of operations could bene- - * fit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+ICOFF, Mm1, kk ); - Xv1 = vsip_msubview_d( Xv0, ICOFF, jj+1, kk, 1 ); - Yv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+1, Mm1, 1 ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Xv1, VSIP_MAT_NTRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplNoTrans, Mm1, kk, -HPL_rone, - Mptr( A, iip1, ICOFF, lda ), lda, L1ptr, 1, - HPL_rone, Mptr( A, iip1, jj+1, lda ), 1 ); -#endif - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); - if( curr != 0 ) - { - HPL_dcopy( kk, L1ptr, 1, Mptr( A, ICOFF, jj+1, lda ), 1 ); - ii = iip1; iip1++; m = Mm1; Mm1--; - } - Nm1--; jj++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Xv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Xv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpanllN - */ -} diff --git a/hpl/src/pfact/HPL_pdpanllT.c b/hpl/src/pfact/HPL_pdpanllT.c deleted file mode 100644 index 14884976ec004dd9617d9d1815107c7156329c7b..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpanllT.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpanllT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpanllT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanllT factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Left-looking variant of the - * usual one-dimensional algorithm. The lower triangular N0-by-N0 upper - * block of the panel is stored in transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Yv1, * Xv0, * Xv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, kk, lda, - m=M, n0; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Xv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column and initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 > 0 ) - { -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); - - L1ptr = Mptr( L1, jj+1, ICOFF, n0 ); kk = jj + 1 - ICOFF; - HPL_dtrsv( HplColumnMajor, HplUpper, HplTrans, HplUnit, kk, - Mptr( L1, ICOFF, ICOFF, n0 ), n0, L1ptr, n0 ); -/* - * Scale current column by its absolute value max entry - Update and - * find local absolute value max in next column (Only one pass through - * cache for each next column). This sequence of operations could bene- - * fit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+ICOFF, Mm1, kk ); - Xv1 = vsip_msubview_d( Xv0, jj+1, ICOFF, 1, kk ); - Yv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+1, Mm1, 1 ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Xv1, VSIP_MAT_TRANS, - HPL_rone, Yv1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dgemv( HplColumnMajor, HplNoTrans, Mm1, kk, -HPL_rone, - Mptr( A, iip1, ICOFF, lda ), lda, L1ptr, n0, - HPL_rone, Mptr( A, iip1, jj+1, lda ), 1 ); -#endif - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); - if( curr != 0 ) - { - HPL_dcopy( kk, L1ptr, n0, Mptr( A, ICOFF, jj+1, lda ), 1 ); - ii = iip1; iip1++; m = Mm1; Mm1--; - } - Nm1--; jj++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); - -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Xv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Xv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpanllT - */ -} diff --git a/hpl/src/pfact/HPL_pdpanrlN.c b/hpl/src/pfact/HPL_pdpanrlN.c deleted file mode 100644 index 3701d5b4c3a02b12ab5397988d0737a320ede02a..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpanrlN.c +++ /dev/null @@ -1,250 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpanrlN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpanrlN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanrlN factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Right-looking variant of the - * usual one-dimensional algorithm. The lower triangular N0-by-N0 upper - * block of the panel is stored in no-transpose form (i.e. just like the - * input matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Acur, * Anxt; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Xv1, * Yv0, * Yv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, lda, m=M; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Yv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column - initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 >= 1 ) - { - Acur = Mptr( A, iip1, jj, lda ); Anxt = Mptr( Acur, 0, 1, lda ); -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); -/* - * Scale current column by its absolute value max entry - Update trai- - * ling sub-matrix and find local absolute value max in next column (On- - * ly one pass through cache for each current column). This sequence of - * operations could benefit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Acur, 1 ); - HPL_daxpy( Mm1, -WORK[4+jj+1], Acur, 1, Anxt, 1 ); - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); -#ifdef HPL_CALL_VSIPL - if( Nm1 > 1 ) - { -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+2, - Mm1, Nm1-1 ); - Xv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj, - Mm1, 1 ); - Yv1 = vsip_msubview_d( Yv0, jj, jj+2, 1, Nm1-1 ); - - vsip_gemp_d( -HPL_rone, Xv1, VSIP_MAT_NTRANS, Yv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); - } -#else - if( Nm1 > 1 ) - HPL_dger( HplColumnMajor, Mm1, Nm1-1, -HPL_rone, Acur, 1, - WORK+4+jj+2, 1, Mptr( Anxt, 0, 1, lda ), lda ); -#endif -/* - * Same thing as above but with worse data access on y (A += x * y^T) - * - * if( Nm1 > 1 ) ) - * HPL_dger( HplColumnMajor, Mm1, Nm1-1, -HPL_rone, Acur, 1, - * Mptr( L1, jj, jj+2, n0 ), n0, Mptr( Anxt, 0, 1, lda ), - * lda ); - */ - if( curr != 0 ) { ii = iip1; iip1++; m = Mm1; Mm1--; } - - Nm1--; jj++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpN( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Yv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Yv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpanrlN - */ -} diff --git a/hpl/src/pfact/HPL_pdpanrlT.c b/hpl/src/pfact/HPL_pdpanrlT.c deleted file mode 100644 index c6ee5d25894bb89b7be4c9a8ae27ccd418d98745..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdpanrlT.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdpanrlT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdpanrlT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdpanrlT factorizes a panel of columns that is a sub-array of a - * larger one-dimensional panel A using the Right-looking variant of the - * usual one-dimensional algorithm. The lower triangular N0-by-N0 upper - * block of the panel is stored in transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Note that one iteration of the the main loop is unrolled. The local - * computation of the absolute value max of the next column is performed - * just after its update by the current column. This allows to bring the - * current column only once through cache at each step. The current - * implementation does not perform any blocking for this sequence of - * BLAS operations, however the design allows for plugging in an optimal - * (machine-specific) specialized BLAS-like kernel. This idea has been - * suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Acur, * Anxt, * L1; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Xv1, * Yv0, * Yv1; -#endif - int Mm1, Nm1, curr, ii, iip1, jj, lda, m=M, - n0; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - Nm1 = N - 1; jj = ICOFF; - if( curr != 0 ) { ii = ICOFF; iip1 = ii+1; Mm1 = m-1; } - else { ii = 0; iip1 = ii; Mm1 = m; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Yv0 = vsip_mbind_d( PANEL->L1block, 0, 1, PANEL->jb, PANEL->jb, PANEL->jb ); -#endif -/* - * Find local absolute value max in first column - initialize WORK[0:3] - */ - HPL_dlocmax( PANEL, m, ii, jj, WORK ); - - while( Nm1 >= 1 ) - { - Acur = Mptr( A, iip1, jj, lda ); Anxt = Mptr( Acur, 0, 1, lda ); -/* - * Swap and broadcast the current row - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); -/* - * Scale current column by its absolute value max entry - Update trai- - * ling sub-matrix and find local absolute value max in next column (On- - * ly one pass through cache for each current column). This sequence of - * operations could benefit from a specialized blocked implementation. - */ - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Acur, 1 ); - HPL_daxpy( Mm1, -(*(Mptr( L1, jj+1, jj, n0 ))), Acur, 1, Anxt, 1 ); - HPL_dlocmax( PANEL, Mm1, iip1, jj+1, WORK ); - - if( Nm1 > 1 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj+2, - Mm1, Nm1-1 ); - Xv1 = vsip_msubview_d( Av0, PANEL->ii+iip1, PANEL->jj+jj, - Mm1, 1 ); - Yv1 = vsip_msubview_d( Yv0, jj+2, jj, Nm1-1, 1 ); - - vsip_gemp_d( -HPL_rone, Xv1, VSIP_MAT_NTRANS, Yv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Yv1 ); - (void) vsip_mdestroy_d( Xv1 ); - (void) vsip_mdestroy_d( Av1 ); -#else - HPL_dger( HplColumnMajor, Mm1, Nm1-1, -HPL_rone, Acur, 1, - Mptr( L1, jj+2, jj, n0 ), 1, Mptr( Anxt, 0, 1, lda ), - lda ); -#endif - } - if( curr != 0 ) { ii = iip1; iip1++; m = Mm1; Mm1--; } - - Nm1--; jj++; - } -/* - * Swap and broadcast last row - Scale last column by its absolute value - * max entry - */ - HPL_pdmxswp( PANEL, m, ii, jj, WORK ); - HPL_dlocswpT( PANEL, ii, jj, WORK ); - if( WORK[0] != HPL_rzero ) - HPL_dscal( Mm1, HPL_rone / WORK[0], Mptr( A, iip1, jj, lda ), 1 ); -#ifdef HPL_CALL_VSIPL -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Yv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Yv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PFACT ); -#endif -/* - * End of HPL_pdpanrlT - */ -} diff --git a/hpl/src/pfact/HPL_pdrpancrN.c b/hpl/src/pfact/HPL_pdrpancrN.c deleted file mode 100644 index 047e82128d0b61538dda5f88719c20439c92be8e..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpancrN.c +++ /dev/null @@ -1,282 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpancrN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpancrN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpancrN HPL_pdrpancrN recursively factorizes a panel of columns using the - * recursive Crout variant of the usual one-dimensional algorithm. The - * lower triangular N0-by-N0 upper block of the panel is stored in - * no-transpose form (i.e. just like the input matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Local update - Factor current panel - Replicated update and solve - */ -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ICOFF, - m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jb ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ICOFF, m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jb ); - } - Lv1 = vsip_msubview_d( Lv0, ICOFF, ioff, jj, jb ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_NTRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, m, jb, jj, - -HPL_rone, Mptr( Aptr, ii, 0, lda ), lda, Mptr( L1ptr, - 0, jj, n0 ), n0, HPL_rone, Mptr( Aptr, ii, jj, lda ), - lda ); -#endif - HPL_pdrpancrN( PANEL, m, jb, ioff, WORK ); - - if( n > 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Lv0, ioff, ICOFF, jb, jj ); - Av2 = vsip_msubview_d( Lv0, ioff, ioff+jb, jb, n ); - Lv1 = vsip_msubview_d( Lv0, ICOFF, ioff+jb, jj, n ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_NTRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, jb, n, - jj, -HPL_rone, Mptr( L1ptr, jj, 0, n0 ), n0, - Mptr( L1ptr, 0, jj+jb, n0 ), n0, HPL_rone, - Mptr( L1ptr, jj, jj+jb, n0 ), n0 ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, n, HPL_rone, Mptr( L1ptr, jj, jj, - n0 ), n0, Mptr( L1ptr, jj, jj+jb, n0 ), n0 ); - } -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlacpy( ioff, jb, Mptr( L1, 0, ioff, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - ii += jb; m -= jb; - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpancrN - */ -} diff --git a/hpl/src/pfact/HPL_pdrpancrT.c b/hpl/src/pfact/HPL_pdrpancrT.c deleted file mode 100644 index 6ef5fa42f264f01bd5a7e64173cfce543fae98cd..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpancrT.c +++ /dev/null @@ -1,282 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpancrT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpancrT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpancrT recursively factorizes a panel of columns using the - * recursive Crout variant of the usual one-dimensional algorithm. - * The lower triangular N0-by-N0 upper block of the panel is stored in - * transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Local update - Factor current panel - Replicated update and solve - */ -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ICOFF, - m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jb ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ICOFF, m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jb ); - } - Lv1 = vsip_msubview_d( Lv0, ioff, ICOFF, jb, jj ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, - VSIP_MAT_TRANS, HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, m, jb, jj, - -HPL_rone, Mptr( Aptr, ii, 0, lda ), lda, Mptr( L1ptr, - jj, 0, n0 ), n0, HPL_rone, Mptr( Aptr, ii, jj, lda ), - lda ); -#endif - HPL_pdrpancrT( PANEL, m, jb, ioff, WORK ); - - if( n > 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - Av1 = vsip_msubview_d( Lv0, ioff+jb, ICOFF, n, jj ); - Av2 = vsip_msubview_d( Lv0, ioff+jb, ioff, n, jb ); - Lv1 = vsip_msubview_d( Lv0, ICOFF, ioff, jj, jb ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, - VSIP_MAT_NTRANS, HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, n, jb, - jj, -HPL_rone, Mptr( L1ptr, jj+jb, 0, n0 ), n0, - Mptr( L1ptr, 0, jj, n0 ), n0, HPL_rone, - Mptr( L1ptr, jj+jb, jj, n0 ), n0 ); -#endif - HPL_dtrsm( HplColumnMajor, HplRight, HplUpper, HplNoTrans, - HplUnit, n, jb, HPL_rone, Mptr( L1ptr, jj, jj, - n0 ), n0, Mptr( L1ptr, jj+jb, jj, n0 ), n0 ); - } -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlatcpy( ioff, jb, Mptr( L1, ioff, 0, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - ii += jb; m -= jb; - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpancrT - */ -} diff --git a/hpl/src/pfact/HPL_pdrpanllN.c b/hpl/src/pfact/HPL_pdrpanllN.c deleted file mode 100644 index 69adaf737ebbf6ca7cf34ca26c743ab903556407..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpanllN.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpanllN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpanllN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpanllN recursively factorizes a panel of columns using the - * recursive Left-looking variant of the one-dimensional algorithm. The - * lower triangular N0-by-N0 upper block of the panel is stored in - * no-transpose form (i.e. just like the input matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Replicated solve - Local update - Factor current panel - */ - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, HplUnit, - jj, jb, HPL_rone, L1ptr, n0, Mptr( L1ptr, 0, jj, n0 ), - n0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ICOFF, - m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jj ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ICOFF, m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jj ); - } - Lv1 = vsip_msubview_d( Lv0, ICOFF, ioff, jj, jb ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_NTRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, m, jb, - jj, -HPL_rone, Mptr( Aptr, ii, 0, lda ), lda, - Mptr( L1ptr, 0, jj, n0 ), n0, HPL_rone, - Mptr( Aptr, ii, jj, lda ), lda ); -#endif - HPL_pdrpanllN( PANEL, m, jb, ioff, WORK ); -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlacpy( ioff, jb, Mptr( L1, 0, ioff, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - ii += jb; m -= jb; - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpanllN - */ -} diff --git a/hpl/src/pfact/HPL_pdrpanllT.c b/hpl/src/pfact/HPL_pdrpanllT.c deleted file mode 100644 index 28e20132208521c50ec1078ed69ab6a477f4c6b6..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpanllT.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpanllT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpanllT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpanllT recursively factorizes a panel of columns using the - * recursive Left-looking variant of the one-dimensional algorithm. The - * lower triangular N0-by-N0 upper block of the panel is stored in - * transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Replicated solve - Local update - Factor current panel - */ - HPL_dtrsm( HplColumnMajor, HplRight, HplUpper, HplNoTrans, - HplUnit, jb, jj, HPL_rone, L1ptr, n0, Mptr( L1ptr, - jj, 0, n0 ), n0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ICOFF, - m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jj ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ICOFF, m, jj ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jj ); - } - Lv1 = vsip_msubview_d( Lv0, ioff, ICOFF, jb, jj ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_TRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Av2 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, m, jb, - jj, -HPL_rone, Mptr( Aptr, ii, 0, lda ), lda, - Mptr( L1ptr, jj, 0, n0 ), n0, HPL_rone, - Mptr( Aptr, ii, jj, lda ), lda ); -#endif - HPL_pdrpanllT( PANEL, m, jb, ioff, WORK ); -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlatcpy( ioff, jb, Mptr( L1, ioff, 0, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - ii += jb; m -= jb; - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpanllT - */ -} diff --git a/hpl/src/pfact/HPL_pdrpanrlN.c b/hpl/src/pfact/HPL_pdrpanrlN.c deleted file mode 100644 index 5f85b9cba352ad13415a2220a0f566c7083a30bf..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpanrlN.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpanrlN -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpanrlN -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpanrlN recursively factorizes a panel of columns using the - * recursive Right-looking variant of the one-dimensional algorithm. The - * lower triangular N0-by-N0 upper block of the panel is stored in - * no-transpose form (i.e. just like the input matrix itself). - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Factor current panel - Replicated solve - Local update - */ - HPL_pdrpanrlN( PANEL, m, jb, ioff, WORK ); - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, n, HPL_rone, Mptr( L1ptr, jj, jj, n0 ), - n0, Mptr( L1ptr, jj, jj+jb, n0 ), n0 ); - if( curr != 0 ) { ii += jb; m -= jb; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jb ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff+jb, - m, n ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jb ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff+jb, m, n ); - } - Lv1 = vsip_msubview_d( Lv0, ioff, ioff+jb, jb, n ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_NTRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, m, n, - jb, -HPL_rone, Mptr( Aptr, ii, jj, lda ), lda, - Mptr( L1ptr, jj, jj+jb, n0 ), n0, HPL_rone, - Mptr( Aptr, ii, jj+jb, lda ), lda ); -#endif -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlacpy( ioff, jb, Mptr( L1, 0, ioff, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpanrlN - */ -} diff --git a/hpl/src/pfact/HPL_pdrpanrlT.c b/hpl/src/pfact/HPL_pdrpanrlT.c deleted file mode 100644 index c3337fc125e335e4c9467ddc5abfc28298f9a0b4..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/HPL_pdrpanrlT.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdrpanrlT -( - HPL_T_panel * PANEL, - const int M, - const int N, - const int ICOFF, - double * WORK -) -#else -void HPL_pdrpanrlT -( PANEL, M, N, ICOFF, WORK ) - HPL_T_panel * PANEL; - const int M; - const int N; - const int ICOFF; - double * WORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdrpanrlT recursively factorizes a panel of columns using the - * recursive Right-looking variant of the one-dimensional algorithm. The - * lower triangular N0-by-N0 upper block of the panel is stored in - * transpose form. - * - * Bi-directional exchange is used to perform the swap::broadcast - * operations at once for one column in the panel. This results in a - * lower number of slightly larger messages than usual. On P processes - * and assuming bi-directional links, the running time of this function - * can be approximated by (when N is equal to N0): - * - * N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - * N0^2 * ( M - N0/3 ) * gam2-3 - * - * where M is the local number of rows of the panel, lat and bdwth are - * the latency and bandwidth of the network for double precision real - * words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS - * rate of execution. The recursive algorithm allows indeed to almost - * achieve Level 3 BLAS performance in the panel factorization. On a - * large number of modern machines, this operation is however latency - * bound, meaning that its cost can be estimated by only the latency - * portion N0 * log_2(P) * lat. Mono-directional links will double this - * communication cost. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * M (local input) const int - * On entry, M specifies the local number of rows of sub(A). - * - * N (local input) const int - * On entry, N specifies the local number of columns of sub(A). - * - * ICOFF (global input) const int - * On entry, ICOFF specifies the row and column offset of sub(A) - * in A. - * - * WORK (local workspace) double * - * On entry, WORK is a workarray of size at least 2*(4+2*N0). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * Aptr, * L1, * L1ptr; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Lv0, * Av1, * Av2, * Lv1; -#endif - int curr, ii, ioff, jb, jj, lda, m, n, n0, nb, - nbdiv, nbmin; -/* .. - * .. Executable Statements .. - */ - if( N <= ( nbmin = PANEL->algo->nbmin ) ) - { PANEL->algo->pffun( PANEL, M, N, ICOFF, WORK ); return; } -/* - * Find new recursive blocking factor. To avoid an infinite loop, one - * must guarantee: 1 <= jb < N, knowing that N is greater than NBMIN. - * First, we compute nblocks: the number of blocks of size NBMIN in N, - * including the last one that may be smaller. nblocks is thus larger - * than or equal to one, since N >= NBMIN. - * The ratio ( nblocks + NDIV - 1 ) / NDIV is thus larger than or equal - * to one as well. For NDIV >= 2, we are guaranteed that the quan- - * tity ( ( nblocks + NDIV - 1 ) / NDIV ) * NBMIN is less than N and - * greater than or equal to NBMIN. - */ - nbdiv = PANEL->algo->nbdiv; ii = jj = 0; m = M; n = N; - nb = jb = ( (((N+nbmin-1) / nbmin) + nbdiv - 1) / nbdiv ) * nbmin; - - A = PANEL->A; lda = PANEL->lda; - L1 = PANEL->L1; n0 = PANEL->jb; - L1ptr = Mptr( L1, ICOFF, ICOFF, n0 ); - curr = (int)( PANEL->grid->myrow == PANEL->prow ); - - if( curr != 0 ) Aptr = Mptr( A, ICOFF, ICOFF, lda ); - else Aptr = Mptr( A, 0, ICOFF, lda ); -/* - * The triangular solve is replicated in every process row. The panel - * factorization is such that the first rows of A are accumulated in - * every process row during the (panel) swapping phase. We ensure this - * way a minimum amount of communication during the entire panel facto- - * rization. - */ - do - { - n -= jb; ioff = ICOFF + jj; -/* - * Factor current panel - Replicated solve - Local update - */ - HPL_pdrpanrlT( PANEL, m, jb, ioff, WORK ); - HPL_dtrsm( HplColumnMajor, HplRight, HplUpper, HplNoTrans, - HplUnit, n, jb, HPL_rone, Mptr( L1ptr, jj, jj, n0 ), - n0, Mptr( L1ptr, jj+jb, jj, n0 ), n0 ); - if( curr != 0 ) { ii += jb; m -= jb; } -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L1block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L1block, 0, 1, n0, n0, n0 ); -/* - * Create the matrix subviews - */ - if( curr != 0 ) - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff, - m, jb ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ICOFF+ii, PANEL->jj+ioff+jb, - m, N ); - } - else - { - Av1 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff, m, jb ); - Av2 = vsip_msubview_d( Av0, PANEL->ii+ii, PANEL->jj+ioff+jb, m, n ); - } - Lv1 = vsip_msubview_d( Lv0, ioff+jb, ioff, n, jb ); - - vsip_gemp_d( -HPL_rone, Av1, VSIP_MAT_NTRANS, Lv1, VSIP_MAT_TRANS, - HPL_rone, Av2 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); - (void) vsip_mdestroy_d( Av2 ); - (void) vsip_mdestroy_d( Av1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, m, n, - jb, -HPL_rone, Mptr( Aptr, ii, jj, lda ), lda, - Mptr( L1ptr, jj+jb, jj, n0 ), n0, HPL_rone, - Mptr( Aptr, ii, jj+jb, lda ), lda ); -#endif -/* - * Copy back upper part of A in current process row - Go the next block - */ - if( curr != 0 ) - { - HPL_dlatcpy( ioff, jb, Mptr( L1, ioff, 0, n0 ), n0, - Mptr( A, 0, ioff, lda ), lda ); - } - jj += jb; jb = Mmin( n, nb ); - - } while( n > 0 ); -/* - * End of HPL_pdrpanrlT - */ -} diff --git a/hpl/src/pfact/intel64/Make.inc b/hpl/src/pfact/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/pfact/intel64/Makefile b/hpl/src/pfact/intel64/Makefile deleted file mode 100644 index 8d3f5e0847dcc94b745aa52b8a5724acc38d575a..0000000000000000000000000000000000000000 --- a/hpl/src/pfact/intel64/Makefile +++ /dev/null @@ -1,118 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_pfact.h -# -## Object files ######################################################## -# -HPL_pfaobj = \ - HPL_dlocmax.o HPL_dlocswpN.o HPL_dlocswpT.o \ - HPL_pdmxswp.o HPL_pdpancrN.o HPL_pdpancrT.o \ - HPL_pdpanllN.o HPL_pdpanllT.o HPL_pdpanrlN.o \ - HPL_pdpanrlT.o HPL_pdrpanllN.o HPL_pdrpanllT.o \ - HPL_pdrpancrN.o HPL_pdrpancrT.o HPL_pdrpanrlN.o \ - HPL_pdrpanrlT.o HPL_pdfact.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pfaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pfaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dlocmax.o : ../HPL_dlocmax.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocmax.c -HPL_dlocswpN.o : ../HPL_dlocswpN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocswpN.c -HPL_dlocswpT.o : ../HPL_dlocswpT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dlocswpT.c -HPL_pdmxswp.o : ../HPL_pdmxswp.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdmxswp.c -HPL_pdpancrN.o : ../HPL_pdpancrN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpancrN.c -HPL_pdpancrT.o : ../HPL_pdpancrT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpancrT.c -HPL_pdpanllN.o : ../HPL_pdpanllN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanllN.c -HPL_pdpanllT.o : ../HPL_pdpanllT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanllT.c -HPL_pdpanrlN.o : ../HPL_pdpanrlN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanrlN.c -HPL_pdpanrlT.o : ../HPL_pdpanrlT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdpanrlT.c -HPL_pdrpanllN.o : ../HPL_pdrpanllN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanllN.c -HPL_pdrpanllT.o : ../HPL_pdrpanllT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanllT.c -HPL_pdrpancrN.o : ../HPL_pdrpancrN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpancrN.c -HPL_pdrpancrT.o : ../HPL_pdrpancrT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpancrT.c -HPL_pdrpanrlN.o : ../HPL_pdrpanrlN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanrlN.c -HPL_pdrpanrlT.o : ../HPL_pdrpanrlT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdrpanrlT.c -HPL_pdfact.o : ../HPL_pdfact.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdfact.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/pfact/intel64/lib.grd b/hpl/src/pfact/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/src/pgesv/HPL_equil.c b/hpl/src/pgesv/HPL_equil.c deleted file mode 100644 index b3a98a315aaa6c63b00bba3ada751e60a54b72f3..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_equil.c +++ /dev/null @@ -1,253 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_equil -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const enum HPL_TRANS TRANS, - const int N, - double * U, - const int LDU, - int * IPLEN, - const int * IPMAP, - const int * IPMAPM1, - int * IWORK -) -#else -void HPL_equil -( PBCST, IFLAG, PANEL, TRANS, N, U, LDU, IPLEN, IPMAP, IPMAPM1, IWORK ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const enum HPL_TRANS TRANS; - const int N; - double * U; - const int LDU; - int * IPLEN; - const int * IPMAP; - const int * IPMAPM1; - int * IWORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_equil equilibrates the local pieces of U, so that on exit to - * this function, pieces of U contained in every process row are of the - * same size. This phase makes the rolling phase optimal. In addition, - * this function probes for the column panel L and forwards it when - * possible. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be equilibrated) information. - * - * TRANS (global input) const enum HPL_TRANS - * On entry, TRANS specifies whether U is stored in transposed - * or non-transposed form. - * - * N (local input) const int - * On entry, N specifies the number of rows or columns of U. N - * must be at least 0. - * - * U (local input/output) double * - * On entry, U is an array of dimension (LDU,*) containing the - * local pieces of U in each process row. - * - * LDU (local input) const int - * On entry, LDU specifies the local leading dimension of U. LDU - * should be at least MAX(1,IPLEN[nprow]) when U is stored in - * non-transposed form, and MAX(1,N) otherwise. - * - * IPLEN (global input) int * - * On entry, IPLEN is an array of dimension NPROW+1. This array - * is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U - * in process IPMAP[i]. - * - * IPMAP (global input) const int * - * On entry, IPMAP is an array of dimension NPROW. This array - * contains the logarithmic mapping of the processes. In other - * words, IPMAP[myrow] is the absolute coordinate of the sorted - * process. - * - * IPMAPM1 (global input) const int * - * On entry, IPMAPM1 is an array of dimension NPROW. This array - * contains the inverse of the logarithmic mapping contained in - * IPMAP: For i in [0.. NPROCS) IPMAPM1[IPMAP[i]] = i. - * - * IWORK (workspace) int * - * On entry, IWORK is a workarray of dimension NPROW+1. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, ip, ipU, ipcur, iprow, iptgt, lastrow, - left, npm1, nprow, ll, llU, llcur, lltgt, - right, slen, smax, smin; -/* .. - * .. Executable Statements .. - */ - if( ( npm1 = ( nprow = PANEL->grid->nprow ) - 1 ) <= 1 ) return; -/* - * If the current distribution of the pieces of U is already optimal for - * the rolling phase, then return imediately. The optimal distribution - * is such that ip processes have smax items and the remaining processes - * only have smin items. Another way to check this is to verify that all - * differences IPLEN[i+1] - IPLEN[i] are either smin or smax. - */ - smax = ( ( slen = IPLEN[nprow] ) + npm1 ) / nprow; - ip = slen - nprow * ( smin = slen / nprow ); - - iprow = 0; - do - { - ll = IPLEN[iprow+1] - IPLEN[iprow]; iprow++; - } while( ( iprow < nprow ) && ( ( ll == smin ) || ( ll == smax ) ) ); - - if( iprow == nprow ) return; -/* - * Now, we are sure the distribution of the pieces of U is not optimal - * with respect to the rolling phase, thus perform equilibration. Go - * through the list of processes: Processes that have rows that do not - * belong to them with respect to the optimal mapping spread them in a - * logarithmic fashion. To simplify a little bit the implementation, and - * mainly the packing, a source process row spreads its data to its left - * first, and then to its right. - */ - IWORK[nprow] = slen; - - for( iprow = 0; iprow < nprow; iprow++ ) - { - llU = IPLEN[iprow+1] - ( ipU = IPLEN[iprow] ); - if( iprow < ip ) { lltgt = smax; iptgt = iprow * smax; } - else { lltgt = smin; iptgt = iprow * smin + ip; } - - left = ( ipU < iptgt ); right = ( iptgt + lltgt < ipU + llU ); -/* - * If I have something to spread to either the left or the right - */ - if( ( llU > 0 ) && ( left || right ) ) - { /* Figure out how much every other process should have */ - - ipcur = ipU; llcur = llU; - - for( i = 0; i < nprow; i++ ) - { - if( i < ip ) { lltgt = smax; iptgt = i * smax; } - else { lltgt = smin; iptgt = i * smin + ip; } - lastrow = iptgt + lltgt - 1; - - if( ( lastrow >= ipcur ) && ( llcur > 0 ) ) - { ll = lastrow - ipcur + 1; ll = Mmin( ll, llcur ); llcur -= ll; } - else { ll = 0; } - - IWORK[i] = ipcur; ipcur += ll; IWORK[i+1] = ipcur; - } -/* - * Equilibration phase - */ - if( TRANS == HplNoTrans ) - { - if( left ) - { - HPL_spreadN( PBCST, IFLAG, PANEL, HplLeft, N, U, LDU, - iprow, IWORK, IPMAP, IPMAPM1 ); - } - - if( right ) - { - HPL_spreadN( PBCST, IFLAG, PANEL, HplRight, N, U, LDU, - iprow, IWORK, IPMAP, IPMAPM1 ); - } - } - else - { - if( left ) - { - HPL_spreadT( PBCST, IFLAG, PANEL, HplLeft, N, U, LDU, - iprow, IWORK, IPMAP, IPMAPM1 ); - } - - if( right ) - { - HPL_spreadT( PBCST, IFLAG, PANEL, HplRight, N, U, LDU, - iprow, IWORK, IPMAP, IPMAPM1 ); - } - } - } - } -/* - * Finally update IPLEN with the indexes corresponding to the new dis- - * tribution of U - IPLEN[nprow] remained unchanged. - */ - for( i = 0; i < nprow; i++ ) IPLEN[i] = ( i < ip ? i*smax : i*smin + ip ); -/* - * End of HPL_equil - */ -} diff --git a/hpl/src/pgesv/HPL_logsort.c b/hpl/src/pgesv/HPL_logsort.c deleted file mode 100644 index e88ae589c99e6cd904c6f1c75398b6a3e2d95fcc..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_logsort.c +++ /dev/null @@ -1,185 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_logsort -( - const int NPROCS, - const int ICURROC, - int * IPLEN, - int * IPMAP, - int * IPMAPM1 -) -#else -void HPL_logsort -( NPROCS, ICURROC, IPLEN, IPMAP, IPMAPM1 ) - const int NPROCS; - const int ICURROC; - int * IPLEN; - int * IPMAP; - int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_logsort computes an array IPMAP and its inverse IPMAPM1 that - * contain the logarithmic sorted processes id with repect to the local - * number of rows of U that they own. This is necessary to ensure that - * the logarithmic spreading of U is optimal in terms of number of steps - * and communication volume as well. In other words, the larget pieces - * of U will be sent a minimal number of times. - * - * Arguments - * ========= - * - * NPROCS (global input) const int - * On entry, NPROCS specifies the number of process rows in the - * process grid. NPROCS is at least one. - * - * ICURROC (global input) const int - * On entry, ICURROC is the source process row. - * - * IPLEN (global input/output) int * - * On entry, IPLEN is an array of dimension NPROCS+1, such that - * IPLEN[0] is 0, and IPLEN[i] contains the number of rows of U, - * that process i-1 has. On exit, IPLEN[i] is the number of - * rows of U in the processes before process IPMAP[i] after the - * sort, with the convention that IPLEN[NPROCS] is the total - * number of rows of the panel. In other words, IPLEN[i+1] - - * IPLEN[i] is the number of rows of A that should be moved to - * the process IPMAP[i]. IPLEN is such that the number of rows - * of the source process row is IPLEN[1] - IPLEN[0], and the - * remaining entries of this array are sorted so that the - * quantities IPLEN[i+1]-IPLEN[i] are logarithmically sorted. - * - * IPMAP (global output) int * - * On entry, IPMAP is an array of dimension NPROCS. On exit, - * array contains the logarithmic mapping of the processes. In - * other words, IPMAP[myroc] is the corresponding sorted process - * coordinate. - * - * IPMAPM1 (global output) int * - * On entry, IPMAPM1 is an array of dimension NPROCS. On exit, - * this array contains the inverse of the logarithmic mapping - * contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in - * [0.. NPROCS) - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int dist, i, ip, iplen_i, iplen_j, itmp, j, k; -/* .. - * .. Executable Statements .. - */ -/* - * Compute the logarithmic distance between process j and process 0, as - * well as the maximum logarithmic distance. IPMAPM1 is workarray here. - */ - for( j = 0, dist = 0; j < NPROCS; j++ ) - { - IPMAP[j] = MModAdd( j, ICURROC, NPROCS ); ip = j; itmp = 0; - do { if( ip & 1 ) itmp++; ip >>= 1; } while ( ip ); - IPMAPM1[j] = itmp; if( itmp > dist ) dist = itmp; - } -/* - * Shift IPLEN[1..NPROCS] of ICURROC places, so that IPLEN[1] is now - * what used to be IPLEN[ICURROC+1]. Initialize IPMAP, so that IPMAP[0] - * is ICURROC. - */ - for( j = 0; j < ICURROC; j++ ) - { - for( i = 2, itmp = IPLEN[1]; i <= NPROCS; i++ ) IPLEN[i-1] = IPLEN[i]; - IPLEN[NPROCS] = itmp; - } -/* - * logarithmic sort - */ - for( k = 1; k <= dist; k++ ) - { - for( j = 1; j < NPROCS; j++ ) - { - if( IPMAPM1[j] == k ) - { - for( i = 2; i < NPROCS; i++ ) - { - if( k < IPMAPM1[i] ) - { - iplen_i = IPLEN[i+1]; iplen_j = IPLEN[j+1]; - - if( iplen_j < iplen_i ) - { - IPLEN[j+1] = iplen_i; IPLEN[i+1] = iplen_j; - itmp = IPMAP[j]; IPMAP[j] = IPMAP[i]; - IPMAP[i] = itmp; - } - } - } - } - } - } -/* - * Compute IPLEN and IPMAPM1 (the inverse of IPMAP) - */ - IPLEN[0] = 0; - - for( i = 0; i < NPROCS; i++ ) - { - IPMAPM1[ IPMAP[i] ] = i; - IPLEN[i+1] += IPLEN[i]; - } -/* - * End of HPL_logsort - */ -} diff --git a/hpl/src/pgesv/HPL_pdgesv.c b/hpl/src/pgesv/HPL_pdgesv.c deleted file mode 100644 index 25485110c313e16db0b6c45db85f886fb6db8a18..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdgesv.c +++ /dev/null @@ -1,116 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdgesv -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - HPL_T_pmat * A -) -#else -void HPL_pdgesv -( GRID, ALGO, A ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - HPL_T_pmat * A; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdgesv factors a N+1-by-N matrix using LU factorization with row - * partial pivoting. The main algorithm is the "right looking" variant - * with or without look-ahead. The lower triangular factor is left - * unpivoted and the pivots are not returned. The right hand side is the - * N+1 column of the coefficient matrix. - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - if( A->n <= 0 ) return; - - A->info = 0; - - if( ( ALGO->depth == 0 ) || ( GRID->npcol == 1 ) ) - { - HPL_pdgesv0( GRID, ALGO, A ); - } - else - { - HPL_pdgesvK2( GRID, ALGO, A ); - } -/* - * Solve upper triangular system - */ - if( A->info == 0 ) HPL_pdtrsv( GRID, A ); -/* - * End of HPL_pdgesv - */ -} diff --git a/hpl/src/pgesv/HPL_pdgesv0.c b/hpl/src/pgesv/HPL_pdgesv0.c deleted file mode 100644 index c36134bada340ed8048f37928cc11e51f5134e6d..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdgesv0.c +++ /dev/null @@ -1,151 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdgesv0 -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - HPL_T_pmat * A -) -#else -void HPL_pdgesv0 -( GRID, ALGO, A ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - HPL_T_pmat * A; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdgesv0 factors a N+1-by-N matrix using LU factorization with row - * partial pivoting. The main algorithm is the "right looking" variant - * without look-ahead. The lower triangular factor is left unpivoted and - * the pivots are not returned. The right hand side is the N+1 column of - * the coefficient matrix. - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - HPL_T_panel * * panel = NULL; - HPL_T_UPD_FUN HPL_pdupdate; - int N, j, jb, n, nb, tag=MSGID_BEGIN_FACT, - test=HPL_KEEP_TESTING; -/* .. - * .. Executable Statements .. - */ - if( ( N = A->n ) <= 0 ) return; - - HPL_pdupdate = ALGO->upfun; nb = A->nb; -/* - * Allocate a panel list of length 1 - Allocate panel[0] resources - */ - panel = (HPL_T_panel **)malloc( sizeof( HPL_T_panel * ) ); - if( panel == NULL ) - { HPL_pabort( __LINE__, "HPL_pdgesv0", "Memory allocation failed" ); } - - HPL_pdpanel_new( GRID, ALGO, N, N+1, Mmin( N, nb ), A, 0, 0, tag, - &panel[0] ); -/* - * Loop over the columns of A - */ - for( j = 0; j < N; j += nb ) - { - n = N - j; jb = Mmin( n, nb ); -/* - * Release panel resources - re-initialize panel data structure - */ - (void) HPL_pdpanel_free( panel[0] ); - HPL_pdpanel_init( GRID, ALGO, n, n+1, jb, A, j, j, tag, panel[0] ); -/* - * Factor and broadcast current panel - update - */ - HPL_pdfact( panel[0] ); - (void) HPL_binit( panel[0] ); - do - { (void) HPL_bcast( panel[0], &test ); } - while( test != HPL_SUCCESS ); - (void) HPL_bwait( panel[0] ); - HPL_pdupdate( NULL, NULL, panel[0], -1 ); -/* - * Update message id for next factorization - */ - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); - } -/* - * Release panel resources and panel list - */ - (void) HPL_pdpanel_disp( &panel[0] ); - - if( panel ) free( panel ); -/* - * End of HPL_pdgesv0 - */ -} diff --git a/hpl/src/pgesv/HPL_pdgesvK1.c b/hpl/src/pgesv/HPL_pdgesvK1.c deleted file mode 100644 index 512938c8576ea840dd2b62a3ce6b4c1e633505ee..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdgesvK1.c +++ /dev/null @@ -1,205 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdgesvK1 -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - HPL_T_pmat * A -) -#else -void HPL_pdgesvK1 -( GRID, ALGO, A ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - HPL_T_pmat * A; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdgesvK1 factors a N+1-by-N matrix using LU factorization with row - * partial pivoting. The main algorithm is the "right looking" variant - * with look-ahead. The lower triangular factor is left unpivoted and - * the pivots are not returned. The right hand side is the N+1 column of - * the coefficient matrix. - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - HPL_T_panel * * panel = NULL; - HPL_T_UPD_FUN HPL_pdupdate; - int N, depth, icurcol=0, j, jb, jj=0, jstart, - k, mycol, n, nb, nn, npcol, nq, - tag=MSGID_BEGIN_FACT, test=HPL_KEEP_TESTING; -/* .. - * .. Executable Statements .. - */ - mycol = GRID->mycol; npcol = GRID->npcol; - depth = ALGO->depth; HPL_pdupdate = ALGO->upfun; - N = A->n; nb = A->nb; - - if( N <= 0 ) return; -/* - * Allocate a panel list of length depth + 1 (depth >= 1) - */ - panel = (HPL_T_panel **)malloc( (size_t)(depth+1)*sizeof( HPL_T_panel *) ); - if( panel == NULL ) - { HPL_pabort( __LINE__, "HPL_pdgesvK1", "Memory allocation failed" ); } -/* - * Create and initialize the first depth panels - */ - nq = HPL_numroc( N+1, nb, nb, mycol, 0, npcol ); nn = N; jstart = 0; - - for( k = 0; k < depth; k++ ) - { - jb = Mmin( nn, nb ); - HPL_pdpanel_new( GRID, ALGO, nn, nn+1, jb, A, jstart, jstart, - tag, &panel[k] ); - nn -= jb; jstart += jb; - if( mycol == icurcol ) { jj += jb; nq -= jb; } - icurcol = MModAdd1( icurcol, npcol ); - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); - } -/* - * Initialize the lookahead - Factor jstart columns: panel[0..depth-1] - */ - for( k = 0, j = 0; k < depth; k++ ) - { - jb = jstart - j; jb = Mmin( jb, nb ); j += jb; -/* - * Factor and broadcast k-th panel - use long topology for those - */ - HPL_pdfact( panel[k] ); - (void) HPL_binit( panel[k] ); - do - { (void) HPL_bcast( panel[k], &test ); } - while( test != HPL_SUCCESS ); - (void) HPL_bwait( panel[k] ); -/* - * Partial update of the depth-1-k panels in front of me - */ - if( k < depth - 1 ) - { - nn = HPL_numrocI( jstart-j, j, nb, nb, mycol, 0, npcol ); - HPL_pdupdate( NULL, NULL, panel[k], nn ); - } - } -/* - * Main loop over the remaining columns of A - */ - for( j = jstart; j < N; j += nb ) - { - n = N - j; jb = Mmin( n, nb ); -/* - * Allocate current panel resources - Finish latest update - Factor and - * broadcast current panel - */ - HPL_pdpanel_new( GRID, ALGO, n, n+1, jb, A, j, j, tag, &panel[depth] ); - - if( mycol == icurcol ) - { - nn = HPL_numrocI( jb, j, nb, nb, mycol, 0, npcol ); - for( k = 0; k < depth; k++ ) /* partial updates 0..depth-1 */ - HPL_pdupdate( NULL, NULL, panel[k], nn ); - HPL_pdfact( panel[depth] ); /* factor current panel */ - } - else { nn = 0; } - /* Finish the latest update and broadcast the current panel */ - (void) HPL_binit( panel[depth] ); - HPL_pdupdate( panel[depth], &test, panel[0], nq-nn ); - (void) HPL_bwait( panel[depth] ); -/* - * Release latest panel resources - circular of the panel pointers - * Go to the next process row and column - update the message ids for - * broadcast - */ - (void) HPL_pdpanel_disp( &panel[0] ); - for( k = 0; k < depth; k++ ) panel[k] = panel[k+1]; - - if( mycol == icurcol ) { jj += jb; nq -= jb; } - icurcol = MModAdd1( icurcol, npcol ); - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); - } -/* - * Clean-up: Finish updates - release panels and panel list - */ - nn = HPL_numrocI( 1, N, nb, nb, mycol, 0, npcol ); - for( k = 0; k < depth; k++ ) - { - HPL_pdupdate( NULL, NULL, panel[k], nn ); - (void) HPL_pdpanel_disp( &panel[k] ); - } - - if( panel ) free( panel ); -/* - * End of HPL_pdgesvK1 - */ -} diff --git a/hpl/src/pgesv/HPL_pdgesvK2.c b/hpl/src/pgesv/HPL_pdgesvK2.c deleted file mode 100644 index 05343ccecc8b978c6832f7912a2f7c4b921b5a95..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdgesvK2.c +++ /dev/null @@ -1,214 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdgesvK2 -( - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - HPL_T_pmat * A -) -#else -void HPL_pdgesvK2 -( GRID, ALGO, A ) - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - HPL_T_pmat * A; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdgesvK2 factors a N+1-by-N matrix using LU factorization with row - * partial pivoting. The main algorithm is the "right looking" variant - * with look-ahead. The lower triangular factor is left unpivoted and - * the pivots are not returned. The right hand side is the N+1 column of - * the coefficient matrix. - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters. - * - * A (local input/output) HPL_T_pmat * - * On entry, A points to the data structure containing the local - * array information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - HPL_T_panel * p, * * panel = NULL; - HPL_T_UPD_FUN HPL_pdupdate; - int N, depth, icurcol=0, j, jb, jj=0, jstart, - k, mycol, n, nb, nn, npcol, nq, - tag=MSGID_BEGIN_FACT, test=HPL_KEEP_TESTING; -/* .. - * .. Executable Statements .. - */ - mycol = GRID->mycol; npcol = GRID->npcol; - depth = ALGO->depth; HPL_pdupdate = ALGO->upfun; - N = A->n; nb = A->nb; - - if( N <= 0 ) return; -/* - * Allocate a panel list of length depth + 1 (depth >= 1) - */ - panel = (HPL_T_panel **)malloc( (size_t)(depth+1) * sizeof( HPL_T_panel *) ); - if( panel == NULL ) - { HPL_pabort( __LINE__, "HPL_pdgesvK2", "Memory allocation failed" ); } -/* - * Create and initialize the first depth panels - */ - nq = HPL_numroc( N+1, nb, nb, mycol, 0, npcol ); nn = N; jstart = 0; - - for( k = 0; k < depth; k++ ) - { - jb = Mmin( nn, nb ); - HPL_pdpanel_new( GRID, ALGO, nn, nn+1, jb, A, jstart, jstart, - tag, &panel[k] ); - nn -= jb; jstart += jb; - if( mycol == icurcol ) { jj += jb; nq -= jb; } - icurcol = MModAdd1( icurcol, npcol ); - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); - } -/* - * Create last depth+1 panel - */ - HPL_pdpanel_new( GRID, ALGO, nn, nn+1, Mmin( nn, nb ), A, jstart, - jstart, tag, &panel[depth] ); - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); -/* - * Initialize the lookahead - Factor jstart columns: panel[0..depth-1] - */ - for( k = 0, j = 0; k < depth; k++ ) - { - jb = jstart - j; jb = Mmin( jb, nb ); j += jb; -/* - * Factor and broadcast k-th panel - */ - HPL_pdfact( panel[k] ); - (void) HPL_binit( panel[k] ); - do - { (void) HPL_bcast( panel[k], &test ); } - while( test != HPL_SUCCESS ); - (void) HPL_bwait( panel[k] ); -/* - * Partial update of the depth-k-1 panels in front of me - */ - if( k < depth - 1 ) - { - nn = HPL_numrocI( jstart-j, j, nb, nb, mycol, 0, npcol ); - HPL_pdupdate( NULL, NULL, panel[k], nn ); - } - } -/* - * Main loop over the remaining columns of A - */ - for( j = jstart; j < N; j += nb ) - { - n = N - j; jb = Mmin( n, nb ); -/* - * Initialize current panel - Finish latest update, Factor and broadcast - * current panel - */ - (void) HPL_pdpanel_free( panel[depth] ); - HPL_pdpanel_init( GRID, ALGO, n, n+1, jb, A, j, j, tag, panel[depth] ); - - if( mycol == icurcol ) - { - nn = HPL_numrocI( jb, j, nb, nb, mycol, 0, npcol ); - for( k = 0; k < depth; k++ ) /* partial updates 0..depth-1 */ - (void) HPL_pdupdate( NULL, NULL, panel[k], nn ); - HPL_pdfact( panel[depth] ); /* factor current panel */ - } - else { nn = 0; } - /* Finish the latest update and broadcast the current panel */ - (void) HPL_binit( panel[depth] ); - HPL_pdupdate( panel[depth], &test, panel[0], nq-nn ); - (void) HPL_bwait( panel[depth] ); -/* - * Circular of the panel pointers: - * xtmp = x[0]; for( k=0; k < depth; k++ ) x[k] = x[k+1]; x[d] = xtmp; - * - * Go to next process row and column - update the message ids for broadcast - */ - p = panel[0]; for( k = 0; k < depth; k++ ) panel[k] = panel[k+1]; - panel[depth] = p; - - if( mycol == icurcol ) { jj += jb; nq -= jb; } - icurcol = MModAdd1( icurcol, npcol ); - tag = MNxtMgid( tag, MSGID_BEGIN_FACT, MSGID_END_FACT ); - } -/* - * Clean-up: Finish updates - release panels and panel list - */ - nn = HPL_numrocI( 1, N, nb, nb, mycol, 0, npcol ); - for( k = 0; k < depth; k++ ) - { - (void) HPL_pdupdate( NULL, NULL, panel[k], nn ); - (void) HPL_pdpanel_disp( &panel[k] ); - } - (void) HPL_pdpanel_disp( &panel[depth] ); - - if( panel ) free( panel ); -/* - * End of HPL_pdgesvK2 - */ -} diff --git a/hpl/src/pgesv/HPL_pdlaswp00N.c b/hpl/src/pgesv/HPL_pdlaswp00N.c deleted file mode 100644 index dbe9ef543363379a3b6f992217b88f897d7ed39a..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdlaswp00N.c +++ /dev/null @@ -1,432 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdlaswp00N -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdlaswp00N -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlaswp00N applies the NB row interchanges to NN columns of the - * trailing submatrix and broadcast a column panel. - * - * Bi-directional exchange is used to perform the swap :: broadcast of - * the row panel U at once, resulting in a lower number of messages than - * usual as well as a lower communication volume. With P process rows and - * assuming bi-directional links, the running time of this function can - * be approximated by: - * - * log_2(P) * (lat + NB*LocQ(N) / bdwth) - * - * where NB is the number of rows of the row panel U, N is the global - * number of columns being updated, lat and bdwth are the latency and - * bandwidth of the network for double precision real words. Mono - * directional links will double this communication cost. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be broadcast and swapped) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be swapped and broadcast starting at - * the current position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - HPL_T_grid * grid; - double * A, * U, * W; - void * vptr = NULL; - int * ipID, * lindxA, * lindxAU, * llen, - * llen_sv; - unsigned int ip2, ip2_=1, ipdist, ipow=1, mask=1, - mydist, mydis_; - int Cmsgid=MSGID_BEGIN_PFACT, Np2, align, - hdim, i, icurrow, *iflag, ipA, ipW, *ipl, - iprow, jb, k, lda, ldW, myrow, n, nprow, - partner, root, size_, usize; -#define LDU jb -/* .. - * .. Executable Statements .. - */ - n = Mmin( NN, PANEL->n ); jb = PANEL->jb; -/* - * Quick return if there is nothing to do - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) return; - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * Retrieve parameters from the PANEL data structure - */ - grid = PANEL->grid; nprow = grid->nprow; myrow = grid->myrow; - comm = grid->col_comm; ip2 = (unsigned int)grid->row_ip2; - hdim = grid->row_hdim; align = PANEL->algo->align; - A = PANEL->A; U = PANEL->U; iflag = PANEL->IWORK; - lda = PANEL->lda; icurrow = PANEL->prow; usize = jb * n; - ldW = n + 1; -/* - * Allocate space for temporary W (ldW * jb) - */ - vptr = (void*)malloc( - ((size_t)(align) + ((size_t)(jb) * (size_t)(ldW))) * sizeof(double) ); - if( vptr == NULL ) - { HPL_pabort( __LINE__, "HPL_pdlaswp00N", "Memory allocation failed" ); } - - W = (double *)HPL_PTR( vptr, ((size_t)(align) * sizeof(double) ) ); -/* - * Construct ipID and its local counter parts lindxA, lindxAU - llen is - * the number of rows/columns that I have in workspace and that I should - * send. Compute lindx_, ipA, llen if it has not already been done for - * this panel; - */ - k = (int)((unsigned int)(jb) << 1); ipl = iflag + 1; ipID = ipl + 1; - lindxA = ipID + ((unsigned int)(k) << 1); lindxAU = lindxA + k; - llen = lindxAU + k; llen_sv = llen + nprow; - - if( *iflag == -1 ) /* no index arrays have been computed so far */ - { - HPL_pipid( PANEL, ipl, ipID ); - HPL_plindx0( PANEL, *ipl, ipID, lindxA, lindxAU, llen_sv ); - *iflag = 0; - } - else if( *iflag == 1 ) /* HPL_pdlaswp01N called before: reuse ipID */ - { - HPL_plindx0( PANEL, *ipl, ipID, lindxA, lindxAU, llen_sv ); - *iflag = 0; - } -/* - * Copy the llen_sv into llen - Reset ipA to its correct value - */ - ipA = llen_sv[myrow]; - for( i = 0; i < nprow; i++ ) { llen[i] = llen_sv[i]; } -/* - * For i in [0..2*jb), lindxA[i] is the offset in A of a row that ulti- - * mately goes to U( lindxAU[i], : ) or U( :, lindxAU[i] ). In icurrow, - * we directly pack into U, otherwise we pack into workspace. The first - * entry of each column packed in workspace is in fact the row or column - * offset in U where it should go to. - */ - if( myrow == icurrow ) - { - HPL_dlaswp01N( ipA, n, A, lda, U, LDU, lindxA, lindxAU ); - } - else - { - HPL_dlaswp02N( ipA, n, A, lda, W, W+1, ldW, lindxA, lindxAU ); - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); -/* - * Algorithm for bi-directional data exchange: - * - * As long as I have not talked to a process that already had the data - * from icurrow, I will be sending the workspace, otherwise I will be - * sending U. Note that the columns in workspace contain the local index - * in U they should go to. - * - * If I am receiving from a process that has the data from icurrow, I - * will be receiving in U, copy the data of U that stays into A, and - * then the columns I have in workspace into U; otherwise I will be re- - * ceiving in the remaining workspace. If I am one of those processes - * that already has the data from icurrow, I will be immediately copying - * the data I have in my workspace into U. - * - * When I receive U, some of U should be copied in my piece of A before - * I can copy the rows I have in my workspace into U. This information - * is kept in the lists lindx_: the row lindxAU[i] should be copied in - * the row lindxA[i] of my piece of A, just as in the reversed initial - * packing operation. Those rows are thus the first ones in the work ar- - * ray. After this operation has been performed, I will not need - * those lindx arrays, and I will always be sending a buffer of size - * jb x n, or n x jb, that is, U. - * - * At every step of the algorithm, it is necesary to update the list - * llen, so that I can figure out how large the next messages I will be - * sending/receiving are. It is obvious when I am sending U. It is not - * otherwise. - * - * We choose icurrow to be the source of the bi-directional exchange. - * This allows the processes in the non-power 2 part to receive U at the - * first exchange, and then broadcast internally this U so that those - * processes can grab their piece of A. - */ - if( myrow == icurrow ) { llen[myrow] = 0; ipA = 0; } - ipW = ipA; - Np2 = ( ( size_ = nprow - ip2 ) != 0 ); - mydist = (unsigned int)MModSub( myrow, icurrow, nprow ); -/* - * bi-directional exchange: If nprow is not a power of 2, proc[i-ip2] - * receives local data from proc[i] for all i in [ip2..nprow); icurrow - * is the source, these last process indexes are relative to icurrow. - */ - if( ( Np2 != 0 ) && ( ( partner = (int)(mydist ^ ip2) ) < nprow ) ) - { - partner = MModAdd( icurrow, partner, nprow ); - - if( mydist == 0 ) /* I am the current row: I send U and recv W */ - { - (void) HPL_sdrv( U, usize, Cmsgid, W, llen[partner] * ldW, - Cmsgid, partner, comm ); - if( llen[partner] > 0 ) - HPL_dlaswp03N( llen[partner], n, U, LDU, W, W+1, ldW ); - } - else if( mydist == ip2 ) - { /* I recv U for later Bcast, I send my W */ - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, U, usize, - Cmsgid, partner, comm ); - } - else /* None of us is icurrow, we exchange our Ws */ - { - if( ( mydist & ip2 ) != 0 ) - { - (void) HPL_send( W, llen[myrow]*ldW, partner, Cmsgid, comm ); - } - else - { - (void) HPL_recv( Mptr( W, 0, ipW, ldW ), llen[partner]*ldW, - partner, Cmsgid, comm ); - if( llen[partner] > 0 ) ipW += llen[partner]; - } - } - } -/* - * Update llen - */ - for( i = 1; i < size_; i++ ) - { - iprow = MModAdd( icurrow, i, nprow ); - partner = MModAdd( iprow, (int)(ip2), nprow ); - llen[ iprow ] += llen[ partner ]; - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); -/* - * power of 2 part of the processes collection: only processes [0..ip2) - * are working; some of them (mydist >> (k+1) == 0) either send or re- - * ceive U. At every step k, k is in [0 .. hdim), of the algorithm, a - * process pair that exchanges U is such that (mydist >> (k+1) == 0). - * Among those processes, the ones that are sending U are such that - * mydist >> k == 0. - */ - if( mydist < ip2 ) - { - k = 0; - - while( k < hdim ) - { - partner = (int)(mydist ^ ipow); - partner = MModAdd( icurrow, partner, nprow ); -/* - * Exchange and combine the local results - If I receive U, then I must - * copy from U the rows that belong to my piece of A, and then update U - * by copying in it the rows I have accumulated in W. Otherwise, I re- - * ceive W. In this later case, and I have U, I shall update my copy of - * U by copying in it the rows I have accumulated in W. If I did not - * have U before, I simply need to update my pointer in W for later use. - */ - if( ( mydist >> (unsigned int)( k + 1 ) ) == 0 ) - { - if( ( mydist >> (unsigned int)(k) ) == 0 ) - { - (void) HPL_sdrv( U, usize, Cmsgid, Mptr( W, 0, ipW, - ldW ), llen[partner]*ldW, Cmsgid, - partner, comm ); - HPL_dlaswp03N( llen[partner], n, U, LDU, Mptr( W, 0, ipW, - ldW ), Mptr( W, 1, ipW, ldW ), ldW ); - ipW += llen[partner]; - } - else - { - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, U, usize, - Cmsgid, partner, comm ); - HPL_dlaswp04N( ipA, llen[myrow], n, U, LDU, A, lda, W, - W+1, ldW, lindxA, lindxAU ); - } - } - else - { - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, Mptr( W, 0, - ipW, ldW ), llen[partner]*ldW, Cmsgid, - partner, comm ); - ipW += llen[partner]; - } -/* - * Update llen - Go to next process pairs - */ - iprow = icurrow; ipdist = 0; - do - { - if( (unsigned int)( partner = (int)(ipdist ^ ipow) ) > ipdist ) - { - partner = MModAdd( icurrow, partner, nprow ); - llen[iprow] += llen[partner]; - llen[partner] = llen[iprow]; - } - iprow = MModAdd( iprow, 1, nprow ); ipdist++; - - } while( ipdist < ip2 ); - - ipow <<= 1; k++; -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - } - } - else - { -/* - * non power of 2 part of the process collection: proc[ip2] broadcast U - * to procs[ip2..nprow) (relatively to icurrow). - */ - if( size_ > 1 ) - { - k = size_ - 1; - while( k > 1 ) { k >>= 1; ip2_ <<= 1; mask <<= 1; mask++; } - root = MModAdd( icurrow, (int)(ip2), nprow ); - mydis_ = (unsigned int)MModSub( myrow, root, nprow ); - - do - { - mask ^= ip2_; - if( ( mydis_ & mask ) == 0 ) - { - partner = (int)(mydis_ ^ ip2_); - if( ( mydis_ & ip2_ ) != 0 ) - { - (void) HPL_recv( U, usize, MModAdd( root, partner, - nprow ), Cmsgid, comm ); - - } - else if( partner < size_ ) - { - (void) HPL_send( U, usize, MModAdd( root, partner, - nprow ), Cmsgid, comm ); - } - } - ip2_ >>= 1; -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2_ > 0 ); - } -/* - * Every process in [ip2..nprow) (relatively to icurrow) grabs its piece - * of A. - */ - HPL_dlaswp05N( ipA, n, A, lda, U, LDU, lindxA, lindxAU ); - } -/* - * If nprow is not a power of 2, proc[i-ip2] sends global result to - * proc[i] for all i in [ip2..nprow); - */ - if( ( Np2 != 0 ) && ( ( partner = (int)(mydist ^ ip2) ) < nprow ) ) - { - partner = MModAdd( icurrow, partner, nprow ); - if( ( mydist & ip2 ) != 0 ) - { (void) HPL_recv( U, usize, partner, Cmsgid, comm ); } - else - { (void) HPL_send( U, usize, partner, Cmsgid, comm ); } - } - - if( vptr ) free( vptr ); -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * End of HPL_pdlaswp00N - */ -} diff --git a/hpl/src/pgesv/HPL_pdlaswp00T.c b/hpl/src/pgesv/HPL_pdlaswp00T.c deleted file mode 100644 index 1b01a63318c552be5b8b25997f036af57736d55a..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdlaswp00T.c +++ /dev/null @@ -1,433 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdlaswp00T -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdlaswp00T -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlaswp00T applies the NB row interchanges to NN columns of the - * trailing submatrix and broadcast a column panel. - * - * Bi-directional exchange is used to perform the swap :: broadcast of - * the row panel U at once, resulting in a lower number of messages than - * usual as well as a lower communication volume. With P process rows and - * assuming bi-directional links, the running time of this function can - * be approximated by: - * - * log_2(P) * (lat + NB*LocQ(N) / bdwth) - * - * where NB is the number of rows of the row panel U, N is the global - * number of columns being updated, lat and bdwth are the latency and - * bandwidth of the network for double precision real words. Mono - * directional links will double this communication cost. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be broadcast and swapped) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be swapped and broadcast starting at - * the current position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Comm comm; - HPL_T_grid * grid; - double * A, * U, * W; - void * vptr = NULL; - int * ipID, * lindxA, * lindxAU, * llen, - * llen_sv; - unsigned int ip2, ip2_=1, ipdist, ipow=1, mask=1, - mydist, mydis_; - int Cmsgid=MSGID_BEGIN_PFACT, Np2, align, - hdim, i, icurrow, *iflag, ipA, ipW, *ipl, - iprow, jb, k, lda, ldW, myrow, n, nprow, - partner, root, size_, usize; -#define LDU n -/* .. - * .. Executable Statements .. - */ - n = Mmin( NN, PANEL->n ); jb = PANEL->jb; -/* - * Quick return if there is nothing to do - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) return; - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * Retrieve parameters from the PANEL data structure - */ - grid = PANEL->grid; nprow = grid->nprow; myrow = grid->myrow; - comm = grid->col_comm; ip2 = (unsigned int)grid->row_ip2; - hdim = grid->row_hdim; align = PANEL->algo->align; - A = PANEL->A; U = PANEL->U; iflag = PANEL->IWORK; - lda = PANEL->lda; icurrow = PANEL->prow; usize = jb * n; - ldW = n + 1; -/* - * Allocate space for temporary W (ldW * jb) - */ - vptr = (void*)malloc( ( (size_t)(align) + - ((size_t)(jb) * (size_t)(ldW))) * - sizeof(double) ); - if( vptr == NULL ) - { HPL_pabort( __LINE__, "HPL_pdlaswp00T", "Memory allocation failed" ); } - - W = (double *)HPL_PTR( vptr, ((size_t)(align) * sizeof(double) ) ); -/* - * Construct ipID and its local counter parts lindxA, lindxAU - llen is - * the number of rows/columns that I have in workspace and that I should - * send. Compute lindx_, ipA, llen if it has not already been done for - * this panel; - */ - k = (int)((unsigned int)(jb) << 1); ipl = iflag + 1; ipID = ipl + 1; - lindxA = ipID + ((unsigned int)(k) << 1); lindxAU = lindxA + k; - llen = lindxAU + k; llen_sv = llen + nprow; - - if( *iflag == -1 ) /* no index arrays have been computed so far */ - { - HPL_pipid( PANEL, ipl, ipID ); - HPL_plindx0( PANEL, *ipl, ipID, lindxA, lindxAU, llen_sv ); - *iflag = 0; - } - else if( *iflag == 1 ) /* HPL_pdlaswp01T called before: reuse ipID */ - { - HPL_plindx0( PANEL, *ipl, ipID, lindxA, lindxAU, llen_sv ); - *iflag = 0; - } -/* - * Copy the llen_sv into llen - Reset ipA to its correct value - */ - ipA = llen_sv[myrow]; - for( i = 0; i < nprow; i++ ) { llen[i] = llen_sv[i]; } -/* - * For i in [0..2*jb), lindxA[i] is the offset in A of a row that ulti- - * mately goes to U( lindxAU[i], : ) or U( :, lindxAU[i] ). In icurrow, - * we directly pack into U, otherwise we pack into workspace. The first - * entry of each column packed in workspace is in fact the row or column - * offset in U where it should go to. - */ - if( myrow == icurrow ) - { - HPL_dlaswp01T( ipA, n, A, lda, U, LDU, lindxA, lindxAU ); - } - else - { - HPL_dlaswp02N( ipA, n, A, lda, W, W+1, ldW, lindxA, lindxAU ); - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); -/* - * Algorithm for bi-directional data exchange: - * - * As long as I have not talked to a process that already had the data - * from icurrow, I will be sending the workspace, otherwise I will be - * sending U. Note that the columns in workspace contain the local index - * in U they should go to. - * - * If I am receiving from a process that has the data from icurrow, I - * will be receiving in U, copy the data of U that stays into A, and - * then the columns I have in workspace into U; otherwise I will be re- - * ceiving in the remaining workspace. If I am one of those processes - * that already has the data from icurrow, I will be immediately copying - * the data I have in my workspace into U. - * - * When I receive U, some of U should be copied in my piece of A before - * I can copy the rows I have in my workspace into U. This information - * is kept in the lists lindx_: the row lindxAU[i] should be copied in - * the row lindxA[i] of my piece of A, just as in the reversed initial - * packing operation. Those rows are thus the first ones in the work ar- - * ray. After this operation has been performed, I will not need - * those lindx arrays, and I will always be sending a buffer of size - * jb x n, or n x jb, that is, U. - * - * At every step of the algorithm, it is necesary to update the list - * llen, so that I can figure out how large the next messages I will be - * sending/receiving are. It is obvious when I am sending U. It is not - * otherwise. - * - * We choose icurrow to be the source of the bi-directional exchange. - * This allows the processes in the non-power 2 part to receive U at the - * first exchange, and then broadcast internally this U so that those - * processes can grab their piece of A. - */ - if( myrow == icurrow ) { llen[myrow] = 0; ipA = 0; } - ipW = ipA; - Np2 = ( ( size_ = nprow - ip2 ) != 0 ); - mydist = (unsigned int)MModSub( myrow, icurrow, nprow ); -/* - * bi-directional exchange: If nprow is not a power of 2, proc[i-ip2] - * receives local data from proc[i] for all i in [ip2..nprow); icurrow - * is the source, these last process indexes are relative to icurrow. - */ - if( ( Np2 != 0 ) && ( ( partner = (int)(mydist ^ ip2) ) < nprow ) ) - { - partner = MModAdd( icurrow, partner, nprow ); - - if( mydist == 0 ) /* I am the current row: I send U and recv W */ - { - (void) HPL_sdrv( U, usize, Cmsgid, W, llen[partner] * ldW, - Cmsgid, partner, comm ); - if( llen[partner] > 0 ) - HPL_dlaswp03T( llen[partner], n, U, LDU, W, W+1, ldW ); - } - else if( mydist == ip2 ) - { /* I recv U for later Bcast, I send my W */ - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, U, usize, - Cmsgid, partner, comm ); - } - else /* None of us is icurrow, we exchange our Ws */ - { - if( ( mydist & ip2 ) != 0 ) - { - (void) HPL_send( W, llen[myrow]*ldW, partner, Cmsgid, comm ); - } - else - { - (void) HPL_recv( Mptr( W, 0, ipW, ldW ), llen[partner]*ldW, - partner, Cmsgid, comm ); - if( llen[partner] > 0 ) ipW += llen[partner]; - } - } - } -/* - * Update llen - */ - for( i = 1; i < size_; i++ ) - { - iprow = MModAdd( icurrow, i, nprow ); - partner = MModAdd( iprow, (int)(ip2), nprow ); - llen[ iprow ] += llen[ partner ]; - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); -/* - * power of 2 part of the processes collection: only processes [0..ip2) - * are working; some of them (mydist >> (k+1) == 0) either send or re- - * ceive U. At every step k, k is in [0 .. hdim), of the algorithm, a - * process pair that exchanges U is such that (mydist >> (k+1) == 0). - * Among those processes, the ones that are sending U are such that - * mydist >> k == 0. - */ - if( mydist < ip2 ) - { - k = 0; - - while( k < hdim ) - { - partner = (int)(mydist ^ ipow); - partner = MModAdd( icurrow, partner, nprow ); -/* - * Exchange and combine the local results - If I receive U, then I must - * copy from U the rows that belong to my piece of A, and then update U - * by copying in it the rows I have accumulated in W. Otherwise, I re- - * ceive W. In this later case, and I have U, I shall update my copy of - * U by copying in it the rows I have accumulated in W. If I did not - * have U before, I simply need to update my pointer in W for later use. - */ - if( ( mydist >> (unsigned int)( k + 1 ) ) == 0 ) - { - if( ( mydist >> (unsigned int)(k) ) == 0 ) - { - (void) HPL_sdrv( U, usize, Cmsgid, Mptr( W, 0, ipW, - ldW ), llen[partner]*ldW, Cmsgid, - partner, comm ); - HPL_dlaswp03T( llen[partner], n, U, LDU, Mptr( W, 0, ipW, - ldW ), Mptr( W, 1, ipW, ldW ), ldW ); - ipW += llen[partner]; - } - else - { - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, U, usize, - Cmsgid, partner, comm ); - HPL_dlaswp04T( ipA, llen[myrow], n, U, LDU, A, lda, W, - W+1, ldW, lindxA, lindxAU ); - } - } - else - { - (void) HPL_sdrv( W, llen[myrow]*ldW, Cmsgid, Mptr( W, 0, - ipW, ldW ), llen[partner]*ldW, Cmsgid, - partner, comm ); - ipW += llen[partner]; - } -/* - * Update llen - Go to next process pairs - */ - iprow = icurrow; ipdist = 0; - do - { - if( (unsigned int)( partner = (int)(ipdist ^ ipow) ) > ipdist ) - { - partner = MModAdd( icurrow, partner, nprow ); - llen[iprow] += llen[partner]; - llen[partner] = llen[iprow]; - } - iprow = MModAdd( iprow, 1, nprow ); ipdist++; - - } while( ipdist < ip2 ); - - ipow <<= 1; k++; -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - } - } - else - { -/* - * non power of 2 part of the process collection: proc[ip2] broadcast U - * to procs[ip2..nprow) (relatively to icurrow). - */ - if( size_ > 1 ) - { - k = size_ - 1; - while( k > 1 ) { k >>= 1; ip2_ <<= 1; mask <<= 1; mask++; } - root = MModAdd( icurrow, (int)(ip2), nprow ); - mydis_ = (unsigned int)MModSub( myrow, root, nprow ); - - do - { - mask ^= ip2_; - if( ( mydis_ & mask ) == 0 ) - { - partner = (int)(mydis_ ^ ip2_); - if( ( mydis_ & ip2_ ) != 0 ) - { - (void) HPL_recv( U, usize, MModAdd( root, partner, - nprow ), Cmsgid, comm ); - - } - else if( partner < size_ ) - { - (void) HPL_send( U, usize, MModAdd( root, partner, - nprow ), Cmsgid, comm ); - } - } - ip2_ >>= 1; -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2_ > 0 ); - } -/* - * Every process in [ip2..nprow) (relatively to icurrow) grabs its piece - * of A. - */ - HPL_dlaswp05T( ipA, n, A, lda, U, LDU, lindxA, lindxAU ); - } -/* - * If nprow is not a power of 2, proc[i-ip2] sends global result to - * proc[i] for all i in [ip2..nprow); - */ - if( ( Np2 != 0 ) && ( ( partner = (int)(mydist ^ ip2) ) < nprow ) ) - { - partner = MModAdd( icurrow, partner, nprow ); - if( ( mydist & ip2 ) != 0 ) - { (void) HPL_recv( U, usize, partner, Cmsgid, comm ); } - else - { (void) HPL_send( U, usize, partner, Cmsgid, comm ); } - } - - if( vptr ) free( vptr ); -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * End of HPL_pdlaswp00T - */ -} diff --git a/hpl/src/pgesv/HPL_pdlaswp01N.c b/hpl/src/pgesv/HPL_pdlaswp01N.c deleted file mode 100644 index b0721ce8efd6c381611fcb5b44868a3bd4d115a9..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdlaswp01N.c +++ /dev/null @@ -1,217 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdlaswp01N -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdlaswp01N -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlaswp01N applies the NB row interchanges to NN columns of the - * trailing submatrix and broadcast a column panel. - * - * A "Spread then roll" algorithm performs the swap :: broadcast of the - * row panel U at once, resulting in a minimal communication volume and - * a "very good" use of the connectivity if available. With P process - * rows and assuming bi-directional links, the running time of this - * function can be approximated by: - * - * (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - * - * where NB is the number of rows of the row panel U, N is the global - * number of columns being updated, lat and bdwth are the latency and - * bandwidth of the network for double precision real words. K is - * a constant in (2,3] that depends on the achieved bandwidth during a - * simultaneous message exchange between two processes. An empirical - * optimistic value of K is typically 2.4. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be swapped and broadcast starting at - * the current position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * U; - int * ipID, * iplen, * ipmap, * ipmapm1, - * iwork, * lindxA = NULL, * lindxAU, - * permU; - static int equil=-1; - int icurrow, * iflag, * ipA, * ipl, jb, k, - lda, myrow, n, nprow; -#define LDU jb -/* .. - * .. Executable Statements .. - */ - n = PANEL->n; n = Mmin( NN, n ); jb = PANEL->jb; -/* - * Quick return if there is nothing to do - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) return; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * Decide whether equilibration should be performed or not - */ - if( equil == -1 ) equil = PANEL->algo->equil; -/* - * Retrieve parameters from the PANEL data structure - */ - nprow = PANEL->grid->nprow; myrow = PANEL->grid->myrow; - A = PANEL->A; U = PANEL->U; iflag = PANEL->IWORK; - lda = PANEL->lda; icurrow = PANEL->prow; -/* - * Compute ipID (if not already done for this panel). lindxA and lindxAU - * are of length at most 2*jb - iplen is of size nprow+1, ipmap, ipmapm1 - * are of size nprow, permU is of length jb, and this function needs a - * workspace of size max( 2 * jb (plindx1), nprow+1(equil)): - * 1(iflag) + 1(ipl) + 1(ipA) + 9*jb + 3*nprow + 1 + MAX(2*jb,nprow+1) - * i.e. 4 + 9*jb + 3*nprow + max(2*jb, nprow+1); - */ - k = (int)((unsigned int)(jb) << 1); ipl = iflag + 1; ipID = ipl + 1; - ipA = ipID + ((unsigned int)(k) << 1); lindxA = ipA + 1; - lindxAU = lindxA + k; iplen = lindxAU + k; ipmap = iplen + nprow + 1; - ipmapm1 = ipmap + nprow; permU = ipmapm1 + nprow; iwork = permU + jb; - - if( *iflag == -1 ) /* no index arrays have been computed so far */ - { - HPL_pipid( PANEL, ipl, ipID ); - HPL_plindx1( PANEL, *ipl, ipID, ipA, lindxA, lindxAU, iplen, - ipmap, ipmapm1, permU, iwork ); - *iflag = 1; - } - else if( *iflag == 0 ) /* HPL_pdlaswp00N called before: reuse ipID */ - { - HPL_plindx1( PANEL, *ipl, ipID, ipA, lindxA, lindxAU, iplen, - ipmap, ipmapm1, permU, iwork ); - *iflag = 1; - } - else if( ( *iflag == 1 ) && ( equil != 0 ) ) - { /* HPL_pdlaswp01N was call before only re-compute IPLEN, IPMAP */ - HPL_plindx10( PANEL, *ipl, ipID, iplen, ipmap, ipmapm1 ); - *iflag = 1; - } -/* - * Copy into U the rows to be spread (local to icurrow) - */ - if( myrow == icurrow ) - { HPL_dlaswp01N( *ipA, n, A, lda, U, LDU, lindxA, lindxAU ); } -/* - * Spread U - optionally probe for column panel - */ - HPL_spreadN( PBCST, IFLAG, PANEL, HplRight, n, U, LDU, 0, iplen, - ipmap, ipmapm1 ); -/* - * Local exchange (everywhere but in process row icurrow) - */ - if( myrow != icurrow ) - { - k = ipmapm1[myrow]; - HPL_dlaswp06N( iplen[k+1]-iplen[k], n, A, lda, Mptr( U, iplen[k], - 0, LDU ), LDU, lindxA ); - } -/* - * Equilibration - */ - if( equil != 0 ) - HPL_equil( PBCST, IFLAG, PANEL, HplNoTrans, n, U, LDU, iplen, - ipmap, ipmapm1, iwork ); -/* - * Rolling phase - */ - HPL_rollN( PBCST, IFLAG, PANEL, n, U, LDU, iplen, ipmap, ipmapm1 ); -/* - * Permute U in every process row - */ - HPL_dlaswp00N( jb, n, U, LDU, permU ); - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * End of HPL_pdlaswp01N - */ -} diff --git a/hpl/src/pgesv/HPL_pdlaswp01T.c b/hpl/src/pgesv/HPL_pdlaswp01T.c deleted file mode 100644 index 31c3cfab75d26eeb43c9db3d92a7ee8a5bcf68e6..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdlaswp01T.c +++ /dev/null @@ -1,217 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdlaswp01T -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdlaswp01T -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdlaswp01T applies the NB row interchanges to NN columns of the - * trailing submatrix and broadcast a column panel. - * - * A "Spread then roll" algorithm performs the swap :: broadcast of the - * row panel U at once, resulting in a minimal communication volume and - * a "very good" use of the connectivity if available. With P process - * rows and assuming bi-directional links, the running time of this - * function can be approximated by: - * - * (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - * - * where NB is the number of rows of the row panel U, N is the global - * number of columns being updated, lat and bdwth are the latency and - * bandwidth of the network for double precision real words. K is - * a constant in (2,3] that depends on the achieved bandwidth during a - * simultaneous message exchange between two processes. An empirical - * optimistic value of K is typically 2.4. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be swapped and broadcast starting at - * the current position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * A, * U; - int * ipID, * iplen, * ipmap, * ipmapm1, - * iwork, * lindxA = NULL, * lindxAU, - * permU; - static int equil=-1; - int icurrow, * iflag, * ipA, * ipl, jb, k, - lda, myrow, n, nprow; -#define LDU n -/* .. - * .. Executable Statements .. - */ - n = PANEL->n; n = Mmin( NN, n ); jb = PANEL->jb; -/* - * Quick return if there is nothing to do - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) return; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * Decide whether equilibration should be performed or not - */ - if( equil == -1 ) equil = PANEL->algo->equil; -/* - * Retrieve parameters from the PANEL data structure - */ - nprow = PANEL->grid->nprow; myrow = PANEL->grid->myrow; - A = PANEL->A; U = PANEL->U; iflag = PANEL->IWORK; - lda = PANEL->lda; icurrow = PANEL->prow; -/* - * Compute ipID (if not already done for this panel). lindxA and lindxAU - * are of length at most 2*jb - iplen is of size nprow+1, ipmap, ipmapm1 - * are of size nprow, permU is of length jb, and this function needs a - * workspace of size max( 2 * jb (plindx1), nprow+1(equil)): - * 1(iflag) + 1(ipl) + 1(ipA) + 9*jb + 3*nprow + 1 + MAX(2*jb,nprow+1) - * i.e. 4 + 9*jb + 3*nprow + max(2*jb, nprow+1); - */ - k = (int)((unsigned int)(jb) << 1); ipl = iflag + 1; ipID = ipl + 1; - ipA = ipID + ((unsigned int)(k) << 1); lindxA = ipA + 1; - lindxAU = lindxA + k; iplen = lindxAU + k; ipmap = iplen + nprow + 1; - ipmapm1 = ipmap + nprow; permU = ipmapm1 + nprow; iwork = permU + jb; - - if( *iflag == -1 ) /* no index arrays have been computed so far */ - { - HPL_pipid( PANEL, ipl, ipID ); - HPL_plindx1( PANEL, *ipl, ipID, ipA, lindxA, lindxAU, iplen, - ipmap, ipmapm1, permU, iwork ); - *iflag = 1; - } - else if( *iflag == 0 ) /* HPL_pdlaswp00T called before: reuse ipID */ - { - HPL_plindx1( PANEL, *ipl, ipID, ipA, lindxA, lindxAU, iplen, - ipmap, ipmapm1, permU, iwork ); - *iflag = 1; - } - else if( ( *iflag == 1 ) && ( equil != 0 ) ) - { /* HPL_pdlaswp01T was call before only re-compute IPLEN, IPMAP */ - HPL_plindx10( PANEL, *ipl, ipID, iplen, ipmap, ipmapm1 ); - *iflag = 1; - } -/* - * Copy into U the rows to be spread (local to icurrow) - */ - if( myrow == icurrow ) - { HPL_dlaswp01T( *ipA, n, A, lda, U, LDU, lindxA, lindxAU ); } -/* - * Spread U - optionally probe for column panel - */ - HPL_spreadT( PBCST, IFLAG, PANEL, HplRight, n, U, LDU, 0, iplen, - ipmap, ipmapm1 ); -/* - * Local exchange (everywhere but in process row icurrow) - */ - if( myrow != icurrow ) - { - k = ipmapm1[myrow]; - HPL_dlaswp06T( iplen[k+1]-iplen[k], n, A, lda, Mptr( U, 0, - iplen[k], LDU ), LDU, lindxA ); - } -/* - * Equilibration - */ - if( equil != 0 ) - HPL_equil( PBCST, IFLAG, PANEL, HplTrans, n, U, LDU, iplen, ipmap, - ipmapm1, iwork ); -/* - * Rolling phase - */ - HPL_rollT( PBCST, IFLAG, PANEL, n, U, LDU, iplen, ipmap, ipmapm1 ); -/* - * Permute U in every process row - */ - HPL_dlaswp10N( n, jb, U, LDU, permU ); - -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); -#endif -/* - * End of HPL_pdlaswp01T - */ -} diff --git a/hpl/src/pgesv/HPL_pdtrsv.c b/hpl/src/pgesv/HPL_pdtrsv.c deleted file mode 100644 index 69bf1f6a1f19c13d7884d8db2fc083e0c4985ce4..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdtrsv.c +++ /dev/null @@ -1,296 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdtrsv -( - HPL_T_grid * GRID, - HPL_T_pmat * AMAT -) -#else -void HPL_pdtrsv -( GRID, AMAT ) - HPL_T_grid * GRID; - HPL_T_pmat * AMAT; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdtrsv solves an upper triangular system of linear equations. - * - * The rhs is the last column of the N by N+1 matrix A. The solve starts - * in the process column owning the Nth column of A, so the rhs b may - * need to be moved one process column to the left at the beginning. The - * routine therefore needs a column vector in every process column but - * the one owning b. The result is replicated in all process rows, and - * returned in XR, i.e. XR is of size nq = LOCq( N ) in all processes. - * - * The algorithm uses decreasing one-ring broadcast in process rows and - * columns implemented in terms of synchronous communication point to - * point primitives. The lookahead of depth 1 is used to minimize the - * critical path. This entire operation is essentially ``latency'' bound - * and an estimate of its running time is given by: - * - * (move rhs) lat + N / ( P bdwth ) + - * (solve) ((N / NB)-1) 2 (lat + NB / bdwth) + - * gam2 N^2 / ( P Q ), - * - * where gam2 is an estimate of the Level 2 BLAS rate of execution. - * There are N / NB diagonal blocks. One must exchange 2 messages of - * length NB to compute the next NB entries of the vector solution, as - * well as performing a total of N^2 floating point operations. - * - * Arguments - * ========= - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * AMAT (local input/output) HPL_T_pmat * - * On entry, AMAT points to the data structure containing the - * local array information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Comm Ccomm, Rcomm; - double * A=NULL, * Aprev=NULL, * Aptr, * XC=NULL, - * XR=NULL, * Xd=NULL, * Xdprev=NULL, - * W=NULL; - int Alcol, Alrow, Anpprev, Anp, Anq, Bcol, - Cmsgid, GridIsNotPx1, GridIsNot1xQ, Rmsgid, - Wfr=0, colprev, kb, kbprev, lda, mycol, - myrow, n, n1, n1p, n1pprev=0, nb, npcol, - nprow, rowprev, tmp1, tmp2; -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PTRSV ); -#endif - if( ( n = AMAT->n ) <= 0 ) return; - nb = AMAT->nb; lda = AMAT->ld; A = AMAT->A; XR = AMAT->X; - - (void) HPL_grid_info( GRID, &nprow, &npcol, &myrow, &mycol ); - Rcomm = GRID->row_comm; Rmsgid = MSGID_BEGIN_PTRSV; - Ccomm = GRID->col_comm; Cmsgid = MSGID_BEGIN_PTRSV + 1; - GridIsNot1xQ = ( nprow > 1 ); GridIsNotPx1 = ( npcol > 1 ); -/* - * Move the rhs in the process column owning the last column of A. - */ - Mnumroc( Anp, n, nb, nb, myrow, 0, nprow ); - Mnumroc( Anq, n, nb, nb, mycol, 0, npcol ); - - tmp1 = ( n - 1 ) / nb; - Alrow = tmp1 - ( tmp1 / nprow ) * nprow; - Alcol = tmp1 - ( tmp1 / npcol ) * npcol; - kb = n - tmp1 * nb; - - Aptr = (double *)(A); XC = Mptr( Aptr, 0, Anq, lda ); - Mindxg2p( n, nb, nb, Bcol, 0, npcol ); - - if( ( Anp > 0 ) && ( Alcol != Bcol ) ) - { - if( mycol == Bcol ) - { (void) HPL_send( XC, Anp, Alcol, Rmsgid, Rcomm ); } - else if( mycol == Alcol ) - { (void) HPL_recv( XC, Anp, Bcol, Rmsgid, Rcomm ); } - } - Rmsgid = ( Rmsgid + 2 > - MSGID_END_PTRSV ? MSGID_BEGIN_PTRSV : Rmsgid + 2 ); - if( mycol != Alcol ) - { for( tmp1=0; tmp1 < Anp; tmp1++ ) XC[tmp1] = HPL_rzero; } -/* - * Set up lookahead - */ - n1 = ( npcol - 1 ) * nb; n1 = Mmax( n1, nb ); - if( Anp > 0 ) - { - W = (double*)malloc( (size_t)(Mmin( n1, Anp )) * sizeof( double ) ); - if( W == NULL ) - { HPL_pabort( __LINE__, "HPL_pdtrsv", "Memory allocation failed" ); } - Wfr = 1; - } - - Anpprev = Anp; Xdprev = XR; Aprev = Aptr = Mptr( Aptr, 0, Anq, lda ); - tmp1 = n - kb; tmp1 -= ( tmp2 = Mmin( tmp1, n1 ) ); - MnumrocI( n1pprev, tmp2, Mmax( 0, tmp1 ), nb, nb, myrow, 0, nprow ); - - if( myrow == Alrow ) { Anpprev = ( Anp -= kb ); } - if( mycol == Alcol ) - { - Aprev = ( Aptr -= lda * kb ); Anq -= kb; Xdprev = ( Xd = XR + Anq ); - if( myrow == Alrow ) - { - HPL_dtrsv( HplColumnMajor, HplUpper, HplNoTrans, HplNonUnit, - kb, Aptr+Anp, lda, XC+Anp, 1 ); - HPL_dcopy( kb, XC+Anp, 1, Xd, 1 ); - } - } - - rowprev = Alrow; Alrow = MModSub1( Alrow, nprow ); - colprev = Alcol; Alcol = MModSub1( Alcol, npcol ); - kbprev = kb; n -= kb; - tmp1 = n - ( kb = nb ); tmp1 -= ( tmp2 = Mmin( tmp1, n1 ) ); - MnumrocI( n1p, tmp2, Mmax( 0, tmp1 ), nb, nb, myrow, 0, nprow ); -/* - * Start the operations - */ - while( n > 0 ) - { - if( mycol == Alcol ) { Aptr -= lda * kb; Anq -= kb; Xd = XR + Anq; } - if( myrow == Alrow ) { Anp -= kb; } -/* - * Broadcast (decreasing-ring) of previous solution block in previous - * process column, compute partial update of current block and send it - * to current process column. - */ - if( mycol == colprev ) - { -/* - * Send previous solution block in process row above - */ - if( myrow == rowprev ) - { - if( GridIsNot1xQ ) - (void) HPL_send( Xdprev, kbprev, MModSub1( myrow, nprow ), - Cmsgid, Ccomm ); - } - else - { - (void) HPL_recv( Xdprev, kbprev, MModAdd1( myrow, nprow ), - Cmsgid, Ccomm ); - } -/* - * Compute partial update of previous solution block and send it to cur- - * rent column - */ - if( n1pprev > 0 ) - { - tmp1 = Anpprev - n1pprev; - HPL_dgemv( HplColumnMajor, HplNoTrans, n1pprev, kbprev, - -HPL_rone, Aprev+tmp1, lda, Xdprev, 1, HPL_rone, - XC+tmp1, 1 ); - if( GridIsNotPx1 ) - (void) HPL_send( XC+tmp1, n1pprev, Alcol, Rmsgid, Rcomm ); - } -/* - * Finish the (decreasing-ring) broadcast of the solution block in pre- - * vious process column - */ - if( ( myrow != rowprev ) && - ( myrow != MModAdd1( rowprev, nprow ) ) ) - (void) HPL_send( Xdprev, kbprev, MModSub1( myrow, nprow ), - Cmsgid, Ccomm ); - } - else if( mycol == Alcol ) - { -/* - * Current column receives and accumulates partial update of previous - * solution block - */ - if( n1pprev > 0 ) - { - (void) HPL_recv( W, n1pprev, colprev, Rmsgid, Rcomm ); - HPL_daxpy( n1pprev, HPL_rone, W, 1, XC+Anpprev-n1pprev, 1 ); - } - } -/* - * Solve current diagonal block - */ - if( ( mycol == Alcol ) && ( myrow == Alrow ) ) - { - HPL_dtrsv( HplColumnMajor, HplUpper, HplNoTrans, HplNonUnit, - kb, Aptr+Anp, lda, XC+Anp, 1 ); - HPL_dcopy( kb, XC+Anp, 1, XR+Anq, 1 ); - } -/* -* Finish previous update -*/ - if( ( mycol == colprev ) && ( ( tmp1 = Anpprev - n1pprev ) > 0 ) ) - HPL_dgemv( HplColumnMajor, HplNoTrans, tmp1, kbprev, -HPL_rone, - Aprev, lda, Xdprev, 1, HPL_rone, XC, 1 ); -/* -* Save info of current step and update info for the next step -*/ - if( mycol == Alcol ) { Xdprev = Xd; Aprev = Aptr; } - if( myrow == Alrow ) { Anpprev -= kb; } - rowprev = Alrow; colprev = Alcol; - n1pprev = n1p; kbprev = kb; n -= kb; - Alrow = MModSub1( Alrow, nprow ); Alcol = MModSub1( Alcol, npcol ); - tmp1 = n - ( kb = nb ); tmp1 -= ( tmp2 = Mmin( tmp1, n1 ) ); - MnumrocI( n1p, tmp2, Mmax( 0, tmp1 ), nb, nb, myrow, 0, nprow ); - - Rmsgid = ( Rmsgid+2 > MSGID_END_PTRSV ? - MSGID_BEGIN_PTRSV : Rmsgid+2 ); - Cmsgid = ( Cmsgid+2 > MSGID_END_PTRSV ? - MSGID_BEGIN_PTRSV+1 : Cmsgid+2 ); - } -/* - * Replicate last solution block - */ - if( mycol == colprev ) - (void) HPL_broadcast( (void *)(XR), kbprev, HPL_DOUBLE, rowprev, - Ccomm ); - - if( Wfr ) free( W ); -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_PTRSV ); -#endif -/* - * End of HPL_pdtrsv - */ -} diff --git a/hpl/src/pgesv/HPL_pdupdateNN.c b/hpl/src/pgesv/HPL_pdupdateNN.c deleted file mode 100644 index 1af485295dbc38e951f149487980b4633504180d..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdupdateNN.c +++ /dev/null @@ -1,442 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdupdateNN -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdupdateNN -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdupdateNN broadcast - forward the panel PBCST and simultaneously - * applies the row interchanges and updates part of the trailing (using - * the panel PANEL) submatrix. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local output) int * - * On exit, IFLAG indicates whether or not the broadcast has - * been completed when PBCST is not NULL on entry. In that case, - * IFLAG is left unchanged. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be updated) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be updated starting at the current - * position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * Aptr, * L1ptr, * L2ptr, * Uptr, * dpiv; - int * ipiv; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Lv0, * Lv1, * Uv0, * Uv1; -#endif - int curr, i, iroff, jb, lda, ldl2, mp, n, nb, - nq0, nn, test; - static int tswap = 0; - static HPL_T_SWAP fswap = HPL_NO_SWP; -#define LDU jb -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - nb = PANEL->nb; jb = PANEL->jb; n = PANEL->nq; lda = PANEL->lda; - if( NN >= 0 ) n = Mmin( NN, n ); -/* - * There is nothing to update, enforce the panel broadcast. - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) - { - if( PBCST != NULL ) - { - do { (void) HPL_bcast( PBCST, IFLAG ); } - while( *IFLAG != HPL_SUCCESS ); - } -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - return; - } -/* - * Enable/disable the column panel probing mechanism - */ - (void) HPL_bcast( PBCST, &test ); -/* - * 1 x Q case - */ - if( PANEL->grid->nprow == 1 ) - { - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - ldl2 = PANEL->ldl2; dpiv = PANEL->DPIV; ipiv = PANEL->IWORK; - mp = PANEL->mp - jb; iroff = PANEL->ii; nq0 = 0; -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif - for( i = 0; i < jb; i++ ) { ipiv[i] = (int)(dpiv[i]) - iroff; } -/* - * So far we have not updated anything - test availability of the panel - * to be forwarded - If detected forward it and finish the update in one - * step. - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); -/* - * Update nb columns at a time - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - else /* nprow > 1 ... */ - { -/* - * Selection of the swapping algorithm - swap:broadcast U. - */ - if( fswap == HPL_NO_SWP ) - { fswap = PANEL->algo->fswap; tswap = PANEL->algo->fsthr; } - - if( ( fswap == HPL_SWAP01 ) || - ( ( fswap == HPL_SW_MIX ) && ( n > tswap ) ) ) - { HPL_pdlaswp01N( PBCST, &test, PANEL, n ); } - else - { HPL_pdlaswp00N( PBCST, &test, PANEL, n ); } -/* - * Compute redundantly row block of U and update trailing submatrix - */ - nq0 = 0; curr = ( PANEL->grid->myrow == PANEL->prow ? 1 : 0 ); - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - Uptr = PANEL->U; ldl2 = PANEL->ldl2; - mp = PANEL->mp - ( curr != 0 ? jb : 0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->Ublock, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); - Uv0 = vsip_mbind_d( PANEL->Ublock, 0, 1, LDU, LDU, n ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif -/* - * Broadcast has not occured yet, spliting the computational part - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); - - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Uptr, LDU ); - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlacpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - Uptr = Mptr( Uptr, 0, nn, LDU ); - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlacpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Uv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Uv0 ); - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - - PANEL->A = Mptr( PANEL->A, 0, n, lda ); PANEL->nq -= n; PANEL->jj += n; -/* - * return the outcome of the probe (should always be HPL_SUCCESS, the - * panel broadcast is enforced in that routine). - */ - if( PBCST != NULL ) *IFLAG = test; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif -/* - * End of HPL_pdupdateNN - */ -} diff --git a/hpl/src/pgesv/HPL_pdupdateNT.c b/hpl/src/pgesv/HPL_pdupdateNT.c deleted file mode 100644 index 3b35cd2411c254987aa9547a58e7afe7ede95ec0..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdupdateNT.c +++ /dev/null @@ -1,443 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdupdateNT -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdupdateNT -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdupdateNT broadcast - forward the panel PBCST and simultaneously - * applies the row interchanges and updates part of the trailing (using - * the panel PANEL) submatrix. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local output) int * - * On exit, IFLAG indicates whether or not the broadcast has - * been completed when PBCST is not NULL on entry. In that case, - * IFLAG is left unchanged. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be updated) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be updated starting at the current - * position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * Aptr, * L1ptr, * L2ptr, * Uptr, * dpiv; - int * ipiv; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Lv0, * Lv1, * Uv0, * Uv1; -#endif - int curr, i, iroff, jb, lda, ldl2, mp, n, nb, - nq0, nn, test; - static int tswap = 0; - static HPL_T_SWAP fswap = HPL_NO_SWP; -#define LDU n -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - nb = PANEL->nb; jb = PANEL->jb; n = PANEL->nq; lda = PANEL->lda; - if( NN >= 0 ) n = Mmin( NN, n ); -/* - * There is nothing to update, enforce the panel broadcast. - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) - { - if( PBCST != NULL ) - { - do { (void) HPL_bcast( PBCST, IFLAG ); } - while( *IFLAG != HPL_SUCCESS ); - } -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - return; - } -/* - * Enable/disable the column panel probing mechanism - */ - (void) HPL_bcast( PBCST, &test ); -/* - * 1 x Q case - */ - if( PANEL->grid->nprow == 1 ) - { - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - ldl2 = PANEL->ldl2; dpiv = PANEL->DPIV; ipiv = PANEL->IWORK; - mp = PANEL->mp - jb; iroff = PANEL->ii; nq0 = 0; -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif - for( i = 0; i < jb; i++ ) { ipiv[i] = (int)(dpiv[i]) - iroff; } -/* - * So far we have not updated anything - test availability of the panel - * to be forwarded - If detected forward it and finish the update in one - * step. - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); -/* - * Update nb columns at a time - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplLower, HplNoTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - else /* nprow > 1 ... */ - { -/* - * Selection of the swapping algorithm - swap:broadcast U. - */ - if( fswap == HPL_NO_SWP ) - { fswap = PANEL->algo->fswap; tswap = PANEL->algo->fsthr; } - - if( ( fswap == HPL_SWAP01 ) || - ( ( fswap == HPL_SW_MIX ) && ( n > tswap ) ) ) - { HPL_pdlaswp01T( PBCST, &test, PANEL, n ); } - else - { HPL_pdlaswp00T( PBCST, &test, PANEL, n ); } -/* - * Compute redundantly row block of U and update trailing submatrix - */ - nq0 = 0; curr = ( PANEL->grid->myrow == PANEL->prow ? 1 : 0 ); - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - Uptr = PANEL->U; ldl2 = PANEL->ldl2; - mp = PANEL->mp - ( curr != 0 ? jb : 0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->Ublock, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); - Uv0 = vsip_mbind_d( PANEL->Ublock, 0, 1, LDU, LDU, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif -/* - * Broadcast has not occured yet, spliting the computational part - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); - - HPL_dtrsm( HplColumnMajor, HplRight, HplLower, HplTrans, - HplUnit, nn, jb, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlatcpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - Uptr = Mptr( Uptr, nn, 0, LDU ); - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { - HPL_dtrsm( HplColumnMajor, HplRight, HplLower, HplTrans, - HplUnit, nn, jb, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlatcpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Uv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Uv0 ); - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - - PANEL->A = Mptr( PANEL->A, 0, n, lda ); PANEL->nq -= n; PANEL->jj += n; -/* - * return the outcome of the probe (should always be HPL_SUCCESS, the - * panel broadcast is enforced in that routine). - */ - if( PBCST != NULL ) *IFLAG = test; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif -/* - * End of HPL_pdupdateNT - */ -} diff --git a/hpl/src/pgesv/HPL_pdupdateTN.c b/hpl/src/pgesv/HPL_pdupdateTN.c deleted file mode 100644 index 324ffaa3119f14c083f6cfa6f1b7c2037f77f72e..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdupdateTN.c +++ /dev/null @@ -1,443 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdupdateTN -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdupdateTN -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdupdateTN broadcast - forward the panel PBCST and simultaneously - * applies the row interchanges and updates part of the trailing (using - * the panel PANEL) submatrix. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local output) int * - * On exit, IFLAG indicates whether or not the broadcast has - * been completed when PBCST is not NULL on entry. In that case, - * IFLAG is left unchanged. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be updated) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be updated starting at the current - * position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * Aptr, * L1ptr, * L2ptr, * Uptr, * dpiv; - int * ipiv; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Lv0, * Lv1, * Uv0, * Uv1; -#endif - int curr, i, iroff, jb, lda, ldl2, mp, n, nb, - nq0, nn, test; - static int tswap = 0; - static HPL_T_SWAP fswap = HPL_NO_SWP; -#define LDU jb -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - nb = PANEL->nb; jb = PANEL->jb; n = PANEL->nq; lda = PANEL->lda; - if( NN >= 0 ) n = Mmin( NN, n ); -/* - * There is nothing to update, enforce the panel broadcast. - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) - { - if( PBCST != NULL ) - { - do { (void) HPL_bcast( PBCST, IFLAG ); } - while( *IFLAG != HPL_SUCCESS ); - } -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - return; - } -/* - * Enable/disable the column panel probing mechanism - */ - (void) HPL_bcast( PBCST, &test ); -/* - * 1 x Q case - */ - if( PANEL->grid->nprow == 1 ) - { - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - ldl2 = PANEL->ldl2; dpiv = PANEL->DPIV; ipiv = PANEL->IWORK; - mp = PANEL->mp - jb; iroff = PANEL->ii; nq0 = 0; -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif - for( i = 0; i < jb; i++ ) { ipiv[i] = (int)(dpiv[i]) - iroff; } -/* - * So far we have not updated anything - test availability of the panel - * to be forwarded - If detected forward it and finish the update in one - * step. - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); -/* - * Update nb columns at a time - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - else /* nprow > 1 ... */ - { -/* - * Selection of the swapping algorithm - swap:broadcast U. - */ - if( fswap == HPL_NO_SWP ) - { fswap = PANEL->algo->fswap; tswap = PANEL->algo->fsthr; } - - if( ( fswap == HPL_SWAP01 ) || - ( ( fswap == HPL_SW_MIX ) && ( n > tswap ) ) ) - { HPL_pdlaswp01N( PBCST, &test, PANEL, n ); } - else - { HPL_pdlaswp00N( PBCST, &test, PANEL, n ); } -/* - * Compute redundantly row block of U and update trailing submatrix - */ - nq0 = 0; curr = ( PANEL->grid->myrow == PANEL->prow ? 1 : 0 ); - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - Uptr = PANEL->U; ldl2 = PANEL->ldl2; - mp = PANEL->mp - ( curr != 0 ? jb : 0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->Ublock, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); - Uv0 = vsip_mbind_d( PANEL->Ublock, 0, 1, LDU, LDU, n ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif -/* - * Broadcast has not occured yet, spliting the computational part - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); - - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlacpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - Uptr = Mptr( Uptr, 0, nn, LDU ); - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlacpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, 0, nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Uv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Uv0 ); - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - - PANEL->A = Mptr( PANEL->A, 0, n, lda ); PANEL->nq -= n; PANEL->jj += n; -/* - * return the outcome of the probe (should always be HPL_SUCCESS, the - * panel broadcast is enforced in that routine). - */ - if( PBCST != NULL ) *IFLAG = test; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif -/* - * End of HPL_pdupdateTN - */ -} diff --git a/hpl/src/pgesv/HPL_pdupdateTT.c b/hpl/src/pgesv/HPL_pdupdateTT.c deleted file mode 100644 index 19a620f13c7e87f96bef48db6035eda278c15107..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pdupdateTT.c +++ /dev/null @@ -1,443 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdupdateTT -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int NN -) -#else -void HPL_pdupdateTT -( PBCST, IFLAG, PANEL, NN ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int NN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdupdateTT broadcast - forward the panel PBCST and simultaneously - * applies the row interchanges and updates part of the trailing (using - * the panel PANEL) submatrix. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local output) int * - * On exit, IFLAG indicates whether or not the broadcast has - * been completed when PBCST is not NULL on entry. In that case, - * IFLAG is left unchanged. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be updated) information. - * - * NN (local input) const int - * On entry, NN specifies the local number of columns of the - * trailing submatrix to be updated starting at the current - * position. NN must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double * Aptr, * L1ptr, * L2ptr, * Uptr, * dpiv; - int * ipiv; -#ifdef HPL_CALL_VSIPL - vsip_mview_d * Av0, * Av1, * Lv0, * Lv1, * Uv0, * Uv1; -#endif - int curr, i, iroff, jb, lda, ldl2, mp, n, nb, - nq0, nn, test; - static int tswap = 0; - static HPL_T_SWAP fswap = HPL_NO_SWP; -#define LDU n -/* .. - * .. Executable Statements .. - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - nb = PANEL->nb; jb = PANEL->jb; n = PANEL->nq; lda = PANEL->lda; - if( NN >= 0 ) n = Mmin( NN, n ); -/* - * There is nothing to update, enforce the panel broadcast. - */ - if( ( n <= 0 ) || ( jb <= 0 ) ) - { - if( PBCST != NULL ) - { - do { (void) HPL_bcast( PBCST, IFLAG ); } - while( *IFLAG != HPL_SUCCESS ); - } -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif - return; - } -/* - * Enable/disable the column panel probing mechanism - */ - (void) HPL_bcast( PBCST, &test ); -/* - * 1 x Q case - */ - if( PANEL->grid->nprow == 1 ) - { - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - ldl2 = PANEL->ldl2; dpiv = PANEL->DPIV; ipiv = PANEL->IWORK; - mp = PANEL->mp - jb; iroff = PANEL->ii; nq0 = 0; -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif - for( i = 0; i < jb; i++ ) { ipiv[i] = (int)(dpiv[i]) - iroff; } -/* - * So far we have not updated anything - test availability of the panel - * to be forwarded - If detected forward it and finish the update in one - * step. - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); -/* - * Update nb columns at a time - */ -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_LASWP ); - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); - HPL_ptimer( HPL_TIMING_LASWP ); -#else - HPL_dlaswp00N( jb, nn, Aptr, lda, ipiv ); -#endif - HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper, HplTrans, - HplUnit, jb, nn, HPL_rone, L1ptr, jb, Aptr, lda ); -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, jb, nn ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_NTRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Aptr, lda, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - else /* nprow > 1 ... */ - { -/* - * Selection of the swapping algorithm - swap:broadcast U. - */ - if( fswap == HPL_NO_SWP ) - { fswap = PANEL->algo->fswap; tswap = PANEL->algo->fsthr; } - - if( ( fswap == HPL_SWAP01 ) || - ( ( fswap == HPL_SW_MIX ) && ( n > tswap ) ) ) - { HPL_pdlaswp01T( PBCST, &test, PANEL, n ); } - else - { HPL_pdlaswp00T( PBCST, &test, PANEL, n ); } -/* - * Compute redundantly row block of U and update trailing submatrix - */ - nq0 = 0; curr = ( PANEL->grid->myrow == PANEL->prow ? 1 : 0 ); - Aptr = PANEL->A; L2ptr = PANEL->L2; L1ptr = PANEL->L1; - Uptr = PANEL->U; ldl2 = PANEL->ldl2; - mp = PANEL->mp - ( curr != 0 ? jb : 0 ); -#ifdef HPL_CALL_VSIPL -/* - * Admit the blocks - */ - (void) vsip_blockadmit_d( PANEL->Ablock, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->L2block, VSIP_TRUE ); - (void) vsip_blockadmit_d( PANEL->Ublock, VSIP_TRUE ); -/* - * Create the matrix views - */ - Av0 = vsip_mbind_d( PANEL->Ablock, 0, 1, lda, lda, PANEL->pmat->nq ); - Lv0 = vsip_mbind_d( PANEL->L2block, 0, 1, ldl2, ldl2, jb ); - Uv0 = vsip_mbind_d( PANEL->Ublock, 0, 1, LDU, LDU, jb ); -/* - * Create the matrix subviews - */ - Lv1 = vsip_msubview_d( Lv0, 0, 0, mp, jb ); -#endif -/* - * Broadcast has not occured yet, spliting the computational part - */ - while ( test == HPL_KEEP_TESTING ) - { - nn = n - nq0; nn = Mmin( nb, nn ); - - HPL_dtrsm( HplColumnMajor, HplRight, HplUpper, HplNoTrans, - HplUnit, nn, jb, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlatcpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - Uptr = Mptr( Uptr, nn, 0, LDU ); - Aptr = Mptr( Aptr, 0, nn, lda ); nq0 += nn; - - (void) HPL_bcast( PBCST, &test ); - } -/* - * The panel has been forwarded at that point, finish the update - */ - if( ( nn = n - nq0 ) > 0 ) - { - HPL_dtrsm( HplColumnMajor, HplRight, HplUpper, HplNoTrans, - HplUnit, nn, jb, HPL_rone, L1ptr, jb, Uptr, LDU ); - - if( curr != 0 ) - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii+jb, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Mptr( Aptr, jb, 0, lda ), lda ); -#endif - HPL_dlatcpy( jb, nn, Uptr, LDU, Aptr, lda ); - } - else - { -#ifdef HPL_CALL_VSIPL -/* - * Create the matrix subviews - */ - Uv1 = vsip_msubview_d( Uv0, nq0, 0, nn, jb ); - Av1 = vsip_msubview_d( Av0, PANEL->ii, PANEL->jj+nq0, mp, nn ); - - vsip_gemp_d( -HPL_rone, Lv1, VSIP_MAT_NTRANS, Uv1, VSIP_MAT_TRANS, - HPL_rone, Av1 ); -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Av1 ); - (void) vsip_mdestroy_d( Uv1 ); -#else - HPL_dgemm( HplColumnMajor, HplNoTrans, HplTrans, mp, nn, - jb, -HPL_rone, L2ptr, ldl2, Uptr, LDU, HPL_rone, - Aptr, lda ); -#endif - } - } -#ifdef HPL_CALL_VSIPL -/* - * Destroy the matrix subviews - */ - (void) vsip_mdestroy_d( Lv1 ); -/* - * Release the blocks - */ - (void) vsip_blockrelease_d( vsip_mgetblock_d( Uv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Lv0 ), VSIP_TRUE ); - (void) vsip_blockrelease_d( vsip_mgetblock_d( Av0 ), VSIP_TRUE ); -/* - * Destroy the matrix views - */ - (void) vsip_mdestroy_d( Uv0 ); - (void) vsip_mdestroy_d( Lv0 ); - (void) vsip_mdestroy_d( Av0 ); -#endif - } - - PANEL->A = Mptr( PANEL->A, 0, n, lda ); PANEL->nq -= n; PANEL->jj += n; -/* - * return the outcome of the probe (should always be HPL_SUCCESS, the - * panel broadcast is enforced in that routine). - */ - if( PBCST != NULL ) *IFLAG = test; -#ifdef HPL_DETAILED_TIMING - HPL_ptimer( HPL_TIMING_UPDATE ); -#endif -/* - * End of HPL_pdupdateTT - */ -} diff --git a/hpl/src/pgesv/HPL_perm.c b/hpl/src/pgesv/HPL_perm.c deleted file mode 100644 index 597eb6f47ce74e97c80a5dcb26dd79c1a36cd47d..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_perm.c +++ /dev/null @@ -1,131 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_perm -( - const int N, - int * LINDXA, - int * LINDXAU, - int * IWORK -) -#else -void HPL_perm -( N, LINDXA, LINDXAU, IWORK ) - const int N; - int * LINDXA; - int * LINDXAU; - int * IWORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_perm combines two index arrays and generate the corresponding - * permutation. First, this function computes the inverse of LINDXA, and - * then combine it with LINDXAU. Second, in order to be able to perform - * the permutation in place, LINDXAU is overwritten by the sequence of - * permutation producing the same result. What we ultimately want to - * achieve is: U[LINDXAU[i]] := U[LINDXA[i]] for i in [0..N). After the - * call to this function, this in place permutation can be performed by - * for i in [0..N) swap U[i] with U[LINDXAU[i]]. - * - * Arguments - * ========= - * - * N (global input) const int - * On entry, N specifies the length of the arrays LINDXA and - * LINDXAU. N should be at least zero. - * - * LINDXA (global input/output) int * - * On entry, LINDXA is an array of dimension N containing the - * source indexes. On exit, LINDXA contains the combined index - * array. - * - * LINDXAU (global input/output) int * - * On entry, LINDXAU is an array of dimension N containing the - * target indexes. On exit, LINDXAU contains the sequence of - * permutation, that should be applied in increasing order to - * permute the underlying array U in place. - * - * IWORK (workspace) int * - * On entry, IWORK is a workarray of dimension N. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, j, k, fndd; -/* .. - * .. Executable Statements .. - */ -/* - * Inverse LINDXA - combine LINDXA and LINDXAU - Initialize IWORK - */ - for( i = 0; i < N; i++ ) { IWORK[LINDXA[i]] = i; } - for( i = 0; i < N; i++ ) { LINDXA[i] = LINDXAU[IWORK[i]]; IWORK[i] = i; } - - for( i = 0; i < N; i++ ) - { - /* search LINDXA such that LINDXA[j] == i */ - j = 0; do { fndd = ( LINDXA[j] == i ); j++; } while( !fndd ); j--; - /* search IWORK such that IWORK[k] == j */ - k = 0; do { fndd = ( IWORK[k] == j ); k++; } while( !fndd ); k--; - /* swap IWORK[i] and IWORK[k]; LINDXAU[i] = k */ - j = IWORK[i]; IWORK[i] = IWORK[k]; IWORK[k] = j; - LINDXAU[i] = k; - } -/* - * End of HPL_perm - */ -} diff --git a/hpl/src/pgesv/HPL_pipid.c b/hpl/src/pgesv/HPL_pipid.c deleted file mode 100644 index b1577ada0cabb3e79f21d6b5c597330d3aad164f..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_pipid.c +++ /dev/null @@ -1,187 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pipid -( - HPL_T_panel * PANEL, - int * K, - int * IPID -) -#else -void HPL_pipid -( PANEL, K, IPID ) - HPL_T_panel * PANEL; - int * K; - int * IPID; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pipid computes an array IPID that contains the source and final - * destination of matrix rows resulting from the application of N - * interchanges as computed by the LU factorization with row partial - * pivoting. The array IPID is such that the row of global index IPID(i) - * should be mapped onto the row of global index IPID(i+1). Note that we - * cannot really know the length of IPID a priori. However, we know that - * this array is at least 2*N long, since there are N rows to swap and - * broadcast. The length of this array must be smaller than or equal to - * 4*N, since every row is swapped with at most a single distinct remote - * row. The algorithm constructing IPID goes as follows: Let IA be the - * global index of the first row to be swapped. - * - * For every row src IA + i with i in [0..N) to be swapped with row dst - * such that dst is given by DPIV[i]: - * - * Is row src the destination of a previous row of the current block, - * that is, is there k odd such that IPID(k) is equal to src ? - * Yes: update this destination with dst. For example, if the - * pivot array is (0,2)(1,1)(2,5) ... , then when we swap rows 2 and 5, - * we swap in fact row 0 and 5, i.e., row 0 goes to 5 and not 2 as it - * was thought so far ... - * No : add the pair (src,dst) at the end of IPID; row src has not - * been moved yet. - * - * Is row dst different from src the destination of a previous row of - * the current block, i.e., is there k odd such that IPID(k) is equal to - * dst ? - * Yes: update IPID(k) with src. For example, if the pivot array - * is (0,5)(1,1)(2,5) ... , then when we swap rows 2 and 5, we swap in - * fact row 2 and 0, i.e., row 0 goes to 2 and not 5 as it was thought - * so far ... - * No : add the pair (dst,src) at the end of IPID; row dst has not - * been moved yet. - * - * Note that when src is equal to dst, the pair (dst,src) should not be - * added to IPID in order to avoid duplicated entries in this array. - * During the construction of the array IPID, we make sure that the - * first N entries are such that IPID(k) with k odd is equal to IA+k/2. - * For k in [0..K/2), the row of global index IPID(2*k) should be - * mapped onto the row of global index IPID(2*k+1). - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * K (global output) int * - * On exit, K specifies the number of entries in IPID. K is at - * least 2*N, and at most 4*N. - * - * IPID (global output) int * - * On entry, IPID is an array of length 4*N. On exit, the first - * K entries of that array contain the src and final destination - * resulting from the application of the N interchanges as - * specified by DPIV. The pairs (src,dst) are contiguously - * stored and sorted so that IPID(2*i+1) is equal to IA+i with i - * in [0..N) - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int dst, fndd, fnds, ia, i, j, jb, lst, off, - src; - double * dpiv; -/* .. - * .. Executable Statements .. - */ - dpiv = PANEL->DPIV; jb = PANEL->jb; src = ia = PANEL->ia; - dst = (int)(dpiv[0]); IPID[0] = dst; IPID[1] = src; *K = 2; - if( src != dst ) { IPID[2] = src; IPID[3] = dst; *K += 2; } - - for( i = 1; i < jb; i++ ) - { - fnds = 0; j = 1; - - if( ( src = ia + i ) == ( dst = (int)(dpiv[i]) ) ) - { - do { if( src == IPID[j] ) { fnds = j; } else { j += 2; } } - while( !( fnds ) && ( j < *K ) ); - if( !fnds ) { lst = *K; off = 2; IPID[lst] = src; } - else { lst = fnds-1; off = 0; } - IPID[lst+1] = dst; - } - else - { - fndd = 0; - do - { - if ( src == IPID[j] ) { fnds = j; } - else if( dst == IPID[j] ) { fndd = j; } - j += 2; - } - while( ( !( fnds ) || !( fndd ) ) && ( j < *K ) ); - if( !fnds ) { IPID[*K] = src; IPID[*K+1] = dst; off = 2; } - else { IPID[fnds] = dst; off = 0; } - if( !fndd ) { lst = *K+off; IPID[lst ] = dst; off += 2; } - else { lst = fndd-1; } - IPID[lst+1] = src; - } -/* - * Enforce IPID(1,i) equal to src = ia + i - */ - if( lst != ( j = ( i << 1 ) ) ) - { - src = IPID[j ]; IPID[j ] = IPID[lst ]; IPID[lst ] = src; - dst = IPID[j+1]; IPID[j+1] = IPID[lst+1]; IPID[lst+1] = dst; - } - *K += off; - } -/* - * End of HPL_pipid - */ -} diff --git a/hpl/src/pgesv/HPL_plindx0.c b/hpl/src/pgesv/HPL_plindx0.c deleted file mode 100644 index 22ba6a9b205bdf18fac62cdb0791874e006954b2..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_plindx0.c +++ /dev/null @@ -1,281 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_plindx0 -( - HPL_T_panel * PANEL, - const int K, - int * IPID, - int * LINDXA, - int * LINDXAU, - int * LLEN -) -#else -void HPL_plindx0 -( PANEL, K, IPID, LINDXA, LINDXAU, LLEN ) - HPL_T_panel * PANEL; - const int K; - int * IPID; - int * LINDXA; - int * LINDXAU; - int * LLEN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_plindx0 computes two local arrays LINDXA and LINDXAU containing - * the local source and final destination position resulting from the - * application of row interchanges. - * - * On entry, the array IPID of length K is such that the row of global - * index IPID(i) should be mapped onto row of global index IPID(i+1). - * Let IA be the global index of the first row to be swapped. For k in - * [0..K/2), the row of global index IPID(2*k) should be mapped onto the - * row of global index IPID(2*k+1). The question then, is to determine - * which rows should ultimately be part of U. - * - * First, some rows of the process ICURROW may be swapped locally. One - * of this row belongs to U, the other one belongs to my local piece of - * A. The other rows of the current block are swapped with remote rows - * and are thus not part of U. These rows however should be sent along, - * and grabbed by the other processes as we progress in the exchange - * phase. - * - * So, assume that I am ICURROW and consider a row of index IPID(2*i) - * that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less - * than N, this row is locally swapped and should be copied into U at - * the position IPID(2*i+1) - IA. No row will be exchanged for this one. - * If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be - * locally copied into my local piece of A at the position corresponding - * to the row of global index IPID(2*i+1). - * - * If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) - * is to be swapped away and strictly speaking does not belong to U, but - * to A remotely. Since this process will however send this array U, - * this row is copied into U, exactly where the row IPID(2*i+1) should - * go. For this, we search IPID for k1, such that IPID(2*k1) is equal to - * IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position - * IPID(2*k1+1)-IA. - * - * It is thus important to put the rows that go into U, i.e., such that - * IPID(2*i+1) - IA is less than N at the begining of the array IPID. By - * doing so, U is formed, and the local copy is performed in just one - * sweep. - * - * Two lists LINDXA and LINDXAU are built. LINDXA contains the local - * index of the rows I have that should be copied. LINDXAU contains the - * local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A - * is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) - * of A should be locally copied into A(-LINDXAU(k),:). In the process - * ICURROW, the initial packing algorithm proceeds as follows. - * - * for all entries in IPID, - * if IPID(2*i) is in ICURROW, - * if IPID(2*i+1) is in ICURROW, - * if( IPID(2*i+1) - IA < N ) - * save corresponding local position - * of this row (LINDXA); - * save local position (LINDXAU) in U - * where this row goes; - * [copy row IPID(2*i) in U at position - * IPID(2*i+1)-IA; ]; - * else - * save corresponding local position of - * this row (LINDXA); - * save local position (-LINDXAU) in A - * where this row goes; - * [copy row IPID(2*i) in my piece of A - * at IPID(2*i+1);] - * end if - * else - * find k1 such that IPID(2*k1) = IPID(2*i+1); - * copy row IPID(2*i) in U at position - * IPID(2*k1+1)-IA; - * save corresponding local position of this - * row (LINDXA); - * save local position (LINDXAU) in U where - * this row goes; - * end if - * end if - * end for - * - * Second, if I am not the current row process ICURROW, all source rows - * in IPID that I own are part of U. Indeed, they are swapped with one - * row of the current block of rows, and the main factorization - * algorithm proceeds one row after each other. The processes different - * from ICURROW, should exchange and accumulate those rows until they - * receive some data previously owned by the process ICURROW. - * - * In processes different from ICURROW, the initial packing algorithm - * proceeds as follows. Consider a row of global index IPID(2*i) that I - * own. When I will be receiving data previously owned by ICURROW, i.e., - * U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, - * and this particular row of U should be first copied into my piece of - * A, at A(il,:), where il is the local row index corresponding to - * IPID(2*i). Now,initially, this row will be packed into workspace, say - * as the kth row of that work array. The following algorithm sets - * LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row - * should be copied. LINDXA(k) stores the local index in A where this - * row of U should be copied, i.e il. - * - * for all entries in IPID, - * if IPID(2*i) is not in ICURROW, - * copy row IPID(2*i) in work array; - * save corresponding local position - * of this row (LINDXA); - * save position (LINDXAU) in U where - * this row should be copied; - * end if - * end for - * - * Since we are at it, we also globally figure out how many rows every - * process has. That is necessary, because it would rather be cumbersome - * to figure it on the fly during the bi-directional exchange phase. - * This information is kept in the array LLEN of size NPROW. Also note - * that the arrays LINDXA and LINDXAU are of max length equal to 2*N. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * K (global input) const int - * On entry, K specifies the number of entries in IPID. K is at - * least 2*N, and at most 4*N. - * - * IPID (global input) int * - * On entry, IPID is an array of length K. The first K entries - * of that array contain the src and final destination resulting - * from the application of the interchanges. - * - * LINDXA (local output) int * - * On entry, LINDXA is an array of dimension 2*N. On exit, this - * array contains the local indexes of the rows of A I have that - * should be copied into U. - * - * LINDXAU (local output) int * - * On exit, LINDXAU is an array of dimension 2*N. On exit, this - * array contains the local destination information encoded as - * follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be - * copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) - * of A should be locally copied into A(-LINDXAU(k),:). - * - * LLEN (global output) int * - * On entry, LLEN is an array of length NPROW. On exit, it - * contains how many rows every process has. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int dst, dstrow, fndd, i, ia, icurrow, il, - ip=0, iroff, j, jb, myrow, nb, nprow, - src, srcrow; -/* .. - * .. Executable Statements .. - */ -/* - * Compute the local arrays LINDXA and LINDXAU containing the local - * source and final destination position resulting from the application - * of N interchanges. - */ - myrow = PANEL->grid->myrow; nprow = PANEL->grid->nprow; - icurrow = PANEL->prow; jb = PANEL->jb; - nb = PANEL->nb; ia = PANEL->ia; - iroff = PANEL->ii; - - for( i = 0; i < nprow; i++ ) LLEN[i] = 0; - - for( i = 0; i < K; i += 2 ) - { - src = IPID[i]; - Mindxg2p( src, nb, nb, srcrow, 0, nprow ); LLEN[ srcrow ]++; - - if( myrow == srcrow ) - { - Mindxg2l( il, src, nb, nb, myrow, 0, nprow ); - LINDXA[ip] = il - iroff; dst = IPID[i+1]; - - if( myrow == icurrow ) - { - Mindxg2p( dst, nb, nb, dstrow, 0, nprow ); - if( dstrow == icurrow ) - { - if( dst - ia < jb ) { LINDXAU[ip] = dst - ia; } - else - { - Mindxg2l( il, dst, nb, nb, myrow, 0, nprow ); - LINDXAU[ip] = iroff - il; - } - } - else - { - j = 0; - do { fndd = ( dst == IPID[j] ); j+=2; } - while( !fndd && ( j < K ) ); - LINDXAU[ip] = IPID[j-1] - ia; - } - } - else { LINDXAU[ip] = dst - ia; } - - ip++; - } - } -/* - * End of HPL_plindx0 - */ -} diff --git a/hpl/src/pgesv/HPL_plindx1.c b/hpl/src/pgesv/HPL_plindx1.c deleted file mode 100644 index 4d3286afab557da48cce7429df342fe19f226054..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_plindx1.c +++ /dev/null @@ -1,275 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_plindx1 -( - HPL_T_panel * PANEL, - const int K, - const int * IPID, - int * IPA, - int * LINDXA, - int * LINDXAU, - int * IPLEN, - int * IPMAP, - int * IPMAPM1, - int * PERMU, - int * IWORK -) -#else -void HPL_plindx1 -( PANEL, K, IPID, IPA, LINDXA, LINDXAU, IPLEN, IPMAP, IPMAPM1, PERMU, IWORK ) - HPL_T_panel * PANEL; - const int K; - const int * IPID; - int * IPA; - int * LINDXA; - int * LINDXAU; - int * IPLEN; - int * IPMAP; - int * IPMAPM1; - int * PERMU; - int * IWORK; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_plindx1 computes two local arrays LINDXA and LINDXAU containing - * the local source and final destination position resulting from the - * application of row interchanges. In addition, this function computes - * three arrays IPLEN, IPMAP and IPMAPM1 that contain the logarithmic - * mapping information for the spreading phase. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * K (global input) const int - * On entry, K specifies the number of entries in IPID. K is at - * least 2*N, and at most 4*N. - * - * IPID (global input) const int * - * On entry, IPID is an array of length K. The first K entries - * of that array contain the src and final destination resulting - * from the application of the interchanges. - * - * IPA (global output) int * - * On exit, IPA specifies the number of rows that the current - * process row has that either belong to U or should be swapped - * with remote rows of A. - * - * LINDXA (global output) int * - * On entry, LINDXA is an array of dimension 2*N. On exit, this - * array contains the local indexes of the rows of A I have that - * should be copied into U. - * - * LINDXAU (global output) int * - * On exit, LINDXAU is an array of dimension 2*N. On exit, this - * array contains the local destination information encoded as - * follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be - * copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) - * of A should be locally copied into A(-LINDXAU(k),:). - * - * IPLEN (global output) int * - * On entry, IPLEN is an array of dimension NPROW + 1. On exit, - * this array is such that IPLEN[i] is the number of rows of A - * in the processes before process IPMAP[i] after the sort - * with the convention that IPLEN[nprow] is the total number of - * rows of the panel. In other words IPLEN[i+1]-IPLEN[i] is the - * local number of rows of A that should be moved to the process - * IPMAP[i]. IPLEN is such that the number of rows of the source - * process row can be computed as IPLEN[1] - IPLEN[0], and the - * remaining entries of this array are sorted so that the - * quantities IPLEN[i+1] - IPLEN[i] are logarithmically sorted. - * - * IPMAP (global output) int * - * On entry, IPMAP is an array of dimension NPROW. On exit, this - * array contains the logarithmic mapping of the processes. In - * other words, IPMAP[myrow] is the corresponding sorted process - * coordinate. - * - * IPMAPM1 (global output) int * - * On entry, IPMAPM1 is an array of dimension NPROW. On exit, - * this array contains the inverse of the logarithmic mapping - * contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in - * [0.. NPROCS) - * - * PERMU (global output) int * - * On entry, PERMU is an array of dimension JB. On exit, PERMU - * contains a sequence of permutations, that should be applied - * in increasing order to permute in place the row panel U. - * - * IWORK (workspace) int * - * On entry, IWORK is a workarray of dimension 2*JB. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int * iwork; - int dst, dstrow, fndd, i, ia, icurrow, il, - ip, ipU, iroff, j, jb, myrow, nb, nprow, - src, srcrow; -/* .. - * .. Executable Statements .. - */ -/* - * Logarithmic sort of the processes - compute IPMAP, IPLEN and IPMAPM1 - */ - HPL_plindx10( PANEL, K, IPID, IPLEN, IPMAP, IPMAPM1 ); -/* - * Compute the local arrays LINDXA and LINDXAU containing the local - * source and final destination position resulting from the application - * of N interchanges. Compute LINDXA and LINDXAU in icurrow, and LINDXA - * elsewhere and PERMU in every process. - */ - myrow = PANEL->grid->myrow; nprow = PANEL->grid->nprow; - jb = PANEL->jb; nb = PANEL->nb; ia = PANEL->ia; - iroff = PANEL->ii; icurrow = PANEL->prow; - - iwork = IWORK + jb; - - if( myrow == icurrow ) - { - for( i = 0, ip = 0, ipU = 0; i < K; i += 2 ) - { - src = IPID[i]; Mindxg2p( src, nb, nb, srcrow, 0, nprow ); - - if( srcrow == icurrow ) - { - dst = IPID[i+1]; Mindxg2p( dst, nb, nb, dstrow, 0, nprow ); - - Mindxg2l( il, src, nb, nb, myrow, 0, nprow ); - LINDXA[ip] = il - iroff; - - if( ( dstrow == icurrow ) && ( dst - ia < jb ) ) - { - PERMU[ipU] = dst - ia; il = IPMAPM1[dstrow]; - j = IPLEN[il]; iwork[ipU] = LINDXAU[ip] = j; - IPLEN[il]++; ipU++; - } - else if( dstrow != icurrow ) - { - j = 0; - do { fndd = ( dst == IPID[j] ); j+=2; } - while( !fndd && ( j < K ) ); - - PERMU[ipU] = IPID[j-1]-ia; il = IPMAPM1[dstrow]; - j = IPLEN[il]; iwork[ipU] = LINDXAU[ip] = j; - IPLEN[il]++; ipU++; - } - else if( ( dstrow == icurrow ) && ( dst - ia >= jb ) ) - { - Mindxg2l( il, dst, nb, nb, myrow, 0, nprow ); - LINDXAU[ip] = iroff - il; - } - ip++; - } - } - *IPA = ip; - } - else - { - for( i = 0, ip = 0, ipU = 0; i < K; i += 2 ) - { - src = IPID[i ]; Mindxg2p( src, nb, nb, srcrow, 0, nprow ); - dst = IPID[i+1]; Mindxg2p( dst, nb, nb, dstrow, 0, nprow ); -/* - * LINDXA[i] is the local index of the row of A that belongs into U - */ - if( myrow == dstrow ) - { - Mindxg2l( il, dst, nb, nb, myrow, 0, nprow ); - LINDXA[ip] = il - iroff; ip++; - } -/* - * iwork[i] is the local (current) position index in U - * PERMU[i] is the local (final) destination index in U - */ - if( srcrow == icurrow ) - { - if( ( dstrow == icurrow ) && ( dst - ia < jb ) ) - { - PERMU[ipU] = dst - ia; il = IPMAPM1[dstrow]; - iwork[ipU] = IPLEN[il]; IPLEN[il]++; ipU++; - } - else if( dstrow != icurrow ) - { - j = 0; - do { fndd = ( dst == IPID[j] ); j+=2; } - while( !fndd && ( j < K ) ); - PERMU[ipU] = IPID[j-1] - ia; il = IPMAPM1[dstrow]; - iwork[ipU] = IPLEN[il]; IPLEN[il]++; ipU++; - } - } - } - *IPA = 0; - } -/* - * Simplify iwork and PERMU, return in PERMU the sequence of permutation - * that need to be apply to U after it has been broadcast. - */ - HPL_perm( jb, iwork, PERMU, IWORK ); -/* - * Reset IPLEN to its correct value - */ - for( i = nprow; i > 0; i-- ) IPLEN[i] = IPLEN[i-1]; - IPLEN[0] = 0; -/* - * End of HPL_plindx1 - */ -} diff --git a/hpl/src/pgesv/HPL_plindx10.c b/hpl/src/pgesv/HPL_plindx10.c deleted file mode 100644 index 2d79f6d47f5b0b92cfea44c174445a1f750355b7..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_plindx10.c +++ /dev/null @@ -1,155 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_plindx10 -( - HPL_T_panel * PANEL, - const int K, - const int * IPID, - int * IPLEN, - int * IPMAP, - int * IPMAPM1 -) -#else -void HPL_plindx10 -( PANEL, K, IPID, IPLEN, IPMAP, IPMAPM1 ) - HPL_T_panel * PANEL; - const int K; - const int * IPID; - int * IPLEN; - int * IPMAP; - int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_plindx10 computes three arrays IPLEN, IPMAP and IPMAPM1 that - * contain the logarithmic mapping information for the spreading phase. - * - * Arguments - * ========= - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel information. - * - * K (global input) const int - * On entry, K specifies the number of entries in IPID. K is at - * least 2*N, and at most 4*N. - * - * IPID (global input) const int * - * On entry, IPID is an array of length K. The first K entries - * of that array contain the src and final destination resulting - * from the application of the interchanges. - * - * IPLEN (global output) int * - * On entry, IPLEN is an array of dimension NPROW + 1. On exit, - * this array is such that IPLEN[i] is the number of rows of A - * in the processes before process IMAP[i] after the sort, with - * the convention that IPLEN[nprow] is the total number of rows. - * In other words, IPLEN[i+1] - IPLEN[i] is the local number of - * rows of A that should be moved for each process. IPLEN is - * such that the number of rows of the source process row can be - * computed as IPLEN[1] - IPLEN[0], and the remaining entries of - * this array are sorted so that the quantities IPLEN[i+1] - - * IPLEN[i] are logarithmically sorted. - * - * IPMAP (global output) int * - * On entry, IPMAP is an array of dimension NPROW. On exit, this - * array contains the logarithmic mapping of the processes. In - * other words, IPMAP[myrow] is the corresponding sorted process - * coordinate. - * - * IPMAPM1 (global output) int * - * On entry, IPMAPM1 is an array of dimension NPROW. On exit, - * this array contains the inverse of the logarithmic mapping - * contained in IPMAP: IPMAPM1[ IPMAP[i] ] = i, for all i in - * [0.. NPROW) - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int dst, dstrow, i, ia, icurrow, jb, nb, - nprow, src, srcrow; -/* .. - * .. Executable Statements .. - */ - nprow = PANEL->grid->nprow; jb = PANEL->jb; nb = PANEL->nb; - ia = PANEL->ia; icurrow = PANEL->prow; -/* - * Compute redundantly the local number of rows that each process has - * and that belong to U in IPLEN[1 .. nprow+1] - */ - for( i = 0; i <= nprow; i++ ) IPLEN[i] = 0; - - for( i = 0; i < K; i += 2 ) - { - src = IPID[i]; Mindxg2p( src, nb, nb, srcrow, 0, nprow ); - if( srcrow == icurrow ) - { - dst = IPID[i+1]; Mindxg2p( dst, nb, nb, dstrow, 0, nprow ); - if( ( dstrow != srcrow ) || ( dst - ia < jb ) ) IPLEN[dstrow+1]++; - } - } -/* - * Logarithmic sort of the processes - compute IPMAP, IPLEN and IPMAPM1 - * (the inverse of IPMAP) - */ - HPL_logsort( nprow, icurrow, IPLEN, IPMAP, IPMAPM1 ); -/* - * End of HPL_plindx10 - */ -} diff --git a/hpl/src/pgesv/HPL_rollN.c b/hpl/src/pgesv/HPL_rollN.c deleted file mode 100644 index 45d9e7d2b0f2dd1992159b73bca31a371af99771..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_rollN.c +++ /dev/null @@ -1,225 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#define I_SEND 0 -#define I_RECV 1 - -#ifdef STDC_HEADERS -void HPL_rollN -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int N, - double * U, - const int LDU, - const int * IPLEN, - const int * IPMAP, - const int * IPMAPM1 -) -#else -void HPL_rollN -( PBCST, IFLAG, PANEL, N, U, LDU, IPLEN, IPMAP, IPMAPM1 ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int N; - double * U; - const int LDU; - const int * IPLEN; - const int * IPMAP; - const int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_rollN rolls the local arrays containing the local pieces of U, so - * that on exit to this function U is replicated in every process row. - * In addition, this function probe for the presence of the column panel - * and forwards it when available. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be rolled) information. - * - * N (local input) const int - * On entry, N specifies the number of columns of U. N must be - * at least zero. - * - * U (local input/output) double * - * On entry, U is an array of dimension (LDU,*) containing the - * local pieces of U in each process row. - * - * LDU (local input) const int - * On entry, LDU specifies the local leading dimension of U. LDU - * should be at least MAX(1,IPLEN[NPROW]). - * - * IPLEN (global input) const int * - * On entry, IPLEN is an array of dimension NPROW+1. This array - * is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U - * in each process row. - * - * IPMAP (global input) const int * - * On entry, IMAP is an array of dimension NPROW. This array - * contains the logarithmic mapping of the processes. In other - * words, IMAP[myrow] is the absolute coordinate of the sorted - * process. - * - * IPMAPM1 (global input) const int * - * On entry, IMAPM1 is an array of dimension NPROW. This array - * contains the inverse of the logarithmic mapping contained in - * IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Datatype type[2]; - MPI_Status status; - MPI_Request request; - MPI_Comm comm; - int Cmsgid=MSGID_BEGIN_PFACT, ibufR, ibufS, - ierr=MPI_SUCCESS, il, k, l, lengthR, - lengthS, mydist, myrow, next, npm1, nprow, - partner, prev; -/* .. - * .. Executable Statements .. - */ - if( N <= 0 ) return; - - npm1 = ( nprow = PANEL->grid->nprow ) - 1; myrow = PANEL->grid->myrow; - comm = PANEL->grid->col_comm; -/* - * Rolling phase - */ - mydist = IPMAPM1[myrow]; - prev = IPMAP[MModSub1( mydist, nprow )]; - next = IPMAP[MModAdd1( mydist, nprow )]; - - for( k = 0; k < npm1; k++ ) - { - l = (int)( (unsigned int)(k) >> 1 ); - - if( ( ( mydist + k ) & 1 ) != 0 ) - { - il = MModAdd( mydist, l, nprow ); - lengthS = IPLEN[il+1] - ( ibufS = IPLEN[il] ); - il = MModSub( mydist, l+1, nprow ); - lengthR = IPLEN[il+1] - ( ibufR = IPLEN[il] ); partner = prev; - } - else - { - il = MModSub( mydist, l, nprow ); - lengthS = IPLEN[il+1] - ( ibufS = IPLEN[il] ); - il = MModAdd( mydist, l+1, nprow ); - lengthR = IPLEN[il+1] - ( ibufR = IPLEN[il] ); partner = next; - } - - if( lengthR > 0 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lengthR, LDU, MPI_DOUBLE, - &type[I_RECV] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[I_RECV] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Irecv( Mptr( U, ibufR, 0, LDU ), 1, type[I_RECV], - partner, Cmsgid, comm, &request ); - } - - if( lengthS > 0 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lengthS, LDU, MPI_DOUBLE, - &type[I_SEND] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[I_SEND] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, ibufS, 0, LDU ), 1, type[I_SEND], - partner, Cmsgid, comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[I_SEND] ); - } - - if( lengthR > 0 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait( &request, &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[I_RECV] ); - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - } - - if( ierr != MPI_SUCCESS ) - { HPL_pabort( __LINE__, "HPL_rollN", "MPI call failed" ); } -/* - * End of HPL_rollN - */ -} diff --git a/hpl/src/pgesv/HPL_rollT.c b/hpl/src/pgesv/HPL_rollT.c deleted file mode 100644 index 65d40450daee931422a6db03167a51bceb896082..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_rollT.c +++ /dev/null @@ -1,259 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#define I_SEND 0 -#define I_RECV 1 - -#ifdef STDC_HEADERS -void HPL_rollT -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const int N, - double * U, - const int LDU, - const int * IPLEN, - const int * IPMAP, - const int * IPMAPM1 -) -#else -void HPL_rollT -( PBCST, IFLAG, PANEL, N, U, LDU, IPLEN, IPMAP, IPMAPM1 ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const int N; - double * U; - const int LDU; - const int * IPLEN; - const int * IPMAP; - const int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_rollT rolls the local arrays containing the local pieces of U, so - * that on exit to this function U is replicated in every process row. - * In addition, this function probe for the presence of the column panel - * and forwards it when available. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be rolled) information. - * - * N (local input) const int - * On entry, N specifies the local number of rows of U. N must - * be at least zero. - * - * U (local input/output) double * - * On entry, U is an array of dimension (LDU,*) containing the - * local pieces of U in each process row. - * - * LDU (local input) const int - * On entry, LDU specifies the local leading dimension of U. LDU - * should be at least MAX(1,N). - * - * IPLEN (global input) const int * - * On entry, IPLEN is an array of dimension NPROW+1. This array - * is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U - * in each process row. - * - * IPMAP (global input) const int * - * On entry, IMAP is an array of dimension NPROW. This array - * contains the logarithmic mapping of the processes. In other - * words, IMAP[myrow] is the absolute coordinate of the sorted - * process. - * - * IPMAPM1 (global input) const int * - * On entry, IMAPM1 is an array of dimension NPROW. This array - * contains the inverse of the logarithmic mapping contained in - * IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#if 0 - MPI_Datatype type[2]; -#endif - MPI_Status status; - MPI_Request request; - MPI_Comm comm; - int Cmsgid=MSGID_BEGIN_PFACT, ibufR, ibufS, - ierr=MPI_SUCCESS, il, k, l, lengthR, - lengthS, mydist, myrow, next, npm1, nprow, - partner, prev; -/* .. - * .. Executable Statements .. - */ - if( N <= 0 ) return; - - npm1 = ( nprow = PANEL->grid->nprow ) - 1; myrow = PANEL->grid->myrow; - comm = PANEL->grid->col_comm; -/* - * Rolling phase - */ - mydist = IPMAPM1[myrow]; - prev = IPMAP[MModSub1( mydist, nprow )]; - next = IPMAP[MModAdd1( mydist, nprow )]; - - for( k = 0; k < npm1; k++ ) - { - l = (int)( (unsigned int)(k) >> 1 ); - - if( ( ( mydist + k ) & 1 ) != 0 ) - { - il = MModAdd( mydist, l, nprow ); - lengthS = IPLEN[il+1] - ( ibufS = IPLEN[il] ); - il = MModSub( mydist, l+1, nprow ); - lengthR = IPLEN[il+1] - ( ibufR = IPLEN[il] ); partner = prev; - } - else - { - il = MModSub( mydist, l, nprow ); - lengthS = IPLEN[il+1] - ( ibufS = IPLEN[il] ); - il = MModAdd( mydist, l+1, nprow ); - lengthR = IPLEN[il+1] - ( ibufR = IPLEN[il] ); partner = next; - } - - if( lengthR > 0 ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lengthR * LDU, MPI_DOUBLE, - &type[I_RECV] ); - else - ierr = MPI_Type_vector( lengthR, N, LDU, MPI_DOUBLE, - &type[I_RECV] ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[I_RECV] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Irecv( Mptr( U, 0, ibufR, LDU ), 1, type[I_RECV], - partner, Cmsgid, comm, &request ); -#else -/* - * In our case, LDU is N - Do not use the MPI datatype. - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Irecv( Mptr( U, 0, ibufR, LDU ), lengthR*LDU, - MPI_DOUBLE, partner, Cmsgid, comm, &request ); -#endif - } - - if( lengthS > 0 ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lengthS*LDU, MPI_DOUBLE, - &type[I_SEND] ); - else - ierr = MPI_Type_vector( lengthS, N, LDU, MPI_DOUBLE, - &type[I_SEND] ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type[I_SEND] ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibufS, LDU ), 1, type[I_SEND], - partner, Cmsgid, comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[I_SEND] ); -#else -/* - * In our case, LDU is N - Do not use the MPI datatype. - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibufS, LDU ), lengthS*LDU, - MPI_DOUBLE, partner, Cmsgid, comm ); -#endif - } - - if( lengthR > 0 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Wait( &request, &status ); -#if 0 - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type[I_RECV] ); -#endif - } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - } - - if( ierr != MPI_SUCCESS ) - { HPL_pabort( __LINE__, "HPL_rollT", "MPI call failed" ); } -/* - * End of HPL_rollT - */ -} diff --git a/hpl/src/pgesv/HPL_spreadN.c b/hpl/src/pgesv/HPL_spreadN.c deleted file mode 100644 index 926fc38dd6734068a96a18bc1decdfac2d49ec8f..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_spreadN.c +++ /dev/null @@ -1,303 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_spreadN -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const enum HPL_SIDE SIDE, - const int N, - double * U, - const int LDU, - const int SRCDIST, - const int * IPLEN, - const int * IPMAP, - const int * IPMAPM1 -) -#else -void HPL_spreadN -( PBCST, IFLAG, PANEL, SIDE, N, U, LDU, SRCDIST, IPLEN, IPMAP, IPMAPM1 ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const enum HPL_SIDE SIDE; - const int N; - double * U; - const int LDU; - const int SRCDIST; - const int * IPLEN; - const int * IPMAP; - const int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_spreadN spreads the local array containing local pieces of U, so - * that on exit to this function, a piece of U is contained in every - * process row. The array IPLEN contains the number of rows of U, that - * should be spread on any given process row. This function also probes - * for the presence of the column panel PBCST. In case of success, this - * panel will be forwarded. If PBCST is NULL on input, this probing - * mechanism will be disabled. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be spread) information. - * - * SIDE (global input) const enum HPL_SIDE - * On entry, SIDE specifies whether the local piece of U located - * in process IPMAP[SRCDIST] should be spread to the right or to - * the left. This feature is used by the equilibration process. - * - * N (global input) const int - * On entry, N specifies the local number of columns of U. N - * must be at least zero. - * - * U (local input/output) double * - * On entry, U is an array of dimension (LDU,*) containing the - * local pieces of U. - * - * LDU (local input) const int - * On entry, LDU specifies the local leading dimension of U. LDU - * should be at least MAX(1,IPLEN[nprow]). - * - * SRCDIST (local input) const int - * On entry, SRCDIST specifies the source process that spreads - * its piece of U. - * - * IPLEN (global input) const int * - * On entry, IPLEN is an array of dimension NPROW+1. This array - * is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U - * in each process before process IPMAP[i], with the convention - * that IPLEN[nprow] is the total number of rows. In other words - * IPLEN[i+1] - IPLEN[i] is the local number of rows of U that - * should be moved to process IPMAP[i]. - * - * IPMAP (global input) const int * - * On entry, IPMAP is an array of dimension NPROW. This array - * contains the logarithmic mapping of the processes. In other - * words, IPMAP[myrow] is the absolute coordinate of the sorted - * process. - * - * IPMAPM1 (global input) const int * - * On entry, IPMAPM1 is an array of dimension NPROW. This array - * contains the inverse of the logarithmic mapping contained in - * IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - MPI_Datatype type; - MPI_Status status; - MPI_Comm comm; - unsigned int ip2=1, mask=1, mydist, mydist2; - int Cmsgid=MSGID_BEGIN_PFACT, ibuf, - ierr=MPI_SUCCESS, il, k, lbuf, lgth, myrow, - npm1, nprow, partner; -/* .. - * .. Executable Statements .. - */ - myrow = PANEL->grid->myrow; nprow = PANEL->grid->nprow; - comm = PANEL->grid->col_comm; -/* - * Spread U to the left - */ - if( SIDE == HplLeft ) - { - nprow = ( npm1 = SRCDIST ) + 1; - if( ( ( mydist = (unsigned int)(IPMAPM1[myrow]) ) > - (unsigned int)(SRCDIST) ) || ( npm1 == 0 ) ) return; - - k = npm1; while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - mydist2 = ( mydist = npm1 - mydist ); il = npm1 - ip2; - lgth = IPLEN[nprow]; - - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - lbuf = IPLEN[il+1] - ( ibuf = IPLEN[il-Mmin(il, (int)(ip2))] ); - - if( lbuf > 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lbuf, LDU, MPI_DOUBLE, - &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, ibuf, 0, LDU ), 1, type, - IPMAP[npm1-partner], Cmsgid, comm, - &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); - } - else if( partner < nprow ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lbuf, LDU, MPI_DOUBLE, - &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, ibuf, 0, LDU ), 1, type, - IPMAP[npm1-partner], Cmsgid, comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; il += ip2; } - else { mydist2 -= ip2; ip2 >>= 1; il -= ip2; } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2 > 0 ); - } - else - { - npm1 = ( nprow -= SRCDIST ) - 1; - if( ( ( mydist = (unsigned int)(IPMAPM1[myrow]) ) < - (unsigned int)(SRCDIST) ) || ( npm1 == 0 ) ) return; - - k = npm1; while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - mydist2 = ( mydist -= SRCDIST ); il = ip2; - lgth = IPLEN[SRCDIST+nprow]; -/* - * Spread U to the right - offset the IPLEN, and IPMAP arrays - */ - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - k = il ; ibuf = ( k >= nprow ? lgth : IPLEN[SRCDIST+k] ); - k = il + ip2; lbuf = ( k >= nprow ? lgth : IPLEN[SRCDIST+k] ) - ibuf; - - if( lbuf > 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lbuf, LDU, MPI_DOUBLE, - &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, ibuf, 0, LDU ), 1, type, - IPMAP[SRCDIST+partner], Cmsgid, - comm, &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); - } - else if( partner < nprow ) - { - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_vector( N, lbuf, LDU, MPI_DOUBLE, - &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, ibuf, 0, LDU ), 1, type, - IPMAP[SRCDIST+partner], Cmsgid, - comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; il -= ip2; } - else { mydist2 -= ip2; ip2 >>= 1; il += ip2; } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2 > 0 ); - } - - if( ierr != MPI_SUCCESS ) - { HPL_pabort( __LINE__, "HPL_spreadN", "MPI call failed" ); } -/* - * End of HPL_spreadN - */ -} diff --git a/hpl/src/pgesv/HPL_spreadT.c b/hpl/src/pgesv/HPL_spreadT.c deleted file mode 100644 index 4c753aa1ab1433bc8c1f6c60ca547d3128d83d07..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/HPL_spreadT.c +++ /dev/null @@ -1,372 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_spreadT -( - HPL_T_panel * PBCST, - int * IFLAG, - HPL_T_panel * PANEL, - const enum HPL_SIDE SIDE, - const int N, - double * U, - const int LDU, - const int SRCDIST, - const int * IPLEN, - const int * IPMAP, - const int * IPMAPM1 -) -#else -void HPL_spreadT -( PBCST, IFLAG, PANEL, SIDE, N, U, LDU, SRCDIST, IPLEN, IPMAP, IPMAPM1 ) - HPL_T_panel * PBCST; - int * IFLAG; - HPL_T_panel * PANEL; - const enum HPL_SIDE SIDE; - const int N; - double * U; - const int LDU; - const int SRCDIST; - const int * IPLEN; - const int * IPMAP; - const int * IPMAPM1; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_spreadT spreads the local array containing local pieces of U, so - * that on exit to this function, a piece of U is contained in every - * process row. The array IPLEN contains the number of columns of U, - * that should be spread on any given process row. This function also - * probes for the presence of the column panel PBCST. If available, - * this panel will be forwarded. If PBCST is NULL on input, this - * probing mechanism will be disabled. - * - * Arguments - * ========= - * - * PBCST (local input/output) HPL_T_panel * - * On entry, PBCST points to the data structure containing the - * panel (to be broadcast) information. - * - * IFLAG (local input/output) int * - * On entry, IFLAG indicates whether or not the broadcast has - * already been completed. If not, probing will occur, and the - * outcome will be contained in IFLAG on exit. - * - * PANEL (local input/output) HPL_T_panel * - * On entry, PANEL points to the data structure containing the - * panel (to be spread) information. - * - * SIDE (global input) const enum HPL_SIDE - * On entry, SIDE specifies whether the local piece of U located - * in process IPMAP[SRCDIST] should be spread to the right or to - * the left. This feature is used by the equilibration process. - * - * N (global input) const int - * On entry, N specifies the local number of rows of U. N must - * be at least zero. - * - * U (local input/output) double * - * On entry, U is an array of dimension (LDU,*) containing the - * local pieces of U. - * - * LDU (local input) const int - * On entry, LDU specifies the local leading dimension of U. LDU - * should be at least MAX(1,N). - * - * SRCDIST (local input) const int - * On entry, SRCDIST specifies the source process that spreads - * its piece of U. - * - * IPLEN (global input) const int * - * On entry, IPLEN is an array of dimension NPROW+1. This array - * is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U - * in each process before process IPMAP[i], with the convention - * that IPLEN[nprow] is the total number of rows. In other words - * IPLEN[i+1] - IPLEN[i] is the local number of rows of U that - * should be moved to process IPMAP[i]. - * - * IPMAP (global input) const int * - * On entry, IPMAP is an array of dimension NPROW. This array - * contains the logarithmic mapping of the processes. In other - * words, IPMAP[myrow] is the absolute coordinate of the sorted - * process. - * - * IPMAPM1 (global input) const int * - * On entry, IPMAPM1 is an array of dimension NPROW. This array - * contains the inverse of the logarithmic mapping contained in - * IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#if 0 - MPI_Datatype type; -#endif - MPI_Status status; - MPI_Comm comm; - unsigned int ip2=1, mask=1, mydist, mydist2; - int Cmsgid=MSGID_BEGIN_PFACT, ibuf, - ierr=MPI_SUCCESS, il, k, lbuf, lgth, myrow, - npm1, nprow, partner; -/* .. - * .. Executable Statements .. - */ - myrow = PANEL->grid->myrow; nprow = PANEL->grid->nprow; - comm = PANEL->grid->col_comm; -/* - * Spread U - */ - if( SIDE == HplLeft ) - { - nprow = ( npm1 = SRCDIST ) + 1; - if( ( ( mydist = (unsigned int)(IPMAPM1[myrow]) ) > - (unsigned int)(SRCDIST) ) || ( npm1 == 0 ) ) return; - - k = npm1; while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - mydist2 = ( mydist = npm1 - mydist ); il = npm1 - ip2; - lgth = IPLEN[nprow]; - - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - lbuf = IPLEN[il+1] - ( ibuf = IPLEN[il-Mmin(il, (int)(ip2))] ); - - if( lbuf > 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lbuf*LDU, MPI_DOUBLE, - &type ); - else - ierr = MPI_Type_vector( lbuf, N, LDU, MPI_DOUBLE, - &type ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, 0, ibuf, LDU ), 1, type, - IPMAP[npm1-partner], Cmsgid, comm, - &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else -/* - * In our case, LDU is N - do not use the MPI Datatypes - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, 0, ibuf, LDU ), lbuf*N, - MPI_DOUBLE, IPMAP[npm1-partner], - Cmsgid, comm, &status ); -#endif - } - else if( partner < nprow ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lbuf*LDU, MPI_DOUBLE, - &type ); - else - ierr = MPI_Type_vector( lbuf, N, LDU, MPI_DOUBLE, - &type ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibuf, LDU ), 1, type, - IPMAP[npm1-partner], Cmsgid, comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else -/* - * In our case, LDU is N - do not use the MPI Datatypes - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibuf, LDU ), lbuf*N, - MPI_DOUBLE, IPMAP[npm1-partner], - Cmsgid, comm ); -#endif - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; il += ip2; } - else { mydist2 -= ip2; ip2 >>= 1; il -= ip2; } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2 > 0 ); - } - else - { - npm1 = ( nprow -= SRCDIST ) - 1; - if( ( ( mydist = (unsigned int)(IPMAPM1[myrow]) ) < - (unsigned int)(SRCDIST) ) || ( npm1 == 0 ) ) return; - - k = npm1; while( k > 1 ) { k >>= 1; ip2 <<= 1; mask <<= 1; mask++; } - mydist2 = ( mydist -= SRCDIST ); il = ip2; -/* - * Spread to the right - offset the IPLEN and IPMAP arrays - */ - lgth = IPLEN[SRCDIST+nprow]; -/* - * Spread U - */ - do - { - mask ^= ip2; - - if( ( mydist & mask ) == 0 ) - { - k = il ; ibuf = ( k >= nprow ? lgth : IPLEN[SRCDIST+k] ); - k = il + ip2; lbuf = ( k >= nprow ? lgth : IPLEN[SRCDIST+k] ) - ibuf; - - if( lbuf > 0 ) - { - partner = mydist ^ ip2; - - if( mydist & ip2 ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lbuf*LDU, MPI_DOUBLE, - &type ); - else - ierr = MPI_Type_vector( lbuf, N, LDU, MPI_DOUBLE, - &type ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, 0, ibuf, LDU ), 1, type, - IPMAP[SRCDIST+partner], Cmsgid, - comm, &status ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else -/* - * In our case, LDU is N - do not use the MPI Datatypes - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Recv( Mptr( U, 0, ibuf, LDU ), lbuf*N, - MPI_DOUBLE, IPMAP[SRCDIST+partner], - Cmsgid, comm, &status ); -#endif - } - else if( partner < nprow ) - { -#if 0 - if( ierr == MPI_SUCCESS ) - { - if( LDU == N ) - ierr = MPI_Type_contiguous( lbuf*LDU, MPI_DOUBLE, - &type ); - else - ierr = MPI_Type_vector( lbuf, N, LDU, MPI_DOUBLE, - &type ); - } - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_commit( &type ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibuf, LDU ), 1, type, - IPMAP[SRCDIST+partner], Cmsgid, - comm ); - if( ierr == MPI_SUCCESS ) - ierr = MPI_Type_free( &type ); -#else -/* - * In our case, LDU is N - do not use the MPI Datatypes - */ - if( ierr == MPI_SUCCESS ) - ierr = MPI_Send( Mptr( U, 0, ibuf, LDU ), lbuf*N, - MPI_DOUBLE, IPMAP[SRCDIST+partner], - Cmsgid, comm ); -#endif - } - } - } - - if( mydist2 < ip2 ) { ip2 >>= 1; il -= ip2; } - else { mydist2 -= ip2; ip2 >>= 1; il += ip2; } -/* - * Probe for column panel - forward it when available - */ - if( *IFLAG == HPL_KEEP_TESTING ) (void) HPL_bcast( PBCST, IFLAG ); - - } while( ip2 > 0 ); - } - - if( ierr != MPI_SUCCESS ) - { HPL_pabort( __LINE__, "HPL_spreadT", "MPI call failed" ); } -/* - * End of HPL_spreadT - */ -} diff --git a/hpl/src/pgesv/intel64/Make.inc b/hpl/src/pgesv/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/src/pgesv/intel64/Makefile b/hpl/src/pgesv/intel64/Makefile deleted file mode 100644 index 1e9d40c5d126efee91034dc91cc71872d18cd115..0000000000000000000000000000000000000000 --- a/hpl/src/pgesv/intel64/Makefile +++ /dev/null @@ -1,136 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_grid.h $(INCdir)/hpl_comm.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_panel.h $(INCdir)/hpl_pfact.h \ - $(INCdir)/hpl_pgesv.h -# -## Object files ######################################################## -# -HPL_pgeobj = \ - HPL_pipid.o HPL_plindx0.o HPL_pdlaswp00N.o \ - HPL_pdlaswp00T.o HPL_perm.o HPL_logsort.o \ - HPL_plindx10.o HPL_plindx1.o HPL_spreadN.o \ - HPL_spreadT.o HPL_rollN.o HPL_rollT.o \ - HPL_equil.o HPL_pdlaswp01N.o HPL_pdlaswp01T.o \ - HPL_pdupdateNN.o HPL_pdupdateNT.o HPL_pdupdateTN.o \ - HPL_pdupdateTT.o HPL_pdtrsv.o HPL_pdgesv0.o \ - HPL_pdgesvK1.o HPL_pdgesvK2.o HPL_pdgesv.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pgeobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pgeobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pipid.o : ../HPL_pipid.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pipid.c -HPL_plindx0.o : ../HPL_plindx0.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx0.c -HPL_pdlaswp00N.o : ../HPL_pdlaswp00N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp00N.c -HPL_pdlaswp00T.o : ../HPL_pdlaswp00T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp00T.c -HPL_perm.o : ../HPL_perm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_perm.c -HPL_logsort.o : ../HPL_logsort.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_logsort.c -HPL_plindx10.o : ../HPL_plindx10.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx10.c -HPL_plindx1.o : ../HPL_plindx1.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_plindx1.c -HPL_spreadN.o : ../HPL_spreadN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_spreadN.c -HPL_spreadT.o : ../HPL_spreadT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_spreadT.c -HPL_rollN.o : ../HPL_rollN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rollN.c -HPL_rollT.o : ../HPL_rollT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rollT.c -HPL_equil.o : ../HPL_equil.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_equil.c -HPL_pdlaswp01N.o : ../HPL_pdlaswp01N.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp01N.c -HPL_pdlaswp01T.o : ../HPL_pdlaswp01T.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdlaswp01T.c -HPL_pdupdateNN.o : ../HPL_pdupdateNN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateNN.c -HPL_pdupdateNT.o : ../HPL_pdupdateNT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateNT.c -HPL_pdupdateTN.o : ../HPL_pdupdateTN.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateTN.c -HPL_pdupdateTT.o : ../HPL_pdupdateTT.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdupdateTT.c -HPL_pdtrsv.o : ../HPL_pdtrsv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdtrsv.c -HPL_pdgesv0.o : ../HPL_pdgesv0.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesv0.c -HPL_pdgesvK1.o : ../HPL_pdgesvK1.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesvK1.c -HPL_pdgesvK2.o : ../HPL_pdgesvK2.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesvK2.c -HPL_pdgesv.o : ../HPL_pdgesv.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdgesv.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/src/pgesv/intel64/lib.grd b/hpl/src/pgesv/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/testing/matgen/HPL_dmatgen.c b/hpl/testing/matgen/HPL_dmatgen.c deleted file mode 100644 index 6c591211f77f9f78f2dc5f1bd4759d3c356a9544..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_dmatgen.c +++ /dev/null @@ -1,134 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_dmatgen -( - const int M, - const int N, - double * A, - const int LDA, - const int ISEED -) -#else -void HPL_dmatgen -( M, N, A, LDA, ISEED ) - const int M; - const int N; - double * A; - const int LDA; - const int ISEED; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_dmatgen generates (or regenerates) a random matrix A. - * - * The pseudo-random generator uses the linear congruential algorithm: - * X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer - * Programming, Knuth 1973, Vol. 2. - * - * Arguments - * ========= - * - * M (input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * A (output) double * - * On entry, A points to an array of dimension (LDA,N). On exit, - * this array contains the coefficients of the randomly - * generated matrix. - * - * LDA (input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,M). - * - * ISEED (input) const int - * On entry, ISEED specifies the seed number to generate the - * matrix A. ISEED must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int iadd[2], ia1[2], ic1[2], iran1[2], - jseed[2], mult[2]; - int i, incA = LDA - M, j; -/* .. - * .. Executable Statements .. - */ - if( ( M <= 0 ) || ( N <= 0 ) ) return; -/* - * Initialize the random sequence - */ - mult [0] = HPL_MULT0; mult [1] = HPL_MULT1; - iadd [0] = HPL_IADD0; iadd [1] = HPL_IADD1; - jseed[0] = ISEED; jseed[1] = 0; - - HPL_xjumpm( 1, mult, iadd, jseed, iran1, ia1, ic1 ); - HPL_setran( 0, iran1 ); HPL_setran( 1, ia1 ); HPL_setran( 2, ic1 ); -/* - * Generate an M by N matrix - */ - for( j = 0; j < N; A += incA, j++ ) - for( i = 0; i < M; A++, i++ ) *A = HPL_rand(); -/* - * End of HPL_dmatgen - */ -} diff --git a/hpl/testing/matgen/HPL_jumpit.c b/hpl/testing/matgen/HPL_jumpit.c deleted file mode 100644 index 6ee2c5a73b9d1c0119d59b1606c4f5607d5f48c3..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_jumpit.c +++ /dev/null @@ -1,114 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_jumpit -( - int * MULT, - int * IADD, - int * IRANN, - int * IRANM -) -#else -void HPL_jumpit -( MULT, IADD, IRANN, IRANM ) - int * MULT; - int * IADD; - int * IRANN; - int * IRANM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_jumpit jumps in the random sequence from the number X(n) encoded - * in IRANN to the number X(m) encoded in IRANM using the constants A - * and C encoded in MULT and IADD: X(m) = A * X(n) + C. The constants A - * and C obviously depend on m and n, see the function HPL_xjumpm in - * order to initialize them. - * - * Arguments - * ========= - * - * MULT (local input) int * - * On entry, MULT is an array of dimension 2, that contains the - * 16-lower and 15-higher bits of the constant A. - * - * IADD (local input) int * - * On entry, IADD is an array of dimension 2, that contains the - * 16-lower and 15-higher bits of the constant C. - * - * IRANN (local input) int * - * On entry, IRANN is an array of dimension 2, that contains - * the 16-lower and 15-higher bits of the encoding of X(n). - * - * IRANM (local output) int * - * On entry, IRANM is an array of dimension 2. On exit, this - * array contains respectively the 16-lower and 15-higher bits - * of the encoding of X(m). - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int j[2]; -/* .. - * .. Executable Statements .. - */ - HPL_lmul( IRANN, MULT, j ); /* j = IRANN * MULT; */ - HPL_ladd( j, IADD, IRANM ); /* IRANM = j + IADD; */ - HPL_setran( 0, IRANM ); /* irand = IRANM */ -/* - * End of HPL_jumpit - */ -} diff --git a/hpl/testing/matgen/HPL_ladd.c b/hpl/testing/matgen/HPL_ladd.c deleted file mode 100644 index e0f86b075d8eb78ec17e0725f70c7e96d18d115f..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_ladd.c +++ /dev/null @@ -1,126 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_ladd -( - int * J, - int * K, - int * I -) -#else -void HPL_ladd -( J, K, I ) - int * J; - int * K; - int * I; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_ladd adds without carry two long positive integers K and J and - * puts the result into I. The long integers I, J, K are encoded on 64 - * bits using an array of 2 integers. The 32-lower bits are stored in - * the first entry of each array, the 32-higher bits in the second - * entry. - * - * Arguments - * ========= - * - * J (local input) int * - * On entry, J is an integer array of dimension 2 containing the - * encoded long integer J. - * - * K (local input) int * - * On entry, K is an integer array of dimension 2 containing the - * encoded long integer K. - * - * I (local output) int * - * On entry, I is an integer array of dimension 2. On exit, this - * array contains the encoded long integer result. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - unsigned int itmp0, itmp1; - unsigned int ktmp0 = K[0] & 65535, ktmp1 = (unsigned)K[0] >> 16; - unsigned int ktmp2 = K[1] & 65535, ktmp3 = (unsigned)K[1] >> 16; - unsigned int jtmp0 = J[0] & 65535, jtmp1 = (unsigned)J[0] >> 16; - unsigned int jtmp2 = J[1] & 65535, jtmp3 = (unsigned)J[1] >> 16; - -/* .. - * .. Executable Statements .. - */ -/* - * K[1] K[0] K I[0] = (K[0]+J[0]) % 2^32 - * XXXX XXXX carry = (K[0]+J[0]) / 2^32 - * - * + J[1] J[0] J I[1] = K[1] + J[1] + carry - * XXXX XXXX I[1] = I[1] % 2^32 - * ------------- - * I[1] I[0] - * 0XXX XXXX I - */ - itmp0 = ktmp0 + jtmp0; - itmp1 = itmp0 >> 16; I[0] = itmp0 - (itmp1 << 16 ); - itmp1 += ktmp1 + jtmp1; I[0] |= (itmp1 & 65535) << 16; - itmp0 = (itmp1 >> 16) + ktmp2 + jtmp2; - I[1] = itmp0 - ((itmp0 >> 16 ) << 16); - itmp1 = (itmp0 >> 16) + ktmp3 + jtmp3; - I[1] |= (itmp1 & 65535) << 16; -/* - * End of HPL_ladd - */ -} diff --git a/hpl/testing/matgen/HPL_lmul.c b/hpl/testing/matgen/HPL_lmul.c deleted file mode 100644 index 879a9d4c141975ef41cddd0e83331d4e60edbb6e..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_lmul.c +++ /dev/null @@ -1,131 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_lmul -( - int * K, - int * J, - int * I -) -#else -void HPL_lmul -( K, J, I ) - int * K; - int * J; - int * I; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_lmul multiplies without carry two long positive integers K and J - * and puts the result into I. The long integers I, J, K are encoded on - * 64 bits using an array of 2 integers. The 32-lower bits are stored in - * the first entry of each array, the 32-higher bits in the second entry - * of each array. For efficiency purposes, the intrisic modulo function - * is inlined. - * - * Arguments - * ========= - * - * K (local input) int * - * On entry, K is an integer array of dimension 2 containing the - * encoded long integer K. - * - * J (local input) int * - * On entry, J is an integer array of dimension 2 containing the - * encoded long integer J. - * - * I (local output) int * - * On entry, I is an integer array of dimension 2. On exit, this - * array contains the encoded long integer result. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int r, c; - unsigned int kk[4], jj[4], res[5]; -/* .. - * .. Executable Statements .. - */ -/* - * Addition is done with 16 bits at a time. Multiplying two 16-bit - * integers yields a 32-bit result. The lower 16-bits of the result - * are kept in I, and the higher 16-bits are carried over to the - * next multiplication. - */ - for (c = 0; c < 2; ++c) { - kk[2*c] = K[c] & 65535; - kk[2*c+1] = ((unsigned)K[c] >> 16) & 65535; - jj[2*c] = J[c] & 65535; - jj[2*c+1] = ((unsigned)J[c] >> 16) & 65535; - } - - res[0] = 0; - for (c = 0; c < 4; ++c) { - res[c+1] = (res[c] >> 16) & 65535; - res[c] &= 65535; - for (r = 0; r < c+1; ++r) { - res[c] = kk[r] * jj[c-r] + (res[c] & 65535); - res[c+1] += (res[c] >> 16) & 65535; - } - } - - for (c = 0; c < 2; ++c) - I[c] = (int)(((res[2*c+1] & 65535) << 16) | (res[2*c] & 65535)); -/* - * End of HPL_lmul - */ -} diff --git a/hpl/testing/matgen/HPL_rand.c b/hpl/testing/matgen/HPL_rand.c deleted file mode 100644 index e474fd9960fd6bc2d215ce711bca4b953524be64..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_rand.c +++ /dev/null @@ -1,94 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -double HPL_rand( void ) -#else -double HPL_rand() -#endif -{ -/* - * Purpose - * ======= - * - * HPL_rand generates the next number in the random sequence. This - * function ensures that this number lies in the interval (-0.5, 0.5]. - * - * The static array irand contains the information (2 integers) required - * to generate the next number in the sequence X(n). This number is - * computed as X(n) = (2^32 * irand[1] + irand[0]) / d - 0.5, where the - * constant d is the largest 64 bit positive unsigned integer. The array - * irand is then updated for the generation of the next number X(n+1) - * in the random sequence as follows X(n+1) = a * X(n) + c. The - * constants a and c should have been preliminarily stored in the arrays - * ias and ics as 2 pairs of integers. The initialization of ias, ics - * and irand is performed by the function HPL_setran. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int j[2]; -/* .. - * .. Executable Statements .. - */ - HPL_setran( 3, j ); -/* - * return number between -0.5 and 0.5 - */ - return( HPL_HALF - - (((j[0] & 65535) + ((unsigned)j[0] >> 16) * HPL_POW16) / HPL_DIVFAC * HPL_HALF + - (j[1] & 65535) + ((unsigned)j[1] >> 16) * HPL_POW16) / HPL_DIVFAC * HPL_HALF ); -/* - * End of HPL_rand - */ -} diff --git a/hpl/testing/matgen/HPL_setran.c b/hpl/testing/matgen/HPL_setran.c deleted file mode 100644 index 0172340b822538a4d1a2d74040956d26a1ee0f54..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_setran.c +++ /dev/null @@ -1,115 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * --------------------------------------------------------------------- - * Static variables - * --------------------------------------------------------------------- - */ -static int ias[2], ics[2], irand[2]; - -#ifdef STDC_HEADERS -void HPL_setran -( - const int OPTION, - int * IRAN -) -#else -void HPL_setran -( OPTION, IRAN ) - const int OPTION; - int * IRAN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_setran initializes the random generator with the encoding of the - * first number X(0) in the sequence, and the constants a and c used to - * compute the next element in the sequence: X(n+1) = a*X(n) + c. X(0), - * a and c are stored in the static variables irand, ias and ics. When - * OPTION is 0 (resp. 1 and 2), irand (resp. ia and ic) is set to the - * values of the input array IRAN. When OPTION is 3, IRAN is set to the - * current value of irand, and irand is then incremented. - * - * Arguments - * ========= - * - * OPTION (local input) const int - * On entry, OPTION is an integer that specifies the operations - * to be performed on the random generator as specified above. - * - * IRAN (local input/output) int * - * On entry, IRAN is an array of dimension 2, that contains the - * 16-lower and 15-higher bits of a random number. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int j[2]; -/* .. - * .. Executable Statements .. - */ - if( OPTION == 3 ) - { /* return current value */ - IRAN[0] = irand[0]; IRAN[1] = irand[1]; - HPL_lmul( irand, ias, j ); /* j = irand * ias; */ - HPL_ladd( j, ics, irand ); /* irand = j + ics; */ - } - else if( OPTION == 0 ) { irand[0] = IRAN[0]; irand[1] = IRAN[1]; } - else if( OPTION == 1 ) { ias [0] = IRAN[0]; ias [1] = IRAN[1]; } - else if( OPTION == 2 ) { ics [0] = IRAN[0]; ics [1] = IRAN[1]; } -/* - * End of HPL_setran - */ -} diff --git a/hpl/testing/matgen/HPL_xjumpm.c b/hpl/testing/matgen/HPL_xjumpm.c deleted file mode 100644 index f4e760d069c6fcf863e6a9bf344b4b2c31619629..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/HPL_xjumpm.c +++ /dev/null @@ -1,158 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_xjumpm -( - const int JUMPM, - int * MULT, - int * IADD, - int * IRANN, - int * IRANM, - int * IAM, - int * ICM -) -#else -void HPL_xjumpm -( JUMPM, MULT, IADD, IRANN, IRANM, IAM, ICM ) - const int JUMPM; - int * MULT; - int * IADD; - int * IRANN; - int * IRANM; - int * IAM; - int * ICM; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_xjumpm computes the constants A and C to jump JUMPM numbers in - * the random sequence: X(n+JUMPM) = A*X(n)+C. The constants encoded in - * MULT and IADD specify how to jump from one entry in the sequence to - * the next. - * - * Arguments - * ========= - * - * JUMPM (local input) const int - * On entry, JUMPM specifies the number of entries in the - * sequence to jump over. When JUMPM is less or equal than zero, - * A and C are not computed, IRANM is set to IRANN corresponding - * to a jump of size zero. - * - * MULT (local input) int * - * On entry, MULT is an array of dimension 2, that contains the - * 16-lower and 15-higher bits of the constant a to jump from - * X(n) to X(n+1) = a*X(n) + c in the random sequence. - * - * IADD (local input) int * - * On entry, IADD is an array of dimension 2, that contains the - * 16-lower and 15-higher bits of the constant c to jump from - * X(n) to X(n+1) = a*X(n) + c in the random sequence. - * - * IRANN (local input) int * - * On entry, IRANN is an array of dimension 2. that contains the - * 16-lower and 15-higher bits of the encoding of X(n). - * - * IRANM (local output) int * - * On entry, IRANM is an array of dimension 2. On exit, this - * array contains respectively the 16-lower and 15-higher bits - * of the encoding of X(n+JUMPM). - * - * IAM (local output) int * - * On entry, IAM is an array of dimension 2. On exit, when JUMPM - * is greater than zero, this array contains the encoded - * constant A to jump from X(n) to X(n+JUMPM) in the random - * sequence. IAM(0:1) contains respectively the 16-lower and - * 15-higher bits of this constant A. When JUMPM is less or - * equal than zero, this array is not referenced. - * - * ICM (local output) int * - * On entry, ICM is an array of dimension 2. On exit, when JUMPM - * is greater than zero, this array contains the encoded - * constant C to jump from X(n) to X(n+JUMPM) in the random - * sequence. ICM(0:1) contains respectively the 16-lower and - * 15-higher bits of this constant C. When JUMPM is less or - * equal than zero, this array is not referenced. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int j[2], k; -/* .. - * .. Executable Statements .. - */ - if( JUMPM > 0 ) - { - IAM[0] = MULT[0]; IAM[1] = MULT[1]; /* IAM = MULT; */ - ICM[0] = IADD[0]; ICM[1] = IADD[1]; /* ICM = IADD; */ - for( k = 1; k <= JUMPM-1; k++ ) - { - HPL_lmul( IAM, MULT, j ); /* j = IAM * MULT; */ - IAM[0] = j[0]; IAM[1] = j[1]; /* IAM = j; */ - HPL_lmul( ICM, MULT, j ); /* j = ICM * MULT; */ - HPL_ladd( IADD, j, ICM ); /* ICM = IADD + j; */ - } - HPL_lmul( IRANN, IAM, j ); /* j = IRANN * IAM; */ - HPL_ladd( j, ICM, IRANM ); /* IRANM = j + ICM; */ - } - else - { /* IRANM = IRANN */ - IRANM[0] = IRANN[0]; IRANM[1] = IRANN[1]; - } -/* - * End of HPL_xjumpm - */ -} diff --git a/hpl/testing/matgen/intel64/Make.inc b/hpl/testing/matgen/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/testing/matgen/intel64/Makefile b/hpl/testing/matgen/intel64/Makefile deleted file mode 100644 index 0aaac9b3409800f89ae473f651239d923ac621d4..0000000000000000000000000000000000000000 --- a/hpl/testing/matgen/intel64/Makefile +++ /dev/null @@ -1,95 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_matgen.h -# -## Object files ######################################################## -# -HPL_matobj = \ - HPL_dmatgen.o HPL_ladd.o HPL_lmul.o \ - HPL_xjumpm.o HPL_jumpit.o HPL_rand.o \ - HPL_setran.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_matobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_matobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_dmatgen.o : ../HPL_dmatgen.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_dmatgen.c -HPL_ladd.o : ../HPL_ladd.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ladd.c -HPL_lmul.o : ../HPL_lmul.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_lmul.c -HPL_xjumpm.o : ../HPL_xjumpm.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_xjumpm.c -HPL_jumpit.o : ../HPL_jumpit.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_jumpit.c -HPL_rand.o : ../HPL_rand.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_rand.c -HPL_setran.o : ../HPL_setran.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_setran.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/testing/matgen/intel64/lib.grd b/hpl/testing/matgen/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/testing/pmatgen/HPL_pdmatgen.c b/hpl/testing/pmatgen/HPL_pdmatgen.c deleted file mode 100644 index 30c31c97d8da44ea3e69d16ed3cf2715fe84b455..0000000000000000000000000000000000000000 --- a/hpl/testing/pmatgen/HPL_pdmatgen.c +++ /dev/null @@ -1,198 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdmatgen -( - const HPL_T_grid * GRID, - const int M, - const int N, - const int NB, - double * A, - const int LDA, - const int ISEED -) -#else -void HPL_pdmatgen -( GRID, M, N, NB, A, LDA, ISEED ) - const HPL_T_grid * GRID; - const int M; - const int N; - const int NB; - double * A; - const int LDA; - const int ISEED; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdmatgen generates (or regenerates) a parallel random matrix A. - * - * The pseudo-random generator uses the linear congruential algorithm: - * X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer - * Programming, Knuth 1973, Vol. 2. - * - * Arguments - * ========= - * - * GRID (local input) const HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * M (global input) const int - * On entry, M specifies the number of rows of the matrix A. - * M must be at least zero. - * - * N (global input) const int - * On entry, N specifies the number of columns of the matrix A. - * N must be at least zero. - * - * NB (global input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * A (local output) double * - * On entry, A points to an array of dimension (LDA,LocQ(N)). - * On exit, this array contains the coefficients of the randomly - * generated matrix. - * - * LDA (local input) const int - * On entry, LDA specifies the leading dimension of the array A. - * LDA must be at least max(1,LocP(M)). - * - * ISEED (global input) const int - * On entry, ISEED specifies the seed number to generate the - * matrix A. ISEED must be at least zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int iadd [2], ia1 [2], ia2 [2], ia3 [2], - ia4 [2], ia5 [2], ib1 [2], ib2 [2], - ib3 [2], ic1 [2], ic2 [2], ic3 [2], - ic4 [2], ic5 [2], iran1[2], iran2[2], - iran3[2], iran4[2], itmp1[2], itmp2[2], - itmp3[2], jseed[2], mult [2]; - int ib, iblk, ik, jb, jblk, jk, jump1, jump2, - jump3, jump4, jump5, jump6, jump7, lmb, - lnb, mblks, mp, mycol, myrow, nblks, - npcol, nprow, nq; -/* .. - * .. Executable Statements .. - */ - (void) HPL_grid_info( GRID, &nprow, &npcol, &myrow, &mycol ); - - mult [0] = HPL_MULT0; mult [1] = HPL_MULT1; - iadd [0] = HPL_IADD0; iadd [1] = HPL_IADD1; - jseed[0] = ISEED; jseed[1] = 0; -/* - * Generate an M by N matrix starting in process (0,0) - */ - Mnumroc( mp, M, NB, NB, myrow, 0, nprow ); - Mnumroc( nq, N, NB, NB, mycol, 0, npcol ); - - if( ( mp <= 0 ) || ( nq <= 0 ) ) return; -/* - * Local number of blocks and size of the last one - */ - mblks = ( mp + NB - 1 ) / NB; lmb = mp - ( ( mp - 1 ) / NB ) * NB; - nblks = ( nq + NB - 1 ) / NB; lnb = nq - ( ( nq - 1 ) / NB ) * NB; -/* - * Compute multiplier/adder for various jumps in random sequence - */ - jump1 = 1; jump2 = nprow * NB; jump3 = M; jump4 = npcol * NB; - jump5 = NB; jump6 = mycol; jump7 = myrow * NB; - - HPL_xjumpm( jump1, mult, iadd, jseed, iran1, ia1, ic1 ); - HPL_xjumpm( jump2, mult, iadd, iran1, itmp1, ia2, ic2 ); - HPL_xjumpm( jump3, mult, iadd, iran1, itmp1, ia3, ic3 ); - HPL_xjumpm( jump4, ia3, ic3, iran1, itmp1, ia4, ic4 ); - HPL_xjumpm( jump5, ia3, ic3, iran1, itmp1, ia5, ic5 ); - HPL_xjumpm( jump6, ia5, ic5, iran1, itmp3, itmp1, itmp2 ); - HPL_xjumpm( jump7, mult, iadd, itmp3, iran1, itmp1, itmp2 ); - HPL_setran( 0, iran1 ); HPL_setran( 1, ia1 ); HPL_setran( 2, ic1 ); -/* - * Save value of first number in sequence - */ - ib1[0] = iran1[0]; ib1[1] = iran1[1]; - ib2[0] = iran1[0]; ib2[1] = iran1[1]; - ib3[0] = iran1[0]; ib3[1] = iran1[1]; - - for( jblk = 0; jblk < nblks; jblk++ ) - { - jb = ( jblk == nblks - 1 ? lnb : NB ); - for( jk = 0; jk < jb; jk++ ) - { - for( iblk = 0; iblk < mblks; iblk++ ) - { - ib = ( iblk == mblks - 1 ? lmb : NB ); - for( ik = 0; ik < ib; A++, ik++ ) *A = HPL_rand(); - HPL_jumpit( ia2, ic2, ib1, iran2 ); - ib1[0] = iran2[0]; ib1[1] = iran2[1]; - } - A += LDA - mp; - HPL_jumpit( ia3, ic3, ib2, iran3 ); - ib1[0] = iran3[0]; ib1[1] = iran3[1]; - ib2[0] = iran3[0]; ib2[1] = iran3[1]; - } - HPL_jumpit( ia4, ic4, ib3, iran4 ); - ib1[0] = iran4[0]; ib1[1] = iran4[1]; - ib2[0] = iran4[0]; ib2[1] = iran4[1]; - ib3[0] = iran4[0]; ib3[1] = iran4[1]; - } -/* - * End of HPL_pdmatgen - */ -} diff --git a/hpl/testing/pmatgen/intel64/Make.inc b/hpl/testing/pmatgen/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/testing/pmatgen/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/testing/pmatgen/intel64/Makefile b/hpl/testing/pmatgen/intel64/Makefile deleted file mode 100644 index 75f0c07cbe637eaaa2d98fa29556c1228501a3dc..0000000000000000000000000000000000000000 --- a/hpl/testing/pmatgen/intel64/Makefile +++ /dev/null @@ -1,81 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_matgen.h $(INCdir)/hpl_pmisc.h \ - $(INCdir)/hpl_pauxil.h $(INCdir)/hpl_pmatgen.h -# -## Object files ######################################################## -# -HPL_pmaobj = \ - HPL_pdmatgen.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_pmaobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_pmaobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_pdmatgen.o : ../HPL_pdmatgen.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdmatgen.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/testing/pmatgen/intel64/lib.grd b/hpl/testing/pmatgen/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/testing/ptest/HPL.dat b/hpl/testing/ptest/HPL.dat deleted file mode 100644 index 47aee883e1ff64af3a3069fdf7d97631c6242ba7..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/HPL.dat +++ /dev/null @@ -1,31 +0,0 @@ -HPLinpack benchmark input file -Innovative Computing Laboratory, University of Tennessee -HPL.out output file name (if any) -6 device out (6=stdout,7=stderr,file) -4 # of problems sizes (N) -29 30 34 35 Ns -4 # of NBs -1 2 3 4 NBs -0 PMAP process mapping (0=Row-,1=Column-major) -3 # of process grids (P x Q) -2 1 4 Ps -2 4 1 Qs -16.0 threshold -3 # of panel fact -0 1 2 PFACTs (0=left, 1=Crout, 2=Right) -2 # of recursive stopping criterium -2 4 NBMINs (>= 1) -1 # of panels in recursion -2 NDIVs -3 # of recursive panel fact. -0 1 2 RFACTs (0=left, 1=Crout, 2=Right) -1 # of broadcast -0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) -1 # of lookahead depth -0 DEPTHs (>=0) -2 SWAP (0=bin-exch,1=long,2=mix) -64 swapping threshold -0 L1 in (0=transposed,1=no-transposed) form -0 U in (0=transposed,1=no-transposed) form -1 Equilibration (0=no,1=yes) -8 memory alignment in double (> 0) diff --git a/hpl/testing/ptest/HPL_pddriver.c b/hpl/testing/ptest/HPL_pddriver.c deleted file mode 100644 index 22a56a1d4ca4f135df7a187897f417f474e78a5e..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/HPL_pddriver.c +++ /dev/null @@ -1,293 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -int main -( - int ARGC, - char * * ARGV -) -#else -int main( ARGC, ARGV ) -/* - * .. Scalar Arguments .. - */ - int ARGC; -/* - * .. Array Arguments .. - */ - char * * ARGV; -#endif -{ -/* - * Purpose - * ======= - * - * main is the main driver program for testing the HPL routines. - * This program is driven by a short data file named "HPL.dat". - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int nval [HPL_MAX_PARAM], - nbval [HPL_MAX_PARAM], - pval [HPL_MAX_PARAM], - qval [HPL_MAX_PARAM], - nbmval[HPL_MAX_PARAM], - ndvval[HPL_MAX_PARAM], - ndhval[HPL_MAX_PARAM]; - - HPL_T_FACT pfaval[HPL_MAX_PARAM], - rfaval[HPL_MAX_PARAM]; - - HPL_T_TOP topval[HPL_MAX_PARAM]; - - HPL_T_grid grid; - HPL_T_palg algo; - HPL_T_test test; - int L1notran, Unotran, align, equil, in, inb, - inbm, indh, indv, ipfa, ipq, irfa, itop, - mycol, myrow, ns, nbs, nbms, ndhs, ndvs, - npcol, npfs, npqs, nprow, nrfs, ntps, - rank, size, tswap; - HPL_T_ORDER pmapping; - HPL_T_FACT rpfa; - HPL_T_SWAP fswap; -/* .. - * .. Executable Statements .. - */ - MPI_Init( &ARGC, &ARGV ); -#ifdef HPL_CALL_VSIPL - vsip_init((void*)0); -#endif - MPI_Comm_rank( MPI_COMM_WORLD, &rank ); - MPI_Comm_size( MPI_COMM_WORLD, &size ); -/* - * Read and check validity of test parameters from input file - * - * HPL Version 1.0, Linpack benchmark input file - * Your message here - * HPL.out output file name (if any) - * 6 device out (6=stdout,7=stderr,file) - * 4 # of problems sizes (N) - * 29 30 34 35 Ns - * 4 # of NBs - * 1 2 3 4 NBs - * 0 PMAP process mapping (0=Row-,1=Column-major) - * 3 # of process grids (P x Q) - * 2 1 4 Ps - * 2 4 1 Qs - * 16.0 threshold - * 3 # of panel fact - * 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) - * 2 # of recursive stopping criterium - * 2 4 NBMINs (>= 1) - * 1 # of panels in recursion - * 2 NDIVs - * 3 # of recursive panel fact. - * 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) - * 1 # of broadcast - * 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) - * 1 # of lookahead depth - * 0 DEPTHs (>=0) - * 2 SWAP (0=bin-exch,1=long,2=mix) - * 4 swapping threshold - * 0 L1 in (0=transposed,1=no-transposed) form - * 0 U in (0=transposed,1=no-transposed) form - * 1 Equilibration (0=no,1=yes) - * 8 memory alignment in double (> 0) - */ - HPL_pdinfo( &test, &ns, nval, &nbs, nbval, &pmapping, &npqs, pval, qval, - &npfs, pfaval, &nbms, nbmval, &ndvs, ndvval, &nrfs, rfaval, - &ntps, topval, &ndhs, ndhval, &fswap, &tswap, &L1notran, - &Unotran, &equil, &align ); -/* - * Loop over different process grids - Define process grid. Go to bottom - * of process grid loop if this case does not use my process. - */ - for( ipq = 0; ipq < npqs; ipq++ ) - { - (void) HPL_grid_init( MPI_COMM_WORLD, pmapping, pval[ipq], qval[ipq], - &grid ); - (void) HPL_grid_info( &grid, &nprow, &npcol, &myrow, &mycol ); - - if( ( myrow < 0 ) || ( myrow >= nprow ) || - ( mycol < 0 ) || ( mycol >= npcol ) ) goto label_end_of_npqs; - - for( in = 0; in < ns; in++ ) - { /* Loop over various problem sizes */ - for( inb = 0; inb < nbs; inb++ ) - { /* Loop over various blocking factors */ - for( indh = 0; indh < ndhs; indh++ ) - { /* Loop over various lookahead depths */ - for( itop = 0; itop < ntps; itop++ ) - { /* Loop over various broadcast topologies */ - for( irfa = 0; irfa < nrfs; irfa++ ) - { /* Loop over various recursive factorizations */ - for( ipfa = 0; ipfa < npfs; ipfa++ ) - { /* Loop over various panel factorizations */ - for( inbm = 0; inbm < nbms; inbm++ ) - { /* Loop over various recursive stopping criteria */ - for( indv = 0; indv < ndvs; indv++ ) - { /* Loop over various # of panels in recursion */ -/* - * Set up the algorithm parameters - */ - algo.btopo = topval[itop]; algo.depth = ndhval[indh]; - algo.nbmin = nbmval[inbm]; algo.nbdiv = ndvval[indv]; - - algo.pfact = rpfa = pfaval[ipfa]; - - if( L1notran != 0 ) - { - if( rpfa == HPL_LEFT_LOOKING ) algo.pffun = HPL_pdpanllN; - else if( rpfa == HPL_CROUT ) algo.pffun = HPL_pdpancrN; - else algo.pffun = HPL_pdpanrlN; - - algo.rfact = rpfa = rfaval[irfa]; - if( rpfa == HPL_LEFT_LOOKING ) algo.rffun = HPL_pdrpanllN; - else if( rpfa == HPL_CROUT ) algo.rffun = HPL_pdrpancrN; - else algo.rffun = HPL_pdrpanrlN; - - if( Unotran != 0 ) algo.upfun = HPL_pdupdateNN; - else algo.upfun = HPL_pdupdateNT; - } - else - { - if( rpfa == HPL_LEFT_LOOKING ) algo.pffun = HPL_pdpanllT; - else if( rpfa == HPL_CROUT ) algo.pffun = HPL_pdpancrT; - else algo.pffun = HPL_pdpanrlT; - - algo.rfact = rpfa = rfaval[irfa]; - if( rpfa == HPL_LEFT_LOOKING ) algo.rffun = HPL_pdrpanllT; - else if( rpfa == HPL_CROUT ) algo.rffun = HPL_pdrpancrT; - else algo.rffun = HPL_pdrpanrlT; - - if( Unotran != 0 ) algo.upfun = HPL_pdupdateTN; - else algo.upfun = HPL_pdupdateTT; - } - - algo.fswap = fswap; algo.fsthr = tswap; - algo.equil = equil; algo.align = align; - - HPL_pdtest( &test, &grid, &algo, nval[in], nbval[inb] ); - - } - } - } - } - } - } - } - } - (void) HPL_grid_exit( &grid ); -label_end_of_npqs: ; - } -/* - * Print ending messages, close output file, exit. - */ - if( rank == 0 ) - { - test.ktest = test.kpass + test.kfail + test.kskip; -#ifndef HPL_DETAILED_TIMING - HPL_fprintf( test.outfp, "%s%s\n", - "========================================", - "========================================" ); -#else - if( test.thrsh > HPL_rzero ) - HPL_fprintf( test.outfp, "%s%s\n", - "========================================", - "========================================" ); -#endif - - HPL_fprintf( test.outfp, "\n%s %6d %s\n", "Finished", test.ktest, - "tests with the following results:" ); - if( test.thrsh > HPL_rzero ) - { - HPL_fprintf( test.outfp, " %6d %s\n", test.kpass, - "tests completed and passed residual checks," ); - HPL_fprintf( test.outfp, " %6d %s\n", test.kfail, - "tests completed and failed residual checks," ); - HPL_fprintf( test.outfp, " %6d %s\n", test.kskip, - "tests skipped because of illegal input values." ); - } - else - { - HPL_fprintf( test.outfp, " %6d %s\n", test.kpass, - "tests completed without checking," ); - HPL_fprintf( test.outfp, " %6d %s\n", test.kskip, - "tests skipped because of illegal input values." ); - } - - HPL_fprintf( test.outfp, "%s%s\n", - "----------------------------------------", - "----------------------------------------" ); - HPL_fprintf( test.outfp, "\nEnd of Tests.\n" ); - HPL_fprintf( test.outfp, "%s%s\n", - "========================================", - "========================================" ); - - if( ( test.outfp != stdout ) && ( test.outfp != stderr ) ) - (void) fclose( test.outfp ); - } -#ifdef HPL_CALL_VSIPL - vsip_finalize((void*)0); -#endif - MPI_Finalize(); - exit( 0 ); - - return( 0 ); -/* - * End of main - */ -} diff --git a/hpl/testing/ptest/HPL_pdinfo.c b/hpl/testing/ptest/HPL_pdinfo.c deleted file mode 100644 index e24530e6df73fbf8e7e553cbbeebfd3bd9802ffb..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/HPL_pdinfo.c +++ /dev/null @@ -1,1126 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdinfo -( - HPL_T_test * TEST, - int * NS, - int * N, - int * NBS, - int * NB, - HPL_T_ORDER * PMAPPIN, - int * NPQS, - int * P, - int * Q, - int * NPFS, - HPL_T_FACT * PF, - int * NBMS, - int * NBM, - int * NDVS, - int * NDV, - int * NRFS, - HPL_T_FACT * RF, - int * NTPS, - HPL_T_TOP * TP, - int * NDHS, - int * DH, - HPL_T_SWAP * FSWAP, - int * TSWAP, - int * L1NOTRAN, - int * UNOTRAN, - int * EQUIL, - int * ALIGN -) -#else -void HPL_pdinfo -( TEST, NS, N, NBS, NB, PMAPPIN, NPQS, P, Q, NPFS, PF, NBMS, NBM, NDVS, NDV, NRFS, RF, NTPS, TP, NDHS, DH, FSWAP, TSWAP, L1NOTRAN, UNOTRAN, EQUIL, ALIGN ) - HPL_T_test * TEST; - int * NS; - int * N; - int * NBS; - int * NB; - HPL_T_ORDER * PMAPPIN; - int * NPQS; - int * P; - int * Q; - int * NPFS; - HPL_T_FACT * PF; - int * NBMS; - int * NBM; - int * NDVS; - int * NDV; - int * NRFS; - HPL_T_FACT * RF; - int * NTPS; - HPL_T_TOP * TP; - int * NDHS; - int * DH; - HPL_T_SWAP * FSWAP; - int * TSWAP; - int * L1NOTRAN; - int * UNOTRAN; - int * EQUIL; - int * ALIGN; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdinfo reads the startup information for the various tests and - * transmits it to all processes. - * - * Arguments - * ========= - * - * TEST (global output) HPL_T_test * - * On entry, TEST points to a testing data structure. On exit, - * the fields of this data structure are initialized as follows: - * TEST->outfp specifies the output file where the results will - * be printed. It is only defined and used by the process 0 of - * the grid. TEST->thrsh specifies the threshhold value for the - * test ratio. TEST->epsil is the relative machine precision of - * the distributed computer. Finally the test counters, kfail, - * kpass, kskip, ktest are initialized to zero. - * - * NS (global output) int * - * On exit, NS specifies the number of different problem sizes - * to be tested. NS is less than or equal to HPL_MAX_PARAM. - * - * N (global output) int * - * On entry, N is an array of dimension HPL_MAX_PARAM. On exit, - * the first NS entries of this array contain the problem sizes - * to run the code with. - * - * NBS (global output) int * - * On exit, NBS specifies the number of different distribution - * blocking factors to be tested. NBS must be less than or equal - * to HPL_MAX_PARAM. - * - * NB (global output) int * - * On exit, PMAPPIN specifies the process mapping onto the no- - * des of the MPI machine configuration. PMAPPIN defaults to - * row-major ordering. - * - * PMAPPIN (global output) HPL_T_ORDER * - * On entry, NB is an array of dimension HPL_MAX_PARAM. On exit, - * the first NBS entries of this array contain the values of the - * various distribution blocking factors, to run the code with. - * - * NPQS (global output) int * - * On exit, NPQS specifies the number of different values that - * can be used for P and Q, i.e., the number of process grids to - * run the code with. NPQS must be less than or equal to - * HPL_MAX_PARAM. - * - * P (global output) int * - * On entry, P is an array of dimension HPL_MAX_PARAM. On exit, - * the first NPQS entries of this array contain the values of P, - * the number of process rows of the NPQS grids to run the code - * with. - * - * Q (global output) int * - * On entry, Q is an array of dimension HPL_MAX_PARAM. On exit, - * the first NPQS entries of this array contain the values of Q, - * the number of process columns of the NPQS grids to run the - * code with. - * - * NPFS (global output) int * - * On exit, NPFS specifies the number of different values that - * can be used for PF : the panel factorization algorithm to run - * the code with. NPFS is less than or equal to HPL_MAX_PARAM. - * - * PF (global output) HPL_T_FACT * - * On entry, PF is an array of dimension HPL_MAX_PARAM. On exit, - * the first NPFS entries of this array contain the various - * panel factorization algorithms to run the code with. - * - * NBMS (global output) int * - * On exit, NBMS specifies the number of various recursive - * stopping criteria to be tested. NBMS must be less than or - * equal to HPL_MAX_PARAM. - * - * NBM (global output) int * - * On entry, NBM is an array of dimension HPL_MAX_PARAM. On - * exit, the first NBMS entries of this array contain the values - * of the various recursive stopping criteria to be tested. - * - * NDVS (global output) int * - * On exit, NDVS specifies the number of various numbers of - * panels in recursion to be tested. NDVS is less than or equal - * to HPL_MAX_PARAM. - * - * NDV (global output) int * - * On entry, NDV is an array of dimension HPL_MAX_PARAM. On - * exit, the first NDVS entries of this array contain the values - * of the various numbers of panels in recursion to be tested. - * - * NRFS (global output) int * - * On exit, NRFS specifies the number of different values that - * can be used for RF : the recursive factorization algorithm to - * be tested. NRFS is less than or equal to HPL_MAX_PARAM. - * - * RF (global output) HPL_T_FACT * - * On entry, RF is an array of dimension HPL_MAX_PARAM. On exit, - * the first NRFS entries of this array contain the various - * recursive factorization algorithms to run the code with. - * - * NTPS (global output) int * - * On exit, NTPS specifies the number of different values that - * can be used for the broadcast topologies to be tested. NTPS - * is less than or equal to HPL_MAX_PARAM. - * - * TP (global output) HPL_T_TOP * - * On entry, TP is an array of dimension HPL_MAX_PARAM. On exit, - * the first NTPS entries of this array contain the various - * broadcast (along rows) topologies to run the code with. - * - * NDHS (global output) int * - * On exit, NDHS specifies the number of different values that - * can be used for the lookahead depths to be tested. NDHS is - * less than or equal to HPL_MAX_PARAM. - * - * DH (global output) int * - * On entry, DH is an array of dimension HPL_MAX_PARAM. On - * exit, the first NDHS entries of this array contain the values - * of lookahead depths to run the code with. Such a value is at - * least 0 (no-lookahead) or greater than zero. - * - * FSWAP (global output) HPL_T_SWAP * - * On exit, FSWAP specifies the swapping algorithm to be used in - * all tests. - * - * TSWAP (global output) int * - * On exit, TSWAP specifies the swapping threshold as a number - * of columns when the mixed swapping algorithm was chosen. - * - * L1NOTRA (global output) int * - * On exit, L1NOTRAN specifies whether the upper triangle of the - * panels of columns should be stored in no-transposed form - * (L1NOTRAN=1) or in transposed form (L1NOTRAN=0). - * - * UNOTRAN (global output) int * - * On exit, UNOTRAN specifies whether the panels of rows should - * be stored in no-transposed form (UNOTRAN=1) or transposed - * form (UNOTRAN=0) during their broadcast. - * - * EQUIL (global output) int * - * On exit, EQUIL specifies whether equilibration during the - * swap-broadcast of the panel of rows should be performed - * (EQUIL=1) or not (EQUIL=0). - * - * ALIGN (global output) int * - * On exit, ALIGN specifies the alignment of the dynamically - * allocated buffers in double precision words. ALIGN is greater - * than zero. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - char file[HPL_LINE_MAX], line[HPL_LINE_MAX], - auth[HPL_LINE_MAX], num [HPL_LINE_MAX]; - FILE * infp; - int * iwork = NULL; - char * lineptr; - int error=0, fid, i, j, lwork, maxp, nprocs, - rank, size; -/* .. - * .. Executable Statements .. - */ - MPI_Comm_rank( MPI_COMM_WORLD, &rank ); - MPI_Comm_size( MPI_COMM_WORLD, &size ); -/* - * Initialize the TEST data structure with default values - */ - TEST->outfp = stderr; TEST->epsil = 2.0e-16; TEST->thrsh = 16.0; - TEST->kfail = TEST->kpass = TEST->kskip = TEST->ktest = 0; -/* - * Process 0 reads the input data, broadcasts to other processes and - * writes needed information to TEST->outfp. - */ - if( rank == 0 ) - { -/* - * Open file and skip data file header - */ - if( ( infp = fopen( "HPL.dat", "r" ) ) == NULL ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "cannot open file HPL.dat" ); - error = 1; goto label_error; - } - - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) fgets( auth, HPL_LINE_MAX - 2, infp ); -/* - * Read name and unit number for summary output file - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", file ); - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); - fid = atoi( num ); - if ( fid == 6 ) TEST->outfp = stdout; - else if( fid == 7 ) TEST->outfp = stderr; - else if( ( TEST->outfp = fopen( file, "w" ) ) == NULL ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "cannot open file %s.", - file ); - error = 1; goto label_error; - } -/* - * Read and check the parameter values for the tests. - * - * Problem size (>=0) (N) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NS = atoi( num ); - if( ( *NS < 1 ) || ( *NS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %d", - "Number of values of N is less than 1 or greater than", - HPL_MAX_PARAM ); - error = 1; goto label_error; - } - - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( N[ i ] = atoi( num ) ) < 0 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of N less than 0" ); - error = 1; goto label_error; - } - } -/* - * Block size (>=1) (NB) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NBS = atoi( num ); - if( ( *NBS < 1 ) || ( *NBS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of NB is less than 1 or", - "greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NBS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( NB[ i ] = atoi( num ) ) < 1 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of NB less than 1" ); - error = 1; goto label_error; - } - } -/* - * Process grids, mapping, (>=1) (P, Q) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); - *PMAPPIN = ( atoi( num ) == 1 ? HPL_COLUMN_MAJOR : HPL_ROW_MAJOR ); - - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NPQS = atoi( num ); - if( ( *NPQS < 1 ) || ( *NPQS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of grids is less", - "than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NPQS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( P[ i ] = atoi( num ) ) < 1 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of P less than 1" ); - error = 1; goto label_error; - } - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NPQS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( Q[ i ] = atoi( num ) ) < 1 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of Q less than 1" ); - error = 1; goto label_error; - } - } -/* - * Check for enough processes in machine configuration - */ - maxp = 0; - for( i = 0; i < *NPQS; i++ ) - { nprocs = P[i] * Q[i]; maxp = Mmax( maxp, nprocs ); } - if( maxp > size ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Need at least %d processes for these tests", maxp ); - error = 1; goto label_error; - } -/* - * Checking threshold value (TEST->thrsh) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); TEST->thrsh = atof( num ); -/* - * Panel factorization algorithm (PF) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NPFS = atoi( num ); - if( ( *NPFS < 1 ) || ( *NPFS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "number of values of PFACT", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NPFS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - j = atoi( num ); - if( j == 0 ) PF[ i ] = HPL_LEFT_LOOKING; - else if( j == 1 ) PF[ i ] = HPL_CROUT; - else if( j == 2 ) PF[ i ] = HPL_RIGHT_LOOKING; - else PF[ i ] = HPL_RIGHT_LOOKING; - } -/* - * Recursive stopping criterium (>=1) (NBM) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NBMS = atoi( num ); - if( ( *NBMS < 1 ) || ( *NBMS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of NBMIN", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NBMS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( NBM[ i ] = atoi( num ) ) < 1 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of NBMIN less than 1" ); - error = 1; goto label_error; - } - } -/* - * Number of panels in recursion (>=2) (NDV) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NDVS = atoi( num ); - if( ( *NDVS < 1 ) || ( *NDVS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of NDIV", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NDVS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - if( ( NDV[ i ] = atoi( num ) ) < 2 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of NDIV less than 2" ); - error = 1; goto label_error; - } - } -/* - * Recursive panel factorization (RF) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NRFS = atoi( num ); - if( ( *NRFS < 1 ) || ( *NRFS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of RFACT", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NRFS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - j = atoi( num ); - if( j == 0 ) RF[ i ] = HPL_LEFT_LOOKING; - else if( j == 1 ) RF[ i ] = HPL_CROUT; - else if( j == 2 ) RF[ i ] = HPL_RIGHT_LOOKING; - else RF[ i ] = HPL_RIGHT_LOOKING; - } -/* - * Broadcast topology (TP) (0=rg, 1=2rg, 2=rgM, 3=2rgM, 4=L) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NTPS = atoi( num ); - if( ( *NTPS < 1 ) || ( *NTPS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of BCAST", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NTPS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); lineptr += strlen( num ) + 1; - j = atoi( num ); - if( j == 0 ) TP[ i ] = HPL_1RING; - else if( j == 1 ) TP[ i ] = HPL_1RING_M; - else if( j == 2 ) TP[ i ] = HPL_2RING; - else if( j == 3 ) TP[ i ] = HPL_2RING_M; - else if( j == 4 ) TP[ i ] = HPL_BLONG; - else if( j == 5 ) TP[ i ] = HPL_BLONG_M; - else TP[ i ] = HPL_1RING_M; - } -/* - * Lookahead depth (>=0) (NDH) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *NDHS = atoi( num ); - if( ( *NDHS < 1 ) || ( *NDHS > HPL_MAX_PARAM ) ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", "%s %s %d", - "Number of values of DEPTH", - "is less than 1 or greater than", HPL_MAX_PARAM ); - error = 1; goto label_error; - } - (void) fgets( line, HPL_LINE_MAX - 2, infp ); lineptr = line; - for( i = 0; i < *NDHS; i++ ) - { - (void) sscanf( lineptr, "%s", num ); - lineptr += strlen( num ) + 1; - if( ( DH[ i ] = atoi( num ) ) < 0 ) - { - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Value of DEPTH less than 0" ); - error = 1; goto label_error; - } - } -/* - * Swapping algorithm (0,1 or 2) (FSWAP) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); j = atoi( num ); - if( j == 0 ) *FSWAP = HPL_SWAP00; - else if( j == 1 ) *FSWAP = HPL_SWAP01; - else if( j == 2 ) *FSWAP = HPL_SW_MIX; - else *FSWAP = HPL_SWAP01; -/* - * Swapping threshold (>=0) (TSWAP) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *TSWAP = atoi( num ); - if( *TSWAP <= 0 ) *TSWAP = 0; -/* - * L1 in (no-)transposed form (0 or 1) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *L1NOTRAN = atoi( num ); - if( ( *L1NOTRAN != 0 ) && ( *L1NOTRAN != 1 ) ) *L1NOTRAN = 0; -/* - * U in (no-)transposed form (0 or 1) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *UNOTRAN = atoi( num ); - if( ( *UNOTRAN != 0 ) && ( *UNOTRAN != 1 ) ) *UNOTRAN = 0; -/* - * Equilibration (0=no, 1=yes) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *EQUIL = atoi( num ); - if( ( *EQUIL != 0 ) && ( *EQUIL != 1 ) ) *EQUIL = 1; -/* - * Memory alignment in bytes (> 0) (ALIGN) - */ - (void) fgets( line, HPL_LINE_MAX - 2, infp ); - (void) sscanf( line, "%s", num ); *ALIGN = atoi( num ); - if( *ALIGN <= 0 ) *ALIGN = 4; -/* - * Close input file - */ -label_error: - (void) fclose( infp ); - } - else { TEST->outfp = NULL; } -/* - * Check for error on reading input file - */ - (void) HPL_all_reduce( (void *)(&error), 1, HPL_INT, HPL_max, - MPI_COMM_WORLD ); - if( error ) - { - if( rank == 0 ) - HPL_pwarn( stderr, __LINE__, "HPL_pdinfo", - "Illegal input in file HPL.dat. Exiting ..." ); - MPI_Finalize(); -#ifdef HPL_CALL_VSIPL - (void) vsip_finalize( NULL ); -#endif - exit( 1 ); - } -/* - * Compute and broadcast machine epsilon - */ - TEST->epsil = HPL_pdlamch( MPI_COMM_WORLD, HPL_MACH_EPS ); -/* - * Pack information arrays and broadcast - */ - (void) HPL_broadcast( (void *)(&(TEST->thrsh)), 1, HPL_DOUBLE, 0, - MPI_COMM_WORLD ); -/* - * Broadcast array sizes - */ - iwork = (int *)malloc( (size_t)(15) * sizeof( int ) ); - if( rank == 0 ) - { - iwork[ 0] = *NS; iwork[ 1] = *NBS; - iwork[ 2] = ( *PMAPPIN == HPL_ROW_MAJOR ? 0 : 1 ); - iwork[ 3] = *NPQS; iwork[ 4] = *NPFS; iwork[ 5] = *NBMS; - iwork[ 6] = *NDVS; iwork[ 7] = *NRFS; iwork[ 8] = *NTPS; - iwork[ 9] = *NDHS; iwork[10] = *TSWAP; iwork[11] = *L1NOTRAN; - iwork[12] = *UNOTRAN; iwork[13] = *EQUIL; iwork[14] = *ALIGN; - } - (void) HPL_broadcast( (void *)iwork, 15, HPL_INT, 0, MPI_COMM_WORLD ); - if( rank != 0 ) - { - *NS = iwork[ 0]; *NBS = iwork[ 1]; - *PMAPPIN = ( iwork[ 2] == 0 ? HPL_ROW_MAJOR : HPL_COLUMN_MAJOR ); - *NPQS = iwork[ 3]; *NPFS = iwork[ 4]; *NBMS = iwork[ 5]; - *NDVS = iwork[ 6]; *NRFS = iwork[ 7]; *NTPS = iwork[ 8]; - *NDHS = iwork[ 9]; *TSWAP = iwork[10]; *L1NOTRAN = iwork[11]; - *UNOTRAN = iwork[12]; *EQUIL = iwork[13]; *ALIGN = iwork[14]; - } - if( iwork ) free( iwork ); -/* - * Pack information arrays and broadcast - */ - lwork = (*NS) + (*NBS) + 2 * (*NPQS) + (*NPFS) + (*NBMS) + - (*NDVS) + (*NRFS) + (*NTPS) + (*NDHS) + 1; - iwork = (int *)malloc( (size_t)(lwork) * sizeof( int ) ); - if( rank == 0 ) - { - j = 0; - for( i = 0; i < *NS; i++ ) { iwork[j] = N [i]; j++; } - for( i = 0; i < *NBS; i++ ) { iwork[j] = NB[i]; j++; } - for( i = 0; i < *NPQS; i++ ) { iwork[j] = P [i]; j++; } - for( i = 0; i < *NPQS; i++ ) { iwork[j] = Q [i]; j++; } - for( i = 0; i < *NPFS; i++ ) - { - if( PF[i] == HPL_LEFT_LOOKING ) iwork[j] = 0; - else if( PF[i] == HPL_CROUT ) iwork[j] = 1; - else if( PF[i] == HPL_RIGHT_LOOKING ) iwork[j] = 2; - j++; - } - for( i = 0; i < *NBMS; i++ ) { iwork[j] = NBM[i]; j++; } - for( i = 0; i < *NDVS; i++ ) { iwork[j] = NDV[i]; j++; } - for( i = 0; i < *NRFS; i++ ) - { - if( RF[i] == HPL_LEFT_LOOKING ) iwork[j] = 0; - else if( RF[i] == HPL_CROUT ) iwork[j] = 1; - else if( RF[i] == HPL_RIGHT_LOOKING ) iwork[j] = 2; - j++; - } - for( i = 0; i < *NTPS; i++ ) - { - if( TP[i] == HPL_1RING ) iwork[j] = 0; - else if( TP[i] == HPL_1RING_M ) iwork[j] = 1; - else if( TP[i] == HPL_2RING ) iwork[j] = 2; - else if( TP[i] == HPL_2RING_M ) iwork[j] = 3; - else if( TP[i] == HPL_BLONG ) iwork[j] = 4; - else if( TP[i] == HPL_BLONG_M ) iwork[j] = 5; - j++; - } - for( i = 0; i < *NDHS; i++ ) { iwork[j] = DH[i]; j++; } - - if( *FSWAP == HPL_SWAP00 ) iwork[j] = 0; - else if( *FSWAP == HPL_SWAP01 ) iwork[j] = 1; - else if( *FSWAP == HPL_SW_MIX ) iwork[j] = 2; - j++; - } - (void) HPL_broadcast( (void*)iwork, lwork, HPL_INT, 0, - MPI_COMM_WORLD ); - if( rank != 0 ) - { - j = 0; - for( i = 0; i < *NS; i++ ) { N [i] = iwork[j]; j++; } - for( i = 0; i < *NBS; i++ ) { NB[i] = iwork[j]; j++; } - for( i = 0; i < *NPQS; i++ ) { P [i] = iwork[j]; j++; } - for( i = 0; i < *NPQS; i++ ) { Q [i] = iwork[j]; j++; } - - for( i = 0; i < *NPFS; i++ ) - { - if( iwork[j] == 0 ) PF[i] = HPL_LEFT_LOOKING; - else if( iwork[j] == 1 ) PF[i] = HPL_CROUT; - else if( iwork[j] == 2 ) PF[i] = HPL_RIGHT_LOOKING; - j++; - } - for( i = 0; i < *NBMS; i++ ) { NBM[i] = iwork[j]; j++; } - for( i = 0; i < *NDVS; i++ ) { NDV[i] = iwork[j]; j++; } - for( i = 0; i < *NRFS; i++ ) - { - if( iwork[j] == 0 ) RF[i] = HPL_LEFT_LOOKING; - else if( iwork[j] == 1 ) RF[i] = HPL_CROUT; - else if( iwork[j] == 2 ) RF[i] = HPL_RIGHT_LOOKING; - j++; - } - for( i = 0; i < *NTPS; i++ ) - { - if( iwork[j] == 0 ) TP[i] = HPL_1RING; - else if( iwork[j] == 1 ) TP[i] = HPL_1RING_M; - else if( iwork[j] == 2 ) TP[i] = HPL_2RING; - else if( iwork[j] == 3 ) TP[i] = HPL_2RING_M; - else if( iwork[j] == 4 ) TP[i] = HPL_BLONG; - else if( iwork[j] == 5 ) TP[i] = HPL_BLONG_M; - j++; - } - for( i = 0; i < *NDHS; i++ ) { DH[i] = iwork[j]; j++; } - - if( iwork[j] == 0 ) *FSWAP = HPL_SWAP00; - else if( iwork[j] == 1 ) *FSWAP = HPL_SWAP01; - else if( iwork[j] == 2 ) *FSWAP = HPL_SW_MIX; - j++; - } - if( iwork ) free( iwork ); -/* - * regurgitate input - */ - if( rank == 0 ) - { - HPL_fprintf( TEST->outfp, "%s%s\n", - "========================================", - "========================================" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "HPLinpack 2.1 -- High-Performance Linpack benchmark -- ", - " October 26, 2012" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "Written by A. Petitet and R. Clint Whaley, ", - "Innovative Computing Laboratory, UTK" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "Modified by Piotr Luszczek, ", - "Innovative Computing Laboratory, UTK" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "Modified by Julien Langou, ", - "University of Colorado Denver"); - HPL_fprintf( TEST->outfp, "%s%s\n", - "========================================", - "========================================" ); - - HPL_fprintf( TEST->outfp, "\n%s\n", - "An explanation of the input/output parameters follows:" ); - HPL_fprintf( TEST->outfp, "%s\n", - "T/V : Wall time / encoded variant." ); - HPL_fprintf( TEST->outfp, "%s\n", - "N : The order of the coefficient matrix A." ); - HPL_fprintf( TEST->outfp, "%s\n", - "NB : The partitioning blocking factor." ); - HPL_fprintf( TEST->outfp, "%s\n", - "P : The number of process rows." ); - HPL_fprintf( TEST->outfp, "%s\n", - "Q : The number of process columns." ); - HPL_fprintf( TEST->outfp, "%s\n", - "Time : Time in seconds to solve the linear system." ); - HPL_fprintf( TEST->outfp, "%s\n\n", - "Gflops : Rate of execution for solving the linear system." ); - HPL_fprintf( TEST->outfp, "%s\n", - "The following parameter values will be used:" ); -/* - * Problem size - */ - HPL_fprintf( TEST->outfp, "\nN :" ); - for( i = 0; i < Mmin( 8, *NS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", N[i] ); - if( *NS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", N[i] ); - if( *NS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", N[i] ); - } - } -/* - * Distribution blocking factor - */ - HPL_fprintf( TEST->outfp, "\nNB :" ); - for( i = 0; i < Mmin( 8, *NBS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NB[i] ); - if( *NBS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NBS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NB[i] ); - if( *NBS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NBS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NB[i] ); - } - } -/* - * Process mapping - */ - HPL_fprintf( TEST->outfp, "\nPMAP :" ); - if( *PMAPPIN == HPL_ROW_MAJOR ) - HPL_fprintf( TEST->outfp, " Row-major process mapping" ); - else if( *PMAPPIN == HPL_COLUMN_MAJOR ) - HPL_fprintf( TEST->outfp, " Column-major process mapping" ); -/* - * Process grid - */ - HPL_fprintf( TEST->outfp, "\nP :" ); - for( i = 0; i < Mmin( 8, *NPQS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", P[i] ); - if( *NPQS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NPQS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", P[i] ); - if( *NPQS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NPQS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", P[i] ); - } - } - HPL_fprintf( TEST->outfp, "\nQ :" ); - for( i = 0; i < Mmin( 8, *NPQS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", Q[i] ); - if( *NPQS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NPQS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", Q[i] ); - if( *NPQS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NPQS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", Q[i] ); - } - } -/* - * Panel Factorization - */ - HPL_fprintf( TEST->outfp, "\nPFACT :" ); - for( i = 0; i < Mmin( 8, *NPFS ); i++ ) - { - if( PF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( PF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( PF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - if( *NPFS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NPFS ); i++ ) - { - if( PF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( PF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( PF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - if( *NPFS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NPFS; i++ ) - { - if( PF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( PF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( PF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - } - } -/* - * Recursive stopping criterium - */ - HPL_fprintf( TEST->outfp, "\nNBMIN :" ); - for( i = 0; i < Mmin( 8, *NBMS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NBM[i] ); - if( *NBMS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NBMS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NBM[i] ); - if( *NBMS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NBMS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NBM[i] ); - } - } -/* - * Number of panels in recursion - */ - HPL_fprintf( TEST->outfp, "\nNDIV :" ); - for( i = 0; i < Mmin( 8, *NDVS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NDV[i] ); - if( *NDVS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NDVS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NDV[i] ); - if( *NDVS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NDVS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", NDV[i] ); - } - } -/* - * Recursive Factorization - */ - HPL_fprintf( TEST->outfp, "\nRFACT :" ); - for( i = 0; i < Mmin( 8, *NRFS ); i++ ) - { - if( RF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( RF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( RF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - if( *NRFS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NRFS ); i++ ) - { - if( RF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( RF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( RF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - if( *NRFS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NRFS; i++ ) - { - if( RF[i] == HPL_LEFT_LOOKING ) - HPL_fprintf( TEST->outfp, " Left " ); - else if( RF[i] == HPL_CROUT ) - HPL_fprintf( TEST->outfp, " Crout " ); - else if( RF[i] == HPL_RIGHT_LOOKING ) - HPL_fprintf( TEST->outfp, " Right " ); - } - } - } -/* - * Broadcast topology - */ - HPL_fprintf( TEST->outfp, "\nBCAST :" ); - for( i = 0; i < Mmin( 8, *NTPS ); i++ ) - { - if( TP[i] == HPL_1RING ) - HPL_fprintf( TEST->outfp, " 1ring " ); - else if( TP[i] == HPL_1RING_M ) - HPL_fprintf( TEST->outfp, " 1ringM " ); - else if( TP[i] == HPL_2RING ) - HPL_fprintf( TEST->outfp, " 2ring " ); - else if( TP[i] == HPL_2RING_M ) - HPL_fprintf( TEST->outfp, " 2ringM " ); - else if( TP[i] == HPL_BLONG ) - HPL_fprintf( TEST->outfp, " Blong " ); - else if( TP[i] == HPL_BLONG_M ) - HPL_fprintf( TEST->outfp, " BlongM " ); - } - if( *NTPS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NTPS ); i++ ) - { - if( TP[i] == HPL_1RING ) - HPL_fprintf( TEST->outfp, " 1ring " ); - else if( TP[i] == HPL_1RING_M ) - HPL_fprintf( TEST->outfp, " 1ringM " ); - else if( TP[i] == HPL_2RING ) - HPL_fprintf( TEST->outfp, " 2ring " ); - else if( TP[i] == HPL_2RING_M ) - HPL_fprintf( TEST->outfp, " 2ringM " ); - else if( TP[i] == HPL_BLONG ) - HPL_fprintf( TEST->outfp, " Blong " ); - else if( TP[i] == HPL_BLONG_M ) - HPL_fprintf( TEST->outfp, " BlongM " ); - } - if( *NTPS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NTPS; i++ ) - { - if( TP[i] == HPL_1RING ) - HPL_fprintf( TEST->outfp, " 1ring " ); - else if( TP[i] == HPL_1RING_M ) - HPL_fprintf( TEST->outfp, " 1ringM " ); - else if( TP[i] == HPL_2RING ) - HPL_fprintf( TEST->outfp, " 2ring " ); - else if( TP[i] == HPL_2RING_M ) - HPL_fprintf( TEST->outfp, " 2ringM " ); - else if( TP[i] == HPL_BLONG ) - HPL_fprintf( TEST->outfp, " Blong " ); - else if( TP[i] == HPL_BLONG_M ) - HPL_fprintf( TEST->outfp, " BlongM " ); - } - } - } -/* - * Lookahead depths - */ - HPL_fprintf( TEST->outfp, "\nDEPTH :" ); - for( i = 0; i < Mmin( 8, *NDHS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", DH[i] ); - if( *NDHS > 8 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 8; i < Mmin( 16, *NDHS ); i++ ) - HPL_fprintf( TEST->outfp, "%8d ", DH[i] ); - if( *NDHS > 16 ) - { - HPL_fprintf( TEST->outfp, "\n " ); - for( i = 16; i < *NDHS; i++ ) - HPL_fprintf( TEST->outfp, "%8d ", DH[i] ); - } - } -/* - * Swapping algorithm - */ - HPL_fprintf( TEST->outfp, "\nSWAP :" ); - if( *FSWAP == HPL_SWAP00 ) - HPL_fprintf( TEST->outfp, " Binary-exchange" ); - else if( *FSWAP == HPL_SWAP01 ) - HPL_fprintf( TEST->outfp, " Spread-roll (long)" ); - else if( *FSWAP == HPL_SW_MIX ) - HPL_fprintf( TEST->outfp, " Mix (threshold = %d)", *TSWAP ); -/* - * L1 storage form - */ - HPL_fprintf( TEST->outfp, "\nL1 :" ); - if( *L1NOTRAN != 0 ) - HPL_fprintf( TEST->outfp, " no-transposed form" ); - else - HPL_fprintf( TEST->outfp, " transposed form" ); -/* - * U storage form - */ - HPL_fprintf( TEST->outfp, "\nU :" ); - if( *UNOTRAN != 0 ) - HPL_fprintf( TEST->outfp, " no-transposed form" ); - else - HPL_fprintf( TEST->outfp, " transposed form" ); -/* - * Equilibration - */ - HPL_fprintf( TEST->outfp, "\nEQUIL :" ); - if( *EQUIL != 0 ) - HPL_fprintf( TEST->outfp, " yes" ); - else - HPL_fprintf( TEST->outfp, " no" ); -/* - * Alignment - */ - HPL_fprintf( TEST->outfp, "\nALIGN : %d double precision words", - *ALIGN ); - - HPL_fprintf( TEST->outfp, "\n\n" ); -/* - * For testing only - */ - if( TEST->thrsh > HPL_rzero ) - { - HPL_fprintf( TEST->outfp, "%s%s\n\n", - "----------------------------------------", - "----------------------------------------" ); - HPL_fprintf( TEST->outfp, "%s\n", - "- The matrix A is randomly generated for each test." ); - HPL_fprintf( TEST->outfp, "%s\n", - "- The following scaled residual check will be computed:" ); - HPL_fprintf( TEST->outfp, "%s\n", - " ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )" ); - HPL_fprintf( TEST->outfp, "%s %21.6e\n", - "- The relative machine precision (eps) is taken to be ", - TEST->epsil ); - HPL_fprintf( TEST->outfp, "%s %11.1f\n\n", - "- Computational tests pass if scaled residuals are less than ", - TEST->thrsh ); - } - } -/* - * End of HPL_pdinfo - */ -} diff --git a/hpl/testing/ptest/HPL_pdtest.c b/hpl/testing/ptest/HPL_pdtest.c deleted file mode 100644 index 3f3bb6f19344d698a8114fe69cfa5d0b95d00941..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/HPL_pdtest.c +++ /dev/null @@ -1,436 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -#ifdef STDC_HEADERS -void HPL_pdtest -( - HPL_T_test * TEST, - HPL_T_grid * GRID, - HPL_T_palg * ALGO, - const int N, - const int NB -) -#else -void HPL_pdtest -( TEST, GRID, ALGO, N, NB ) - HPL_T_test * TEST; - HPL_T_grid * GRID; - HPL_T_palg * ALGO; - const int N; - const int NB; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_pdtest performs one test given a set of parameters such as the - * process grid, the problem size, the distribution blocking factor ... - * This function generates the data, calls and times the linear system - * solver, checks the accuracy of the obtained vector solution and - * writes this information to the file pointed to by TEST->outfp. - * - * Arguments - * ========= - * - * TEST (global input) HPL_T_test * - * On entry, TEST points to a testing data structure: outfp - * specifies the output file where the results will be printed. - * It is only defined and used by the process 0 of the grid. - * thrsh specifies the threshhold value for the test ratio. - * Concretely, a test is declared "PASSED" if and only if the - * following inequality is satisfied: - * ||Ax-b||_oo / ( epsil * - * ( || x ||_oo * || A ||_oo + || b ||_oo ) * - * N ) < thrsh. - * epsil is the relative machine precision of the distributed - * computer. Finally the test counters, kfail, kpass, kskip and - * ktest are updated as follows: if the test passes, kpass is - * incremented by one; if the test fails, kfail is incremented - * by one; if the test is skipped, kskip is incremented by one. - * ktest is left unchanged. - * - * GRID (local input) HPL_T_grid * - * On entry, GRID points to the data structure containing the - * process grid information. - * - * ALGO (global input) HPL_T_palg * - * On entry, ALGO points to the data structure containing the - * algorithmic parameters to be used for this test. - * - * N (global input) const int - * On entry, N specifies the order of the coefficient matrix A. - * N must be at least zero. - * - * NB (global input) const int - * On entry, NB specifies the blocking factor used to partition - * and distribute the matrix A. NB must be larger than one. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ -#ifdef HPL_DETAILED_TIMING - double HPL_w[HPL_TIMING_N]; -#endif - HPL_T_pmat mat; - double wtime[1]; - int info[3]; - double Anorm1, AnormI, Gflops, Xnorm1, XnormI, - BnormI, resid0, resid1; - double * Bptr; - void * vptr = NULL; - static int first=1; - int ii, ip2, mycol, myrow, npcol, nprow, nq; - char ctop, cpfact, crfact; - time_t current_time_start, current_time_end; -/* .. - * .. Executable Statements .. - */ - (void) HPL_grid_info( GRID, &nprow, &npcol, &myrow, &mycol ); - - mat.n = N; mat.nb = NB; mat.info = 0; - mat.mp = HPL_numroc( N, NB, NB, myrow, 0, nprow ); - nq = HPL_numroc( N, NB, NB, mycol, 0, npcol ); - mat.nq = nq + 1; -/* - * Allocate matrix, right-hand-side, and vector solution x. [ A | b ] is - * N by N+1. One column is added in every process column for the solve. - * The result however is stored in a 1 x N vector replicated in every - * process row. In every process, A is lda * (nq+1), x is 1 * nq and the - * workspace is mp. - * - * Ensure that lda is a multiple of ALIGN and not a power of 2 - */ - mat.ld = ( ( Mmax( 1, mat.mp ) - 1 ) / ALGO->align ) * ALGO->align; - do - { - ii = ( mat.ld += ALGO->align ); ip2 = 1; - while( ii > 1 ) { ii >>= 1; ip2 <<= 1; } - } - while( mat.ld == ip2 ); -/* - * Allocate dynamic memory - */ - vptr = (void*)malloc( ( (size_t)(ALGO->align) + - (size_t)(mat.ld+1) * (size_t)(mat.nq) ) * - sizeof(double) ); - info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol; - (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max, - GRID->all_comm ); - if( info[0] != 0 ) - { - if( ( myrow == 0 ) && ( mycol == 0 ) ) - HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest", - "[%d,%d] %s", info[1], info[2], - "Memory allocation failed for A, x and b. Skip." ); - (TEST->kskip)++; - return; - } -/* - * generate matrix and right-hand-side, [ A | b ] which is N by N+1. - */ - mat.A = (double *)HPL_PTR( vptr, - ((size_t)(ALGO->align) * sizeof(double) ) ); - mat.X = Mptr( mat.A, 0, mat.nq, mat.ld ); - HPL_pdmatgen( GRID, N, N+1, NB, mat.A, mat.ld, HPL_ISEED ); -#ifdef HPL_CALL_VSIPL - mat.block = vsip_blockbind_d( (vsip_scalar_d *)(mat.A), - (vsip_length)(mat.ld * mat.nq), - VSIP_MEM_NONE ); -#endif -/* - * Solve linear system - */ - HPL_ptimer_boot(); (void) HPL_barrier( GRID->all_comm ); - time( ¤t_time_start ); - HPL_ptimer( 0 ); - HPL_pdgesv( GRID, ALGO, &mat ); - HPL_ptimer( 0 ); - time( ¤t_time_end ); -#ifdef HPL_CALL_VSIPL - (void) vsip_blockrelease_d( mat.block, VSIP_TRUE ); - vsip_blockdestroy_d( mat.block ); -#endif -/* - * Gather max of all CPU and WALL clock timings and print timing results - */ - HPL_ptimer_combine( GRID->all_comm, HPL_AMAX_PTIME, HPL_WALL_PTIME, - 1, 0, wtime ); - - if( ( myrow == 0 ) && ( mycol == 0 ) ) - { - if( first ) - { - HPL_fprintf( TEST->outfp, "%s%s\n", - "========================================", - "========================================" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "T/V N NB P Q", - " Time Gflops" ); - HPL_fprintf( TEST->outfp, "%s%s\n", - "----------------------------------------", - "----------------------------------------" ); - if( TEST->thrsh <= HPL_rzero ) first = 0; - } -/* - * 2/3 N^3 - 1/2 N^2 flops for LU factorization + 2 N^2 flops for solve. - * Print WALL time - */ - Gflops = ( ( (double)(N) / 1.0e+9 ) * - ( (double)(N) / wtime[0] ) ) * - ( ( 2.0 / 3.0 ) * (double)(N) + ( 3.0 / 2.0 ) ); - - cpfact = ( ( (HPL_T_FACT)(ALGO->pfact) == - (HPL_T_FACT)(HPL_LEFT_LOOKING) ) ? (char)('L') : - ( ( (HPL_T_FACT)(ALGO->pfact) == (HPL_T_FACT)(HPL_CROUT) ) ? - (char)('C') : (char)('R') ) ); - crfact = ( ( (HPL_T_FACT)(ALGO->rfact) == - (HPL_T_FACT)(HPL_LEFT_LOOKING) ) ? (char)('L') : - ( ( (HPL_T_FACT)(ALGO->rfact) == (HPL_T_FACT)(HPL_CROUT) ) ? - (char)('C') : (char)('R') ) ); - - if( ALGO->btopo == HPL_1RING ) ctop = '0'; - else if( ALGO->btopo == HPL_1RING_M ) ctop = '1'; - else if( ALGO->btopo == HPL_2RING ) ctop = '2'; - else if( ALGO->btopo == HPL_2RING_M ) ctop = '3'; - else if( ALGO->btopo == HPL_BLONG ) ctop = '4'; - else /* if( ALGO->btopo == HPL_BLONG_M ) */ ctop = '5'; - - if( wtime[0] > HPL_rzero ) { - HPL_fprintf( TEST->outfp, - "W%c%1d%c%c%1d%c%1d%12d %5d %5d %5d %18.2f %18.3e\n", - ( GRID->order == HPL_ROW_MAJOR ? 'R' : 'C' ), - ALGO->depth, ctop, crfact, ALGO->nbdiv, cpfact, ALGO->nbmin, - N, NB, nprow, npcol, wtime[0], Gflops ); - HPL_fprintf( TEST->outfp, - "HPL_pdgesv() start time %s\n", ctime( ¤t_time_start ) ); - HPL_fprintf( TEST->outfp, - "HPL_pdgesv() end time %s\n", ctime( ¤t_time_end ) ); - } - } -#ifdef HPL_DETAILED_TIMING - HPL_ptimer_combine( GRID->all_comm, HPL_AMAX_PTIME, HPL_WALL_PTIME, - HPL_TIMING_N, HPL_TIMING_BEG, HPL_w ); - if( ( myrow == 0 ) && ( mycol == 0 ) ) - { - HPL_fprintf( TEST->outfp, "%s%s\n", - "--VVV--VVV--VVV--VVV--VVV--VVV--VVV--V", - "VV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-" ); -/* - * Recursive panel factorization - */ - if( HPL_w[HPL_TIMING_RPFACT-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "Max aggregated wall time rfact . . . : %18.2f\n", - HPL_w[HPL_TIMING_RPFACT-HPL_TIMING_BEG] ); -/* - * Panel factorization - */ - if( HPL_w[HPL_TIMING_PFACT-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "+ Max aggregated wall time pfact . . : %18.2f\n", - HPL_w[HPL_TIMING_PFACT-HPL_TIMING_BEG] ); -/* - * Panel factorization (swap) - */ - if( HPL_w[HPL_TIMING_MXSWP-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "+ Max aggregated wall time mxswp . . : %18.2f\n", - HPL_w[HPL_TIMING_MXSWP-HPL_TIMING_BEG] ); -/* - * Update - */ - if( HPL_w[HPL_TIMING_UPDATE-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "Max aggregated wall time update . . : %18.2f\n", - HPL_w[HPL_TIMING_UPDATE-HPL_TIMING_BEG] ); -/* - * Update (swap) - */ - if( HPL_w[HPL_TIMING_LASWP-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "+ Max aggregated wall time laswp . . : %18.2f\n", - HPL_w[HPL_TIMING_LASWP-HPL_TIMING_BEG] ); -/* - * Upper triangular system solve - */ - if( HPL_w[HPL_TIMING_PTRSV-HPL_TIMING_BEG] > HPL_rzero ) - HPL_fprintf( TEST->outfp, - "Max aggregated wall time up tr sv . : %18.2f\n", - HPL_w[HPL_TIMING_PTRSV-HPL_TIMING_BEG] ); - - if( TEST->thrsh <= HPL_rzero ) - HPL_fprintf( TEST->outfp, "%s%s\n", - "========================================", - "========================================" ); - } -#endif -/* - * Quick return, if I am not interested in checking the computations - */ - if( TEST->thrsh <= HPL_rzero ) - { (TEST->kpass)++; if( vptr ) free( vptr ); return; } -/* - * Check info returned by solve - */ - if( mat.info != 0 ) - { - if( ( myrow == 0 ) && ( mycol == 0 ) ) - HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest", "%s %d, %s", - "Error code returned by solve is", mat.info, "skip" ); - (TEST->kskip)++; - if( vptr ) free( vptr ); return; - } -/* - * Check computation, re-generate [ A | b ], compute norm 1 and inf of A and x, - * and norm inf of b - A x. Display residual checks. - */ - HPL_pdmatgen( GRID, N, N+1, NB, mat.A, mat.ld, HPL_ISEED ); - Anorm1 = HPL_pdlange( GRID, HPL_NORM_1, N, N, NB, mat.A, mat.ld ); - AnormI = HPL_pdlange( GRID, HPL_NORM_I, N, N, NB, mat.A, mat.ld ); -/* - * Because x is distributed in process rows, switch the norms - */ - XnormI = HPL_pdlange( GRID, HPL_NORM_1, 1, N, NB, mat.X, 1 ); - Xnorm1 = HPL_pdlange( GRID, HPL_NORM_I, 1, N, NB, mat.X, 1 ); -/* - * If I am in the col that owns b, (1) compute local BnormI, (2) all_reduce to - * find the max (in the col). Then (3) broadcast along the rows so that every - * process has BnormI. Note that since we use a uniform distribution in [-0.5,0.5] - * for the entries of B, it is very likely that BnormI (<=,~) 0.5. - */ - Bptr = Mptr( mat.A, 0, nq, mat.ld ); - if( mycol == HPL_indxg2p( N, NB, NB, 0, npcol ) ){ - if( mat.mp > 0 ) - { - BnormI = Bptr[HPL_idamax( mat.mp, Bptr, 1 )]; BnormI = Mabs( BnormI ); - } - else - { - BnormI = HPL_rzero; - } - (void) HPL_all_reduce( (void *)(&BnormI), 1, HPL_DOUBLE, HPL_max, - GRID->col_comm ); - } - (void) HPL_broadcast( (void *)(&BnormI), 1, HPL_DOUBLE, - HPL_indxg2p( N, NB, NB, 0, npcol ), - GRID->row_comm ); -/* - * If I own b, compute ( b - A x ) and ( - A x ) otherwise - */ - if( mycol == HPL_indxg2p( N, NB, NB, 0, npcol ) ) - { - HPL_dgemv( HplColumnMajor, HplNoTrans, mat.mp, nq, -HPL_rone, - mat.A, mat.ld, mat.X, 1, HPL_rone, Bptr, 1 ); - } - else if( nq > 0 ) - { - HPL_dgemv( HplColumnMajor, HplNoTrans, mat.mp, nq, -HPL_rone, - mat.A, mat.ld, mat.X, 1, HPL_rzero, Bptr, 1 ); - } - else { for( ii = 0; ii < mat.mp; ii++ ) Bptr[ii] = HPL_rzero; } -/* - * Reduce the distributed residual in process column 0 - */ - if( mat.mp > 0 ) - (void) HPL_reduce( Bptr, mat.mp, HPL_DOUBLE, HPL_sum, 0, - GRID->row_comm ); -/* - * Compute || b - A x ||_oo - */ - resid0 = HPL_pdlange( GRID, HPL_NORM_I, N, 1, NB, Bptr, mat.ld ); -/* - * Computes and displays norms, residuals ... - */ - if( N <= 0 ) - { - resid1 = HPL_rzero; - } - else - { - resid1 = resid0 / ( TEST->epsil * ( AnormI * XnormI + BnormI ) * (double)(N) ); - } - - if( resid1 < TEST->thrsh ) (TEST->kpass)++; - else (TEST->kfail)++; - - if( ( myrow == 0 ) && ( mycol == 0 ) ) - { - HPL_fprintf( TEST->outfp, "%s%s\n", - "----------------------------------------", - "----------------------------------------" ); - HPL_fprintf( TEST->outfp, "%s%16.7f%s%s\n", - "||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= ", resid1, - " ...... ", ( resid1 < TEST->thrsh ? "PASSED" : "FAILED" ) ); - - if( resid1 >= TEST->thrsh ) - { - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||Ax-b||_oo . . . . . . . . . . . . . . . . . = ", resid0 ); - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||A||_oo . . . . . . . . . . . . . . . . . . . = ", AnormI ); - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||A||_1 . . . . . . . . . . . . . . . . . . . = ", Anorm1 ); - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||x||_oo . . . . . . . . . . . . . . . . . . . = ", XnormI ); - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||x||_1 . . . . . . . . . . . . . . . . . . . = ", Xnorm1 ); - HPL_fprintf( TEST->outfp, "%s%18.6f\n", - "||b||_oo . . . . . . . . . . . . . . . . . . . = ", BnormI ); - } - } - if( vptr ) free( vptr ); -/* - * End of HPL_pdtest - */ -} diff --git a/hpl/testing/ptest/intel64/Make.inc b/hpl/testing/ptest/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/testing/ptest/intel64/Makefile b/hpl/testing/ptest/intel64/Makefile deleted file mode 100644 index f9de98381367eed44d7cc3ac50f45e28ce5cb11e..0000000000000000000000000000000000000000 --- a/hpl/testing/ptest/intel64/Makefile +++ /dev/null @@ -1,94 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_misc.h $(INCdir)/hpl_blas.h $(INCdir)/hpl_auxil.h \ - $(INCdir)/hpl_gesv.h $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_pauxil.h \ - $(INCdir)/hpl_panel.h $(INCdir)/hpl_pgesv.h $(INCdir)/hpl_pmatgen.h \ - $(INCdir)/hpl_ptimer.h $(INCdir)/hpl_ptest.h -# -## Executable names #################################################### -# -xhpl = $(BINdir)/xhpl -# -## Object files ######################################################## -# -HPL_pteobj = \ - HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o -# -## Targets ############################################################# -# -all : dexe -# -dexe : dexe.grd -# -$(BINdir)/HPL.dat : ../HPL.dat - ( $(CP) ../HPL.dat $(BINdir) ) -# -dexe.grd: $(HPL_pteobj) $(HPLlib) - $(LINKER) $(LINKFLAGS) -o $(xhpl) $(HPL_pteobj) $(HPL_LIBS) - $(MAKE) $(BINdir)/HPL.dat - $(TOUCH) dexe.grd -# -# ###################################################################### -# -HPL_pddriver.o : ../HPL_pddriver.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pddriver.c -HPL_pdinfo.o : ../HPL_pdinfo.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdinfo.c -HPL_pdtest.o : ../HPL_pdtest.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_pdtest.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/testing/ptest/intel64/dexe.grd b/hpl/testing/ptest/intel64/dexe.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/testing/ptimer/HPL_ptimer.c b/hpl/testing/ptimer/HPL_ptimer.c deleted file mode 100644 index 9d779f8560f19bcce66d18f900e843e619efa329..0000000000000000000000000000000000000000 --- a/hpl/testing/ptimer/HPL_ptimer.c +++ /dev/null @@ -1,358 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * --------------------------------------------------------------------- - * Static variables - * --------------------------------------------------------------------- - */ -static int HPL_ptimer_disabled; -static double HPL_ptimer_cpusec [HPL_NPTIMER], - HPL_ptimer_cpustart [HPL_NPTIMER], - HPL_ptimer_wallsec [HPL_NPTIMER], - HPL_ptimer_wallstart[HPL_NPTIMER]; -/* - * --------------------------------------------------------------------- - * User callable functions - * --------------------------------------------------------------------- - */ -#ifdef STDC_HEADERS -void HPL_ptimer_boot( void ) -#else -void HPL_ptimer_boot() -#endif -{ -/* - * HPL_ptimer_boot (re)sets all timers to 0, and enables HPL_ptimer. - */ -/* - * .. Local Variables .. - */ - int i; -/* .. - * .. Executable Statements .. - */ - HPL_ptimer_disabled = 0; - - for( i = 0; i < HPL_NPTIMER; i++ ) - { - HPL_ptimer_cpusec [i] = HPL_ptimer_wallsec [i] = HPL_rzero; - HPL_ptimer_cpustart[i] = HPL_ptimer_wallstart[i] = HPL_PTIMER_STARTFLAG; - } -/* - * End of HPL_ptimer_boot - */ -} - -#ifdef STDC_HEADERS -void HPL_ptimer( const int I ) -#else -void HPL_ptimer( I ) - const int I; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_ptimer provides a "stopwatch" functionality cpu/wall timer in - * seconds. Up to 64 separate timers can be functioning at once. The - * first call starts the timer, and the second stops it. This routine - * can be disenabled by calling HPL_ptimer_disable(), so that calls to - * the timer are ignored. This feature can be used to make sure certain - * sections of code do not affect timings, even if they call routines - * which have HPL_ptimer calls in them. HPL_ptimer_enable() will enable - * the timer functionality. One can retrieve the current value of a - * timer by calling - * - * t0 = HPL_ptimer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - * - * where I is the timer index in [0..64). To inititialize the timer - * functionality, one must have called HPL_ptimer_boot() prior to any of - * the functions mentioned above. - * - * Arguments - * ========= - * - * I (global input) const int - * On entry, I specifies the timer to stop/start. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - if( HPL_ptimer_disabled ) return; -/* - * If timer has not been started, start it. Otherwise, stop it and add - * interval to count - */ - if( HPL_ptimer_wallstart[I] == HPL_PTIMER_STARTFLAG ) - { - HPL_ptimer_wallstart[I] = HPL_ptimer_walltime(); - HPL_ptimer_cpustart [I] = HPL_ptimer_cputime (); - } - else - { - HPL_ptimer_cpusec [I] += HPL_ptimer_cputime ()-HPL_ptimer_cpustart [I]; - HPL_ptimer_wallsec [I] += HPL_ptimer_walltime()-HPL_ptimer_wallstart[I]; - HPL_ptimer_wallstart[I] = HPL_PTIMER_STARTFLAG; - } -/* - * End of HPL_ptimer - */ -} - -#ifdef STDC_HEADERS -void HPL_ptimer_enable( void ) -#else -void HPL_ptimer_enable() -#endif -{ -/* - * HPL_ptimer_enable sets it so calls to HPL_ptimer are not ignored. - */ -/* .. - * .. Executable Statements .. - */ - HPL_ptimer_disabled = 0; - return; -/* - * End of HPL_ptimer_enable - */ -} - -#ifdef STDC_HEADERS -void HPL_ptimer_disable( void ) -#else -void HPL_ptimer_disable() -#endif -{ -/* - * HPL_ptimer_disable sets it so calls to HPL_ptimer are ignored. - */ -/* .. - * .. Executable Statements .. - */ - HPL_ptimer_disabled = 1; - return; -/* - * End of HPL_ptimer_disable - */ -} - -#ifdef STDC_HEADERS -double HPL_ptimer_inquire -( - const HPL_T_PTIME TMTYPE, - const int I -) -#else -double HPL_ptimer_inquire( TMTYPE, I ) - const int I; - const HPL_T_PTIME TMTYPE; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_ptimer_inquire returns wall- or cpu- time that has accumulated in - * timer I. - * - * Arguments - * ========= - * - * TMTYPE (global input) const HPL_T_PTIME - * On entry, TMTYPE specifies what time will be returned as fol- - * lows - * = HPL_WALL_PTIME : wall clock time is returned, - * = HPL_CPU_PTIME : CPU time is returned (default). - * - * I (global input) const int - * On entry, I specifies the timer to return. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double time; -/* .. - * .. Executable Statements .. - */ -/* - * If wall- or cpu-time are not available on this machine, return - * HPL_PTIMER_ERROR - */ - if( TMTYPE == HPL_WALL_PTIME ) - { - if( HPL_ptimer_walltime() == HPL_PTIMER_ERROR ) - time = HPL_PTIMER_ERROR; - else - time = HPL_ptimer_wallsec[I]; - } - else - { - if( HPL_ptimer_cputime() == HPL_PTIMER_ERROR ) - time = HPL_PTIMER_ERROR; - else - time = HPL_ptimer_cpusec [I]; - } - return( time ); -/* - * End of HPL_ptimer_inquire - */ -} - -#ifdef STDC_HEADERS -void HPL_ptimer_combine -( - MPI_Comm COMM, - const HPL_T_PTIME_OP OPE, - const HPL_T_PTIME TMTYPE, - const int N, - const int IBEG, - double * TIMES -) -#else -void HPL_ptimer_combine( COMM, OPE, TMTYPE, N, IBEG, TIMES ) - const int IBEG, N; - const HPL_T_PTIME_OP OPE; - const HPL_T_PTIME TMTYPE; - MPI_Comm COMM; - double * TIMES; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_ptimer_combine combines the timing information stored on a scope - * of processes into the user TIMES array. - * - * Arguments - * ========= - * - * COMM (global/local input) MPI_Comm - * The MPI communicator identifying the process collection on - * which the timings are taken. - * - * OPE (global input) const HPL_T_PTIME_OP - * On entry, OP specifies what combine operation should be done - * as follows: - * = HPL_AMAX_PTIME get max. time on any process (default), - * = HPL_AMIN_PTIME get min. time on any process, - * = HPL_SUM_PTIME get sum of times across processes. - * - * TMTYPE (global input) const HPL_T_PTIME - * On entry, TMTYPE specifies what time will be returned as fol- - * lows - * = HPL_WALL_PTIME : wall clock time is returned, - * = HPL_CPU_PTIME : CPU time is returned (default). - * - * N (global input) const int - * On entry, N specifies the number of timers to combine. - * - * IBEG (global input) const int - * On entry, IBEG specifies the first timer to be combined. - * - * TIMES (global output) double * - * On entry, TIMES is an array of dimension at least N. On exit, - * this array contains the requested timing information. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - int i, tmpdis; -/* .. - * .. Executable Statements .. - */ - tmpdis = HPL_ptimer_disabled; HPL_ptimer_disabled = 1; -/* - * Timer has been disabled for combine operation - copy timing informa- - * tion into user times array. If wall- or cpu-time are not available - * on this machine, fill in times with HPL_PTIMER_ERROR flag and return. - */ - if( TMTYPE == HPL_WALL_PTIME ) - { - if( HPL_ptimer_walltime() == HPL_PTIMER_ERROR ) - { for( i = 0; i < N; i++ ) TIMES[i] = HPL_PTIMER_ERROR; return; } - else - { for( i = 0; i < N; i++ ) TIMES[i] = HPL_ptimer_wallsec[IBEG+i]; } - } - else - { - if( HPL_ptimer_cputime() == HPL_PTIMER_ERROR ) - { for( i = 0; i < N; i++ ) TIMES[i] = HPL_PTIMER_ERROR; return; } - else - { for( i = 0; i < N; i++ ) TIMES[i] = HPL_ptimer_cpusec[IBEG+i]; } - } -/* - * Combine all nodes information, restore HPL_ptimer_disabled, and return - */ - for( i = 0; i < N; i++ ) TIMES[i] = Mmax( HPL_rzero, TIMES[i] ); - - if( OPE == HPL_AMAX_PTIME ) - (void) HPL_all_reduce( (void *)(TIMES), N, HPL_DOUBLE, HPL_max, COMM ); - else if( OPE == HPL_AMIN_PTIME ) - (void) HPL_all_reduce( (void *)(TIMES), N, HPL_DOUBLE, HPL_min, COMM ); - else if( OPE == HPL_SUM_PTIME ) - (void) HPL_all_reduce( (void *)(TIMES), N, HPL_DOUBLE, HPL_sum, COMM ); - else - (void) HPL_all_reduce( (void *)(TIMES), N, HPL_DOUBLE, HPL_max, COMM ); - - HPL_ptimer_disabled = tmpdis; -/* - * End of HPL_ptimer_combine - */ -} diff --git a/hpl/testing/ptimer/HPL_ptimer_cputime.c b/hpl/testing/ptimer/HPL_ptimer_cputime.c deleted file mode 100644 index 639df6059cfe1a21aff73c55e1942b526adab4fa..0000000000000000000000000000000000000000 --- a/hpl/testing/ptimer/HPL_ptimer_cputime.c +++ /dev/null @@ -1,146 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -/* - * Purpose - * ======= - * - * HPL_ptimer_cputime returns the cpu time. If HPL_USE_CLOCK is defined, - * the clock() function is used to return an approximation of processor - * time used by the program. The value returned is the CPU time used so - * far as a clock_t; to get the number of seconds used, the result is - * divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C - * standard library. If HPL_USE_TIMES is defined, the times() function - * is used instead. This function returns the current process times. - * times() returns the number of clock ticks that have elapsed since the - * system has been up. Otherwise and by default, the standard library - * function getrusage() is used. - * - * --------------------------------------------------------------------- - */ - -#if defined( HPL_USE_CLOCK ) - -#include - -#ifdef STDC_HEADERS -double HPL_ptimer_cputime( void ) -#else -double HPL_ptimer_cputime() -#endif -{ - static double cps = CLOCKS_PER_SEC; - double d; - clock_t t1; - static clock_t t0 = 0; - - if( t0 == 0 ) t0 = clock(); - t1 = clock() - t0; - d = (double)(t1) / cps; - return( d ); -} - -#elif defined( HPL_USE_TIMES ) - -#include -#include - -#ifdef STDC_HEADERS -double HPL_ptimer_cputime( void ) -#else -double HPL_ptimer_cputime() -#endif -{ - clock_t t1; - struct tms ts; - static double ClockTick = HPL_rzero; - - if( ClockTick == HPL_rzero ) ClockTick = (double)(sysconf(_SC_CLK_TCK)); - (void) times( &ts ); - return( (double)(ts.tms_utime) / ClockTick ); -} - -/* #elif defined( HPL_USE_GETRUSAGE ) */ -#else - -#include -#include - -#ifdef STDC_HEADERS -double HPL_ptimer_cputime( void ) -#else -double HPL_ptimer_cputime() -#endif -{ - struct rusage ruse; - - (void) getrusage( RUSAGE_SELF, &ruse ); - return( (double)( ruse.ru_utime.tv_sec ) + - ( (double)( ruse.ru_utime.tv_usec ) / 1000000.0 ) ); -} - -/* -#else - -#ifdef STDC_HEADERS -double HPL_ptimer_cputime( void ) -#else -double HPL_ptimer_cputime() -#endif -{ - return( HPL_PTIMER_ERROR ); -} -*/ - -#endif -/* - * End of HPL_ptimer_cputime - */ diff --git a/hpl/testing/ptimer/HPL_ptimer_walltime.c b/hpl/testing/ptimer/HPL_ptimer_walltime.c deleted file mode 100644 index c6523ac4b83213e59a4e7b09900743efef35cebc..0000000000000000000000000000000000000000 --- a/hpl/testing/ptimer/HPL_ptimer_walltime.c +++ /dev/null @@ -1,103 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -/* - * Purpose - * ======= - * - * HPL_ptimer_walltime returns the elapsed (wall-clock) time. - * - * - * --------------------------------------------------------------------- - */ - -#if defined( HPL_USE_GETTIMEOFDAY ) - -#include -#include - -#ifdef STDC_HEADERS -double HPL_ptimer_walltime( void ) -#else -double HPL_ptimer_walltime() -#endif -{ - struct timeval tp; - static long start=0, startu; - - if( !start ) - { - (void) gettimeofday( &tp, NULL ); - start = tp.tv_sec; - startu = tp.tv_usec; - return( HPL_rzero ); - } - (void) gettimeofday( &tp, NULL ); - - return( (double)( tp.tv_sec - start ) + - ( (double)( tp.tv_usec-startu ) / 1000000.0 ) ); -} - -#else - -#ifdef STDC_HEADERS -double HPL_ptimer_walltime( void ) -#else -double HPL_ptimer_walltime() -#endif -{ - return( MPI_Wtime() ); -} - -#endif -/* - * End of HPL_ptimer_walltime - */ diff --git a/hpl/testing/ptimer/intel64/Make.inc b/hpl/testing/ptimer/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/testing/ptimer/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/testing/ptimer/intel64/Makefile b/hpl/testing/ptimer/intel64/Makefile deleted file mode 100644 index 012b65989b7a31780185547e3dc907a98067ae28..0000000000000000000000000000000000000000 --- a/hpl/testing/ptimer/intel64/Makefile +++ /dev/null @@ -1,84 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_ptimer.h -# -## Object files ######################################################## -# -HPL_ptiobj = \ - HPL_ptimer.o HPL_ptimer_cputime.o HPL_ptimer_walltime.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_ptiobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_ptiobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_ptimer.o : ../HPL_ptimer.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer.c -HPL_ptimer_cputime.o : ../HPL_ptimer_cputime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer_cputime.c -HPL_ptimer_walltime.o : ../HPL_ptimer_walltime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_ptimer_walltime.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/testing/ptimer/intel64/lib.grd b/hpl/testing/ptimer/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/testing/timer/HPL_timer.c b/hpl/testing/timer/HPL_timer.c deleted file mode 100644 index e80af1a0ef51d9b38ac5afaf8e6779b7fffc7305..0000000000000000000000000000000000000000 --- a/hpl/testing/timer/HPL_timer.c +++ /dev/null @@ -1,253 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" -/* - * --------------------------------------------------------------------- - * Static variables - * --------------------------------------------------------------------- - */ -static int HPL_timer_disabled; -static double HPL_timer_cpusec [HPL_NTIMER], - HPL_timer_cpustart [HPL_NTIMER], - HPL_timer_wallsec [HPL_NTIMER], - HPL_timer_wallstart[HPL_NTIMER]; -/* - * --------------------------------------------------------------------- - * User callable functions - * --------------------------------------------------------------------- - */ -#ifdef STDC_HEADERS -void HPL_timer_boot( void ) -#else -void HPL_timer_boot() -#endif -{ -/* - * HPL_timer_boot (re)sets all timers to 0, and enables HPL_timer. - */ -/* - * .. Local Variables .. - */ - int i; -/* .. - * .. Executable Statements .. - */ - HPL_timer_disabled = 0; - - for( i = 0; i < HPL_NTIMER; i++ ) - { - HPL_timer_cpusec [i] = HPL_timer_wallsec [i] = HPL_rzero; - HPL_timer_cpustart[i] = HPL_timer_wallstart[i] = HPL_TIMER_STARTFLAG; - } -/* - * End of HPL_timer_boot - */ -} - -#ifdef STDC_HEADERS -void HPL_timer( const int I ) -#else -void HPL_timer( I ) - const int I; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_timer provides a "stopwatch" functionality cpu/wall timer in - * seconds. Up to 64 separate timers can be functioning at once. The - * first call starts the timer, and the second stops it. This routine - * can be disenabled by calling HPL_timer_disable(), so that calls to - * the timer are ignored. This feature can be used to make sure certain - * sections of code do not affect timings, even if they call routines - * which have HPL_timer calls in them. HPL_timer_enable() will re-enable - * the timer functionality. One can retrieve the current value of a - * timer by calling - * - * t0 = HPL_timer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - * - * where I is the timer index in [0..64). To initialize the timer - * functionality, one must have called HPL_timer_boot() prior to any of - * the functions mentioned above. - * - * Arguments - * ========= - * - * I (global input) const int - * On entry, I specifies the timer to stop/start. - * - * --------------------------------------------------------------------- - */ -/* .. - * .. Executable Statements .. - */ - if( HPL_timer_disabled ) return; -/* - * If timer has not been started, start it. Otherwise, stop it and add - * interval to count - */ - if( HPL_timer_wallstart[I] == HPL_TIMER_STARTFLAG ) - { - HPL_timer_wallstart[I] = HPL_timer_walltime(); - HPL_timer_cpustart [I] = HPL_timer_cputime (); - } - else - { - HPL_timer_cpusec [I] += HPL_timer_cputime () - HPL_timer_cpustart [I]; - HPL_timer_wallsec [I] += HPL_timer_walltime() - HPL_timer_wallstart[I]; - HPL_timer_wallstart[I] = HPL_TIMER_STARTFLAG; - } -/* - * End of HPL_timer - */ -} - -#ifdef STDC_HEADERS -void HPL_timer_enable( void ) -#else -void HPL_timer_enable() -#endif -{ -/* - * HPL_timer_enable sets it so calls to HPL_timer are not ignored. - */ -/* .. - * .. Executable Statements .. - */ - HPL_timer_disabled = 0; - return; -/* - * End of HPL_timer_enable - */ -} - -#ifdef STDC_HEADERS -void HPL_timer_disable( void ) -#else -void HPL_timer_disable() -#endif -{ -/* - * HPL_timer_disable sets it so calls to HPL_timer are ignored. - */ -/* .. - * .. Executable Statements .. - */ - HPL_timer_disabled = 1; - return; -/* - * End of HPL_timer_disable - */ -} - -#ifdef STDC_HEADERS -double HPL_timer_inquire -( - const HPL_T_TIME TMTYPE, - const int I -) -#else -double HPL_timer_inquire( TMTYPE, I ) - const int I; - const HPL_T_TIME TMTYPE; -#endif -{ -/* - * Purpose - * ======= - * - * HPL_timer_inquire returns wall- or cpu- time that has accumulated in - * timer I. - * - * Arguments - * ========= - * - * TMTYPE (global input) const HPL_T_TIME - * On entry, TMTYPE specifies what time will be returned as fol- - * lows - * = HPL_WALL_TIME : wall clock time is returned, - * = HPL_CPU_TIME : CPU time is returned (default). - * - * I (global input) const int - * On entry, I specifies the timer to return. - * - * --------------------------------------------------------------------- - */ -/* - * .. Local Variables .. - */ - double time; -/* .. - * .. Executable Statements .. - */ -/* - * If wall- or cpu-time are not available on this machine, return - * HPL_TIMER_ERROR - */ - if( TMTYPE == HPL_WALL_TIME ) - { - if( HPL_timer_walltime() == HPL_TIMER_ERROR ) - time = HPL_TIMER_ERROR; - else - time = HPL_timer_wallsec[I]; - } - else - { - if( HPL_timer_cputime() == HPL_TIMER_ERROR ) - time = HPL_TIMER_ERROR; - else - time = HPL_timer_cpusec [I]; - } - return( time ); -/* - * End of HPL_timer_inquire - */ -} diff --git a/hpl/testing/timer/HPL_timer_cputime.c b/hpl/testing/timer/HPL_timer_cputime.c deleted file mode 100644 index bdeffa7601d7d4dbc807b549f8bdf4d3e4aac034..0000000000000000000000000000000000000000 --- a/hpl/testing/timer/HPL_timer_cputime.c +++ /dev/null @@ -1,145 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -/* - * Purpose - * ======= - * - * HPL_timer_cputime returns the cpu time. If HPL_USE_CLOCK is defined, - * the clock() function is used to return an approximation of processor - * time used by the program. The value returned is the CPU time used so - * far as a clock_t; to get the number of seconds used, the result is - * divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C - * standard library. If HPL_USE_TIMES is defined, the times() function - * is used instead. This function returns the current process times. - * times() returns the number of clock ticks that have elapsed since the - * system has been up. Otherwise and by default, the standard library - * function getrusage() is used. - * - * --------------------------------------------------------------------- - */ - -#if defined( HPL_USE_CLOCK ) - -#include - -#ifdef STDC_HEADERS -double HPL_timer_cputime( void ) -#else -double HPL_timer_cputime() -#endif -{ - static double cps = CLOCKS_PER_SEC; - double d; - clock_t t1; - static clock_t t0 = 0; - - if( t0 == 0 ) t0 = clock(); - t1 = clock() - t0; - d = (double)(t1) / cps; - return( d ); -} - -#elif defined( HPL_USE_TIMES ) - -#include -#include - -#ifdef STDC_HEADERS -double HPL_timer_cputime( void ) -#else -double HPL_timer_cputime() -#endif -{ - clock_t t1; - struct tms ts; - static double ClockTick = HPL_rzero; - - if( ClockTick == HPL_rzero ) ClockTick = (double)(sysconf(_SC_CLK_TCK)); - (void) times( &ts ); - return( (double)(ts.tms_utime) / ClockTick ); -} - -/* #elif defined( HPL_USE_GETRUSAGE ) */ -#else - -#include -#include - -#ifdef STDC_HEADERS -double HPL_timer_cputime( void ) -#else -double HPL_timer_cputime() -#endif -{ - struct rusage ruse; - (void) getrusage( RUSAGE_SELF, &ruse ); - return( (double)( ruse.ru_utime.tv_sec ) + - ( (double)( ruse.ru_utime.tv_usec ) / 1000000.0 ) ); -} - -/* -#else - -#ifdef STDC_HEADERS -double HPL_timer_cputime( void ) -#else -double HPL_timer_cputime() -#endif -{ - return( HPL_TIMER_ERROR ); -} -*/ - -#endif -/* - * End of HPL_timer_cputime - */ diff --git a/hpl/testing/timer/HPL_timer_walltime.c b/hpl/testing/timer/HPL_timer_walltime.c deleted file mode 100644 index 43cbb7e821ec93c4890283c410f29d1bfe38e563..0000000000000000000000000000000000000000 --- a/hpl/testing/timer/HPL_timer_walltime.c +++ /dev/null @@ -1,88 +0,0 @@ -/* - * -- High Performance Computing Linpack Benchmark (HPL) - * HPL - 2.1 - October 26, 2012 - * Antoine P. Petitet - * University of Tennessee, Knoxville - * Innovative Computing Laboratory - * (C) Copyright 2000-2008 All Rights Reserved - * - * -- Copyright notice and Licensing terms: - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions, and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. All advertising materials mentioning features or use of this - * software must display the following acknowledgement: - * This product includes software developed at the University of - * Tennessee, Knoxville, Innovative Computing Laboratory. - * - * 4. The name of the University, the name of the Laboratory, or the - * names of its contributors may not be used to endorse or promote - * products derived from this software without specific written - * permission. - * - * -- Disclaimer: - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY - * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * --------------------------------------------------------------------- - */ -/* - * Include files - */ -#include "hpl.h" - -/* - * Purpose - * ======= - * - * HPL_timer_walltime returns the elapsed (wall-clock) time. - * - * - * --------------------------------------------------------------------- - */ - -#include -#include - -#ifdef STDC_HEADERS -double HPL_timer_walltime( void ) -#else -double HPL_timer_walltime() -#endif -{ - struct timeval tp; - static long start=0, startu; - - if( !start ) - { - (void) gettimeofday( &tp, NULL ); - start = tp.tv_sec; - startu = tp.tv_usec; - return( HPL_rzero ); - } - (void) gettimeofday( &tp, NULL ); - - return( (double)( tp.tv_sec - start ) + - ( (double)( tp.tv_usec-startu ) / 1000000.0 ) ); -} -/* - * End of HPL_timer_walltime - */ diff --git a/hpl/testing/timer/intel64/Make.inc b/hpl/testing/timer/intel64/Make.inc deleted file mode 120000 index c1d8167d071e028cbeff544106b9f386bbdcac8a..0000000000000000000000000000000000000000 --- a/hpl/testing/timer/intel64/Make.inc +++ /dev/null @@ -1 +0,0 @@ -/home/valeriuc/work/PRACE/CodeVault/src/1_dense/hpl/Make.intel64 \ No newline at end of file diff --git a/hpl/testing/timer/intel64/Makefile b/hpl/testing/timer/intel64/Makefile deleted file mode 100644 index b684ab1bc7cfa59683de0ae2b26d949994eb9a7f..0000000000000000000000000000000000000000 --- a/hpl/testing/timer/intel64/Makefile +++ /dev/null @@ -1,84 +0,0 @@ -# -# -- High Performance Computing Linpack Benchmark (HPL) -# HPL - 2.1 - October 26, 2012 -# Antoine P. Petitet -# University of Tennessee, Knoxville -# Innovative Computing Laboratory -# (C) Copyright 2000-2008 All Rights Reserved -# -# -- Copyright notice and Licensing terms: -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# 1. Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions, and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# -# 3. All advertising materials mentioning features or use of this -# software must display the following acknowledgement: -# This product includes software developed at the University of -# Tennessee, Knoxville, Innovative Computing Laboratory. -# -# 4. The name of the University, the name of the Laboratory, or the -# names of its contributors may not be used to endorse or promote -# products derived from this software without specific written -# permission. -# -# -- Disclaimer: -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# ###################################################################### -# -include Make.inc -# -# ###################################################################### -# -INCdep = \ - $(INCdir)/hpl_pmisc.h $(INCdir)/hpl_timer.h -# -## Object files ######################################################## -# -HPL_timobj = \ - HPL_timer.o HPL_timer_cputime.o HPL_timer_walltime.o -# -## Targets ############################################################# -# -all : lib -# -lib : lib.grd -# -lib.grd : $(HPL_timobj) - $(ARCHIVER) $(ARFLAGS) $(HPLlib) $(HPL_timobj) - $(RANLIB) $(HPLlib) - $(TOUCH) lib.grd -# -# ###################################################################### -# -HPL_timer.o : ../HPL_timer.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer.c -HPL_timer_cputime.o : ../HPL_timer_cputime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer_cputime.c -HPL_timer_walltime.o : ../HPL_timer_walltime.c $(INCdep) - $(CC) -o $@ -c $(CCFLAGS) ../HPL_timer_walltime.c -# -# ###################################################################### -# -clean : - $(RM) *.o *.grd -# -# ###################################################################### diff --git a/hpl/testing/timer/intel64/lib.grd b/hpl/testing/timer/intel64/lib.grd deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/hpl/www/1rinM.jpg b/hpl/www/1rinM.jpg deleted file mode 100755 index 9af78f8440d02d4684a0d8aa404323766adbb41c..0000000000000000000000000000000000000000 Binary files a/hpl/www/1rinM.jpg and /dev/null differ diff --git a/hpl/www/1ring.jpg b/hpl/www/1ring.jpg deleted file mode 100755 index 73e4391cf9ae327cf3c486f2900adb1b2fd476f1..0000000000000000000000000000000000000000 Binary files a/hpl/www/1ring.jpg and /dev/null differ diff --git a/hpl/www/2-273x48.jpg b/hpl/www/2-273x48.jpg deleted file mode 100755 index 23795f8b9f57ced02d65efd7a7133349b1d95089..0000000000000000000000000000000000000000 Binary files a/hpl/www/2-273x48.jpg and /dev/null differ diff --git a/hpl/www/2rinM.jpg b/hpl/www/2rinM.jpg deleted file mode 100755 index c294e0d0723d4b6c77b86115c94d1e1260ac8f61..0000000000000000000000000000000000000000 Binary files a/hpl/www/2rinM.jpg and /dev/null differ diff --git a/hpl/www/2ring.jpg b/hpl/www/2ring.jpg deleted file mode 100755 index f37187f13fb317f46f7b0f197dda4d22cb5e6bb3..0000000000000000000000000000000000000000 Binary files a/hpl/www/2ring.jpg and /dev/null differ diff --git a/hpl/www/HPL_abort.html b/hpl/www/HPL_abort.html deleted file mode 100755 index 92ddd2cd2a0bbc96d82e73755af0145bb42128d2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_abort.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_abort HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_abort halts execution. - -

Synopsis

-#include "hpl.h"

-void -HPL_abort( -int -LINE, -const char * -SRNAME, -const char * -FORM, -... -); - -

Description

-HPL_abort -displays an error message on stderr and halts execution. - -

Arguments

-
-LINE    (local input)                 int
-        On entry,  LINE  specifies the line  number in the file where
-        the  error  has  occured.  When  LINE  is not a positive line
-        number, it is ignored.
-
-
-SRNAME  (local input)                 const char *
-        On entry, SRNAME  should  be the name of the routine  calling
-        this error handler.
-
-
-FORM    (local input)                 const char *
-        On entry, FORM specifies the format, i.e., how the subsequent
-        arguments are converted for output.
-
-
-        (local input)                 ...
-        On entry,  ...  is the list of arguments to be printed within
-        the format string.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   HPL_abort( __LINE__, __FILE__, "Halt.\n" );
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_fprintf, -HPL_warn. - - - diff --git a/hpl/www/HPL_all_reduce.html b/hpl/www/HPL_all_reduce.html deleted file mode 100755 index a42764acc6d80883526b42dd0309add185fa27c9..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_all_reduce.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_all_reduce HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_all_reduce All reduce operation. - -

Synopsis

-#include "hpl.h"

-int -HPL_all_reduce( -void * -BUFFER, -const int -COUNT, -const HPL_T_TYPE -DTYPE, -const HPL_T_OP -OP, -MPI_Comm -COMM -); - -

Description

-HPL_all_reduce -performs a global reduce operation across all -processes of a group leaving the results on all processes. - -

Arguments

-
-BUFFER  (local input/global output)   void *
-        On entry,  BUFFER  points to  the  buffer to be combined.  On
-        exit, this array contains the combined data and  is identical
-        on all processes in the group.
-
-
-COUNT   (global input)                const int
-        On entry,  COUNT  indicates the number of entries in  BUFFER.
-        COUNT must be at least zero.
-
-
-DTYPE   (global input)                const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
-
-OP      (global input)                const HPL_T_OP 
-        On entry, OP is a pointer to the local combine function.
-
-
-COMM    (global/local input)          MPI_Comm
-        The MPI communicator identifying the process collection.
-
- -

See Also

-HPL_broadcast, -HPL_reduce, -HPL_barrier, -HPL_min, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_barrier.html b/hpl/www/HPL_barrier.html deleted file mode 100755 index 4463390c85bd2100991c4d0457836a1339f3c774..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_barrier.html +++ /dev/null @@ -1,41 +0,0 @@ - - -HPL_barrier HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_barrier Barrier operation. - -

Synopsis

-#include "hpl.h"

-int -HPL_barrier( -MPI_Comm -COMM -); - -

Description

-HPL_barrier -blocks the caller until all process members have call it. -The call returns at any process only after all group members have -entered the call. - -

Arguments

-
-COMM    (global/local input)          MPI_Comm
-        The MPI communicator identifying the process collection.
-
- -

See Also

-HPL_broadcast, -HPL_reduce, -HPL_all_reduce, -HPL_min, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_bcast.html b/hpl/www/HPL_bcast.html deleted file mode 100755 index 5665959cca75a55274629c357a54536a950814fb..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_bcast.html +++ /dev/null @@ -1,46 +0,0 @@ - - -HPL_bcast HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_bcast Perform the row broadcast. - -

Synopsis

-#include "hpl.h"

-int -HPL_bcast( -HPL_T_panel * -PANEL, -int * -IFLAG -); - -

Description

-HPL_bcast -broadcasts the current panel. Successful completion is -indicated by IFLAG set to HPL_SUCCESS on return. IFLAG will be set to -HPL_FAILURE on failure and to HPL_KEEP_TESTING when the operation was -not completed, in which case this function should be called again. - -

Arguments

-
-PANEL   (input/output)                HPL_T_panel *
-        On entry,  PANEL  points to the  current panel data structure
-        being broadcast.
-
-
-IFLAG   (output)                      int *
-        On exit,  IFLAG  indicates  whether  or not the broadcast has
-        occured.
-
- -

See Also

-HPL_binit, -HPL_bwait. - - - diff --git a/hpl/www/HPL_binit.html b/hpl/www/HPL_binit.html deleted file mode 100755 index 951b631c8818bac58ad222906e49767326df3c70..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_binit.html +++ /dev/null @@ -1,37 +0,0 @@ - - -HPL_binit HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_binit Initialize the row broadcast. - -

Synopsis

-#include "hpl.h"

-int -HPL_binit( -HPL_T_panel * -PANEL -); - -

Description

-HPL_binit -initializes a row broadcast. Successful completion is -indicated by the returned error code HPL_SUCCESS. - -

Arguments

-
-PANEL   (input/output)                HPL_T_panel *
-        On entry,  PANEL  points to the  current panel data structure
-        being broadcast.
-
- -

See Also

-HPL_bcast, -HPL_bwait. - - - diff --git a/hpl/www/HPL_broadcast.html b/hpl/www/HPL_broadcast.html deleted file mode 100755 index 50ecef017d9befd8b162643be157eb8dc3e01c8f..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_broadcast.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_broadcast HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_broadcast Broadcast operation. - -

Synopsis

-#include "hpl.h"

-int -HPL_broadcast( -void * -BUFFER, -const int -COUNT, -const HPL_T_TYPE -DTYPE, -const int -ROOT, -MPI_Comm -COMM -); - -

Description

-HPL_broadcast -broadcasts a message from the process with rank ROOT to -all processes in the group. - -

Arguments

-
-BUFFER  (local input/output)          void *
-        On entry,  BUFFER  points to  the  buffer to be broadcast. On
-        exit, this array contains the broadcast data and is identical
-        on all processes in the group.
-
-
-COUNT   (global input)                const int
-        On entry,  COUNT  indicates the number of entries in  BUFFER.
-        COUNT must be at least zero.
-
-
-DTYPE   (global input)                const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
-
-ROOT    (global input)                const int
-        On entry, ROOT is the coordinate of the source process.
-
-
-COMM    (global/local input)          MPI_Comm
-        The MPI communicator identifying the process collection.
-
- -

See Also

-HPL_reduce, -HPL_all_reduce, -HPL_barrier, -HPL_min, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_bwait.html b/hpl/www/HPL_bwait.html deleted file mode 100755 index d7f0370d91bbdbec0972d2ad41e3fab553eaef0d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_bwait.html +++ /dev/null @@ -1,38 +0,0 @@ - - -HPL_bwait HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_bwait Finalize the row broadcast. - -

Synopsis

-#include "hpl.h"

-int -HPL_bwait( -HPL_T_panel * -PANEL -); - -

Description

-HPL_bwait -HPL_bwait waits for the row broadcast of the current panel to -terminate. Successful completion is indicated by the returned error -code HPL_SUCCESS. - -

Arguments

-
-PANEL   (input/output)                HPL_T_panel *
-        On entry,  PANEL  points to the  current panel data structure
-        being broadcast.
-
- -

See Also

-HPL_binit, -HPL_bcast. - - - diff --git a/hpl/www/HPL_copyL.html b/hpl/www/HPL_copyL.html deleted file mode 100755 index be92eb398d464a6092553d19754aa91194abab7e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_copyL.html +++ /dev/null @@ -1,42 +0,0 @@ - - -HPL_copyL HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_copyL Copy the current panel into a contiguous workspace. - -

Synopsis

-#include "hpl.h"

-void -HPL_copyL( -HPL_T_panel * -PANEL -); - -

Description

-HPL_copyL -copies the panel of columns, the L1 replicated submatrix, -the pivot array and the info scalar into a contiguous workspace for -later broadcast. - -The copy of this panel into a contiguous buffer can be enforced by -specifying -DHPL_COPY_L in the architecture specific Makefile. - -

Arguments

-
-PANEL   (input/output)                HPL_T_panel *
-        On entry,  PANEL  points to the  current panel data structure
-        being broadcast.
-
- -

See Also

-HPL_binit, -HPL_bcast, -HPL_bwait. - - - diff --git a/hpl/www/HPL_daxpy.html b/hpl/www/HPL_daxpy.html deleted file mode 100755 index da263fda266c181b82d0461903cfed401de9be96..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_daxpy.html +++ /dev/null @@ -1,89 +0,0 @@ - - -HPL_daxpy HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_daxpy y := y + alpha * x. - -

Synopsis

-#include "hpl.h"

-void -HPL_daxpy( -const int -N, -const double -ALPHA, -const double * -X, -const int -INCX, -double * -Y, -const int -INCY -); - -

Description

-HPL_daxpy -scales the vector x by alpha and adds it to y. - -

Arguments

-
-N       (local input)                 const int
-        On entry, N specifies the length of the vectors  x  and  y. N
-        must be at least zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied as zero, then the entries of the incremented array X
-        need not be set on input.
-
-
-X       (local input)                 const double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
-
-Y       (local input/output)          double *
-        On entry,  Y  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCY ) )  that  contains the vector y.
-        On exit, the entries of the incremented array  Y  are updated
-        with the scaled entries of the incremented array X.
-
-
-INCY    (local input)                 const int
-        On entry, INCY specifies the increment for the elements of Y.
-        INCY must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double x[3], y[3];
-   x[0] = 1.0; x[1] = 2.0; x[2] = 3.0;
-   y[0] = 4.0; y[1] = 5.0; y[2] = 6.0;
-   HPL_daxpy( 3, 2.0, x, 1, y, 1 );
-   printf("y=[%f,%f,%f]\n", y[0], y[1], y[2]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dcopy, -HPL_dscal, -HPL_dswap. - - - diff --git a/hpl/www/HPL_dcopy.html b/hpl/www/HPL_dcopy.html deleted file mode 100755 index a6f106449c8cd128f129b6bef65587be951d1843..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dcopy.html +++ /dev/null @@ -1,81 +0,0 @@ - - -HPL_dcopy HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dcopy y := x. - -

Synopsis

-#include "hpl.h"

-void -HPL_dcopy( -const int -N, -const double * -X, -const int -INCX, -double * -Y, -const int -INCY -); - -

Description

-HPL_dcopy -copies the vector x into the vector y. - -

Arguments

-
-N       (local input)                 const int
-        On entry, N specifies the length of the vectors  x  and  y. N
-        must be at least zero.
-
-
-X       (local input)                 const double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
-
-Y       (local input/output)          double *
-        On entry,  Y  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCY ) )  that  contains the vector y.
-        On exit, the entries of the incremented array  Y  are updated
-        with the entries of the incremented array X.
-
-
-INCY    (local input)                 const int
-        On entry, INCY specifies the increment for the elements of Y.
-        INCY must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double x[3], y[3];
-   x[0] = 1.0; x[1] = 2.0; x[2] = 3.0;
-   y[0] = 4.0; y[1] = 5.0; y[2] = 6.0;
-   HPL_dcopy( 3, x, 1, y, 1 );
-   printf("y=[%f,%f,%f]\n", y[0], y[1], y[2]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_daxpy, -HPL_dscal, -HPL_dswap. - - - diff --git a/hpl/www/HPL_dgemm.html b/hpl/www/HPL_dgemm.html deleted file mode 100755 index c174e264334b08d507b8ba8a4b4ee7e491bae0f2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dgemm.html +++ /dev/null @@ -1,178 +0,0 @@ - - -HPL_dgemm HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dgemm C := alpha * op(A) * op(B) + beta * C. - -

Synopsis

-#include "hpl.h"

-void -HPL_dgemm( -const enum HPL_ORDER -ORDER, -const enum HPL_TRANS -TRANSA, -const enum HPL_TRANS -TRANSB, -const int -M, -const int -N, -const int -K, -const double -ALPHA, -const double * -A, -const int -LDA, -const double * -B, -const int -LDB, -const double -BETA, -double * -C, -const int -LDC -); - -

Description

-HPL_dgemm -performs one of the matrix-matrix operations - - C := alpha * op( A ) * op( B ) + beta * C - - where op( X ) is one of - - op( X ) = X or op( X ) = X^T. - -Alpha and beta are scalars, and A, B and C are matrices, with op(A) -an m by k matrix, op(B) a k by n matrix and C an m by n matrix. - -

Arguments

-
-ORDER   (local input)                 const enum HPL_ORDER
-        On entry, ORDER  specifies the storage format of the operands
-        as follows:                                                  
-           ORDER = HplRowMajor,                                      
-           ORDER = HplColumnMajor.                                   
-
-
-TRANSA  (local input)                 const enum HPL_TRANS
-        On entry, TRANSA  specifies the form of  op(A)  to be used in
-        the matrix-matrix operation follows:                         
-           TRANSA==HplNoTrans    : op( A ) = A,                     
-           TRANSA==HplTrans      : op( A ) = A^T,                   
-           TRANSA==HplConjTrans  : op( A ) = A^T.                   
-
-
-TRANSB  (local input)                 const enum HPL_TRANS
-        On entry, TRANSB  specifies the form of  op(B)  to be used in
-        the matrix-matrix operation follows:                         
-           TRANSB==HplNoTrans    : op( B ) = B,                     
-           TRANSB==HplTrans      : op( B ) = B^T,                   
-           TRANSB==HplConjTrans  : op( B ) = B^T.                   
-
-
-M       (local input)                 const int
-        On entry,  M  specifies  the  number  of rows  of the  matrix
-        op(A)  and  of  the  matrix  C.  M  must  be  at least  zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies  the number  of columns of the matrix
-        op(B)  and  the number of columns of the matrix  C. N must be
-        at least zero.
-
-
-K       (local input)                 const int
-        On entry,  K  specifies  the  number of columns of the matrix
-        op(A) and the number of rows of the matrix op(B).  K  must be
-        be at least  zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied  as  zero  then the elements of the matrices A and B
-        need not be set on input.
-
-
-A       (local input)                 const double *
-        On entry,  A  is an array of dimension (LDA,ka),  where ka is
-        k  when   TRANSA==HplNoTrans,  and  is  m  otherwise.  Before
-        entry  with  TRANSA==HplNoTrans, the  leading  m by k part of
-        the array  A must contain the matrix A, otherwise the leading
-        k  by  m  part of the array  A  must  contain the  matrix  A.
-
-
-LDA     (local input)                 const int
-        On entry, LDA  specifies the first dimension of A as declared
-        in the  calling (sub) program. When  TRANSA==HplNoTrans  then
-        LDA must be at least max(1,m), otherwise LDA must be at least
-        max(1,k).
-
-
-B       (local input)                 const double *
-        On entry, B is an array of dimension (LDB,kb),  where  kb  is
-        n   when  TRANSB==HplNoTrans, and  is  k  otherwise.   Before
-        entry with TRANSB==HplNoTrans,  the  leading  k by n  part of
-        the array  B must contain the matrix B, otherwise the leading
-        n  by  k  part of the array  B  must  contain  the matrix  B.
-
-
-LDB     (local input)                 const int
-        On entry, LDB  specifies the first dimension of B as declared
-        in the  calling (sub) program. When  TRANSB==HplNoTrans  then
-        LDB must be at least max(1,k), otherwise LDB must be at least
-        max(1,n).
-
-
-BETA    (local input)                 const double
-        On entry,  BETA  specifies the scalar  beta.   When  BETA  is
-        supplied  as  zero  then  the  elements of the matrix C  need
-        not be set on input.
-
-
-C       (local input/output)          double *
-        On entry,  C  is an array of dimension (LDC,n). Before entry,
-        the  leading m by n part  of  the  array  C  must contain the
-        matrix C,  except when beta is zero, in which case C need not
-        be set on entry. On exit, the array  C  is overwritten by the
-        m by n  matrix ( alpha*op( A )*op( B ) + beta*C ).
-
-
-LDC     (local input)                 const int
-        On entry, LDC  specifies the first dimension of C as declared
-        in  the   calling  (sub)  program.   LDC  must  be  at  least
-        max(1,m).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], b[2*2], c[2*2];
-   a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0;
-   b[0] = 2.0; b[1] = 1.0; b[2] = 1.0; b[3] = 2.0;
-   c[0] = 4.0; c[1] = 3.0; c[2] = 2.0; c[3] = 1.0;
-   HPL_dgemm( HplColumnMajor, HplNoTrans, HplNoTrans,
-              2, 2, 2, 2.0, a, 2, b, 2, -1.0, c, 2 );
-   printf("  [%f,%f]\n", c[0], c[2]);
-   printf("c=[%f,%f]\n", c[1], c[3]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dtrsm. - - - diff --git a/hpl/www/HPL_dgemv.html b/hpl/www/HPL_dgemv.html deleted file mode 100755 index 8936fef6b51bfcb0fed42e11ec473d1f2136c34e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dgemv.html +++ /dev/null @@ -1,146 +0,0 @@ - - -HPL_dgemv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dgemv y := beta * y + alpha * op(A) * x. - -

Synopsis

-#include "hpl.h"

-void -HPL_dgemv( -const enum HPL_ORDER -ORDER, -const enum HPL_TRANS -TRANS, -const int -M, -const int -N, -const double -ALPHA, -const double * -A, -const int -LDA, -const double * -X, -const int -INCX, -const double -BETA, -double * -Y, -const int -INCY -); - -

Description

-HPL_dgemv -performs one of the matrix-vector operations - - y := alpha * op( A ) * x + beta * y, - - where op( X ) is one of - - op( X ) = X or op( X ) = X^T. - -where alpha and beta are scalars, x and y are vectors and A is an m -by n matrix. - -

Arguments

-
-ORDER   (local input)                 const enum HPL_ORDER
-        On entry, ORDER  specifies the storage format of the operands
-        as follows:                                                  
-           ORDER = HplRowMajor,                                      
-           ORDER = HplColumnMajor.                                   
-
-
-TRANS   (local input)                 const enum HPL_TRANS
-        On entry,  TRANS  specifies the  operation to be performed as
-        follows:   
-           TRANS = HplNoTrans y := alpha*A  *x + beta*y,
-           TRANS = HplTrans   y := alpha*A^T*x + beta*y.
-
-
-M       (local input)                 const int
-        On entry,  M  specifies  the number of rows of  the matrix A.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied as zero then  A and X  need not be set on input.
-
-
-A       (local input)                 const double *
-        On entry,  A  points  to an array of size equal to or greater
-        than LDA * n.  Before  entry, the leading m by n part  of the
-        array  A  must contain the matrix coefficients.
-
-
-LDA     (local input)                 const int
-        On entry,  LDA  specifies  the  leading  dimension  of  A  as
-        declared  in  the  calling  (sub) program.  LDA  must  be  at
-        least MAX(1,m).
-
-
-X       (local input)                 const double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
-
-BETA    (local input)                 const double
-        On entry, BETA  specifies the scalar beta.    When  ALPHA  is
-        supplied as zero then  Y  need not be set on input.
-
-
-Y       (local input/output)          double *
-        On entry,  Y  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCY ) )  that  contains the vector y.
-        Before entry with BETA non-zero, the incremented array Y must
-        contain the vector  y.  On exit,  Y  is  overwritten  by  the
-        updated vector y.
-
-
-INCY    (local input)                 const int
-        On entry, INCY specifies the increment for the elements of Y.
-        INCY must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], x[2], y[2];
-   a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0;
-   x[0] = 2.0; x[1] = 1.0; y[2] = 1.0; y[3] = 2.0;
-   HPL_dgemv( HplColumnMajor, HplNoTrans, 2, 2, 2.0,
-              a, 2, x, 1, -1.0, y, 1 );
-   printf("y=[%f,%f]\n", y[0], y[1]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dger, -HPL_dtrsv. - - - diff --git a/hpl/www/HPL_dger.html b/hpl/www/HPL_dger.html deleted file mode 100755 index 0896af2368ed83ae50e506c7ecb5e6746333a2ab..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dger.html +++ /dev/null @@ -1,124 +0,0 @@ - - -HPL_dger HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dger A := alpha * x * y^T + A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dger( -const enum HPL_ORDER -ORDER, -const int -M, -const int -N, -const double -ALPHA, -const double * -X, -const int -INCX, -double * -Y, -const int -INCY, -double * -A, -const int -LDA -); - -

Description

-HPL_dger -performs the rank 1 operation - - A := alpha * x * y^T + A, - -where alpha is a scalar, x is an m-element vector, y is an n-element -vector and A is an m by n matrix. - -

Arguments

-
-ORDER   (local input)                 const enum HPL_ORDER
-        On entry, ORDER  specifies the storage format of the operands
-        as follows:                                                  
-           ORDER = HplRowMajor,                                      
-           ORDER = HplColumnMajor.                                   
-
-
-M       (local input)                 const int
-        On entry,  M  specifies  the number of rows of  the matrix A.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied as zero then  X and Y  need not be set on input.
-
-
-X       (local input)                 const double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( m - 1 ) * abs( INCX ) )  that  contains the vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
-
-Y       (local input)                 double *
-        On entry,  Y  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCY ) )  that  contains the vector y.
-
-
-INCY    (local input)                 const int
-        On entry, INCY specifies the increment for the elements of Y.
-        INCY must not be zero.
-
-
-A       (local input/output)          double *
-        On entry,  A  points  to an array of size equal to or greater
-        than LDA * n.  Before  entry, the leading m by n part  of the
-        array  A  must contain the matrix coefficients. On exit, A is
-        overwritten by the updated matrix.
-
-
-LDA     (local input)                 const int
-        On entry,  LDA  specifies  the  leading  dimension  of  A  as
-        declared  in  the  calling  (sub) program.  LDA  must  be  at
-        least MAX(1,m).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], x[2], y[2];
-   a[0] = 1.0; a[1] = 2.0; a[2] = 3.0; a[3] = 3.0;
-   x[0] = 2.0; x[1] = 1.0; y[2] = 1.0; y[3] = 2.0;
-   HPL_dger( HplColumnMajor, 2, 2, 2.0, x, 1, y, 1,
-             a, 2 );
-   printf("y=[%f,%f]\n", y[0], y[1]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dgemv, -HPL_dtrsv. - - - diff --git a/hpl/www/HPL_dlacpy.html b/hpl/www/HPL_dlacpy.html deleted file mode 100755 index 5d0d98befce348bc09e8b6de0997b63d0884ad21..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlacpy.html +++ /dev/null @@ -1,84 +0,0 @@ - - -HPL_dlacpy HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlacpy B := A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlacpy( -const int -M, -const int -N, -const double * -A, -const int -LDA, -double * -B, -const int -LDB -); - -

Description

-HPL_dlacpy -copies an array A into an array B. - -

Arguments

-
-M       (local input)                 const int
-        On entry,  M specifies the number of rows of the arrays A and
-        B. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N specifies  the number of columns of the arrays A
-        and B. N must be at least zero.
-
-
-A       (local input)                 const double *
-        On entry, A points to an array of dimension (LDA,N).
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-B       (local output)                double *
-        On entry, B points to an array of dimension (LDB,N). On exit,
-        B is overwritten with A.
-
-
-LDB     (local input)                 const int
-        On entry, LDB specifies the leading dimension of the array B.
-        LDB must be at least MAX(1,M).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], b[2*2];
-   a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0;
-   HPL_dlacpy( 2, 2, a, 2, b, 2 );
-   printf("  [%f,%f]\n", b[0], b[2]);
-   printf("b=[%f,%f]\n", b[1], b[3]);
-   exit(0);
-   return(0);
-}
-
- -

See Also

-HPL_dlatcpy. - - - diff --git a/hpl/www/HPL_dlamch.html b/hpl/www/HPL_dlamch.html deleted file mode 100755 index 0555e5d56924e464c55da91691437a28b8fb0e50..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlamch.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_dlamch HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlamch determines machine-specific arithmetic constants. - -

Synopsis

-#include "hpl.h"

-double -HPL_dlamch( -const HPL_T_MACH -CMACH -); - -

Description

-HPL_dlamch -determines machine-specific arithmetic constants such as -the relative machine precision (eps), the safe minimum (sfmin) such -that 1 / sfmin does not overflow, the base of the machine (base), the -precision (prec), the number of (base) digits in the mantissa (t), -whether rounding occurs in addition (rnd=1.0 and 0.0 otherwise), the -minimum exponent before (gradual) underflow (emin), the underflow -threshold (rmin) base**(emin-1), the largest exponent before overflow -(emax), the overflow threshold (rmax) (base**emax)*(1-eps). - -

Arguments

-
-CMACH   (local input)                 const HPL_T_MACH
-        Specifies the value to be returned by HPL_dlamch             
-           = HPL_MACH_EPS,   HPL_dlamch := eps (default)             
-           = HPL_MACH_SFMIN, HPL_dlamch := sfmin                     
-           = HPL_MACH_BASE,  HPL_dlamch := base                      
-           = HPL_MACH_PREC,  HPL_dlamch := eps*base                  
-           = HPL_MACH_MLEN,  HPL_dlamch := t                         
-           = HPL_MACH_RND,   HPL_dlamch := rnd                       
-           = HPL_MACH_EMIN,  HPL_dlamch := emin                      
-           = HPL_MACH_RMIN,  HPL_dlamch := rmin                      
-           = HPL_MACH_EMAX,  HPL_dlamch := emax                      
-           = HPL_MACH_RMAX,  HPL_dlamch := rmax                      
-         
-        where                                                        
-         
-           eps   = relative machine precision,                       
-           sfmin = safe minimum,                                     
-           base  = base of the machine,                              
-           prec  = eps*base,                                         
-           t     = number of digits in the mantissa,                 
-           rnd   = 1.0 if rounding occurs in addition,               
-           emin  = minimum exponent before underflow,                
-           rmin  = underflow threshold,                              
-           emax  = largest exponent before overflow,                 
-           rmax  = overflow threshold.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double eps;
-   eps = HPL_dlamch( HPL_MACH_EPS );
-   printf("eps=%18.8e\n", eps);
-   exit(0); return(0);
-}
-
- -

References

-This function has been manually translated from the Fortran 77 LAPACK -auxiliary function dlamch.f (version 2.0 -- 1992), that was itself -based on the function ENVRON by Malcolm and incorporated suggestions -by Gentleman and Marovich. See - -Malcolm M. A., Algorithms to reveal properties of floating-point -arithmetic., Comms. of the ACM, 15, 949-951 (1972). - -Gentleman W. M. and Marovich S. B., More on algorithms that reveal -properties of floating point arithmetic units., Comms. of the ACM, -17, 276-277 (1974). - - - diff --git a/hpl/www/HPL_dlange.html b/hpl/www/HPL_dlange.html deleted file mode 100755 index 0e8817021c22dca93e0520453d0ddff3aaee81c2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlange.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_dlange HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlange Compute ||A||. - -

Synopsis

-#include "hpl.h"

-double -HPL_dlange( -const HPL_T_NORM -NORM, -const int -M, -const int -N, -const double * -A, -const int -LDA -); - -

Description

-HPL_dlange -returns the value of the one norm, or the infinity norm, -or the element of largest absolute value of a matrix A: - - max(abs(A(i,j))) when NORM = HPL_NORM_A, - norm1(A), when NORM = HPL_NORM_1, - normI(A), when NORM = HPL_NORM_I, - -where norm1 denotes the one norm of a matrix (maximum column sum) and -normI denotes the infinity norm of a matrix (maximum row sum). Note -that max(abs(A(i,j))) is not a matrix norm. - -

Arguments

-
-NORM    (local input)                 const HPL_T_NORM
-        On entry,  NORM  specifies  the  value to be returned by this
-        function as described above.
-
-
-M       (local input)                 const int
-        On entry,  M  specifies  the number  of rows of the matrix A.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-A       (local input)                 const double *
-        On entry,  A  points to an  array of dimension  (LDA,N), that
-        contains the matrix A.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,M).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2];
-   a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0;
-   norm = HPL_dlange( HPL_NORM_I, 2, 2, a, 2 );
-   printf("norm=%f\n", norm);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dlaprnt, -HPL_fprintf. - - - diff --git a/hpl/www/HPL_dlaprnt.html b/hpl/www/HPL_dlaprnt.html deleted file mode 100755 index dd613019641819a083c83234dbe4ebd75b323064..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaprnt.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_dlaprnt HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaprnt Print the matrix A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaprnt( -const int -M, -const int -N, -double * -A, -const int -IA, -const int -JA, -const int -LDA, -const char * -CMATNM -); - -

Description

-HPL_dlaprnt -prints to standard error an M-by-N matrix A. - -

Arguments

-
-M       (local input)                 const int
-        On entry,  M  specifies the number of rows of A. M must be at
-        least zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies the number of columns of A. N must be
-        at least zero.
-
-
-A       (local input)                 double *
-        On entry, A  points to an array of dimension (LDA,N).
-
-
-IA      (local input)                 const int
-        On entry, IA specifies the starting row index to be printed.
-
-
-JA      (local input)                 const int
-        On entry,  JA  specifies  the  starting  column index  to be
-        printed.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,M).
-
-
-CMATNM  (local input)                 const char *
-        On entry, CMATNM is the name of the matrix to be printed.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2];
-   a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0;
-   HPL_dlaprnt( 2, 2, a, 0, 0, 2, "A" );
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_fprintf. - - - diff --git a/hpl/www/HPL_dlaswp00N.html b/hpl/www/HPL_dlaswp00N.html deleted file mode 100755 index eb66cd50a01438e85867801a99a1d7c323fb5feb..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp00N.html +++ /dev/null @@ -1,78 +0,0 @@ - - -HPL_dlaswp00N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp00N performs a series of row interchanges. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp00N( -const int -M, -const int -N, -double * -A, -const int -LDA, -const int * -IPIV -); - -

Description

-HPL_dlaswp00N -performs a series of local row interchanges on a matrix -A. One row interchange is initiated for rows 0 through M-1 of A. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M specifies the number of rows of the array A to be
-        interchanged. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies  the number of columns of the array A.
-        N must be at least zero.
-
-
-A       (local input/output)          double *
-        On entry, A  points to an array of dimension (LDA,N) to which
-        the row interchanges will be  applied.  On exit, the permuted
-        matrix.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-IPIV    (local input)                 const int *
-        On entry,  IPIV  is  an  array of size  M  that  contains the
-        pivoting  information.  For  k  in [0..M),  IPIV[k]=IROFF + l
-        implies that local rows k and l are to be interchanged.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp01N.html b/hpl/www/HPL_dlaswp01N.html deleted file mode 100755 index 7b6863aa86ebde1cdbb5097853dba95e2f4a2c23..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp01N.html +++ /dev/null @@ -1,109 +0,0 @@ - - -HPL_dlaswp01N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp01N copies rows of A into itself and into U. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp01N( -const int -M, -const int -N, -double * -A, -const int -LDA, -double * -U, -const int -LDU, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp01N -copies scattered rows of A into itself and into an -array U. The row offsets in A of the source rows are specified by -LINDXA. The destination of those rows are specified by LINDXAU. A -positive value of LINDXAU indicates that the array destination is U, -and A otherwise. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of A that should be
-        moved within A or copied into U. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the length of rows of A that should be
-        moved within A or copied into U. N must be at least zero.
-
-
-A       (local input/output)          double *
-        On entry, A points to an array of dimension (LDA,N). The rows
-        of this array specified by LINDXA should be moved within A or
-        copied into U.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          double *
-        On entry, U points to an array of dimension (LDU,N). The rows
-        of A specified by LINDXA are be copied within this array U at
-        the positions indicated by positive values of LINDXAU.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,M).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local  row indexes  of  A  that should be moved within  A  or
-        or copied into U.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension  M that  contains
-        the local  row indexes of  U  where the rows of  A  should be
-        copied at. This array also contains the  local row offsets in
-        A where some of the rows of A should be moved to.  A positive
-        value of  LINDXAU[i]  indicates that the row  LINDXA[i]  of A
-        should be copied into U at the position LINDXAU[i]; otherwise
-        the row  LINDXA[i]  of  A  should be moved  at  the  position
-        -LINDXAU[i] within A.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp01T.html b/hpl/www/HPL_dlaswp01T.html deleted file mode 100755 index 596bd9e1e3e6bf4f6a688e9ee53dbb630fa4f127..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp01T.html +++ /dev/null @@ -1,110 +0,0 @@ - - -HPL_dlaswp01T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp01T copies rows of A into itself and into U. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp01T( -const int -M, -const int -N, -double * -A, -const int -LDA, -double * -U, -const int -LDU, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp01T -copies scattered rows of A into itself and into an -array U. The row offsets in A of the source rows are specified by -LINDXA. The destination of those rows are specified by LINDXAU. A -positive value of LINDXAU indicates that the array destination is U, -and A otherwise. Rows of A are stored as columns in U. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of A that should be
-        moved within A or copied into U. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the length of rows of A that should be
-        moved within A or copied into U. N must be at least zero.
-
-
-A       (local input/output)          double *
-        On entry, A points to an array of dimension (LDA,N). The rows
-        of this array specified by LINDXA should be moved within A or
-        copied into U.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          double *
-        On entry, U points to an array of dimension (LDU,M). The rows
-        of A specified by  LINDXA  are copied within this array  U at
-        the  positions indicated by positive values of LINDXAU.  The
-        rows of A are stored as columns in U.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,N).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local  row indexes  of  A  that should be moved within  A  or
-        or copied into U.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension  M that  contains
-        the local  row indexes of  U  where the rows of  A  should be
-        copied at. This array also contains the  local row offsets in
-        A where some of the rows of A should be moved to.  A positive
-        value of  LINDXAU[i]  indicates that the row  LINDXA[i]  of A
-        should be copied into U at the position LINDXAU[i]; otherwise
-        the row  LINDXA[i]  of  A  should be moved  at  the  position
-        -LINDXAU[i] within A.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp02N.html b/hpl/www/HPL_dlaswp02N.html deleted file mode 100755 index 545c1594566c797613f7d1f758bea8438b0247bc..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp02N.html +++ /dev/null @@ -1,107 +0,0 @@ - - -HPL_dlaswp02N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp02N pack rows of A into columns of W. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp02N( -const int -M, -const int -N, -const double * -A, -const int -LDA, -double * -W0, -double * -W, -const int -LDW, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp02N -packs scattered rows of an array A into workspace W. -The row offsets in A are specified by LINDXA. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of A that should be
-        copied into W. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the length of rows of A that should be
-        copied into W. N must be at least zero.
-
-
-A       (local input)                 const double *
-        On entry, A points to an array of dimension (LDA,N). The rows
-        of this array specified by LINDXA should be copied into W.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-W0      (local input/output)          double *
-        On exit,  W0  is  an array of size (M-1)*LDW+1, that contains
-        the destination offset  in U where the columns of W should be
-        copied.
-
-
-W       (local output)                double *
-        On entry, W  is an array of size (LDW,M). On exit, W contains
-        the  rows LINDXA[i] for i in [0..M) of A stored  contiguously
-        in W(:,i).
-
-
-LDW     (local input)                 const int
-        On entry, LDW specifies the leading dimension of the array W.
-        LDW must be at least MAX(1,N+1).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local row indexes of A that should be copied into W.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension M  that  contains
-        the local  row indexes of  U that should be copied into A and
-        replaced by the rows of W.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp03N.html b/hpl/www/HPL_dlaswp03N.html deleted file mode 100755 index 08d93e4e2617e8f28908322ef8b37ba96f89566d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp03N.html +++ /dev/null @@ -1,95 +0,0 @@ - - -HPL_dlaswp03N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp03N copy rows of W into U. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp03N( -const int -M, -const int -N, -double * -U, -const int -LDU, -const double * -W0, -const double * -W, -const int -LDW -); - -

Description

-HPL_dlaswp03N -copies columns of W into rows of an array U. The -destination in U of these columns contained in W is stored within W0. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies  the  number  of columns of  W  stored
-        contiguously that should be copied into U. M must be at least
-        zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies  the  length of columns of  W  stored
-        contiguously that should be copied into U. N must be at least
-        zero.
-
-
-U       (local input/output)          double *
-        On entry, U points to an array of dimension (LDU,N).  Columns
-        of W are copied as rows within this array U at  the positions
-        specified in W0.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,M).
-
-
-W0      (local input)                 const double *
-        On entry,  W0  is an array of size (M-1)*LDW+1, that contains
-        the destination offset  in U where the columns of W should be
-        copied.
-
-
-W       (local input)                 const double *
-        On entry, W  is an array of size (LDW,M),  that contains data
-        to be copied into U. For i in [0..M),  entries W(:,i)  should
-        be copied into the row or column W0(i*LDW) of U.
-
-
-LDW     (local input)                 const int
-        On entry, LDW specifies the leading dimension of the array W.
-        LDW must be at least MAX(1,N+1).
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp03T.html b/hpl/www/HPL_dlaswp03T.html deleted file mode 100755 index fe91fcda3e7549da86c9b1261de9a8727b11766c..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp03T.html +++ /dev/null @@ -1,95 +0,0 @@ - - -HPL_dlaswp03T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp03T copy columns of W into U. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp03T( -const int -M, -const int -N, -double * -U, -const int -LDU, -const double * -W0, -const double * -W, -const int -LDW -); - -

Description

-HPL_dlaswp03T -copies columns of W into an array U. The destination -in U of these columns contained in W is stored within W0. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies  the  number  of columns of  W  stored
-        contiguously that should be copied into U. M must be at least
-        zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies  the  length of columns of  W  stored
-        contiguously that should be copied into U. N must be at least
-        zero.
-
-
-U       (local input/output)          double *
-        On entry, U points to an array of dimension (LDU,M).  Columns
-        of W are copied within the array U at the positions specified
-        in W0.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,N).
-
-
-W0      (local input)                 const double *
-        On entry,  W0  is an array of size (M-1)*LDW+1, that contains
-        the destination offset  in U where the columns of W should be
-        copied.
-
-
-W       (local input)                 const double *
-        On entry, W  is an array of size (LDW,M),  that contains data
-        to be copied into U. For i in [0..M),  entries W(:,i)  should
-        be copied into the row or column W0(i*LDW) of U.
-
-
-LDW     (local input)                 const int
-        On entry, LDW specifies the leading dimension of the array W.
-        LDW must be at least MAX(1,N+1).
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp04N.html b/hpl/www/HPL_dlaswp04N.html deleted file mode 100755 index 28f1ac74ff4f6ad8b8e28726bcfb208a69e1dba2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp04N.html +++ /dev/null @@ -1,131 +0,0 @@ - - -HPL_dlaswp04N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp04N copy rows of U in A and replace them with columns of W. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp04N( -const int -M0, -const int -M1, -const int -N, -double * -U, -const int -LDU, -double * -A, -const int -LDA, -const double * -W0, -const double * -W, -const int -LDW, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp04N -copies M0 rows of U into A and replaces those rows of U -with columns of W. In addition M1 - M0 columns of W are copied into -rows of U. - -

Arguments

-
-M0      (local input)                 const int
-        On entry, M0 specifies the number of rows of U that should be
-        copied into  A  and replaced by columns of  W.  M0 must be at
-        least zero.
-
-
-M1      (local input)                 const int
-        On entry, M1 specifies the number of columns of W that should
-        be copied into rows of U. M1 must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N specifies the length of the rows of U that should
-        be copied into A. N must be at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  points to  an array of dimension (LDU,N).  This
-        array contains the rows that are to be copied into A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,M1).
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        rows of U indicated by LINDXAU.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M0).
-
-
-W0      (local input)                 const double *
-        On entry,  W0  is an array of size (M-1)*LDW+1, that contains
-        the destination offset  in U where the columns of W should be
-        copied.
-
-
-W       (local input)                 const double *
-        On entry, W  is an array of size (LDW,M0+M1),  that  contains
-        data to be copied into U.  For i in [M0..M0+M1),  the entries
-        W(:,i) are copied into the row W0(i*LDW) of U.
-
-
-LDW     (local input)                 const int
-        On entry, LDW specifies the leading dimension of the array W.
-        LDW must be at least MAX(1,N+1).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA  is an array of dimension  M0 containing the
-        local row indexes A into which rows of U are copied.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension M0 that  contains
-        the local  row indexes of  U that should be copied into A and
-        replaced by the columns of W.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp04T.html b/hpl/www/HPL_dlaswp04T.html deleted file mode 100755 index 9000aaf27a63e8551d5da570214cae0e3fd0dc5e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp04T.html +++ /dev/null @@ -1,132 +0,0 @@ - - -HPL_dlaswp04T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp04T copy columns of U in rows of A and replace them with columns of W. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp04T( -const int -M0, -const int -M1, -const int -N, -double * -U, -const int -LDU, -double * -A, -const int -LDA, -const double * -W0, -const double * -W, -const int -LDW, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp04T -copies M0 columns of U into rows of A and replaces those -columns of U with columns of W. In addition M1 - M0 columns of W are -copied into U. - -

Arguments

-
-M0      (local input)                 const int
-        On entry, M0 specifies the number of columns of U that should
-        be copied into A and replaced by columns of W.  M0 must be at
-        least zero.
-
-
-M1      (local input)                 const int
-        On entry, M1 specifies  the number of columnns of W that will
-        be copied into U. M1 must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies the length of the columns of  U  that
-        will be copied into rows of A. N must be at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  points  to an array of dimension (LDU,*).  This
-        array contains the columns that are to be copied into rows of
-        A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,N).
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        columns of U indicated by LINDXAU.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M0).
-
-
-W0      (local input)                 const double *
-        On entry,  W0  is an array of size (M-1)*LDW+1, that contains
-        the destination offset  in U where the columns of W should be
-        copied.
-
-
-W       (local input)                 const double *
-        On entry, W  is an array of size (LDW,M0+M1),  that  contains
-        data to be copied into U.  For i in [M0..M0+M1),  the entries
-        W(:,i) are copied into the column W0(i*LDW) of U.
-
-
-LDW     (local input)                 const int
-        On entry, LDW specifies the leading dimension of the array W.
-        LDW must be at least MAX(1,N+1).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA  is an array of dimension  M0 containing the
-        local row indexes A into which columns of U are copied.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension M0 that  contains
-        the  local column indexes of  U  that should be copied into A
-        and replaced by the columns of W.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp05N.html b/hpl/www/HPL_dlaswp05N.html deleted file mode 100755 index b2ad760bb76648a8f6a51c8fc5840612f1f75399..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp05N.html +++ /dev/null @@ -1,98 +0,0 @@ - - -HPL_dlaswp05N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp05N copy rows of U into A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp05N( -const int -M, -const int -N, -double * -A, -const int -LDA, -const double * -U, -const int -LDU, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp05N -copies rows of U of global offset LINDXAU into rows of -A at positions indicated by LINDXA. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of U that should be
-        copied into A. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N specifies the length of the rows of U that should
-        be copied into A. N must be at least zero.
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        rows of U indicated by LINDXAU.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          const double *
-        On entry,  U  points to an array of dimension  (LDU,N).  This
-        array contains the rows that are to be copied into A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,M).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local row indexes of A that should be copied from U.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension  M that  contains
-        the local row indexes of U that should be copied in A.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp05T.html b/hpl/www/HPL_dlaswp05T.html deleted file mode 100755 index feb1490a94ee719ec508fa9f48d6301f94e44a9b..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp05T.html +++ /dev/null @@ -1,98 +0,0 @@ - - -HPL_dlaswp05T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp05T copy rows of U into A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp05T( -const int -M, -const int -N, -double * -A, -const int -LDA, -const double * -U, -const int -LDU, -const int * -LINDXA, -const int * -LINDXAU -); - -

Description

-HPL_dlaswp05T -copies columns of U of global offset LINDXAU into rows -of A at positions indicated by LINDXA. - -

Arguments

-
-M       (local input)                 const int
-        On entry,  M  specifies the number of columns of U that shouldbe copied into A. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N specifies the length of the columns of U that will
-        be copied into rows of A. N must be at least zero.
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        columns of U indicated by LINDXAU.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          const double *
-        On entry,  U  points  to an array of dimension (LDU,*).  This
-        array contains the columns that are to be copied into rows of
-        A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,N).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local row indexes of A that should be copied from U.
-
-
-LINDXAU (local input)                 const int *
-        On entry, LINDXAU  is an array of dimension  M that  contains
-        the local column indexes of U that should be copied in A.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp06N.html b/hpl/www/HPL_dlaswp06N.html deleted file mode 100755 index 8a6a40ea68da5fc7c6dc7d382bf1e6ac53551de1..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp06N.html +++ /dev/null @@ -1,92 +0,0 @@ - - -HPL_dlaswp06N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp06N swap rows of U with rows of A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp06N( -const int -M, -const int -N, -double * -A, -const int -LDA, -double * -U, -const int -LDU, -const int * -LINDXA -); - -

Description

-HPL_dlaswp06N -swaps rows of U with rows of A at positions -indicated by LINDXA. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of A that should be
-        swapped with rows of U. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N specifies the length of the rows of A that should
-        be swapped with rows of U. N must be at least zero.
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        rows or columns of U.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          double *
-        On entry,  U  points  to an array of dimension (LDU,N).  This
-        array contains the rows of U that are to be swapped with rows
-        of A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,M).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local row indexes of A that should be swapped with U.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp06T.html b/hpl/www/HPL_dlaswp06T.html deleted file mode 100755 index dff2d53cacbd8a9628bd57a5bbaf32a8a22a6d99..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp06T.html +++ /dev/null @@ -1,92 +0,0 @@ - - -HPL_dlaswp06T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp06T swap rows or columns of U with rows of A. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp06T( -const int -M, -const int -N, -double * -A, -const int -LDA, -double * -U, -const int -LDU, -const int * -LINDXA -); - -

Description

-HPL_dlaswp06T -swaps columns of U with rows of A at positions -indicated by LINDXA. - -

Arguments

-
-M       (local input)                 const int
-        On entry, M  specifies the number of rows of A that should be
-        swapped with columns of U. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N specifies the length of the rows of A that should
-        be swapped with columns of U. N must be at least zero.
-
-
-A       (local output)                double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        the  rows of this array specified by  LINDXA  are replaced by
-        columns of U.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
-
-U       (local input/output)          double *
-        On entry,  U  points  to an array of dimension (LDU,*).  This
-        array contains the columns of  U  that are to be swapped with
-        rows of A.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the leading dimension of the array U.
-        LDU must be at least MAX(1,N).
-
-
-LINDXA  (local input)                 const int *
-        On entry, LINDXA is an array of dimension M that contains the
-        local row indexes of A that should be swapped with U.
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlaswp10N.html b/hpl/www/HPL_dlaswp10N.html deleted file mode 100755 index cf2a3d3956cd29eb41bf2bd22bcd8d30ffc2286a..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlaswp10N.html +++ /dev/null @@ -1,77 +0,0 @@ - - -HPL_dlaswp10N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlaswp10N performs a series column interchanges. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlaswp10N( -const int -M, -const int -N, -double * -A, -const int -LDA, -const int * -IPIV -); - -

Description

-HPL_dlaswp10N -performs a sequence of local column interchanges on a -matrix A. One column interchange is initiated for columns 0 through -N-1 of A. - -

Arguments

-
-M       (local input)                 const int
-        __arg0__
-
-
-N       (local input)                 const int
-        On entry,  M  specifies  the number of rows of the array A. M
-        must be at least zero.
-
-
-A       (local input/output)          double *
-        On entry, N specifies the number of columns of the array A. N
-        must be at least zero.
-
-
-LDA     (local input)                 const int
-        On entry, A  points to an  array of  dimension (LDA,N).  This
-        array contains the columns onto which the interchanges should
-        be applied. On exit, A contains the permuted matrix.
-
-
-IPIV    (local input)                 const int *
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,M).
-
- -

See Also

-HPL_dlaswp00N, -HPL_dlaswp10N, -HPL_dlaswp01N, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp03T, -HPL_dlaswp04N, -HPL_dlaswp04T, -HPL_dlaswp05N, -HPL_dlaswp05T, -HPL_dlaswp06N, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_dlatcpy.html b/hpl/www/HPL_dlatcpy.html deleted file mode 100755 index a8b2a3901ff64d66be9ec44227e723231e2e2582..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlatcpy.html +++ /dev/null @@ -1,83 +0,0 @@ - - -HPL_dlatcpy HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlatcpy B := A^T - -

Synopsis

-#include "hpl.h"

-void -HPL_dlatcpy( -const int -M, -const int -N, -const double * -A, -const int -LDA, -double * -B, -const int -LDB -); - -

Description

-HPL_dlatcpy -copies the transpose of an array A into an array B. - -

Arguments

-
-M       (local input)                 const int
-        On entry,  M specifies the number of  rows of the array B and
-        the number of columns of A. M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N specifies the number of  rows of the array A and
-        the number of columns of B. N must be at least zero.
-
-
-A       (local input)                 const double *
-        On entry, A points to an array of dimension (LDA,M).
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least MAX(1,N).
-
-
-B       (local output)                double *
-        On entry, B points to an array of dimension (LDB,N). On exit,
-        B is overwritten with the transpose of A.
-
-
-LDB     (local input)                 const int
-        On entry, LDB specifies the leading dimension of the array B.
-        LDB must be at least MAX(1,M).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], b[2*2];
-   a[0] = 1.0; a[1] = 3.0; a[2] = 2.0; a[3] = 4.0;
-   HPL_dlacpy( 2, 2, a, 2, b, 2 );
-   printf("  [%f,%f]\n", b[0], b[2]);
-   printf("b=[%f,%f]\n", b[1], b[3]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dlacpy. - - - diff --git a/hpl/www/HPL_dlocmax.html b/hpl/www/HPL_dlocmax.html deleted file mode 100755 index 6c4a4e38dc0f0d685f008093e96c5819b693bb04..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlocmax.html +++ /dev/null @@ -1,87 +0,0 @@ - - -HPL_dlocmax HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlocmax finds the maximum entry in matrix column. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlocmax( -HPL_T_panel * -PANEL, -const int -N, -const int -II, -const int -JJ, -double * -WORK -); - -

Description

-HPL_dlocmax -finds the maximum entry in the current column and packs -the useful information in WORK[0:3]. On exit, WORK[0] contains the -local maximum absolute value scalar, WORK[1] is the corresponding -local row index, WORK[2] is the corresponding global row index, and -WORK[3] is the coordinate of the process owning this max. When N is -less than 1, the WORK[0:2] is initialized to zero, and WORK[3] is set -to the total number of process rows. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of rows of the column
-        of A on which we operate.
-
-
-II      (local input)                 const int
-        On entry, II  specifies the row offset where the column to be
-        operated on starts with respect to the panel.
-
-
-JJ      (local input)                 const int
-        On entry, JJ  specifies the column offset where the column to
-        be operated on starts with respect to the panel.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is  a workarray of size at least 4.  On exit,
-        WORK[0] contains  the  local  maximum  absolute value scalar,
-        WORK[1] contains  the corresponding local row index,  WORK[2]
-        contains the corresponding global row index, and  WORK[3]  is
-        the coordinate of process owning this max.
-
- -

See Also

-HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_dlocswpN.html b/hpl/www/HPL_dlocswpN.html deleted file mode 100755 index a0bd6156498feb6467a1f5f9e6ff57d7fa2fbef0..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlocswpN.html +++ /dev/null @@ -1,79 +0,0 @@ - - -HPL_dlocswpN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlocswpN locally swaps rows within panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlocswpN( -HPL_T_panel * -PANEL, -const int -II, -const int -JJ, -double * -WORK -); - -

Description

-HPL_dlocswpN -performs the local swapping operations within a panel. -The lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-II      (local input)                 const int
-        On entry, II  specifies the row offset where the column to be
-        operated on starts with respect to the panel.
-
-
-JJ      (local input)                 const int
-        On entry, JJ  specifies the column offset where the column to
-        be operated on starts with respect to the panel.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2 * (4+2*N0).
-        WORK[0] contains  the  local  maximum  absolute value scalar,
-        WORK[1] contains  the corresponding local row index,  WORK[2]
-        contains the corresponding global row index, and  WORK[3]  is
-        the coordinate of process owning this max.  The N0 length max
-        row is stored in WORK[4:4+N0-1];  Note  that this is also the
-        JJth row  (or column) of L1. The remaining part of this array
-        is used as workspace.
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_dlocswpT.html b/hpl/www/HPL_dlocswpT.html deleted file mode 100755 index c164cb0c3f6dcea23db3a78c0f923ea465d394e2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dlocswpT.html +++ /dev/null @@ -1,79 +0,0 @@ - - -HPL_dlocswpT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dlocswpT locally swaps rows within panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_dlocswpT( -HPL_T_panel * -PANEL, -const int -II, -const int -JJ, -double * -WORK -); - -

Description

-HPL_dlocswpT -performs the local swapping operations within a panel. -The lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-II      (local input)                 const int
-        On entry, II  specifies the row offset where the column to be
-        operated on starts with respect to the panel.
-
-
-JJ      (local input)                 const int
-        On entry, JJ  specifies the column offset where the column to
-        be operated on starts with respect to the panel.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2 * (4+2*N0).
-        WORK[0] contains  the  local  maximum  absolute value scalar,
-        WORK[1] contains  the corresponding local row index,  WORK[2]
-        contains the corresponding global row index, and  WORK[3]  is
-        the coordinate of process owning this max.  The N0 length max
-        row is stored in WORK[4:4+N0-1];  Note  that this is also the
-        JJth row  (or column) of L1. The remaining part of this array
-        is used as workspace.
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_dmatgen.html b/hpl/www/HPL_dmatgen.html deleted file mode 100755 index fee74a090abc44d3cabe1b2a76d81f6b35362a1b..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dmatgen.html +++ /dev/null @@ -1,73 +0,0 @@ - - -HPL_dmatgen HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dmatgen random matrix generator. - -

Synopsis

-#include "hpl.h"

-void -HPL_dmatgen( -const int -M, -const int -N, -double * -A, -const int -LDA, -const int -ISEED -); - -

Description

-HPL_dmatgen -generates (or regenerates) a random matrix A. - -The pseudo-random generator uses the linear congruential algorithm: -X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer -Programming, Knuth 1973, Vol. 2. - -

Arguments

-
-M       (input)                       const int
-        On entry,  M  specifies  the number  of rows of the matrix A.
-        M must be at least zero.
-
-
-N       (input)                       const int
-        On entry,  N specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-A       (output)                      double *
-        On entry, A points to an array of dimension (LDA,N). On exit,
-        this  array  contains   the   coefficients  of  the  randomly
-        generated matrix.
-
-
-LDA     (input)                       const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,M).
-
-
-ISEED   (input)                       const int
-        On entry, ISEED  specifies  the  seed  number to generate the
-        matrix A. ISEED must be at least zero.
-
- -

See Also

-HPL_ladd, -HPL_lmul, -HPL_setran, -HPL_xjumpm, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/HPL_dscal.html b/hpl/www/HPL_dscal.html deleted file mode 100755 index c8a2bfe0e49c3ff4e61c9d7fcc987ef2cdda8ff4..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dscal.html +++ /dev/null @@ -1,74 +0,0 @@ - - -HPL_dscal HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dscal x = alpha * x. - -

Synopsis

-#include "hpl.h"

-void -HPL_dscal( -const int -N, -const double -ALPHA, -double * -X, -const int -INCX -); - -

Description

-HPL_dscal -scales the vector x by alpha. - -

Arguments

-
-N       (local input)                 const int
-        On entry, N specifies the length of the vector x. N  must  be
-        at least zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied as zero, then the entries of the incremented array X
-        need not be set on input.
-
-
-X       (local input/output)          double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-        On exit, the entries of the incremented array  X  are  scaled
-        by the scalar alpha.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double x[3];
-   x[0] = 1.0; x[1] = 2.0; x[2] = 3.0;
-   HPL_dscal( 3, 2.0, x, 1 );
-   printf("x=[%f,%f,%f]\n", x[0], x[1], x[2]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_daxpy, -HPL_dcopy, -HPL_dswap. - - - diff --git a/hpl/www/HPL_dswap.html b/hpl/www/HPL_dswap.html deleted file mode 100755 index f132b7d9bd2b0d425f7764ffea0ea47f66847dcf..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dswap.html +++ /dev/null @@ -1,84 +0,0 @@ - - -HPL_dswap HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dswap y <-> x. - -

Synopsis

-#include "hpl.h"

-void -HPL_dswap( -const int -N, -double * -X, -const int -INCX, -double * -Y, -const int -INCY -); - -

Description

-HPL_dswap -swaps the vectors x and y. - -

Arguments

-
-N       (local input)                 const int
-        On entry, N specifies the length of the vectors  x  and  y. N
-        must be at least zero.
-
-
-X       (local input/output)          double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-        On exit, the entries of the incremented array  X  are updated
-        with the entries of the incremented array Y.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
-
-Y       (local input/output)          double *
-        On entry,  Y  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCY ) )  that  contains the vector y.
-        On exit, the entries of the incremented array  Y  are updated
-        with the entries of the incremented array X.
-
-
-INCY    (local input)                 const int
-        On entry, INCY specifies the increment for the elements of Y.
-        INCY must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double x[3], y[3];
-   x[0] = 1.0; x[1] = 2.0; x[2] = 3.0;
-   y[0] = 4.0; y[1] = 5.0; y[2] = 6.0;
-   HPL_dswap( 3, x, 1, y, 1 );
-   printf("x=[%f,%f,%f]\n", x[0], x[1], x[2]);
-   printf("y=[%f,%f,%f]\n", y[0], y[1], y[2]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_daxpy, -HPL_dcopy, -HPL_dscal. - - - diff --git a/hpl/www/HPL_dtrsm.html b/hpl/www/HPL_dtrsm.html deleted file mode 100755 index f7dff39fdaf93188f1e80192b8a1fc8ee8ab8020..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dtrsm.html +++ /dev/null @@ -1,168 +0,0 @@ - - -HPL_dtrsm HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dtrsm B := A^{-1} * B or B := B * A^{-1}. - -

Synopsis

-#include "hpl.h"

-void -HPL_dtrsm( -const enum HPL_ORDER -ORDER, -const enum HPL_SIDE -SIDE, -const enum HPL_UPLO -UPLO, -const enum HPL_TRANS -TRANS, -const enum HPL_DIAG -DIAG, -const int -M, -const int -N, -const double -ALPHA, -const double * -A, -const int -LDA, -double * -B, -const int -LDB -); - -

Description

-HPL_dtrsm -solves one of the matrix equations - - op( A ) * X = alpha * B, or X * op( A ) = alpha * B, - -where alpha is a scalar, X and B are m by n matrices, A is a unit, or -non-unit, upper or lower triangular matrix and op(A) is one of - - op( A ) = A or op( A ) = A^T. - -The matrix X is overwritten on B. - -No test for singularity or near-singularity is included in this -routine. Such tests must be performed before calling this routine. - -

Arguments

-
-ORDER   (local input)                 const enum HPL_ORDER
-        On entry, ORDER  specifies the storage format of the operands
-        as follows:                                                  
-           ORDER = HplRowMajor,                                      
-           ORDER = HplColumnMajor.                                   
-
-
-SIDE    (local input)                 const enum HPL_SIDE
-        On entry, SIDE  specifies  whether  op(A) appears on the left
-        or right of X as follows:
-           SIDE==HplLeft    op( A ) * X = alpha * B,
-           SIDE==HplRight   X * op( A ) = alpha * B.
-
-
-UPLO    (local input)                 const enum HPL_UPLO
-        On  entry,   UPLO   specifies  whether  the  upper  or  lower
-        triangular  part  of the array  A  is to be referenced.  When
-        UPLO==HplUpper, only  the upper triangular part of A is to be
-        referenced, otherwise only the lower triangular part of A is 
-        to be referenced. 
-
-
-TRANS   (local input)                 const enum HPL_TRANS
-        On entry, TRANSA  specifies the form of  op(A)  to be used in
-        the matrix-matrix operation follows:                         
-           TRANSA==HplNoTrans    : op( A ) = A,                     
-           TRANSA==HplTrans      : op( A ) = A^T,                   
-           TRANSA==HplConjTrans  : op( A ) = A^T.                   
-
-
-DIAG    (local input)                 const enum HPL_DIAG
-        On entry,  DIAG  specifies  whether  A  is unit triangular or
-        not. When DIAG==HplUnit,  A is assumed to be unit triangular,
-        and otherwise, A is not assumed to be unit triangular.
-
-
-M       (local input)                 const int
-        On entry,  M  specifies  the number of rows of the  matrix B.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the number of columns of the matrix B.
-        N must be at least zero.
-
-
-ALPHA   (local input)                 const double
-        On entry, ALPHA specifies the scalar alpha.   When  ALPHA  is
-        supplied  as  zero then the elements of the matrix B need not
-        be set on input.
-
-
-A       (local input)                 const double *
-        On entry,  A  points  to an array of size equal to or greater
-        than LDA * k,  where  k is m  when  SIDE==HplLeft  and  is  n
-        otherwise.  Before  entry  with  UPLO==HplUpper,  the leading
-        k by k upper triangular  part of the array A must contain the
-        upper triangular  matrix and the  strictly  lower  triangular
-        part of A is not referenced.  When  UPLO==HplLower on  entry,
-        the  leading k by k lower triangular part of the array A must
-        contain the lower triangular matrix  and  the  strictly upper
-        triangular part of A is not referenced.
-         
-        Note that  when  DIAG==HplUnit,  the  diagonal elements of  A
-        not referenced  either,  but are assumed to be unity.
-
-
-LDA     (local input)                 const int
-        On entry,  LDA  specifies  the  leading  dimension  of  A  as
-        declared  in  the  calling  (sub) program.  LDA  must  be  at
-        least MAX(1,m) when SIDE==HplLeft, and MAX(1,n) otherwise.
-
-
-B       (local input/output)          double *
-        On entry,  B  points  to an array of size equal to or greater
-        than LDB * n.  Before entry, the leading  m by n  part of the
-        array B must contain the matrix  B, except when beta is zero,
-        in which case B need not be set on entry.  On exit, the array
-        B is overwritten by the m by n solution matrix.
-
-
-LDB     (local input)                 const int
-        On entry,  LDB  specifies  the  leading  dimension  of  B  as
-        declared  in  the  calling  (sub) program.  LDB  must  be  at
-        least MAX(1,m).
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], b[2*2];
-   a[0] = 4.0; a[1] = 1.0; a[2] = 2.0; a[3] = 5.0;
-   b[0] = 2.0; b[1] = 1.0; b[2] = 1.0; b[3] = 2.0;
-   HPL_dtrsm( HplColumnMajor, HplLeft, HplUpper,
-              HplNoTrans, HplNonUnit, 2, 2, 2.0,
-              a, 2, b, 2 );
-   printf("  [%f,%f]\n", b[0], b[2]);
-   printf("b=[%f,%f]\n", b[1], b[3]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dgemm. - - - diff --git a/hpl/www/HPL_dtrsv.html b/hpl/www/HPL_dtrsv.html deleted file mode 100755 index 98287b76de051d44d4d3af95aac33310cb125026..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_dtrsv.html +++ /dev/null @@ -1,136 +0,0 @@ - - -HPL_dtrsv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_dtrsv x := A^{-1} x. - -

Synopsis

-#include "hpl.h"

-void -HPL_dtrsv( -const enum HPL_ORDER -ORDER, -const enum HPL_UPLO -UPLO, -const enum HPL_TRANS -TRANS, -const enum HPL_DIAG -DIAG, -const int -N, -const double * -A, -const int -LDA, -double * -X, -const int -INCX -); - -

Description

-HPL_dtrsv -solves one of the systems of equations - - A * x = b, or A^T * x = b, - -where b and x are n-element vectors and A is an n by n non-unit, or -unit, upper or lower triangular matrix. - -No test for singularity or near-singularity is included in this -routine. Such tests must be performed before calling this routine. - -

Arguments

-
-ORDER   (local input)                 const enum HPL_ORDER
-        On entry, ORDER  specifies the storage format of the operands
-        as follows:                                                  
-           ORDER = HplRowMajor,                                      
-           ORDER = HplColumnMajor.                                   
-
-
-UPLO    (local input)                 const enum HPL_UPLO
-        On  entry,   UPLO   specifies  whether  the  upper  or  lower
-        triangular  part  of the array  A  is to be referenced.  When
-        UPLO==HplUpper, only  the upper triangular part of A is to be
-        referenced, otherwise only the lower triangular part of A is 
-        to be referenced. 
-
-
-TRANS   (local input)                 const enum HPL_TRANS
-        On entry,  TRANS  specifies  the equations  to  be  solved as
-        follows:
-           TRANS==HplNoTrans     A   * x = b,
-           TRANS==HplTrans       A^T * x = b.
-
-
-DIAG    (local input)                 const enum HPL_DIAG
-        On entry,  DIAG  specifies  whether  A  is unit triangular or
-        not. When DIAG==HplUnit,  A is assumed to be unit triangular,
-        and otherwise, A is not assumed to be unit triangular.
-
-
-N       (local input)                 const int
-        On entry, N specifies the order of the matrix A. N must be at
-        least zero.
-
-
-A       (local input)                 const double *
-        On entry,  A  points  to an array of size equal to or greater
-        than LDA * n. Before entry with  UPLO==HplUpper,  the leading
-        n by n upper triangular  part of the array A must contain the
-        upper triangular  matrix and the  strictly  lower  triangular
-        part of A is not referenced.  When  UPLO==HplLower  on entry,
-        the  leading n by n lower triangular part of the array A must
-        contain the lower triangular matrix  and  the  strictly upper
-        triangular part of A is not referenced.
-         
-        Note  that  when  DIAG==HplUnit,  the diagonal elements of  A
-        not referenced  either,  but are assumed to be unity.
-
-
-LDA     (local input)                 const int
-        On entry,  LDA  specifies  the  leading  dimension  of  A  as
-        declared  in  the  calling  (sub) program.  LDA  must  be  at
-        least MAX(1,n).
-
-
-X       (local input/output)          double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-        Before entry,  the  incremented array  X  must contain  the n
-        element right-hand side vector b. On exit,  X  is overwritten
-        with the solution vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double a[2*2], x[2];
-   a[0] = 4.0; a[1] = 1.0; a[2] = 2.0; a[3] = 5.0;
-   x[0] = 2.0; x[1] = 1.0;
-   HPL_dtrsv( HplColumnMajor, HplLower, HplNoTrans,
-              HplNoUnit, a, 2, x, 1 );
-   printf("x=[%f,%f]\n", x[0], x[1]);
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_dger, -HPL_dgemv. - - - diff --git a/hpl/www/HPL_equil.html b/hpl/www/HPL_equil.html deleted file mode 100755 index c15569d916bc35cc3f6583527ff26a496bed3923..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_equil.html +++ /dev/null @@ -1,115 +0,0 @@ - - -HPL_equil HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_equil Equilibrate U and forward the column panel L. - -

Synopsis

-#include "hpl.h"

-void -HPL_equil( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const enum HPL_TRANS -TRANS, -const int -N, -double * -U, -const int -LDU, -int * -IPLEN, -const int * -IPMAP, -const int * -IPMAPM1, -int * -IWORK -); - -

Description

-HPL_equil -equilibrates the local pieces of U, so that on exit to -this function, pieces of U contained in every process row are of the -same size. This phase makes the rolling phase optimal. In addition, -this function probes for the column panel L and forwards it when -possible. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be equilibrated) information.
-
-
-TRANS   (global input)                const enum HPL_TRANS
-        On entry, TRANS specifies whether  U  is stored in transposed
-        or non-transposed form.
-
-
-N       (local input)                 const int
-        On entry, N  specifies the number of rows or columns of  U. N
-        must be at least 0.
-
-
-U       (local input/output)          double *
-        On entry,  U  is an array of dimension (LDU,*) containing the
-        local pieces of U in each process row.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the local leading dimension of U. LDU
-        should be at least MAX(1,IPLEN[nprow]) when  U  is stored  in
-        non-transposed form, and MAX(1,N) otherwise.
-
-
-IPLEN   (global input)                int *
-        On entry, IPLEN is an array of dimension NPROW+1.  This array
-        is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U
-        in process IPMAP[i].
-
-
-IPMAP   (global input)                const int *
-        On entry, IPMAP is an array of dimension  NPROW.  This  array
-        contains  the  logarithmic mapping of the processes. In other
-        words, IPMAP[myrow]  is the absolute coordinate of the sorted
-        process.
-
-
-IPMAPM1 (global input)                const int *
-        On entry, IPMAPM1  is an array of dimension NPROW. This array
-        contains  the inverse of the logarithmic mapping contained in
-        IPMAP: For i in [0.. NPROCS) IPMAPM1[IPMAP[i]] = i.
-
-
-IWORK   (workspace)                   int *
-        On entry, IWORK is a workarray of dimension NPROW+1.
-
- -

See Also

-HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_fprintf.html b/hpl/www/HPL_fprintf.html deleted file mode 100755 index d24fb36ae0becedc705ecedad0b5e9613c3726f1..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_fprintf.html +++ /dev/null @@ -1,58 +0,0 @@ - - -HPL_fprintf HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_fprintf fprintf + fflush wrapper. - -

Synopsis

-#include "hpl.h"

-void -HPL_fprintf( -FILE * -STREAM, -const char * -FORM, -... -); - -

Description

-HPL_fprintf -is a wrapper around fprintf flushing the output stream. - -

Arguments

-
-STREAM  (local input)                 FILE *
-        On entry, STREAM specifies the output stream.
-
-
-FORM    (local input)                 const char *
-        On entry, FORM specifies the format, i.e., how the subsequent
-        arguments are converted for output.
-
-
-        (local input)                 ...
-        On entry,  ...  is the list of arguments to be printed within
-        the format string.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   HPL_fprintf( stdout, "Hello World.\n" );
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_abort, -HPL_warn. - - - diff --git a/hpl/www/HPL_grid_exit.html b/hpl/www/HPL_grid_exit.html deleted file mode 100755 index 2017b56e0099ba5cdc516a551406841a0b0b0105..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_grid_exit.html +++ /dev/null @@ -1,39 +0,0 @@ - - -HPL_grid_exit HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_grid_exit Exit process grid. - -

Synopsis

-#include "hpl.h"

-int -HPL_grid_exit( -HPL_T_grid * -GRID -); - -

Description

-HPL_grid_exit -marks the process grid object for deallocation. The -returned error code MPI_SUCCESS indicates successful completion. -Other error codes are (MPI) implementation dependent. - -

Arguments

-
-GRID    (local input/output)          HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid to be released.
-
- -

See Also

-HPL_pnum, -HPL_grid_init, -HPL_grid_info. - - - diff --git a/hpl/www/HPL_grid_info.html b/hpl/www/HPL_grid_info.html deleted file mode 100755 index 276b2ecc5fa7253d33a327ec8b2a6184fd5d8627..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_grid_info.html +++ /dev/null @@ -1,70 +0,0 @@ - - -HPL_grid_info HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_grid_info Retrieve grid information. - -

Synopsis

-#include "hpl.h"

-int -HPL_grid_info( -const HPL_T_grid * -GRID, -int * -NPROW, -int * -NPCOL, -int * -MYROW, -int * -MYCOL -); - -

Description

-HPL_grid_info -returns the grid shape and the coordinates in the grid -of the calling process. Successful completion is indicated by the -returned error code MPI_SUCCESS. Other error codes depend on the MPI -implementation. - -

Arguments

-
-GRID    (local input)                 const HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-NPROW   (global output)               int *
-        On exit,   NPROW  specifies the number of process rows in the
-        grid. NPROW is at least one.
-
-
-NPCOL   (global output)               int *
-        On exit,   NPCOL  specifies  the number of process columns in
-        the grid. NPCOL is at least one.
-
-
-MYROW   (global output)               int *
-        On exit,  MYROW  specifies my  row process  coordinate in the
-        grid. MYROW is greater than or equal  to zero  and  less than
-        NPROW.
-
-
-MYCOL   (global output)               int *
-        On exit,  MYCOL specifies my column process coordinate in the
-        grid. MYCOL is greater than or equal  to zero  and  less than
-        NPCOL.
-
- -

See Also

-HPL_pnum, -HPL_grid_init, -HPL_grid_exit. - - - diff --git a/hpl/www/HPL_grid_init.html b/hpl/www/HPL_grid_init.html deleted file mode 100755 index c16f7a8c60d8ab2840bf9b7c59b5d9f29ef42b1f..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_grid_init.html +++ /dev/null @@ -1,73 +0,0 @@ - - -HPL_grid_init HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_grid_init Create a process grid. - -

Synopsis

-#include "hpl.h"

-int -HPL_grid_init( -MPI_Comm -COMM, -const HPL_T_ORDER -ORDER, -const int -NPROW, -const int -NPCOL, -HPL_T_grid * -GRID -); - -

Description

-HPL_grid_init -creates a NPROW x NPCOL process grid using column- or -row-major ordering from an initial collection of processes identified -by an MPI communicator. Successful completion is indicated by the -returned error code MPI_SUCCESS. Other error codes depend on the MPI -implementation. The coordinates of processes that are not part of the -grid are set to values outside of [0..NPROW) x [0..NPCOL). - -

Arguments

-
-COMM    (global/local input)          MPI_Comm
-        On entry,  COMM  is  the  MPI  communicator  identifying  the
-        initial  collection  of  processes out of which  the  grid is
-        formed.
-
-
-ORDER   (global input)                const HPL_T_ORDER
-        On entry, ORDER specifies how the processes should be ordered
-        in the grid as follows:
-           ORDER = HPL_ROW_MAJOR    row-major    ordering;
-           ORDER = HPL_COLUMN_MAJOR column-major ordering;
-
-
-NPROW   (global input)                const int
-        On entry,  NPROW  specifies the number of process rows in the
-        grid to be created. NPROW must be at least one.
-
-
-NPCOL   (global input)                const int
-        On entry,  NPCOL  specifies  the number of process columns in
-        the grid to be created. NPCOL must be at least one.
-
-
-GRID    (local input/output)          HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information to be initialized.
-
- -

See Also

-HPL_pnum, -HPL_grid_info, -HPL_grid_exit. - - - diff --git a/hpl/www/HPL_idamax.html b/hpl/www/HPL_idamax.html deleted file mode 100755 index 9cbf8808b6c054b1d293bb4c247b88cd0753674f..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_idamax.html +++ /dev/null @@ -1,68 +0,0 @@ - - -HPL_idamax HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_idamax 1st k s.t. |x_k| = max_i(|x_i|). - -

Synopsis

-#include "hpl.h"

-int -HPL_idamax( -const int -N, -const double * -X, -const int -INCX -); - -

Description

-HPL_idamax -returns the index in an n-vector x of the first element -having maximum absolute value. - -

Arguments

-
-N       (local input)                 const int
-        On entry, N specifies the length of the vector x. N  must  be
-        at least zero.
-
-
-X       (local input)                 const double *
-        On entry,  X  is an incremented array of dimension  at  least
-        ( 1 + ( n - 1 ) * abs( INCX ) )  that  contains the vector x.
-
-
-INCX    (local input)                 const int
-        On entry, INCX specifies the increment for the elements of X.
-        INCX must not be zero.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   double x[3];
-   int    imax;
-   x[0] = 1.0; x[1] = 3.0; x[2] = 2.0;
-   imax = HPL_idamax( 3, x, 1 );
-   printf("imax=%d\n", imax);
-   exit(0);
-   return(0);
-}
-
- -

See Also

-HPL_daxpy, -HPL_dcopy, -HPL_dscal, -HPL_dswap. - - - diff --git a/hpl/www/HPL_indxg2l.html b/hpl/www/HPL_indxg2l.html deleted file mode 100755 index 228090a0a6089ef9ac8c5395181194d328e28fc0..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_indxg2l.html +++ /dev/null @@ -1,71 +0,0 @@ - - -HPL_indxg2l HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_indxg2l Map a global index into a local one. - -

Synopsis

-#include "hpl.h"

-int -HPL_indxg2l( -const int -IG, -const int -INB, -const int -NB, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_indxg2l -computes the local index of a matrix entry pointed to by -the global index IG. This local returned index is the same in all -processes. - -

Arguments

-
-IG      (input)                       const int
-        On entry, IG specifies the global index of the matrix  entry.
-        IG must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of the
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix. NB must be larger than one.
-
-
-SRCPROC (input)                       const int
-        On entry, if SRCPROC = -1, the data  is not  distributed  but
-        replicated,  in  which  case  this  routine returns IG in all
-        processes. Otherwise, the value of SRCPROC is ignored.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process rows
-        or columns over which the matrix is distributed.  NPROCS must
-        be at least one.
-
- -

See Also

-HPL_indxg2lp, -HPL_indxg2p, -HPL_indxl2g, -HPL_numroc, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_indxg2lp.html b/hpl/www/HPL_indxg2lp.html deleted file mode 100755 index ff23da9d3e589bfd296a386bb5ddd315a78618b9..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_indxg2lp.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_indxg2lp HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_indxg2lp Map a local index into a global one. - -

Synopsis

-#include "hpl.h"

-void -HPL_indxg2lp( -int * -IL, -int * -PROC, -const int -IG, -const int -INB, -const int -NB, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_indxg2lp -computes the local index of a matrix entry pointed to by -the global index IG as well as the process coordinate which posseses -this entry. The local returned index is the same in all processes. - -

Arguments

-
-IL      (output)                      int *
-        On exit, IL specifies the local index corresponding to IG. IL
-        is at least zero.
-
-
-PROC    (output)                      int *
-        On exit,  PROC  is the  coordinate of the process  owning the
-        entry specified by the global index IG. PROC is at least zero
-        and less than NPROCS.
-
-
-IG      (input)                       const int
-        On entry, IG specifies the global index of the matrix  entry.
-        IG must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of the
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-SRCPROC (input)                       const int
-        On entry, if SRCPROC = -1, the data  is not  distributed  but
-        replicated,  in  which  case  this  routine returns IG in all
-        processes. Otherwise, the value of SRCPROC is ignored.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process rows
-        or columns over which the matrix is distributed.  NPROCS must
-        be at least one.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2p, -HPL_indxl2g, -HPL_numroc, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_indxg2p.html b/hpl/www/HPL_indxg2p.html deleted file mode 100755 index 49911e49a8483c6b6f991aed7d552fbe551a2a45..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_indxg2p.html +++ /dev/null @@ -1,70 +0,0 @@ - - -HPL_indxg2p HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_indxg2p Map a global index into a process coordinate. - -

Synopsis

-#include "hpl.h"

-int -HPL_indxg2p( -const int -IG, -const int -INB, -const int -NB, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_indxg2p -computes the process coordinate which posseses the entry -of a matrix specified by a global index IG. - -

Arguments

-
-IG      (input)                       const int
-        On entry, IG specifies the global index of the matrix  entry.
-        IG must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of the
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-SRCPROC (input)                       const int
-        On entry,  SRCPROC  specifies  the coordinate of the  process
-        that possesses the first row or column of the matrix. SRCPROC
-        must be at least zero and strictly less than NPROCS.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process rows
-        or columns over which the matrix is distributed.  NPROCS must
-        be at least one.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2p, -HPL_indxl2g, -HPL_numroc, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_indxl2g.html b/hpl/www/HPL_indxl2g.html deleted file mode 100755 index e20083d03f07b9c544b19e33f97d7be34939cf10..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_indxl2g.html +++ /dev/null @@ -1,78 +0,0 @@ - - -HPL_indxl2g HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_indxl2g Map a index-process pair into a global index. - -

Synopsis

-#include "hpl.h"

-int -HPL_indxl2g( -const int -IL, -const int -INB, -const int -NB, -const int -PROC, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_indxl2g -computes the global index of a matrix entry pointed to -by the local index IL of the process indicated by PROC. - -

Arguments

-
-IL      (input)                       const int
-        On entry, IL specifies the local  index of the matrix  entry.
-        IL must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of the
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-PROC    (input)                       const int
-        On entry, PROC  specifies the coordinate of the process whose
-        local array row or column is to be determined. PROC  must  be
-        at least zero and strictly less than NPROCS.
-
-
-SRCPROC (input)                       const int
-        On entry,  SRCPROC  specifies  the coordinate of the  process
-        that possesses the first row or column of the matrix. SRCPROC
-        must be at least zero and strictly less than NPROCS.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process rows
-        or columns over which the matrix is distributed.  NPROCS must
-        be at least one.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2lp, -HPL_indxg2p, -HPL_numroc, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_infog2l.html b/hpl/www/HPL_infog2l.html deleted file mode 100755 index a6b74ff5936c679ce38758d44115af393f7e78e9..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_infog2l.html +++ /dev/null @@ -1,155 +0,0 @@ - - -HPL_infog2l HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_infog2l global to local index translation. - -

Synopsis

-#include "hpl.h"

-void -HPL_infog2l( -int -I, -int -J, -const int -IMB, -const int -MB, -const int -INB, -const int -NB, -const int -RSRC, -const int -CSRC, -const int -MYROW, -const int -MYCOL, -const int -NPROW, -const int -NPCOL, -int * -II, -int * -JJ, -int * -PROW, -int * -PCOL -); - -

Description

-HPL_infog2l -computes the starting local index II, JJ corresponding to -the submatrix starting globally at the entry pointed by I, J. This -routine returns the coordinates in the grid of the process owning the -matrix entry of global indexes I, J, namely PROW and PCOL. - -

Arguments

-
-I       (global input)                int
-        On entry,  I  specifies  the  global  row index of the matrix
-        entry. I must be at least zero.
-
-
-J       (global input)                int
-        On entry,  J  specifies the global column index of the matrix
-        entry. J must be at least zero.
-
-
-IMB     (global input)                const int
-        On entry,  IMB  specifies  the size of the first row block of
-        the global matrix. IMB must be at least one.
-
-
-MB      (global input)                const int
-        On entry,  MB specifies the blocking factor used to partition
-        and  distribute the rows of the matrix A.  MB  must be larger
-        than one.
-
-
-INB     (global input)                const int
-        On entry, INB specifies the size of the first column block of
-        the global matrix. INB must be at least one.
-
-
-NB      (global input)                const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the columns of the matrix A. NB must be larger
-        than one.
-
-
-RSRC    (global input)                const int
-        On entry,  RSRC  specifies  the row coordinate of the process
-        that possesses the row  I.  RSRC  must  be at least zero  and
-        strictly less than NPROW.
-
-
-CSRC    (global input)                const int
-        On entry, CSRC specifies the column coordinate of the process
-        that possesses the column J. CSRC  must be at least zero  and
-        strictly less than NPCOL.
-
-
-MYROW   (local input)                 const int
-        On entry, MYROW  specifies my  row process  coordinate in the
-        grid. MYROW is greater than or equal  to zero  and  less than
-        NPROW.
-
-
-MYCOL   (local input)                 const int
-        On entry, MYCOL specifies my column process coordinate in the
-        grid. MYCOL is greater than or equal  to zero  and  less than
-        NPCOL.
-
-
-NPROW   (global input)                const int
-        On entry,  NPROW  specifies the number of process rows in the
-        grid. NPROW is at least one.
-
-
-NPCOL   (global input)                const int
-        On entry,  NPCOL  specifies  the number of process columns in
-        the grid. NPCOL is at least one.
-
-
-II      (local output)                int *
-        On exit, II  specifies the  local  starting  row index of the
-        submatrix. On exit, II is at least 0.
-
-
-JJ      (local output)                int *
-        On exit, JJ  specifies the local starting column index of the
-        submatrix. On exit, JJ is at least 0.
-
-
-PROW    (global output)               int *
-        On exit, PROW is the row coordinate of the process owning the
-        entry specified by the global index I.  PROW is at least zero
-        and less than NPROW.
-
-
-PCOL    (global output)               int *
-        On exit, PCOL  is the column coordinate of the process owning
-        the entry specified by the global index J.  PCOL  is at least
-        zero and less than NPCOL.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2p, -HPL_indxl2g, -HPL_numroc, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_jumpit.html b/hpl/www/HPL_jumpit.html deleted file mode 100755 index c36050cfd27493c3444cf37278255aad4366240e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_jumpit.html +++ /dev/null @@ -1,65 +0,0 @@ - - -HPL_jumpit HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_jumpit jump into the random sequence. - -

Synopsis

-#include "hpl.h"

-void -HPL_jumpit( -int * -MULT, -int * -IADD, -int * -IRANN, -int * -IRANM -); - -

Description

-HPL_jumpit -jumps in the random sequence from the number X(n) encoded -in IRANN to the number X(m) encoded in IRANM using the constants A -and C encoded in MULT and IADD: X(m) = A * X(n) + C. The constants A -and C obviously depend on m and n, see the function HPL_xjumpm in -order to initialize them. - -

Arguments

-
-MULT    (local input)                 int *
-        On entry, MULT is an array of dimension 2, that contains the
-        16-lower and 15-higher bits of the constant A.
-
-
-IADD    (local input)                 int *
-        On entry, IADD is an array of dimension 2, that contains the
-        16-lower and 15-higher bits of the constant C.
-
-
-IRANN   (local input)                 int *
-        On entry,  IRANN  is an array of dimension 2,  that contains 
-        the 16-lower and 15-higher bits of the encoding of X(n).
-
-
-IRANM   (local output)                int *
-        On entry,  IRANM  is an array of dimension 2.  On exit, this
-        array contains respectively the 16-lower and  15-higher bits
-        of the encoding of X(m).
-
- -

See Also

-HPL_ladd, -HPL_lmul, -HPL_setran, -HPL_xjumpm, -HPL_rand. - - - diff --git a/hpl/www/HPL_ladd.html b/hpl/www/HPL_ladd.html deleted file mode 100755 index c53b365539e94f039059ee32fd8f770dd209c20e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_ladd.html +++ /dev/null @@ -1,57 +0,0 @@ - - -HPL_ladd HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_ladd Adds two long positive integers. - -

Synopsis

-#include "hpl.h"

-void -HPL_ladd( -int * -J, -int * -K, -int * -I -); - -

Description

-HPL_ladd -adds without carry two long positive integers K and J and -puts the result into I. The long integers I, J, K are encoded on 64 -bits using an array of 2 integers. The 32-lower bits are stored in -the first entry of each array, the 32-higher bits in the second -entry. - -

Arguments

-
-J       (local input)                 int *
-        On entry, J is an integer array of dimension 2 containing the
-        encoded long integer J.
-
-
-K       (local input)                 int *
-        On entry, K is an integer array of dimension 2 containing the
-        encoded long integer K.
-
-
-I       (local output)                int *
-        On entry, I is an integer array of dimension 2. On exit, this
-        array contains the encoded long integer result.
-
- -

See Also

-HPL_lmul, -HPL_setran, -HPL_xjumpm, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/HPL_lmul.html b/hpl/www/HPL_lmul.html deleted file mode 100755 index 9fcead8e532e2e2593e455d77ad1c1a7495c5121..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_lmul.html +++ /dev/null @@ -1,58 +0,0 @@ - - -HPL_lmul HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_lmul multiplies 2 long positive integers. - -

Synopsis

-#include "hpl.h"

-void -HPL_lmul( -int * -K, -int * -J, -int * -I -); - -

Description

-HPL_lmul -multiplies without carry two long positive integers K and J -and puts the result into I. The long integers I, J, K are encoded on -64 bits using an array of 2 integers. The 32-lower bits are stored in -the first entry of each array, the 32-higher bits in the second entry -of each array. For efficiency purposes, the intrisic modulo function -is inlined. - -

Arguments

-
-K       (local input)                 int *
-        On entry, K is an integer array of dimension 2 containing the
-        encoded long integer K.
-
-
-J       (local input)                 int *
-        On entry, J is an integer array of dimension 2 containing the
-        encoded long integer J.
-
-
-I       (local output)                int *
-        On entry, I is an integer array of dimension 2. On exit, this
-        array contains the encoded long integer result.
-
- -

See Also

-HPL_ladd, -HPL_setran, -HPL_xjumpm, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/HPL_logsort.html b/hpl/www/HPL_logsort.html deleted file mode 100755 index 6b0fad1fa5fbac27f55c53ec4955a9ea45beb130..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_logsort.html +++ /dev/null @@ -1,83 +0,0 @@ - - -HPL_logsort HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_logsort Sort the processes in logarithmic order. - -

Synopsis

-#include "hpl.h"

-void -HPL_logsort( -const int -NPROCS, -const int -ICURROC, -int * -IPLEN, -int * -IPMAP, -int * -IPMAPM1 -); - -

Description

-HPL_logsort -computes an array IPMAP and its inverse IPMAPM1 that -contain the logarithmic sorted processes id with repect to the local -number of rows of U that they own. This is necessary to ensure that -the logarithmic spreading of U is optimal in terms of number of steps -and communication volume as well. In other words, the larget pieces -of U will be sent a minimal number of times. - -

Arguments

-
-NPROCS  (global input)                const int
-        On entry, NPROCS  specifies the number of process rows in the
-        process grid. NPROCS is at least one.
-
-
-ICURROC (global input)                const int
-        On entry, ICURROC is the source process row.
-
-
-IPLEN   (global input/output)         int *
-        On entry, IPLEN is an array of dimension NPROCS+1,  such that
-        IPLEN[0] is 0, and IPLEN[i] contains the number of rows of U,
-        that process i-1 has.  On exit,  IPLEN[i]  is  the number  of
-        rows of U  in the processes before process IPMAP[i] after the
-        sort,  with  the convention that  IPLEN[NPROCS] is  the total
-        number  of rows  of the panel.  In other words,  IPLEN[i+1] -
-        IPLEN[i] is  the  number of rows of A that should be moved to
-        the process IPMAP[i].  IPLEN  is such that the number of rows
-        of  the  source process  row is IPLEN[1] - IPLEN[0],  and the
-        remaining  entries  of  this  array  are  sorted  so that the
-        quantities IPLEN[i+1]-IPLEN[i] are logarithmically sorted.
-
-
-IPMAP   (global output)               int *
-        On entry,  IPMAP  is an array of dimension  NPROCS.  On exit,
-        array contains  the logarithmic mapping of the processes.  In
-        other words, IPMAP[myroc] is the corresponding sorted process
-        coordinate.
-
-
-IPMAPM1 (global output)               int *
-        On entry, IPMAPM1  is an array of dimension NPROCS.  On exit,
-        this  array  contains  the inverse of the logarithmic mapping
-        contained  in  IPMAP:  IPMAPM1[ IPMAP[i] ] = i,  for all i in
-        [0.. NPROCS)
-
- -

See Also

-HPL_plindx1, -HPL_plindx10, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_max.html b/hpl/www/HPL_max.html deleted file mode 100755 index 0d5df5bfcff1dedacf4e3f85cd391eafd299ed4b..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_max.html +++ /dev/null @@ -1,60 +0,0 @@ - - -HPL_max HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_max Combine (max) two buffers. - -

Synopsis

-#include "hpl.h"

-void -HPL_max( -const int -N, -const void * -IN, -void * -INOUT, -const HPL_T_TYPE -DTYPE -); - -

Description

-HPL_max -combines (max) two buffers. - -

Arguments

-
-N       (input)                       const int
-        On entry, N  specifies  the  length  of  the  buffers  to  be
-        combined. N must be at least zero.
-
-
-IN      (input)                       const void *
-        On entry, IN points to the input-only buffer to be combined.
-
-
-INOUT   (input/output)                void *
-        On entry, INOUT  points  to  the  input-output  buffer  to be
-        combined.  On exit,  the  entries of this array contains  the
-        combined results.
-
-
-DTYPE   (input)                       const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
- -

See Also

-HPL_broadcast, -HPL_reduce, -HPL_all_reduce, -HPL_barrier, -HPL_min, -HPL_sum. - - - diff --git a/hpl/www/HPL_min.html b/hpl/www/HPL_min.html deleted file mode 100755 index 54a3be3b5c58fea6c26098eefa8354a8e7459cf2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_min.html +++ /dev/null @@ -1,60 +0,0 @@ - - -HPL_min HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_min Combine (min) two buffers. - -

Synopsis

-#include "hpl.h"

-void -HPL_min( -const int -N, -const void * -IN, -void * -INOUT, -const HPL_T_TYPE -DTYPE -); - -

Description

-HPL_min -combines (min) two buffers. - -

Arguments

-
-N       (input)                       const int
-        On entry, N  specifies  the  length  of  the  buffers  to  be
-        combined. N must be at least zero.
-
-
-IN      (input)                       const void *
-        On entry, IN points to the input-only buffer to be combined.
-
-
-INOUT   (input/output)                void *
-        On entry, INOUT  points  to  the  input-output  buffer  to be
-        combined.  On exit,  the  entries of this array contains  the
-        combined results.
-
-
-DTYPE   (input)                       const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
- -

See Also

-HPL_broadcast, -HPL_reduce, -HPL_all_reduce, -HPL_barrier, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_numroc.html b/hpl/www/HPL_numroc.html deleted file mode 100755 index ab5c89cc2d6cd2800d9a4cd5fc3620bb0b0c15e9..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_numroc.html +++ /dev/null @@ -1,79 +0,0 @@ - - -HPL_numroc HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_numroc Compute the local number of row/columns. - -

Synopsis

-#include "hpl.h"

-int -HPL_numroc( -const int -N, -const int -INB, -const int -NB, -const int -PROC, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_numroc -returns the local number of matrix rows/columns process -PROC will get if we give out N rows/columns starting from global -index 0. - -

Arguments

-
-N       (input)                       const int
-        On entry, N  specifies the number of rows/columns being dealt
-        out. N must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of the
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-PROC    (input)                       const int
-        On entry, PROC specifies  the coordinate of the process whose
-        local portion is determined.  PROC must be at least zero  and
-        strictly less than NPROCS.
-
-
-SRCPROC (input)                       const int
-        On entry,  SRCPROC  specifies  the coordinate of the  process
-        that possesses the first row or column of the matrix. SRCPROC
-        must be at least zero and strictly less than NPROCS.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process rows
-        or columns over which the matrix is distributed.  NPROCS must
-        be at least one.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2lp, -HPL_indxg2p, -HPL_indxl2g, -HPL_numrocI. - - - diff --git a/hpl/www/HPL_numrocI.html b/hpl/www/HPL_numrocI.html deleted file mode 100755 index 7b8dcc54fe01b86057b37b38770db730bfe58b41..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_numrocI.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_numrocI HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_numrocI Compute the local number of row/columns. - -

Synopsis

-#include "hpl.h"

-int -HPL_numrocI( -const int -N, -const int -I, -const int -INB, -const int -NB, -const int -PROC, -const int -SRCPROC, -const int -NPROCS -); - -

Description

-HPL_numrocI -returns the local number of matrix rows/columns process -PROC will get if we give out N rows/columns starting from global -index I. - -

Arguments

-
-N       (input)                       const int
-        On entry, N  specifies the number of rows/columns being dealt
-        out. N must be at least zero.
-
-
-I       (input)                       const int
-        On entry, I  specifies the global index of the matrix  entry
-        I must be at least zero.
-
-
-INB     (input)                       const int
-        On entry,  INB  specifies  the size of the first block of th
-        global matrix. INB must be at least one.
-
-
-NB      (input)                       const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-PROC    (input)                       const int
-        On entry, PROC specifies  the coordinate of the process whos
-        local portion is determined.  PROC must be at least zero  an
-        strictly less than NPROCS.
-
-
-SRCPROC (input)                       const int
-        On entry,  SRCPROC  specifies  the coordinate of the  proces
-        that possesses the first row or column of the matrix. SRCPRO
-        must be at least zero and strictly less than NPROCS.
-
-
-NPROCS  (input)                       const int
-        On entry,  NPROCS  specifies the total number of process row
-        or columns over which the matrix is distributed.  NPROCS mus
-        be at least one.
-
- -

See Also

-HPL_indxg2l, -HPL_indxg2lp, -HPL_indxg2p, -HPL_indxl2g, -HPL_numroc. - - - diff --git a/hpl/www/HPL_pabort.html b/hpl/www/HPL_pabort.html deleted file mode 100755 index 536575f5879e95629d7db1a590f524af7df3d6f4..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pabort.html +++ /dev/null @@ -1,57 +0,0 @@ - - -HPL_pabort HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pabort halts execution. - -

Synopsis

-#include "hpl.h"

-void -HPL_pabort( -int -LINE, -const char * -SRNAME, -const char * -FORM, -... -); - -

Description

-HPL_pabort -displays an error message on stderr and halts execution. - -

Arguments

-
-LINE    (local input)                 int
-        On entry,  LINE  specifies the line  number in the file where
-        the  error  has  occured.  When  LINE  is not a positive line
-        number, it is ignored.
-
-
-SRNAME  (local input)                 const char *
-        On entry, SRNAME  should  be the name of the routine  calling
-        this error handler.
-
-
-FORM    (local input)                 const char *
-        On entry, FORM specifies the format, i.e., how the subsequent
-        arguments are converted for output.
-
-
-        (local input)                 ...
-        On entry,  ...  is the list of arguments to be printed within
-        the format string.
-
- -

See Also

-HPL_fprintf, -HPL_pwarn. - - - diff --git a/hpl/www/HPL_packL.html b/hpl/www/HPL_packL.html deleted file mode 100755 index ce294b30746d16905892ad99db4874a3d1a19e90..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_packL.html +++ /dev/null @@ -1,59 +0,0 @@ - - -HPL_packL HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_packL Form the MPI structure for the row ring broadcasts. - -

Synopsis

-#include "hpl.h"

-int -HPL_packL( -HPL_T_panel * -PANEL, -const int -INDEX, -const int -LEN, -const int -IBUF -); - -

Description

-HPL_packL -forms the MPI data type for the panel to be broadcast. -Successful completion is indicated by the returned error code -MPI_SUCCESS. - -

Arguments

-
-PANEL   (input/output)                HPL_T_panel *
-        On entry,  PANEL  points to the  current panel data structure
-        being broadcast.
-
-
-INDEX   (input)                       const int
-        On entry,  INDEX  points  to  the  first entry of the  packed
-        buffer being broadcast.
-
-
-LEN     (input)                       const int
-        On entry, LEN is the length of the packed buffer.
-
-
-IBUF    (input)                       const int
-        On entry, IBUF  specifies the panel buffer/count/type entries
-        that should be initialized.
-
- -

See Also

-HPL_binit, -HPL_bcast, -HPL_bwait. - - - diff --git a/hpl/www/HPL_pddriver.html b/hpl/www/HPL_pddriver.html deleted file mode 100755 index ce3f9c99da85fbe4fa2dca6c2ac35be19b7be6c6..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pddriver.html +++ /dev/null @@ -1,27 +0,0 @@ - - -main HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-main HPL main timing program. - -

Synopsis

-#include "hpl.h"

-int -main(); - -

Description

-main -is the main driver program for testing the HPL routines. -This program is driven by a short data file named "HPL.dat". - -

See Also

-HPL_pdinfo, -HPL_pdtest. - - - diff --git a/hpl/www/HPL_pdfact.html b/hpl/www/HPL_pdfact.html deleted file mode 100755 index f91a21d29cef2acc6b65e8af92e3fc31f74f7f6a..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdfact.html +++ /dev/null @@ -1,78 +0,0 @@ - - -HPL_pdfact HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdfact recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdfact( -HPL_T_panel * -PANEL -); - -

Description

-HPL_pdfact -recursively factorizes a 1-dimensional panel of columns. -The RPFACT function pointer specifies the recursive algorithm to be -used, either Crout, Left- or Right looking. NBMIN allows to vary the -recursive stopping criterium in terms of the number of columns in the -panel, and NDIV allow to specify the number of subpanels each panel -should be divided into. Usuallly a value of 2 will be chosen. Finally -PFACT is a function pointer specifying the non-recursive algorithm to -to be used on at most NBMIN columns. One can also choose here between -Crout, Left- or Right looking. Empirical tests seem to indicate that -values of 4 or 8 for NBMIN give the best results. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT. - - - diff --git a/hpl/www/HPL_pdgesv.html b/hpl/www/HPL_pdgesv.html deleted file mode 100755 index 8e58f293d4a0932c9342d0de4eaa52abd930697d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdgesv.html +++ /dev/null @@ -1,56 +0,0 @@ - - -HPL_pdgesv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdgesv Solve A x = b. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdgesv( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -HPL_T_pmat * -A -); - -

Description

-HPL_pdgesv -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with or without look-ahead. The lower triangular factor is left -unpivoted and the pivots are not returned. The right hand side is the -N+1 column of the coefficient matrix. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
- -

See Also

-HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdtrsv. - - - diff --git a/hpl/www/HPL_pdgesv0.html b/hpl/www/HPL_pdgesv0.html deleted file mode 100755 index 4a9bd99a69abc17c7e43634393bcdba6f4b952dd..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdgesv0.html +++ /dev/null @@ -1,63 +0,0 @@ - - -HPL_pdgesv0 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdgesv0 Factor an N x N+1 matrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdgesv0( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -HPL_T_pmat * -A -); - -

Description

-HPL_pdgesv0 -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -without look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdfact, -HPL_binit, -HPL_bcast, -HPL_bwait, -HPL_pdupdateNN, -HPL_pdupdateNT, -HPL_pdupdateTN, -HPL_pdupdateTT. - - - diff --git a/hpl/www/HPL_pdgesvK1.html b/hpl/www/HPL_pdgesvK1.html deleted file mode 100755 index 156a366436d81c4e19b60fffd35d5f15f4225018..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdgesvK1.html +++ /dev/null @@ -1,62 +0,0 @@ - - -HPL_pdgesvK1 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdgesvK1 Factor an N x N+1 matrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdgesvK1( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -HPL_T_pmat * -A -); - -

Description

-HPL_pdgesvK1 -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK2, -HPL_pdfact, -HPL_binit, -HPL_bcast, -HPL_bwait, -HPL_pdupdateNN, -HPL_pdupdateNT, -HPL_pdupdateTN, -HPL_pdupdateTT. - - - diff --git a/hpl/www/HPL_pdgesvK2.html b/hpl/www/HPL_pdgesvK2.html deleted file mode 100755 index 52bfcb99a1da9a4cfde37431db515f985ab05c31..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdgesvK2.html +++ /dev/null @@ -1,63 +0,0 @@ - - -HPL_pdgesvK2 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdgesvK2 Factor an N x N+1 matrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdgesvK2( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -HPL_T_pmat * -A -); - -

Description

-HPL_pdgesvK2 -factors a N+1-by-N matrix using LU factorization with row -partial pivoting. The main algorithm is the "right looking" variant -with look-ahead. The lower triangular factor is left unpivoted and -the pivots are not returned. The right hand side is the N+1 column of -the coefficient matrix. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdfact, -HPL_binit, -HPL_bcast, -HPL_bwait, -HPL_pdupdateNN, -HPL_pdupdateNT, -HPL_pdupdateTN, -HPL_pdupdateTT. - - - diff --git a/hpl/www/HPL_pdinfo.html b/hpl/www/HPL_pdinfo.html deleted file mode 100755 index fa118445f1da51381775d754c453871ee0c38dd5..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdinfo.html +++ /dev/null @@ -1,252 +0,0 @@ - - -HPL_pdinfo HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdinfo Read input parameter file. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdinfo( -HPL_T_test * -TEST, -int * -NS, -int * -N, -int * -NBS, -int * -NB, -HPL_T_ORDER * -PMAPPIN, -int * -NPQS, -int * -P, -int * -Q, -int * -NPFS, -HPL_T_FACT * -PF, -int * -NBMS, -int * -NBM, -int * -NDVS, -int * -NDV, -int * -NRFS, -HPL_T_FACT * -RF, -int * -NTPS, -HPL_T_TOP * -TP, -int * -NDHS, -int * -DH, -HPL_T_SWAP * -FSWAP, -int * -TSWAP, -int * -L1NOTRAN, -int * -UNOTRAN, -int * -EQUIL, -int * -ALIGN -); - -

Description

-HPL_pdinfo -reads the startup information for the various tests and -transmits it to all processes. - -

Arguments

-
-TEST    (global output)               HPL_T_test *
-        On entry, TEST  points to a testing data structure.  On exit,
-        the fields of this data structure are initialized as follows:
-        TEST->outfp  specifies the output file where the results will
-        be printed.  It is only defined and used by  the process 0 of
-        the grid.  TEST->thrsh specifies the threshhold value for the
-        test ratio.  TEST->epsil is the relative machine precision of
-        the distributed computer.  Finally  the test counters, kfail,
-        kpass, kskip, ktest are initialized to zero.
-
-
-NS      (global output)               int *
-        On exit,  NS  specifies the number of different problem sizes
-        to be tested. NS is less than or equal to HPL_MAX_PARAM.
-
-
-N       (global output)               int *
-        On entry, N is an array of dimension HPL_MAX_PARAM.  On exit,
-        the first NS entries of this array contain the  problem sizes
-        to run the code with.
-
-
-NBS     (global output)               int *
-        On exit,  NBS  specifies the number of different distribution
-        blocking factors to be tested. NBS must be less than or equal
-        to HPL_MAX_PARAM.
-
-
-NB      (global output)               int *
-        On exit,  PMAPPIN  specifies the process mapping onto the no-
-        des of the  MPI machine configuration.  PMAPPIN  defaults  to
-        row-major ordering.
-
-
-PMAPPIN (global output)               HPL_T_ORDER *
-        On entry, NB is an array of dimension HPL_MAX_PARAM. On exit,
-        the first NBS entries of this array contain the values of the
-        various distribution blocking factors, to run the code with.
-
-
-NPQS    (global output)               int *
-        On exit, NPQS  specifies the  number of different values that
-        can be used for P and Q, i.e., the number of process grids to
-        run  the  code with.  NPQS must be  less  than  or  equal  to
-        HPL_MAX_PARAM.
-
-
-P       (global output)               int *
-        On entry, P  is an array of dimension HPL_MAX_PARAM. On exit,
-        the first NPQS entries of this array contain the values of P,
-        the number of process rows of the  NPQS grids to run the code
-        with.
-
-
-Q       (global output)               int *
-        On entry, Q  is an array of dimension HPL_MAX_PARAM. On exit,
-        the first NPQS entries of this array contain the values of Q,
-        the number of process columns of the  NPQS  grids to  run the
-        code with.
-
-
-NPFS    (global output)               int *
-        On exit, NPFS  specifies the  number of different values that
-        can be used for PF : the panel factorization algorithm to run
-        the code with. NPFS is less than or equal to HPL_MAX_PARAM.
-
-
-PF      (global output)               HPL_T_FACT *
-        On entry, PF is an array of dimension HPL_MAX_PARAM. On exit,
-        the first  NPFS  entries  of this array  contain  the various
-        panel factorization algorithms to run the code with.
-
-
-NBMS    (global output)               int *
-        On exit,  NBMS  specifies  the  number  of  various recursive
-        stopping criteria  to be tested.  NBMS  must be  less than or
-        equal to HPL_MAX_PARAM.
-
-
-NBM     (global output)               int *
-        On entry,  NBM  is an array of  dimension  HPL_MAX_PARAM.  On
-        exit, the first NBMS entries of this array contain the values
-        of the various recursive stopping criteria to be tested.
-
-
-NDVS    (global output)               int *
-        On exit,  NDVS  specifies  the number  of various numbers  of
-        panels in recursion to be tested.  NDVS is less than or equal
-        to HPL_MAX_PARAM.
-
-
-NDV     (global output)               int *
-        On entry,  NDV  is an array of  dimension  HPL_MAX_PARAM.  On
-        exit, the first NDVS entries of this array contain the values
-        of the various numbers of panels in recursion to be tested.
-
-
-NRFS    (global output)               int *
-        On exit, NRFS  specifies the  number of different values that
-        can be used for RF : the recursive factorization algorithm to
-        be tested. NRFS is less than or equal to HPL_MAX_PARAM.
-
-
-RF      (global output)               HPL_T_FACT *
-        On entry, RF is an array of dimension HPL_MAX_PARAM. On exit,
-        the first  NRFS  entries  of  this array contain  the various
-        recursive factorization algorithms to run the code with.
-
-
-NTPS    (global output)               int *
-        On exit, NTPS  specifies the  number of different values that
-        can be used for the  broadcast topologies  to be tested. NTPS
-        is less than or equal to HPL_MAX_PARAM.
-
-
-TP      (global output)               HPL_T_TOP *
-        On entry, TP is an array of dimension HPL_MAX_PARAM. On exit,
-        the  first NTPS  entries of this  array  contain  the various
-        broadcast (along rows) topologies to run the code with.
-
-
-NDHS    (global output)               int *
-        On exit, NDHS  specifies the  number of different values that
-        can be used for the  lookahead depths to be  tested.  NDHS is
-        less than or equal to HPL_MAX_PARAM.
-
-
-DH      (global output)               int *
-        On entry,  DH  is  an array of  dimension  HPL_MAX_PARAM.  On
-        exit, the first NDHS entries of this array contain the values
-        of lookahead depths to run the code with.  Such a value is at
-        least 0 (no-lookahead) or greater than zero.
-
-
-FSWAP   (global output)               HPL_T_SWAP *
-        On exit, FSWAP specifies the swapping algorithm to be used in
-        all tests.
-
-
-TSWAP   (global output)               int *
-        On exit,  TSWAP  specifies the swapping threshold as a number
-        of columns when the mixed swapping algorithm was chosen.
-
-
-L1NOTRA (global output)               int *
-        On exit, L1NOTRAN specifies whether the upper triangle of the
-        panels of columns  should  be stored  in  no-transposed  form
-        (L1NOTRAN=1) or in transposed form (L1NOTRAN=0).
-
-
-UNOTRAN (global output)               int *
-        On exit, UNOTRAN  specifies whether the panels of rows should
-        be stored in  no-transposed form  (UNOTRAN=1)  or  transposed
-        form (UNOTRAN=0) during their broadcast.
-
-
-EQUIL   (global output)               int *
-        On exit,  EQUIL  specifies  whether  equilibration during the
-        swap-broadcast  of  the  panel of rows  should  be  performed
-        (EQUIL=1) or not (EQUIL=0).
-
-
-ALIGN   (global output)               int *
-        On exit,  ALIGN  specifies the alignment  of  the dynamically
-        allocated buffers in double precision words. ALIGN is greater
-        than zero.
-
- -

See Also

-HPL_pddriver, -HPL_pdtest. - - - diff --git a/hpl/www/HPL_pdlamch.html b/hpl/www/HPL_pdlamch.html deleted file mode 100755 index 0f86ed603674bee47b9d5ccbf44d8a8b6bee058e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlamch.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_pdlamch HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlamch determines machine-specific arithmetic constants. - -

Synopsis

-#include "hpl.h"

-double -HPL_pdlamch( -MPI_Comm -COMM, -const HPL_T_MACH -CMACH -); - -

Description

-HPL_pdlamch -determines machine-specific arithmetic constants such as -the relative machine precision (eps), the safe minimum(sfmin) such that -1/sfmin does not overflow, the base of the machine (base), the precision -(prec), the number of (base) digits in the mantissa (t), whether -rounding occurs in addition (rnd = 1.0 and 0.0 otherwise), the minimum -exponent before (gradual) underflow (emin), the underflow threshold -(rmin)- base**(emin-1), the largest exponent before overflow (emax), the -overflow threshold (rmax) - (base**emax)*(1-eps). - -

Arguments

-
-COMM    (global/local input)          MPI_Comm
-        The MPI communicator identifying the process collection.
-
-
-CMACH   (global input)                const HPL_T_MACH
-        Specifies the value to be returned by HPL_pdlamch            
-           = HPL_MACH_EPS,   HPL_pdlamch := eps (default)            
-           = HPL_MACH_SFMIN, HPL_pdlamch := sfmin                    
-           = HPL_MACH_BASE,  HPL_pdlamch := base                     
-           = HPL_MACH_PREC,  HPL_pdlamch := eps*base                 
-           = HPL_MACH_MLEN,  HPL_pdlamch := t                        
-           = HPL_MACH_RND,   HPL_pdlamch := rnd                      
-           = HPL_MACH_EMIN,  HPL_pdlamch := emin                     
-           = HPL_MACH_RMIN,  HPL_pdlamch := rmin                     
-           = HPL_MACH_EMAX,  HPL_pdlamch := emax                     
-           = HPL_MACH_RMAX,  HPL_pdlamch := rmax                     
-         
-        where                                                        
-         
-           eps   = relative machine precision,                       
-           sfmin = safe minimum,                                     
-           base  = base of the machine,                              
-           prec  = eps*base,                                         
-           t     = number of digits in the mantissa,                 
-           rnd   = 1.0 if rounding occurs in addition,               
-           emin  = minimum exponent before underflow,                
-           rmin  = underflow threshold,                              
-           emax  = largest exponent before overflow,                 
-           rmax  = overflow threshold.
-
- - - diff --git a/hpl/www/HPL_pdlange.html b/hpl/www/HPL_pdlange.html deleted file mode 100755 index f1b9619c01f880bd03dd6105dd2f7695bf0d2743..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlange.html +++ /dev/null @@ -1,88 +0,0 @@ - - -HPL_pdlange HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlange Compute ||A||. - -

Synopsis

-#include "hpl.h"

-double -HPL_pdlange( -const HPL_T_grid * -GRID, -const HPL_T_NORM -NORM, -const int -M, -const int -N, -const int -NB, -const double * -A, -const int -LDA -); - -

Description

-HPL_pdlange -returns the value of the one norm, or the infinity norm, -or the element of largest absolute value of a distributed matrix A: - - - max(abs(A(i,j))) when NORM = HPL_NORM_A, - norm1(A), when NORM = HPL_NORM_1, - normI(A), when NORM = HPL_NORM_I, - -where norm1 denotes the one norm of a matrix (maximum column sum) and -normI denotes the infinity norm of a matrix (maximum row sum). Note -that max(abs(A(i,j))) is not a matrix norm. - -

Arguments

-
-GRID    (local input)                 const HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-NORM    (global input)                const HPL_T_NORM
-        On entry,  NORM  specifies  the  value to be returned by this
-        function as described above.
-
-
-M       (global input)                const int
-        On entry,  M  specifies  the number  of rows of the matrix A.
-        M must be at least zero.
-
-
-N       (global input)                const int
-        On entry,  N specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-NB      (global input)                const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix. NB must be larger than one.
-
-
-A       (local input)                 const double *
-        On entry,  A  points to an array of dimension  (LDA,LocQ(N)),
-        that contains the local pieces of the distributed matrix A.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,LocP(M)).
-
- -

See Also

-HPL_pdlaprnt, -HPL_fprintf. - - - diff --git a/hpl/www/HPL_pdlaprnt.html b/hpl/www/HPL_pdlaprnt.html deleted file mode 100755 index 456d3453ec9fd6eae171850976dcf965f816f153..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlaprnt.html +++ /dev/null @@ -1,94 +0,0 @@ - - -HPL_pdlaprnt HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlaprnt Print a distributed matrix A. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdlaprnt( -const HPL_T_grid * -GRID, -const int -M, -const int -N, -const int -NB, -double * -A, -const int -LDA, -const int -IAROW, -const int -IACOL, -const char * -CMATNM -); - -

Description

-HPL_pdlaprnt -prints to standard error a distributed matrix A. The -local pieces of A are sent to the process of coordinates (0,0) in -the grid and then printed. - -

Arguments

-
-GRID    (local input)                 const HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-M       (global input)                const int
-        On entry,  M  specifies the number of rows of the coefficient
-        matrix A. M must be at least zero.
-
-
-N       (global input)                const int
-        On  entry,   N   specifies  the  number  of  columns  of  the
-        coefficient matrix A. N must be at least zero.
-
-
-NB      (global input)                const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix. NB must be larger than one.
-
-
-A       (local input)                 double *
-        On entry,  A  points to an  array of dimension (LDA,LocQ(N)).
-        This array contains the coefficient matrix to be printed.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,LocP(M)).
-
-
-IAROW   (global input)                const int
-        On entry,  IAROW  specifies the row process coordinate owning
-        the  first row of A.  IAROW  must be  larger than or equal to
-        zero and less than NPROW.
-
-
-IACOL   (global input)                const int
-        On entry,  IACOL  specifies  the  column  process  coordinate
-        owning the  first column  of A. IACOL  must be larger than or
-        equal to zero and less than NPCOL.
-
-
-CMATNM  (global input)                const char *
-        On entry, CMATNM is the name of the matrix to be printed.
-
- -

See Also

-HPL_fprintf. - - - diff --git a/hpl/www/HPL_pdlaswp00N.html b/hpl/www/HPL_pdlaswp00N.html deleted file mode 100755 index c3b3a30a2c5601ea0826b9d4aaabf5f2690c79bd..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlaswp00N.html +++ /dev/null @@ -1,82 +0,0 @@ - - -HPL_pdlaswp00N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlaswp00N Broadcast a column panel L and swap the row panel U. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdlaswp00N( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdlaswp00N -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -Bi-directional exchange is used to perform the swap :: broadcast of -the row panel U at once, resulting in a lower number of messages than -usual as well as a lower communication volume. With P process rows and -assuming bi-directional links, the running time of this function can -be approximated by: - - log_2(P) * (lat + NB*LocQ(N) / bdwth) - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. Mono -directional links will double this communication cost. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be broadcast and swapped) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to  be swapped and broadcast starting at
-        the current position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK2, -HPL_pdupdateNN, -HPL_pdupdateTN, -HPL_pipid, -HPL_plindx0, -HPL_dlaswp01N, -HPL_dlaswp02N, -HPL_dlaswp03N, -HPL_dlaswp04N, -HPL_dlaswp05N. - - - diff --git a/hpl/www/HPL_pdlaswp00T.html b/hpl/www/HPL_pdlaswp00T.html deleted file mode 100755 index ff530a072207104b4bbd3b4c6297ec78149de166..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlaswp00T.html +++ /dev/null @@ -1,82 +0,0 @@ - - -HPL_pdlaswp00T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlaswp00T Broadcast a column panel L and swap the row panel U. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdlaswp00T( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdlaswp00T -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -Bi-directional exchange is used to perform the swap :: broadcast of -the row panel U at once, resulting in a lower number of messages than -usual as well as a lower communication volume. With P process rows and -assuming bi-directional links, the running time of this function can -be approximated by: - - log_2(P) * (lat + NB*LocQ(N) / bdwth) - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. Mono -directional links will double this communication cost. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be broadcast and swapped) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to  be swapped and broadcast starting at
-        the current position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK2, -HPL_pdupdateNT, -HPL_pdupdateTT, -HPL_pipid, -HPL_plindx0, -HPL_dlaswp01T, -HPL_dlaswp02N, -HPL_dlaswp03T, -HPL_dlaswp04T, -HPL_dlaswp05T. - - - diff --git a/hpl/www/HPL_pdlaswp01N.html b/hpl/www/HPL_pdlaswp01N.html deleted file mode 100755 index 21482a9e7372b5e5baf1911b27e181abf6bb44a7..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlaswp01N.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_pdlaswp01N HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlaswp01N Broadcast a column panel L and swap the row panel U. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdlaswp01N( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdlaswp01N -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -A "Spread then roll" algorithm performs the swap :: broadcast of the -row panel U at once, resulting in a minimal communication volume and -a "very good" use of the connectivity if available. With P process -rows and assuming bi-directional links, the running time of this -function can be approximated by: - - (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. K is -a constant in (2,3] that depends on the achieved bandwidth during a -simultaneous message exchange between two processes. An empirical -optimistic value of K is typically 2.4. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to  be swapped and broadcast starting at
-        the current position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK2, -HPL_pdupdateNN, -HPL_pdupdateTN, -HPL_pipid, -HPL_plindx1, -HPL_plindx10, -HPL_spreadN, -HPL_equil, -HPL_rollN, -HPL_dlaswp00N, -HPL_dlaswp01N, -HPL_dlaswp06N. - - - diff --git a/hpl/www/HPL_pdlaswp01T.html b/hpl/www/HPL_pdlaswp01T.html deleted file mode 100755 index e549429e9a2b8a1690ee090446789c3948727121..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdlaswp01T.html +++ /dev/null @@ -1,86 +0,0 @@ - - -HPL_pdlaswp01T HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdlaswp01T Broadcast a column panel L and swap the row panel U. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdlaswp01T( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdlaswp01T -applies the NB row interchanges to NN columns of the -trailing submatrix and broadcast a column panel. - -A "Spread then roll" algorithm performs the swap :: broadcast of the -row panel U at once, resulting in a minimal communication volume and -a "very good" use of the connectivity if available. With P process -rows and assuming bi-directional links, the running time of this -function can be approximated by: - - (log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth - -where NB is the number of rows of the row panel U, N is the global -number of columns being updated, lat and bdwth are the latency and -bandwidth of the network for double precision real words. K is -a constant in (2,3] that depends on the achieved bandwidth during a -simultaneous message exchange between two processes. An empirical -optimistic value of K is typically 2.4. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to  be swapped and broadcast starting at
-        the current position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesvK2, -HPL_pdupdateNT, -HPL_pdupdateTT, -HPL_pipid, -HPL_plindx1, -HPL_plindx10, -HPL_spreadT, -HPL_equil, -HPL_rollT, -HPL_dlaswp10N, -HPL_dlaswp01T, -HPL_dlaswp06T. - - - diff --git a/hpl/www/HPL_pdmatgen.html b/hpl/www/HPL_pdmatgen.html deleted file mode 100755 index 973535f69764a85140a819ea935024e7a4d61787..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdmatgen.html +++ /dev/null @@ -1,87 +0,0 @@ - - -HPL_pdmatgen HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdmatgen Parallel random matrix generator. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdmatgen( -const HPL_T_grid * -GRID, -const int -M, -const int -N, -const int -NB, -double * -A, -const int -LDA, -const int -ISEED -); - -

Description

-HPL_pdmatgen -generates (or regenerates) a parallel random matrix A. - -The pseudo-random generator uses the linear congruential algorithm: -X(n+1) = (a * X(n) + c) mod m as described in the Art of Computer -Programming, Knuth 1973, Vol. 2. - -

Arguments

-
-GRID    (local input)                 const HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-M       (global input)                const int
-        On entry,  M  specifies  the number  of rows of the matrix A.
-        M must be at least zero.
-
-
-N       (global input)                const int
-        On entry,  N specifies the number of columns of the matrix A.
-        N must be at least zero.
-
-
-NB      (global input)                const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
-
-A       (local output)                double *
-        On entry,  A  points  to an array of dimension (LDA,LocQ(N)).
-        On exit, this array contains the coefficients of the randomly
-        generated matrix.
-
-
-LDA     (local input)                 const int
-        On entry, LDA specifies the leading dimension of the array A.
-        LDA must be at least max(1,LocP(M)).
-
-
-ISEED   (global input)                const int
-        On entry, ISEED  specifies  the  seed  number to generate the
-        matrix A. ISEED must be at least zero.
-
- -

See Also

-HPL_ladd, -HPL_lmul, -HPL_setran, -HPL_xjumpm, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/HPL_pdmxswp.html b/hpl/www/HPL_pdmxswp.html deleted file mode 100755 index 5094e07ca1a882e58d9433e205ee0aa2c6e4991f..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdmxswp.html +++ /dev/null @@ -1,96 +0,0 @@ - - -HPL_pdmxswp HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdmxswp swaps and broacast the pivot row. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdmxswp( -HPL_T_panel * -PANEL, -const int -M, -const int -II, -const int -JJ, -double * -WORK -); - -

Description

-HPL_pdmxswp -swaps and broadcasts the absolute value max row using -bi-directional exchange. The buffer is partially set by HPL_dlocmax. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by - - log_2( P ) * ( lat + ( 2 * N0 + 4 ) / bdwth ) - -where lat and bdwth are the latency and bandwidth of the network for -double precision real elements. Communication only occurs in one -process column. Mono-directional links will cause the communication -cost to double. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of the matrix
-        column on which this function operates.
-
-
-II      (local input)                 const int
-        On entry, II  specifies the row offset where the column to be
-        operated on starts with respect to the panel.
-
-
-JJ      (local input)                 const int
-        On entry, JJ  specifies the column offset where the column to
-        be operated on starts with respect to the panel.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2 * (4+2*N0).
-        It  is assumed that  HPL_dlocmax  was called  prior  to  this
-        routine to  initialize  the first four entries of this array.
-        On exit, the  N0  length max row is stored in WORK[4:4+N0-1];
-        Note that this is also the  JJth  row  (or column) of L1. The
-        remaining part is used as a temporary array.
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdpancrN.html b/hpl/www/HPL_pdpancrN.html deleted file mode 100755 index c4011f211657526546ff817d87fbfbe2623ce7dd..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpancrN.html +++ /dev/null @@ -1,100 +0,0 @@ - - -HPL_pdpancrN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpancrN Crout panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpancrN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpancrN -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Crout variant of the usual -one-dimensional algorithm. The lower triangular N0-by-N0 upper block -of the panel is stored in no-transpose form (i.e. just like the input -matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT. - - - diff --git a/hpl/www/HPL_pdpancrT.html b/hpl/www/HPL_pdpancrT.html deleted file mode 100755 index 45c76658d04e354a82519ae27320380878be5f9d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpancrT.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_pdpancrT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpancrT Crout panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpancrT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpancrT -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Crout variant of the usual -one-dimensional algorithm. The lower triangular N0-by-N0 upper block -of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT. - - - diff --git a/hpl/www/HPL_pdpanel_disp.html b/hpl/www/HPL_pdpanel_disp.html deleted file mode 100755 index 7915bc53ef5acc07b100d795ddbfdfd89b36374e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanel_disp.html +++ /dev/null @@ -1,38 +0,0 @@ - - -HPL_pdpanel_disp HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanel_disp Deallocate a panel data structure. - -

Synopsis

-#include "hpl.h"

-int -HPL_pdpanel_disp( -HPL_T_panel * * -PANEL -); - -

Description

-HPL_pdpanel_disp -deallocates the panel structure and resources and -stores the error code returned by the panel factorization. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel * *
-        On entry,  PANEL  points  to  the  address  of the panel data
-        structure to be deallocated.
-
- -

See Also

-HPL_pdpanel_new, -HPL_pdpanel_init, -HPL_pdpanel_free. - - - diff --git a/hpl/www/HPL_pdpanel_free.html b/hpl/www/HPL_pdpanel_free.html deleted file mode 100755 index b7657c067e4a67ff06ce342d1d18e8606db4ebb8..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanel_free.html +++ /dev/null @@ -1,38 +0,0 @@ - - -HPL_pdpanel_free HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanel_free Deallocate the panel ressources. - -

Synopsis

-#include "hpl.h"

-int -HPL_pdpanel_free( -HPL_T_panel * -PANEL -); - -

Description

-HPL_pdpanel_free -deallocates the panel resources and stores the error -code returned by the panel factorization. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points  to  the  panel data  structure from
-        which the resources should be deallocated.
-
- -

See Also

-HPL_pdpanel_new, -HPL_pdpanel_init, -HPL_pdpanel_disp. - - - diff --git a/hpl/www/HPL_pdpanel_init.html b/hpl/www/HPL_pdpanel_init.html deleted file mode 100755 index 68c67cd0f2c7a891eb90d01c98e3ad353efe8e3c..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanel_init.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_pdpanel_init HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanel_init Initialize the panel resources. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanel_init( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -const int -M, -const int -N, -const int -JB, -HPL_T_pmat * -A, -const int -IA, -const int -JA, -const int -TAG, -HPL_T_panel * -PANEL -); - -

Description

-HPL_pdpanel_init -initializes a panel data structure. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-M       (local input)                 const int
-        On entry, M specifies the global number of rows of the panel.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies  the  global number of columns of the
-        panel and trailing submatrix. N must be at least zero.
-
-
-JB      (global input)                const int
-        On entry, JB specifies is the number of columns of the panel.
-        JB must be at least zero.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
-
-IA      (global input)                const int
-        On entry,  IA  is  the global row index identifying the panel
-        and trailing submatrix. IA must be at least zero.
-
-
-JA      (global input)                const int
-        On entry, JA is the global column index identifying the panel
-        and trailing submatrix. JA must be at least zero.
-
-
-TAG     (global input)                const int
-        On entry, TAG is the row broadcast message id.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
- -

See Also

-HPL_pdpanel_new, -HPL_pdpanel_disp, -HPL_pdpanel_free. - - - diff --git a/hpl/www/HPL_pdpanel_new.html b/hpl/www/HPL_pdpanel_new.html deleted file mode 100755 index f3420c1342ec01041604563265a3763f9efd996f..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanel_new.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_pdpanel_new HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanel_new Create a panel data structure. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanel_new( -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -const int -M, -const int -N, -const int -JB, -HPL_T_pmat * -A, -const int -IA, -const int -JA, -const int -TAG, -HPL_T_panel * * -PANEL -); - -

Description

-HPL_pdpanel_new -creates and initializes a panel data structure. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters.
-
-
-M       (local input)                 const int
-        On entry, M specifies the global number of rows of the panel.
-        M must be at least zero.
-
-
-N       (local input)                 const int
-        On entry,  N  specifies  the  global number of columns of the
-        panel and trailing submatrix. N must be at least zero.
-
-
-JB      (global input)                const int
-        On entry, JB specifies is the number of columns of the panel.
-        JB must be at least zero.
-
-
-A       (local input/output)          HPL_T_pmat *
-        On entry, A points to the data structure containing the local
-        array information.
-
-
-IA      (global input)                const int
-        On entry,  IA  is  the global row index identifying the panel
-        and trailing submatrix. IA must be at least zero.
-
-
-JA      (global input)                const int
-        On entry, JA is the global column index identifying the panel
-        and trailing submatrix. JA must be at least zero.
-
-
-TAG     (global input)                const int
-        On entry, TAG is the row broadcast message id.
-
-
-PANEL   (local input/output)          HPL_T_panel * *
-        On entry,  PANEL  points  to  the  address  of the panel data
-        structure to create and initialize.
-
- -

See Also

-HPL_pdpanel_new, -HPL_pdpanel_init, -HPL_pdpanel_disp. - - - diff --git a/hpl/www/HPL_pdpanllN.html b/hpl/www/HPL_pdpanllN.html deleted file mode 100755 index ce88864b443c040acf71a8d205e7e22539507d25..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanllN.html +++ /dev/null @@ -1,100 +0,0 @@ - - -HPL_pdpanllN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanllN Left-looking panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanllN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpanllN -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Left-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in no-transpose form (i.e. just like the -input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT. - - - diff --git a/hpl/www/HPL_pdpanllT.html b/hpl/www/HPL_pdpanllT.html deleted file mode 100755 index b9e21142d692b7948032c4a7cb7b98396cdcbf23..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanllT.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_pdpanllT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanllT Left-looking panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanllT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpanllT -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Left-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanrlN, -HPL_pdpanrlT. - - - diff --git a/hpl/www/HPL_pdpanrlN.html b/hpl/www/HPL_pdpanrlN.html deleted file mode 100755 index e50fef4e2330343cc065aae65badbe1fa8ba5f47..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanrlN.html +++ /dev/null @@ -1,100 +0,0 @@ - - -HPL_pdpanrlN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanrlN Right-looking panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanrlN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpanrlN -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Right-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in no-transpose form (i.e. just like the -input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlT. - - - diff --git a/hpl/www/HPL_pdpanrlT.html b/hpl/www/HPL_pdpanrlT.html deleted file mode 100755 index ba993c2e3b6140eb50c147e5c4abd0eab3fb147b..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdpanrlT.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_pdpanrlT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdpanrlT Right-looking panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdpanrlT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdpanrlT -factorizes a panel of columns that is a sub-array of a -larger one-dimensional panel A using the Right-looking variant of the -usual one-dimensional algorithm. The lower triangular N0-by-N0 upper -block of the panel is stored in transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -Note that one iteration of the the main loop is unrolled. The local -computation of the absolute value max of the next column is performed -just after its update by the current column. This allows to bring the -current column only once through cache at each step. The current -implementation does not perform any blocking for this sequence of -BLAS operations, however the design allows for plugging in an optimal -(machine-specific) specialized BLAS-like kernel. This idea has been -suggested to us by Fred Gustavson, IBM T.J. Watson Research Center. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN. - - - diff --git a/hpl/www/HPL_pdrpancrN.html b/hpl/www/HPL_pdrpancrN.html deleted file mode 100755 index c81fdba0499348d5f3de797cbdad0793c867fbd6..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpancrN.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpancrN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpancrN Crout recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpancrN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpancrN -HPL_pdrpancrN recursively factorizes a panel of columns using the -recursive Crout variant of the usual one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdrpancrT.html b/hpl/www/HPL_pdrpancrT.html deleted file mode 100755 index 2ac24eeb677baa11d073b5b659f9a9d7d5fac730..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpancrT.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpancrT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpancrT Crout recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpancrT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpancrT -recursively factorizes a panel of columns using the -recursive Crout variant of the usual one-dimensional algorithm. -The lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdrpanllN.html b/hpl/www/HPL_pdrpanllN.html deleted file mode 100755 index ce912ae32761b9622f4f2294507dbaad6818c140..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpanllN.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpanllN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpanllN Left-looking recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpanllN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpanllN -recursively factorizes a panel of columns using the -recursive Left-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdrpanllT.html b/hpl/www/HPL_pdrpanllT.html deleted file mode 100755 index fb5f66da1d0886dc2a8823beeab76b45c485e6ae..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpanllT.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpanllT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpanllT Left-looking recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpanllT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpanllT -recursively factorizes a panel of columns using the -recursive Left-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanrlN, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdrpanrlN.html b/hpl/www/HPL_pdrpanrlN.html deleted file mode 100755 index 249bed8490734f7b879cb44b7d69c023d2729b10..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpanrlN.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpanrlN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpanrlN Right-looking recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpanrlN( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpanrlN -recursively factorizes a panel of columns using the -recursive Right-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -no-transpose form (i.e. just like the input matrix itself). - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlT, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdrpanrlT.html b/hpl/www/HPL_pdrpanrlT.html deleted file mode 100755 index f9e3b0c8fc04f69085c5f03969e9f0f1666628c2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdrpanrlT.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_pdrpanrlT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdrpanrlT Right-looking recursive panel factorization. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdrpanrlT( -HPL_T_panel * -PANEL, -const int -M, -const int -N, -const int -ICOFF, -double * -WORK -); - -

Description

-HPL_pdrpanrlT -recursively factorizes a panel of columns using the -recursive Right-looking variant of the one-dimensional algorithm. The -lower triangular N0-by-N0 upper block of the panel is stored in -transpose form. - -Bi-directional exchange is used to perform the swap::broadcast -operations at once for one column in the panel. This results in a -lower number of slightly larger messages than usual. On P processes -and assuming bi-directional links, the running time of this function -can be approximated by (when N is equal to N0): - - N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + - N0^2 * ( M - N0/3 ) * gam2-3 - -where M is the local number of rows of the panel, lat and bdwth are -the latency and bandwidth of the network for double precision real -words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS -rate of execution. The recursive algorithm allows indeed to almost -achieve Level 3 BLAS performance in the panel factorization. On a -large number of modern machines, this operation is however latency -bound, meaning that its cost can be estimated by only the latency -portion N0 * log_2(P) * lat. Mono-directional links will double this -communication cost. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-M       (local input)                 const int
-        On entry,  M specifies the local number of rows of sub(A).
-
-
-N       (local input)                 const int
-        On entry,  N specifies the local number of columns of sub(A).
-
-
-ICOFF   (global input)                const int
-        On entry, ICOFF specifies the row and column offset of sub(A)
-        in A.
-
-
-WORK    (local workspace)             double *
-        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
-
- -

See Also

-HPL_dlocmax, -HPL_dlocswpN, -HPL_dlocswpT, -HPL_pdmxswp, -HPL_pdpancrN, -HPL_pdpancrT, -HPL_pdpanllN, -HPL_pdpanllT, -HPL_pdpanrlN, -HPL_pdpanrlT, -HPL_pdrpancrN, -HPL_pdrpancrT, -HPL_pdrpanllN, -HPL_pdrpanllT, -HPL_pdrpanrlN, -HPL_pdfact. - - - diff --git a/hpl/www/HPL_pdtest.html b/hpl/www/HPL_pdtest.html deleted file mode 100755 index 1556940c2a8438b4b008f9c6eb905cc0cde26a21..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdtest.html +++ /dev/null @@ -1,81 +0,0 @@ - - -HPL_pdtest HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdtest Perform one test. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdtest( -HPL_T_test * -TEST, -HPL_T_grid * -GRID, -HPL_T_palg * -ALGO, -const int -N, -const int -NB -); - -

Description

-HPL_pdtest -performs one test given a set of parameters such as the -process grid, the problem size, the distribution blocking factor ... -This function generates the data, calls and times the linear system -solver, checks the accuracy of the obtained vector solution and -writes this information to the file pointed to by TEST->outfp. - -

Arguments

-
-TEST    (global input)                HPL_T_test *
-        On entry,  TEST  points  to a testing data structure:  outfp
-        specifies the output file where the results will be printed.
-        It is only defined and used by the process  0  of the  grid.
-        thrsh  specifies  the  threshhold value  for the test ratio.
-        Concretely, a test is declared "PASSED"  if and only if  the
-        following inequality is satisfied:
-        ||Ax-b||_oo / ( epsil *
-                        ( || x ||_oo * || A ||_oo + || b ||_oo ) *
-                         N )  < thrsh.
-        epsil  is the  relative machine precision of the distributed
-        computer. Finally the test counters, kfail, kpass, kskip and
-        ktest are updated as follows:  if the test passes,  kpass is
-        incremented by one;  if the test fails, kfail is incremented
-        by one; if the test is skipped, kskip is incremented by one.
-        ktest is left unchanged.
-
-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-ALGO    (global input)                HPL_T_palg *
-        On entry,  ALGO  points to  the data structure containing the
-        algorithmic parameters to be used for this test.
-
-
-N       (global input)                const int
-        On entry,  N specifies the order of the coefficient matrix A.
-        N must be at least zero.
-
-
-NB      (global input)                const int
-        On entry,  NB specifies the blocking factor used to partition
-        and distribute the matrix A. NB must be larger than one.
-
- -

See Also

-HPL_pddriver, -HPL_pdinfo. - - - diff --git a/hpl/www/HPL_pdtrsv.html b/hpl/www/HPL_pdtrsv.html deleted file mode 100755 index 597b26de0f2013bf454a5dc4b60b0c307f3cff4d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdtrsv.html +++ /dev/null @@ -1,64 +0,0 @@ - - -HPL_pdtrsv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdtrsv Solve triu( A ) x = b. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdtrsv( -HPL_T_grid * -GRID, -HPL_T_pmat * -AMAT -); - -

Description

-HPL_pdtrsv -solves an upper triangular system of linear equations. - -The rhs is the last column of the N by N+1 matrix A. The solve starts -in the process column owning the Nth column of A, so the rhs b may -need to be moved one process column to the left at the beginning. The -routine therefore needs a column vector in every process column but -the one owning b. The result is replicated in all process rows, and -returned in XR, i.e. XR is of size nq = LOCq( N ) in all processes. - -The algorithm uses decreasing one-ring broadcast in process rows and -columns implemented in terms of synchronous communication point to -point primitives. The lookahead of depth 1 is used to minimize the -critical path. This entire operation is essentially ``latency'' bound -and an estimate of its running time is given by: - - (move rhs) lat + N / ( P bdwth ) + - (solve) ((N / NB)-1) 2 (lat + NB / bdwth) + - gam2 N^2 / ( P Q ), - -where gam2 is an estimate of the Level 2 BLAS rate of execution. -There are N / NB diagonal blocks. One must exchange 2 messages of -length NB to compute the next NB entries of the vector solution, as -well as performing a total of N^2 floating point operations. - -

Arguments

-
-GRID    (local input)                 HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-AMAT    (local input/output)          HPL_T_pmat *
-        On entry,  AMAT  points  to the data structure containing the
-        local array information.
-
- -

See Also

-HPL_pdgesv. - - - diff --git a/hpl/www/HPL_pdupdateNN.html b/hpl/www/HPL_pdupdateNN.html deleted file mode 100755 index 1d6fb5d7af454455f6002730d14cb2b4b8def122..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdupdateNN.html +++ /dev/null @@ -1,65 +0,0 @@ - - -HPL_pdupdateNN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdupdateNN Broadcast a panel and update the trailing submatrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdupdateNN( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdupdateNN -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local output)                int *
-        On exit,  IFLAG  indicates  whether or not  the broadcast has
-        been completed when PBCST is not NULL on entry. In that case,
-        IFLAG is left unchanged.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be updated) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to be updated  starting  at the  current
-        position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdlaswp00N, -HPL_pdlaswp01N. - - - diff --git a/hpl/www/HPL_pdupdateNT.html b/hpl/www/HPL_pdupdateNT.html deleted file mode 100755 index 67164b354a698e442047f624de1276b60ab9063b..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdupdateNT.html +++ /dev/null @@ -1,65 +0,0 @@ - - -HPL_pdupdateNT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdupdateNT Broadcast a panel and update the trailing submatrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdupdateNT( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdupdateNT -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local output)                int *
-        On exit,  IFLAG  indicates  whether or not  the broadcast has
-        been completed when PBCST is not NULL on entry. In that case,
-        IFLAG is left unchanged.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be updated) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to be updated  starting  at the  current
-        position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdlaswp00T, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_pdupdateTN.html b/hpl/www/HPL_pdupdateTN.html deleted file mode 100755 index 0b05f3777233976d523597f48bb3701cb82a9553..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdupdateTN.html +++ /dev/null @@ -1,65 +0,0 @@ - - -HPL_pdupdateTN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdupdateTN Broadcast a panel and update the trailing submatrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdupdateTN( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdupdateTN -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local output)                int *
-        On exit,  IFLAG  indicates  whether or not  the broadcast has
-        been completed when PBCST is not NULL on entry. In that case,
-        IFLAG is left unchanged.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be updated) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to be updated  starting  at the  current
-        position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdlaswp00N, -HPL_pdlaswp01N. - - - diff --git a/hpl/www/HPL_pdupdateTT.html b/hpl/www/HPL_pdupdateTT.html deleted file mode 100755 index 1aeeb15421c85b55fa7da4cf0faa73b3f3ed6d8d..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pdupdateTT.html +++ /dev/null @@ -1,65 +0,0 @@ - - -HPL_pdupdateTT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pdupdateTT Broadcast a panel and update the trailing submatrix. - -

Synopsis

-#include "hpl.h"

-void -HPL_pdupdateTT( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -NN -); - -

Description

-HPL_pdupdateTT -broadcast - forward the panel PBCST and simultaneously -applies the row interchanges and updates part of the trailing (using -the panel PANEL) submatrix. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local output)                int *
-        On exit,  IFLAG  indicates  whether or not  the broadcast has
-        been completed when PBCST is not NULL on entry. In that case,
-        IFLAG is left unchanged.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be updated) information.
-
-
-NN      (local input)                 const int
-        On entry, NN specifies  the  local  number  of columns of the
-        trailing  submatrix  to be updated  starting  at the  current
-        position. NN must be at least zero.
-
- -

See Also

-HPL_pdgesv, -HPL_pdgesv0, -HPL_pdgesvK1, -HPL_pdgesvK2, -HPL_pdlaswp00T, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_perm.html b/hpl/www/HPL_perm.html deleted file mode 100755 index fa5a9dad168ad4ad6c7977a5bbaba22966a6d578..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_perm.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_perm HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_perm Combine 2 index arrays - Generate the permutation. - -

Synopsis

-#include "hpl.h"

-void -HPL_perm( -const int -N, -int * -LINDXA, -int * -LINDXAU, -int * -IWORK -); - -

Description

-HPL_perm -combines two index arrays and generate the corresponding -permutation. First, this function computes the inverse of LINDXA, and -then combine it with LINDXAU. Second, in order to be able to perform -the permutation in place, LINDXAU is overwritten by the sequence of -permutation producing the same result. What we ultimately want to -achieve is: U[LINDXAU[i]] := U[LINDXA[i]] for i in [0..N). After the -call to this function, this in place permutation can be performed by -for i in [0..N) swap U[i] with U[LINDXAU[i]]. - -

Arguments

-
-N       (global input)                const int
-        On entry,  N  specifies the length of the arrays  LINDXA  and
-        LINDXAU. N should be at least zero.
-
-
-LINDXA  (global input/output)         int *
-        On entry,  LINDXA  is an array of dimension N  containing the
-        source indexes. On exit,  LINDXA  contains the combined index
-        array.
-
-
-LINDXAU (global input/output)         int *
-        On entry,  LINDXAU is an array of dimension N  containing the
-        target indexes.  On exit,  LINDXAU  contains  the sequence of
-        permutation,  that  should be applied  in increasing order to
-        permute the underlying array U in place.
-
-
-IWORK   (workspace)                   int *
-        On entry, IWORK is a workarray of dimension N.
-
- -

See Also

-HPL_plindx1, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_pipid.html b/hpl/www/HPL_pipid.html deleted file mode 100755 index 628493fad3d6c2fe87ee47690795a45b94880b6e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pipid.html +++ /dev/null @@ -1,95 +0,0 @@ - - -HPL_pipid HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pipid Simplify the pivot vector. - -

Synopsis

-#include "hpl.h"

-void -HPL_pipid( -HPL_T_panel * -PANEL, -int * -K, -int * -IPID -); - -

Description

-HPL_pipid -computes an array IPID that contains the source and final -destination of matrix rows resulting from the application of N -interchanges as computed by the LU factorization with row partial -pivoting. The array IPID is such that the row of global index IPID(i) -should be mapped onto the row of global index IPID(i+1). Note that we -cannot really know the length of IPID a priori. However, we know that -this array is at least 2*N long, since there are N rows to swap and -broadcast. The length of this array must be smaller than or equal to -4*N, since every row is swapped with at most a single distinct remote -row. The algorithm constructing IPID goes as follows: Let IA be the -global index of the first row to be swapped. - -For every row src IA + i with i in [0..N) to be swapped with row dst -such that dst is given by DPIV[i]: - -Is row src the destination of a previous row of the current block, -that is, is there k odd such that IPID(k) is equal to src ? - Yes: update this destination with dst. For example, if the -pivot array is (0,2)(1,1)(2,5) ... , then when we swap rows 2 and 5, -we swap in fact row 0 and 5, i.e., row 0 goes to 5 and not 2 as it -was thought so far ... - No : add the pair (src,dst) at the end of IPID; row src has not -been moved yet. - -Is row dst different from src the destination of a previous row of -the current block, i.e., is there k odd such that IPID(k) is equal to -dst ? - Yes: update IPID(k) with src. For example, if the pivot array -is (0,5)(1,1)(2,5) ... , then when we swap rows 2 and 5, we swap in -fact row 2 and 0, i.e., row 0 goes to 2 and not 5 as it was thought -so far ... - No : add the pair (dst,src) at the end of IPID; row dst has not -been moved yet. - -Note that when src is equal to dst, the pair (dst,src) should not be -added to IPID in order to avoid duplicated entries in this array. -During the construction of the array IPID, we make sure that the -first N entries are such that IPID(k) with k odd is equal to IA+k/2. -For k in [0..K/2), the row of global index IPID(2*k) should be -mapped onto the row of global index IPID(2*k+1). - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-K       (global output)               int *
-        On exit, K specifies the number of entries in  IPID.  K is at
-        least 2*N, and at most 4*N.
-
-
-IPID    (global output)               int *
-        On entry, IPID is an array of length 4*N.  On exit, the first
-        K entries of that array contain the src and final destination
-        resulting  from  the  application of the  N  interchanges  as
-        specified by  DPIV.  The  pairs  (src,dst)  are  contiguously
-        stored and sorted so that IPID(2*i+1) is equal to IA+i with i
-        in [0..N)
-
- -

See Also

-HPL_pdlaswp00N, -HPL_pdlaswp00T, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_plindx0.html b/hpl/www/HPL_plindx0.html deleted file mode 100755 index aa9640bd9aa65942c84a6c684e31aff25fc90201..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_plindx0.html +++ /dev/null @@ -1,187 +0,0 @@ - - -HPL_plindx0 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_plindx0 Compute local swapping index arrays. - -

Synopsis

-#include "hpl.h"

-void -HPL_plindx0( -HPL_T_panel * -PANEL, -const int -K, -int * -IPID, -int * -LINDXA, -int * -LINDXAU, -int * -LLEN -); - -

Description

-HPL_plindx0 -computes two local arrays LINDXA and LINDXAU containing -the local source and final destination position resulting from the -application of row interchanges. - -On entry, the array IPID of length K is such that the row of global -index IPID(i) should be mapped onto row of global index IPID(i+1). -Let IA be the global index of the first row to be swapped. For k in -[0..K/2), the row of global index IPID(2*k) should be mapped onto the -row of global index IPID(2*k+1). The question then, is to determine -which rows should ultimately be part of U. - -First, some rows of the process ICURROW may be swapped locally. One -of this row belongs to U, the other one belongs to my local piece of -A. The other rows of the current block are swapped with remote rows -and are thus not part of U. These rows however should be sent along, -and grabbed by the other processes as we progress in the exchange -phase. - -So, assume that I am ICURROW and consider a row of index IPID(2*i) -that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less -than N, this row is locally swapped and should be copied into U at -the position IPID(2*i+1) - IA. No row will be exchanged for this one. -If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be -locally copied into my local piece of A at the position corresponding -to the row of global index IPID(2*i+1). - -If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) -is to be swapped away and strictly speaking does not belong to U, but -to A remotely. Since this process will however send this array U, -this row is copied into U, exactly where the row IPID(2*i+1) should -go. For this, we search IPID for k1, such that IPID(2*k1) is equal to -IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position -IPID(2*k1+1)-IA. - -It is thus important to put the rows that go into U, i.e., such that -IPID(2*i+1) - IA is less than N at the begining of the array IPID. By -doing so, U is formed, and the local copy is performed in just one -sweep. - -Two lists LINDXA and LINDXAU are built. LINDXA contains the local -index of the rows I have that should be copied. LINDXAU contains the -local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A -is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) -of A should be locally copied into A(-LINDXAU(k),:). In the process -ICURROW, the initial packing algorithm proceeds as follows. - - for all entries in IPID, - if IPID(2*i) is in ICURROW, - if IPID(2*i+1) is in ICURROW, - if( IPID(2*i+1) - IA < N ) - save corresponding local position - of this row (LINDXA); - save local position (LINDXAU) in U - where this row goes; - [copy row IPID(2*i) in U at position - IPID(2*i+1)-IA; ]; - else - save corresponding local position of - this row (LINDXA); - save local position (-LINDXAU) in A - where this row goes; - [copy row IPID(2*i) in my piece of A - at IPID(2*i+1);] - end if - else - find k1 such that IPID(2*k1) = IPID(2*i+1); - copy row IPID(2*i) in U at position - IPID(2*k1+1)-IA; - save corresponding local position of this - row (LINDXA); - save local position (LINDXAU) in U where - this row goes; - end if - end if - end for - -Second, if I am not the current row process ICURROW, all source rows -in IPID that I own are part of U. Indeed, they are swapped with one -row of the current block of rows, and the main factorization -algorithm proceeds one row after each other. The processes different -from ICURROW, should exchange and accumulate those rows until they -receive some data previously owned by the process ICURROW. - -In processes different from ICURROW, the initial packing algorithm -proceeds as follows. Consider a row of global index IPID(2*i) that I -own. When I will be receiving data previously owned by ICURROW, i.e., -U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, -and this particular row of U should be first copied into my piece of -A, at A(il,:), where il is the local row index corresponding to -IPID(2*i). Now,initially, this row will be packed into workspace, say -as the kth row of that work array. The following algorithm sets -LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row -should be copied. LINDXA(k) stores the local index in A where this -row of U should be copied, i.e il. - - for all entries in IPID, - if IPID(2*i) is not in ICURROW, - copy row IPID(2*i) in work array; - save corresponding local position - of this row (LINDXA); - save position (LINDXAU) in U where - this row should be copied; - end if - end for - -Since we are at it, we also globally figure out how many rows every -process has. That is necessary, because it would rather be cumbersome -to figure it on the fly during the bi-directional exchange phase. -This information is kept in the array LLEN of size NPROW. Also note -that the arrays LINDXA and LINDXAU are of max length equal to 2*N. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-K       (global input)                const int
-        On entry, K specifies the number of entries in IPID.  K is at
-        least 2*N, and at most 4*N.
-
-
-IPID    (global input)                int *
-        On entry,  IPID  is an array of length K. The first K entries
-        of that array contain the src and final destination resulting
-        from the application of the interchanges.
-
-
-LINDXA  (local output)                int *
-        On entry, LINDXA  is an array of dimension 2*N. On exit, this
-        array contains the local indexes of the rows of A I have that
-        should be copied into U.
-
-
-LINDXAU (local output)                int *
-        On exit, LINDXAU  is an array of dimension 2*N. On exit, this
-        array contains  the local destination  information encoded as
-        follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be
-        copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)
-        of A should be locally copied into A(-LINDXAU(k),:).
-
-
-LLEN    (global output)               int *
-        On entry,  LLEN  is  an array  of length  NPROW.  On exit, it
-        contains how many rows every process has.
-
- -

See Also

-HPL_pdlaswp00N, -HPL_pdlaswp00T, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_plindx1.html b/hpl/www/HPL_plindx1.html deleted file mode 100755 index c27339a4531b3576ebed6c9f2dc0967be6046c7e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_plindx1.html +++ /dev/null @@ -1,130 +0,0 @@ - - -HPL_plindx1 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_plindx1 Compute local swapping index arrays. - -

Synopsis

-#include "hpl.h"

-void -HPL_plindx1( -HPL_T_panel * -PANEL, -const int -K, -const int * -IPID, -int * -IPA, -int * -LINDXA, -int * -LINDXAU, -int * -IPLEN, -int * -IPMAP, -int * -IPMAPM1, -int * -PERMU, -int * -IWORK -); - -

Description

-HPL_plindx1 -computes two local arrays LINDXA and LINDXAU containing -the local source and final destination position resulting from the -application of row interchanges. In addition, this function computes -three arrays IPLEN, IPMAP and IPMAPM1 that contain the logarithmic -mapping information for the spreading phase. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-K       (global input)                const int
-        On entry, K specifies the number of entries in IPID.  K is at
-        least 2*N, and at most 4*N.
-
-
-IPID    (global input)                const int *
-        On entry,  IPID  is an array of length K. The first K entries
-        of that array contain the src and final destination resulting
-        from the application of the interchanges.
-
-
-IPA     (global output)               int *
-        On exit,  IPA  specifies  the number of rows that the current
-        process row has that either belong to U  or should be swapped
-        with remote rows of A.
-
-
-LINDXA  (global output)               int *
-        On entry, LINDXA  is an array of dimension 2*N. On exit, this
-        array contains the local indexes of the rows of A I have that
-        should be copied into U.
-
-
-LINDXAU (global output)               int *
-        On exit, LINDXAU  is an array of dimension 2*N. On exit, this
-        array contains  the local destination  information encoded as
-        follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be
-        copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)
-        of A should be locally copied into A(-LINDXAU(k),:).
-
-
-IPLEN   (global output)               int *
-        On entry, IPLEN is an array of dimension NPROW + 1. On  exit,
-        this array is such that  IPLEN[i]  is the number of rows of A
-        in  the  processes  before  process  IPMAP[i]  after the sort
-        with the convention that IPLEN[nprow]  is the total number of
-        rows of the panel.  In other words IPLEN[i+1]-IPLEN[i] is the
-        local number of rows of A that should be moved to the process
-        IPMAP[i]. IPLEN is such that the number of rows of the source
-        process  row can be computed as  IPLEN[1] - IPLEN[0], and the
-        remaining  entries  of  this  array  are  sorted  so that the
-        quantities IPLEN[i+1] - IPLEN[i] are logarithmically sorted.
-
-
-IPMAP   (global output)               int *
-        On entry, IPMAP is an array of dimension NPROW. On exit, this
-        array contains  the logarithmic mapping of the processes.  In
-        other words, IPMAP[myrow] is the corresponding sorted process
-        coordinate.
-
-
-IPMAPM1 (global output)               int *
-        On entry, IPMAPM1  is an array of dimension NPROW.  On  exit,
-        this  array  contains  the inverse of the logarithmic mapping
-        contained  in  IPMAP:  IPMAPM1[ IPMAP[i] ] = i,  for all i in
-        [0.. NPROCS)
-
-
-PERMU   (global output)               int *
-        On entry,  PERMU  is an array of dimension JB. On exit, PERMU
-        contains  a sequence of permutations,  that should be applied
-        in increasing order to permute in place the row panel U.
-
-
-IWORK   (workspace)                   int *
-        On entry, IWORK is a workarray of dimension 2*JB.
-
- -

See Also

-HPL_pdlaswp00N, -HPL_pdlaswp00T, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_plindx10.html b/hpl/www/HPL_plindx10.html deleted file mode 100755 index c34c3a48b3f606e8db220991d03deedc45fc297c..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_plindx10.html +++ /dev/null @@ -1,87 +0,0 @@ - - -HPL_plindx10 HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_plindx10 Compute the logarithmic maps for the spreading. - -

Synopsis

-#include "hpl.h"

-void -HPL_plindx10( -HPL_T_panel * -PANEL, -const int -K, -const int * -IPID, -int * -IPLEN, -int * -IPMAP, -int * -IPMAPM1 -); - -

Description

-HPL_plindx10 -computes three arrays IPLEN, IPMAP and IPMAPM1 that -contain the logarithmic mapping information for the spreading phase. - -

Arguments

-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel information.
-
-
-K       (global input)                const int
-        On entry, K specifies the number of entries in IPID.  K is at
-        least 2*N, and at most 4*N.
-
-
-IPID    (global input)                const int *
-        On entry,  IPID  is an array of length K. The first K entries
-        of that array contain the src and final destination resulting
-        from the application of the interchanges.
-
-
-IPLEN   (global output)               int *
-        On entry, IPLEN  is an array of dimension NPROW + 1. On exit,
-        this array is such that  IPLEN[i]  is the number of rows of A
-        in the processes  before process IMAP[i] after the sort, with
-        the convention that IPLEN[nprow] is the total number of rows.
-        In other words,  IPLEN[i+1] - IPLEN[i] is the local number of
-        rows of  A  that should be moved for each process.  IPLEN  is
-        such that the number of rows of the source process row can be
-        computed as IPLEN[1] - IPLEN[0], and the remaining entries of
-        this  array are sorted  so  that  the quantities IPLEN[i+1] -
-        IPLEN[i] are logarithmically sorted.
-
-
-IPMAP   (global output)               int *
-        On entry, IPMAP is an array of dimension NPROW. On exit, this
-        array contains  the logarithmic mapping of the processes.  In
-        other words, IPMAP[myrow] is the corresponding sorted process
-        coordinate.
-
-
-IPMAPM1 (global output)               int *
-        On entry, IPMAPM1  is an array of dimension NPROW.  On  exit,
-        this  array  contains  the inverse of the logarithmic mapping
-        contained  in  IPMAP:  IPMAPM1[ IPMAP[i] ] = i,  for all i in
-        [0.. NPROW)
-
- -

See Also

-HPL_pdlaswp00N, -HPL_pdlaswp00T, -HPL_pdlaswp01N, -HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_pnum.html b/hpl/www/HPL_pnum.html deleted file mode 100755 index 496c9358c09b45f966c72bd7146d00860c936ae5..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pnum.html +++ /dev/null @@ -1,54 +0,0 @@ - - -HPL_pnum HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pnum Rank determination. - -

Synopsis

-#include "hpl.h"

-int -HPL_pnum( -const HPL_T_grid * -GRID, -const int -MYROW, -const int -MYCOL -); - -

Description

-HPL_pnum -determines the rank of a process as a function of its -coordinates in the grid. - -

Arguments

-
-GRID    (local input)                 const HPL_T_grid *
-        On entry,  GRID  points  to the data structure containing the
-        process grid information.
-
-
-MYROW   (local input)                 const int
-        On entry,  MYROW  specifies the row coordinate of the process
-        whose rank is to be determined. MYROW must be greater than or
-        equal to zero and less than NPROW.
-
-
-MYCOL   (local input)                 const int
-        On entry,  MYCOL  specifies  the  column  coordinate  of  the
-        process whose rank is to be determined. MYCOL must be greater
-        than or equal to zero and less than NPCOL.
-
- -

See Also

-HPL_grid_init, -HPL_grid_info, -HPL_grid_exit. - - - diff --git a/hpl/www/HPL_ptimer.html b/hpl/www/HPL_ptimer.html deleted file mode 100755 index 92932d80d6e903572d6cf064caa3bb1846eaefd7..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_ptimer.html +++ /dev/null @@ -1,49 +0,0 @@ - - -HPL_ptimer HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_ptimer Timer facility. - -

Synopsis

-#include "hpl.h"

-void -HPL_ptimer( -const int -I -); - -

Description

-HPL_ptimer -provides a "stopwatch" functionality cpu/wall timer in -seconds. Up to 64 separate timers can be functioning at once. The -first call starts the timer, and the second stops it. This routine -can be disenabled by calling HPL_ptimer_disable(), so that calls to -the timer are ignored. This feature can be used to make sure certain -sections of code do not affect timings, even if they call routines -which have HPL_ptimer calls in them. HPL_ptimer_enable() will enable -the timer functionality. One can retrieve the current value of a -timer by calling - -t0 = HPL_ptimer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - -where I is the timer index in [0..64). To inititialize the timer -functionality, one must have called HPL_ptimer_boot() prior to any of -the functions mentioned above. - -

Arguments

-
-I       (global input)                const int
-        On entry, I specifies the timer to stop/start.
-
- -

See Also

-HPL_ptimer_cputime, -HPL_ptimer_walltime. - - - diff --git a/hpl/www/HPL_ptimer_cputime.html b/hpl/www/HPL_ptimer_cputime.html deleted file mode 100755 index 83bdd5a4453324b952fc11a8391982821b871d54..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_ptimer_cputime.html +++ /dev/null @@ -1,35 +0,0 @@ - - -HPL_ptimer_cputime HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_ptimer_cputime Return the CPU time. - -

Synopsis

-#include "hpl.h"

-double -HPL_ptimer_cputime(); - -

Description

-HPL_ptimer_cputime -returns the cpu time. If HPL_USE_CLOCK is defined, -the clock() function is used to return an approximation of processor -time used by the program. The value returned is the CPU time used so -far as a clock_t; to get the number of seconds used, the result is -divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C -standard library. If HPL_USE_TIMES is defined, the times() function -is used instead. This function returns the current process times. -times() returns the number of clock ticks that have elapsed since the -system has been up. Otherwise and by default, the standard library -function getrusage() is used. - -

See Also

-HPL_ptimer_walltime, -HPL_ptimer. - - - diff --git a/hpl/www/HPL_ptimer_walltime.html b/hpl/www/HPL_ptimer_walltime.html deleted file mode 100755 index 0019899097414ac194892584f37823961661ea74..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_ptimer_walltime.html +++ /dev/null @@ -1,26 +0,0 @@ - - -HPL_ptimer_walltime HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_ptimer_walltime Return the elapsed (wall-clock) time. - -

Synopsis

-#include "hpl.h"

-double -HPL_ptimer_walltime(); - -

Description

-HPL_ptimer_walltime -returns the elapsed (wall-clock) time. - -

See Also

-HPL_ptimer_cputime, -HPL_ptimer. - - - diff --git a/hpl/www/HPL_pwarn.html b/hpl/www/HPL_pwarn.html deleted file mode 100755 index dee0440f22f5c5263fe86f5670a35e1771ef542e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_pwarn.html +++ /dev/null @@ -1,63 +0,0 @@ - - -HPL_pwarn HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_pwarn displays an error message. - -

Synopsis

-#include "hpl.h"

-void -HPL_pwarn( -FILE * -STREAM, -int -LINE, -const char * -SRNAME, -const char * -FORM, -... -); - -

Description

-HPL_pwarn -displays an error message. - -

Arguments

-
-STREAM  (local input)                 FILE *
-        On entry, STREAM specifies the output stream.
-
-
-LINE    (local input)                 int
-        On entry,  LINE  specifies the line  number in the file where
-        the  error  has  occured.  When  LINE  is not a positive line
-        number, it is ignored.
-
-
-SRNAME  (local input)                 const char *
-        On entry, SRNAME  should  be the name of the routine  calling
-        this error handler.
-
-
-FORM    (local input)                 const char *
-        On entry, FORM specifies the format, i.e., how the subsequent
-        arguments are converted for output.
-
-
-        (local input)                 ...
-        On entry,  ...  is the list of arguments to be printed within
-        the format string.
-
- -

See Also

-HPL_pabort, -HPL_fprintf. - - - diff --git a/hpl/www/HPL_rand.html b/hpl/www/HPL_rand.html deleted file mode 100755 index 00bb2a35f438d5ac1c9f9560a9e802deedd0448e..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_rand.html +++ /dev/null @@ -1,40 +0,0 @@ - - -HPL_rand HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_rand random number generator. - -

Synopsis

-#include "hpl.h"

-double -HPL_rand(); - -

Description

-HPL_rand -generates the next number in the random sequence. This -function ensures that this number lies in the interval (-0.5, 0.5]. - -The static array irand contains the information (2 integers) required -to generate the next number in the sequence X(n). This number is -computed as X(n) = (2^32 * irand[1] + irand[0]) / d - 0.5, where the -constant d is the largest 64 bit positive integer. The array irand is -then updated for the generation of the next number X(n+1) in the -random sequence as follows X(n+1) = a * X(n) + c. The constants a and -c should have been preliminarily stored in the arrays ias and ics as -2 pairs of integers. The initialization of ias, ics and irand is -performed by the function HPL_setran. - -

See Also

-HPL_ladd, -HPL_lmul, -HPL_setran, -HPL_xjumpm, -HPL_jumpit. - - - diff --git a/hpl/www/HPL_recv.html b/hpl/www/HPL_recv.html deleted file mode 100755 index ca3f1de2fe7d08cf0d2f3e580d36fbc9ce575783..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_recv.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_recv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_recv Receive a message. - -

Synopsis

-#include "hpl.h"

-int -HPL_recv( -double * -RBUF, -int -RCOUNT, -int -SRC, -int -RTAG, -MPI_Comm -COMM -); - -

Description

-HPL_recv -is a simple wrapper around MPI_Recv. Its main purpose is -to allow for some experimentation / tuning of this simple routine. -Successful completion is indicated by the returned error code -HPL_SUCCESS. In the case of messages of length less than or equal to -zero, this function returns immediately. - -

Arguments

-
-RBUF    (local output)                double *
-        On entry, RBUF specifies the starting address of buffer to be
-        received.
-
-
-RCOUNT  (local input)                 int
-        On entry,  RCOUNT  specifies  the number  of double precision
-        entries in RBUF. RCOUNT must be at least zero.
-
-
-SRC     (local input)                 int
-        On entry, SRC  specifies the rank of the  sending  process in
-        the communication space defined by COMM.
-
-
-RTAG    (local input)                 int
-        On entry,  STAG specifies the message tag to be used for this
-        communication operation.
-
-
-COMM    (local input)                 MPI_Comm
-        The MPI communicator identifying the communication space.
-
- -

See Also

-HPL_send, -HPL_sdrv. - - - diff --git a/hpl/www/HPL_reduce.html b/hpl/www/HPL_reduce.html deleted file mode 100755 index 5345a70dc7010920f595ec94b51b282aa4488ab3..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_reduce.html +++ /dev/null @@ -1,75 +0,0 @@ - - -HPL_reduce HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_reduce Reduce operation. - -

Synopsis

-#include "hpl.h"

-int -HPL_reduce( -void * -BUFFER, -const int -COUNT, -const HPL_T_TYPE -DTYPE, -const HPL_T_OP -OP, -const int -ROOT, -MPI_Comm -COMM -); - -

Description

-HPL_reduce -performs a global reduce operation across all processes of -a group. Note that the input buffer is used as workarray and in all -processes but the accumulating process corrupting the original data. - -

Arguments

-
-BUFFER  (local input/output)          void *
-        On entry,  BUFFER  points to  the  buffer to be  reduced.  On
-        exit,  and  in process of rank  ROOT  this array contains the
-        reduced data.  This  buffer  is also used as workspace during
-        the operation in the other processes of the group.
-
-
-COUNT   (global input)                const int
-        On entry,  COUNT  indicates the number of entries in  BUFFER.
-        COUNT must be at least zero.
-
-
-DTYPE   (global input)                const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
-
-OP      (global input)                const HPL_T_OP 
-        On entry, OP is a pointer to the local combine function.
-
-
-ROOT    (global input)                const int
-        On entry, ROOT is the coordinate of the accumulating process.
-
-
-COMM    (global/local input)          MPI_Comm
-        The MPI communicator identifying the process collection.
-
- -

See Also

-HPL_broadcast, -HPL_all_reduce, -HPL_barrier, -HPL_min, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_rollN.html b/hpl/www/HPL_rollN.html deleted file mode 100755 index dd4933ac2b87cd4d90b94feae3a9242e89ecf8ac..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_rollN.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_rollN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_rollN Roll U and forward the column panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_rollN( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -N, -double * -U, -const int -LDU, -const int * -IPLEN, -const int * -IPMAP, -const int * -IPMAPM1 -); - -

Description

-HPL_rollN -rolls the local arrays containing the local pieces of U, so -that on exit to this function U is replicated in every process row. -In addition, this function probe for the presence of the column panel -and forwards it when available. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be rolled) information.
-
-
-N       (local input)                 const int
-        On entry, N specifies the number of columns of  U.  N must be
-        at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  is an array of dimension (LDU,*) containing the
-        local pieces of U in each process row.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the local leading dimension of U. LDU
-        should be at least  MAX(1,IPLEN[NPROW]).
-
-
-IPLEN   (global input)                const int *
-        On entry, IPLEN is an array of dimension NPROW+1.  This array
-        is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U
-        in each process row.
-
-
-IPMAP   (global input)                const int *
-        On entry, IMAP  is an array of dimension  NPROW.  This  array
-        contains  the  logarithmic mapping of the processes. In other
-        words,  IMAP[myrow]  is the absolute coordinate of the sorted
-        process.
-
-
-IPMAPM1 (global input)                const int *
-        On entry,  IMAPM1  is an array of dimension NPROW. This array
-        contains  the inverse of the logarithmic mapping contained in
-        IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i.
-
- -

See Also

-HPL_pdlaswp01N. - - - diff --git a/hpl/www/HPL_rollT.html b/hpl/www/HPL_rollT.html deleted file mode 100755 index 15d7991b0d02ac4a4d623db9a801391f4a6f6560..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_rollT.html +++ /dev/null @@ -1,99 +0,0 @@ - - -HPL_rollT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_rollT Roll U and forward the column panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_rollT( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const int -N, -double * -U, -const int -LDU, -const int * -IPLEN, -const int * -IPMAP, -const int * -IPMAPM1 -); - -

Description

-HPL_rollT -rolls the local arrays containing the local pieces of U, so -that on exit to this function U is replicated in every process row. -In addition, this function probe for the presence of the column panel -and forwards it when available. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be rolled) information.
-
-
-N       (local input)                 const int
-        On entry, N specifies the local number of rows of  U.  N must
-        be at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  is an array of dimension (LDU,*) containing the
-        local pieces of U in each process row.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the local leading dimension of U. LDU
-        should be at least  MAX(1,N).
-
-
-IPLEN   (global input)                const int *
-        On entry, IPLEN is an array of dimension NPROW+1.  This array
-        is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U
-        in each process row.
-
-
-IPMAP   (global input)                const int *
-        On entry, IMAP  is an array of dimension  NPROW.  This  array
-        contains  the  logarithmic mapping of the processes. In other
-        words,  IMAP[myrow]  is the absolute coordinate of the sorted
-        process.
-
-
-IPMAPM1 (global input)                const int *
-        On entry,  IMAPM1  is an array of dimension NPROW. This array
-        contains  the inverse of the logarithmic mapping contained in
-        IMAP: For i in [0.. NPROW) IMAPM1[IMAP[i]] = i.
-
- -

See Also

-HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_sdrv.html b/hpl/www/HPL_sdrv.html deleted file mode 100755 index f22503a6ef771d4a090b5abd167db7cafa61f6aa..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_sdrv.html +++ /dev/null @@ -1,88 +0,0 @@ - - -HPL_sdrv HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_sdrv Send and receive a message. - -

Synopsis

-#include "hpl.h"

-int -HPL_sdrv( -double * -SBUF, -int -SCOUNT, -int -STAG, -double * -RBUF, -int -RCOUNT, -int -RTAG, -int -PARTNER, -MPI_Comm -COMM -); - -

Description

-HPL_sdrv -is a simple wrapper around MPI_Sendrecv. Its main purpose is -to allow for some experimentation and tuning of this simple function. -Messages of length less than or equal to zero are not sent nor -received. Successful completion is indicated by the returned error -code HPL_SUCCESS. - -

Arguments

-
-SBUF    (local input)                 double *
-        On entry, SBUF specifies the starting address of buffer to be
-        sent.
-
-
-SCOUNT  (local input)                 int
-        On entry,  SCOUNT  specifies  the number  of double precision
-        entries in SBUF. SCOUNT must be at least zero.
-
-
-STAG    (local input)                 int
-        On entry,  STAG  specifies the message tag to be used for the
-        sending communication operation.
-
-
-RBUF    (local output)                double *
-        On entry, RBUF specifies the starting address of buffer to be
-        received.
-
-
-RCOUNT  (local input)                 int
-        On entry,  RCOUNT  specifies  the number  of double precision
-        entries in RBUF. RCOUNT must be at least zero.
-
-
-RTAG    (local input)                 int
-        On entry,  RTAG  specifies the message tag to be used for the
-        receiving communication operation.
-
-
-PARTNER (local input)                 int
-        On entry,  PARTNER  specifies  the rank of the  collaborative
-        process in the communication space defined by COMM.
-
-
-COMM    (local input)                 MPI_Comm
-        The MPI communicator identifying the communication space.
-
- -

See Also

-HPL_send, -HPL_recv. - - - diff --git a/hpl/www/HPL_send.html b/hpl/www/HPL_send.html deleted file mode 100755 index 2c9baf66426c441ae17c9cc9086e60b56d8dfc16..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_send.html +++ /dev/null @@ -1,67 +0,0 @@ - - -HPL_send HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_send Send a message. - -

Synopsis

-#include "hpl.h"

-int -HPL_send( -double * -SBUF, -int -SCOUNT, -int -DEST, -int -STAG, -MPI_Comm -COMM -); - -

Description

-HPL_send -is a simple wrapper around MPI_Send. Its main purpose is -to allow for some experimentation / tuning of this simple routine. -Successful completion is indicated by the returned error code -MPI_SUCCESS. In the case of messages of length less than or equal to -zero, this function returns immediately. - -

Arguments

-
-SBUF    (local input)                 double *
-        On entry, SBUF specifies the starting address of buffer to be
-        sent.
-
-
-SCOUNT  (local input)                 int
-        On entry,  SCOUNT  specifies  the number of  double precision
-        entries in SBUF. SCOUNT must be at least zero.
-
-
-DEST    (local input)                 int
-        On entry, DEST specifies the rank of the receiving process in
-        the communication space defined by COMM.
-
-
-STAG    (local input)                 int
-        On entry,  STAG specifies the message tag to be used for this
-        communication operation.
-
-
-COMM    (local input)                 MPI_Comm
-        The MPI communicator identifying the communication space.
-
- -

See Also

-HPL_recv, -HPL_sdrv. - - - diff --git a/hpl/www/HPL_setran.html b/hpl/www/HPL_setran.html deleted file mode 100755 index d47dfc25e164bb99e0c3dbad3f125e1ad6895783..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_setran.html +++ /dev/null @@ -1,52 +0,0 @@ - - -HPL_setran HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_setran Manage the random number generator. - -

Synopsis

-#include "hpl.h"

-void -HPL_setran( -const int -OPTION, -int * -IRAN -); - -

Description

-HPL_setran -initializes the random generator with the encoding of the -first number X(0) in the sequence, and the constants a and c used to -compute the next element in the sequence: X(n+1) = a*X(n) + c. X(0), -a and c are stored in the static variables irand, ias and ics. When -OPTION is 0 (resp. 1 and 2), irand (resp. ia and ic) is set to the -values of the input array IRAN. When OPTION is 3, IRAN is set to the -current value of irand, and irand is then incremented. - -

Arguments

-
-OPTION  (local input)                 const int
-        On entry, OPTION  is an integer that specifies the operations
-        to be performed on the random generator as specified above.
-
-
-IRAN    (local input/output)          int *
-        On entry,  IRAN is an array of dimension 2, that contains the
-        16-lower and 15-higher bits of a random number.
-
- -

See Also

-HPL_ladd, -HPL_lmul, -HPL_xjumpm, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/HPL_spreadN.html b/hpl/www/HPL_spreadN.html deleted file mode 100755 index da131460e46446237e77430b1f12dbd81422b993..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_spreadN.html +++ /dev/null @@ -1,120 +0,0 @@ - - -HPL_spreadN HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_spreadN Spread row panel U and forward current column panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_spreadN( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const enum HPL_SIDE -SIDE, -const int -N, -double * -U, -const int -LDU, -const int -SRCDIST, -const int * -IPLEN, -const int * -IPMAP, -const int * -IPMAPM1 -); - -

Description

-HPL_spreadN -spreads the local array containing local pieces of U, so -that on exit to this function, a piece of U is contained in every -process row. The array IPLEN contains the number of rows of U, that -should be spread on any given process row. This function also probes -for the presence of the column panel PBCST. In case of success, this -panel will be forwarded. If PBCST is NULL on input, this probing -mechanism will be disabled. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be spread) information.
-
-
-SIDE    (global input)                const enum HPL_SIDE
-        On entry, SIDE specifies whether the local piece of U located
-        in process IPMAP[SRCDIST] should be spread to the right or to
-        the left. This feature is used by the equilibration process.
-
-
-N       (global input)                const int
-        On entry,  N  specifies  the  local number of columns of U. N
-        must be at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  is an array of dimension (LDU,*) containing the
-        local pieces of U.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the local leading dimension of U. LDU
-        should be at least MAX(1,IPLEN[nprow]).
-
-
-SRCDIST (local input)                 const int
-        On entry,  SRCDIST  specifies the source process that spreads
-        its piece of U.
-
-
-IPLEN   (global input)                const int *
-        On entry, IPLEN is an array of dimension NPROW+1.  This array
-        is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U
-        in each process before process IPMAP[i], with the  convention
-        that IPLEN[nprow] is the total number of rows. In other words
-        IPLEN[i+1] - IPLEN[i]  is  the local number of rows of U that
-        should be moved to process IPMAP[i].
-
-
-IPMAP   (global input)                const int *
-        On entry, IPMAP is an array of dimension  NPROW.  This  array
-        contains  the  logarithmic mapping of the processes. In other
-        words, IPMAP[myrow]  is the absolute coordinate of the sorted
-        process.
-
-
-IPMAPM1 (global input)                const int *
-        On entry,  IPMAPM1 is an array of dimension NPROW. This array
-        contains  the inverse of the logarithmic mapping contained in
-        IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i.
-
- -

See Also

-HPL_pdlaswp01N. - - - diff --git a/hpl/www/HPL_spreadT.html b/hpl/www/HPL_spreadT.html deleted file mode 100755 index 43ab3861a2609373e824e541d61adff3d62d3643..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_spreadT.html +++ /dev/null @@ -1,120 +0,0 @@ - - -HPL_spreadT HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_spreadT Spread row panel U and forward current column panel. - -

Synopsis

-#include "hpl.h"

-void -HPL_spreadT( -HPL_T_panel * -PBCST, -int * -IFLAG, -HPL_T_panel * -PANEL, -const enum HPL_SIDE -SIDE, -const int -N, -double * -U, -const int -LDU, -const int -SRCDIST, -const int * -IPLEN, -const int * -IPMAP, -const int * -IPMAPM1 -); - -

Description

-HPL_spreadT -spreads the local array containing local pieces of U, so -that on exit to this function, a piece of U is contained in every -process row. The array IPLEN contains the number of columns of U, -that should be spread on any given process row. This function also -probes for the presence of the column panel PBCST. If available, -this panel will be forwarded. If PBCST is NULL on input, this -probing mechanism will be disabled. - -

Arguments

-
-PBCST   (local input/output)          HPL_T_panel *
-        On entry,  PBCST  points to the data structure containing the
-        panel (to be broadcast) information.
-
-
-IFLAG   (local input/output)          int *
-        On entry, IFLAG  indicates  whether or not  the broadcast has
-        already been completed.  If not,  probing will occur, and the
-        outcome will be contained in IFLAG on exit.
-
-
-PANEL   (local input/output)          HPL_T_panel *
-        On entry,  PANEL  points to the data structure containing the
-        panel (to be spread) information.
-
-
-SIDE    (global input)                const enum HPL_SIDE
-        On entry, SIDE specifies whether the local piece of U located
-        in process IPMAP[SRCDIST] should be spread to the right or to
-        the left. This feature is used by the equilibration process.
-
-
-N       (global input)                const int
-        On entry,  N  specifies the local number of rows of U. N must
-        be at least zero.
-
-
-U       (local input/output)          double *
-        On entry,  U  is an array of dimension (LDU,*) containing the
-        local pieces of U.
-
-
-LDU     (local input)                 const int
-        On entry, LDU specifies the local leading dimension of U. LDU
-        should be at least MAX(1,N).
-
-
-SRCDIST (local input)                 const int
-        On entry,  SRCDIST  specifies the source process that spreads
-        its piece of U.
-
-
-IPLEN   (global input)                const int *
-        On entry, IPLEN is an array of dimension NPROW+1.  This array
-        is such that IPLEN[i+1] - IPLEN[i] is the number of rows of U
-        in each process before process IPMAP[i], with the  convention
-        that IPLEN[nprow] is the total number of rows. In other words
-        IPLEN[i+1] - IPLEN[i]  is  the local number of rows of U that
-        should be moved to process IPMAP[i].
-
-
-IPMAP   (global input)                const int *
-        On entry, IPMAP is an array of dimension  NPROW.  This  array
-        contains  the  logarithmic mapping of the processes. In other
-        words, IPMAP[myrow]  is the absolute coordinate of the sorted
-        process.
-
-
-IPMAPM1 (global input)                const int *
-        On entry,  IPMAPM1 is an array of dimension NPROW. This array
-        contains  the inverse of the logarithmic mapping contained in
-        IPMAP: For i in [0.. NPROW) IPMAPM1[IPMAP[i]] = i.
-
- -

See Also

-HPL_pdlaswp01T. - - - diff --git a/hpl/www/HPL_sum.html b/hpl/www/HPL_sum.html deleted file mode 100755 index adea5e50969892f79a8689935dc666c6711405f7..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_sum.html +++ /dev/null @@ -1,61 +0,0 @@ - - -HPL_sum HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_sum Combine (sum) two buffers. - -

Synopsis

-#include "hpl.h"

-void -HPL_sum( -const int -N, -const void * -IN, -void * -INOUT, -const HPL_T_TYPE -DTYPE -); - -

Description

-HPL_sum -combines (sum) two buffers. - -

Arguments

-
-N       (input)                       const int
-        On entry, N  specifies  the  length  of  the  buffers  to  be
-        combined. N must be at least zero.
-
-
-IN      (input)                       const void *
-        On entry, IN points to the input-only buffer to be combined.
-
-
-INOUT   (input/output)                void *
-        On entry, INOUT  points  to  the  input-output  buffer  to be
-        combined.  On exit,  the  entries of this array contains  the
-        combined results.
-
-
-DTYPE   (input)                       const HPL_T_TYPE
-        On entry,  DTYPE  specifies the type of the buffers operands.
-
- -

See Also

-HPL_broadcast, -HPL_reduce, -HPL_all_reduce, -HPL_barrier, -HPL_min, -HPL_max, -HPL_sum. - - - diff --git a/hpl/www/HPL_timer.html b/hpl/www/HPL_timer.html deleted file mode 100755 index ec79e7a0437eb6f9991ce7f78536c368b7725679..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_timer.html +++ /dev/null @@ -1,49 +0,0 @@ - - -HPL_timer HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_timer Timer facility. - -

Synopsis

-#include "hpl.h"

-void -HPL_timer( -const int -I -); - -

Description

-HPL_timer -provides a "stopwatch" functionality cpu/wall timer in -seconds. Up to 64 separate timers can be functioning at once. The -first call starts the timer, and the second stops it. This routine -can be disenabled by calling HPL_timer_disable(), so that calls to -the timer are ignored. This feature can be used to make sure certain -sections of code do not affect timings, even if they call routines -which have HPL_timer calls in them. HPL_timer_enable() will re-enable -the timer functionality. One can retrieve the current value of a -timer by calling - -t0 = HPL_timer_inquire( HPL_WALL_TIME | HPL_CPU_TIME, I ) - -where I is the timer index in [0..64). To initialize the timer -functionality, one must have called HPL_timer_boot() prior to any of -the functions mentioned above. - -

Arguments

-
-I       (global input)                const int
-        On entry, I specifies the timer to stop/start.
-
- -

See Also

-HPL_timer_cputime, -HPL_timer_walltime. - - - diff --git a/hpl/www/HPL_timer_cputime.html b/hpl/www/HPL_timer_cputime.html deleted file mode 100755 index be2309dc9b89f541e6bbb8e1272fc2fd9603ddf2..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_timer_cputime.html +++ /dev/null @@ -1,35 +0,0 @@ - - -HPL_timer_cputime HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_timer_cputime Return the CPU time. - -

Synopsis

-#include "hpl.h"

-double -HPL_timer_cputime(); - -

Description

-HPL_timer_cputime -returns the cpu time. If HPL_USE_CLOCK is defined, -the clock() function is used to return an approximation of processor -time used by the program. The value returned is the CPU time used so -far as a clock_t; to get the number of seconds used, the result is -divided by CLOCKS_PER_SEC. This function is part of the ANSI/ISO C -standard library. If HPL_USE_TIMES is defined, the times() function -is used instead. This function returns the current process times. -times() returns the number of clock ticks that have elapsed since the -system has been up. Otherwise and by default, the standard library -function getrusage() is used. - -

See Also

-HPL_timer_walltime, -HPL_timer. - - - diff --git a/hpl/www/HPL_timer_walltime.html b/hpl/www/HPL_timer_walltime.html deleted file mode 100755 index 20d5eb1a3c7731f46a605a008a997c1990ff2d53..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_timer_walltime.html +++ /dev/null @@ -1,26 +0,0 @@ - - -HPL_timer_walltime HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_timer_walltime Return the elapsed (wall-clock) time. - -

Synopsis

-#include "hpl.h"

-double -HPL_timer_walltime(); - -

Description

-HPL_timer_walltime -returns the elapsed (wall-clock) time. - -

See Also

-HPL_timer_cputime, -HPL_timer. - - - diff --git a/hpl/www/HPL_warn.html b/hpl/www/HPL_warn.html deleted file mode 100755 index fe50c271e04dec719c466d70a10c3335ec58efdb..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_warn.html +++ /dev/null @@ -1,74 +0,0 @@ - - -HPL_warn HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_warn displays an error message. - -

Synopsis

-#include "hpl.h"

-void -HPL_warn( -FILE * -STREAM, -int -LINE, -const char * -SRNAME, -const char * -FORM, -... -); - -

Description

-HPL_warn -displays an error message. - -

Arguments

-
-STREAM  (local input)                 FILE *
-        On entry, STREAM specifies the output stream.
-
-
-LINE    (local input)                 int
-        On entry,  LINE  specifies the line  number in the file where
-        the  error  has  occured.  When  LINE  is not a positive line
-        number, it is ignored.
-
-
-SRNAME  (local input)                 const char *
-        On entry, SRNAME  should  be the name of the routine  calling
-        this error handler.
-
-
-FORM    (local input)                 const char *
-        On entry, FORM specifies the format, i.e., how the subsequent
-        arguments are converted for output.
-
-
-        (local input)                 ...
-        On entry,  ...  is the list of arguments to be printed within
-        the format string.
-
- -

Example

-#include "hpl.h"

-
-int main(int argc, char *argv[])
-{
-   HPL_warn( stderr, __LINE__, __FILE__,
-             "Demo.\n" );
-   exit(0); return(0);
-}
-
- -

See Also

-HPL_abort, -HPL_fprintf. - - - diff --git a/hpl/www/HPL_xjumpm.html b/hpl/www/HPL_xjumpm.html deleted file mode 100755 index 51620f7d33f8eabcfbf4fab451008d87ae4b42f1..0000000000000000000000000000000000000000 --- a/hpl/www/HPL_xjumpm.html +++ /dev/null @@ -1,97 +0,0 @@ - - -HPL_xjumpm HPL 2.1 Library Functions October 26, 2012 - - - - -

Name

-HPL_xjumpm Compute constants to jump in the random sequence. - -

Synopsis

-#include "hpl.h"

-void -HPL_xjumpm( -const int -JUMPM, -int * -MULT, -int * -IADD, -int * -IRANN, -int * -IRANM, -int * -IAM, -int * -ICM -); - -

Description

-HPL_xjumpm -computes the constants A and C to jump JUMPM numbers in -the random sequence: X(n+JUMPM) = A*X(n)+C. The constants encoded in -MULT and IADD specify how to jump from one entry in the sequence to -the next. - -

Arguments

-
-JUMPM   (local input)                 const int
-        On entry,  JUMPM  specifies  the  number  of entries  in  the
-        sequence to jump over. When JUMPM is less or equal than zero,
-        A and C are not computed, IRANM is set to IRANN corresponding
-        to a jump of size zero.
-
-
-MULT    (local input)                 int *
-        On entry, MULT is an array of dimension 2,  that contains the
-        16-lower  and 15-higher bits of the constant  a  to jump from
-        X(n) to X(n+1) = a*X(n) + c in the random sequence.
-
-
-IADD    (local input)                 int *
-        On entry, IADD is an array of dimension 2,  that contains the
-        16-lower  and 15-higher bits of the constant  c  to jump from
-        X(n) to X(n+1) = a*X(n) + c in the random sequence.
-
-
-IRANN   (local input)                 int *
-        On entry, IRANN is an array of dimension 2. that contains the
-        16-lower and 15-higher bits of the encoding of X(n).
-
-
-IRANM   (local output)                int *
-        On entry,  IRANM  is an array of dimension 2.   On exit, this
-        array  contains respectively  the 16-lower and 15-higher bits
-        of the encoding of X(n+JUMPM).
-
-
-IAM     (local output)                int *
-        On entry, IAM is an array of dimension 2. On exit, when JUMPM
-        is  greater  than  zero,  this  array  contains  the  encoded
-        constant  A  to jump from  X(n) to  X(n+JUMPM)  in the random
-        sequence. IAM(0:1)  contains  respectively  the  16-lower and
-        15-higher  bits  of this constant  A. When  JUMPM  is less or
-        equal than zero, this array is not referenced.
-
-
-ICM     (local output)                int *
-        On entry, ICM is an array of dimension 2. On exit, when JUMPM
-        is  greater  than  zero,  this  array  contains  the  encoded
-        constant  C  to jump from  X(n)  to  X(n+JUMPM) in the random
-        sequence. ICM(0:1)  contains  respectively  the  16-lower and
-        15-higher  bits  of this constant  C. When  JUMPM  is less or
-        equal than zero, this array is not referenced.
-
- -

See Also

-HPL_ladd, -HPL_lmul, -HPL_setran, -HPL_jumpit, -HPL_rand. - - - diff --git a/hpl/www/algorithm.html b/hpl/www/algorithm.html deleted file mode 100755 index 9b1d7222e8b330fc172472cf358d9b17d77cbd8f..0000000000000000000000000000000000000000 --- a/hpl/www/algorithm.html +++ /dev/null @@ -1,299 +0,0 @@ - - -HPL Algorithm - - - - -

HPL Algorithm

- - -This page provides a high-level description of the algorithm used in -this package. As indicated below, HPL contains in fact many possible -variants for various operations. Defaults could have been chosen, or -even variants could be selected during the execution. Due to the -performance requirements, it was decided to leave the user with the -opportunity of choosing, so that an "optimal" set of parameters could -easily be experimentally determined for a given machine configuration. -From a numerical accuracy point of view, all possible -combinations are rigorously equivalent to each other even though the -result may slightly differ (bit-wise). -

- - -
- -

Main Algorithm

- -This software package solves a linear system of order n: A x = b by -first computing the LU factorization with row partial pivoting of the -n-by-n+1 coefficient matrix [A b] = [[L,U] y]. Since the lower triangular -factor L is applied to b as the factorization progresses, the solution x -is obtained by solving the upper triangular system U x = y. The lower -triangular matrix L is left unpivoted and the array of pivots is not -returned.

- - - - - - -
-The data is distributed onto a two-dimensional P-by-Q grid of processes -according to the block-cyclic scheme to ensure "good" load balance -as well as the scalability of the algorithm. The n-by-n+1 coefficient -matrix is first logically partitioned into nb-by-nb blocks, that are -cyclically "dealt" onto the P-by-Q process grid. This is done in both -dimensions of the matrix.
- - - - - -
-The right-looking variant has been chosen for the main loop of the LU -factorization. This means that at each iteration of the loop a panel of -nb columns is factorized, and the trailing submatrix is updated. Note -that this computation is thus logically partitioned with the same block -size nb that was used for the data distribution.
-
- -

Panel Factorization

- - - - - - -
-At a given iteration of the main loop, and because of the cartesian -property of the distribution scheme, each panel factorization occurs in -one column of processes. This particular part of the computation lies -on the critical path of the overall algorithm. The user is offered the -choice of three (Crout, left- and right-looking) matrix-multiply based -recursive variants. The software also allows the user to choose in how -many sub-panels the current panel should be divided into during the -recursion. Furthermore, one can also select at run-time the recursion -stopping criterium in terms of the number of columns left to factorize. -When this threshold is reached, the sub-panel will then be factorized -using one of the three Crout, left- or right-looking matrix-vector based -variant. Finally, for each panel column the pivot search, the associated -swap and broadcast operation of the pivot row are combined into one -single communication step. A binary-exchange (leave-on-all) reduction -performs these three operations at once.
-
- -

Panel Broadcast

- -Once the panel factorization has been computed, this panel of columns -is broadcast to the other process columns. There are many possible -broadcast algorithms and the software currently offers 6 variants to -choose from. These variants are described below assuming that process 0 -is the source of the broadcast for convenience. "->" means "sends to". -
    -
  • Increasing-ring: 0 -> 1; 1 -> 2; 2 -> 3 and so on. -This algorithm is the classic one; it has the caveat that process 1 has -to send a message. -
    - -
    - -
  • Increasing-ring (modified): 0 -> 1; 0 -> 2; 2 -> 3 -and so on. Process 0 sends two messages and process 1 only receives one -message. This algorithm is almost always better, if not the best. -
    - -
    - -
  • Increasing-2-ring: The Q processes are divided into -two parts: 0 -> 1 and 0 -> Q/2; Then processes 1 and Q/2 act as sources -of two rings: 1 -> 2, Q/2 -> Q/2+1; 2 -> 3, Q/2+1 -> to Q/2+2 and so on. -This algorithm has the advantage of reducing the time by which the last -process will receive the panel at the cost of process 0 sending 2 -messages. -
    - -
    - -
  • Increasing-2-ring (modified): As one may expect, -first 0 -> 1, then the Q-1 processes left are divided into two equal -parts: 0 -> 2 and 0 -> Q/2; Processes 2 and Q/2 act then as sources of -two rings: 2 -> 3, Q/2 -> Q/2+1; 3 -> 4, Q/2+1 -> to Q/2+2 and so on. -This algorithm is probably the most serious competitor to the increasing -ring modified variant. -
    - -
    - -
  • Long (bandwidth reducing): as opposed to the -previous variants, this algorithm and its follower synchronize all -processes involved in the operation. The message is chopped into Q equal -pieces that are scattered across the Q processes. -
    - -
    -The pieces are then rolled in Q-1 steps. The scatter phase uses a binary -tree and the rolling phase exclusively uses mutual message exchanges. In -odd steps 0 <-> 1, 2 <-> 3, 4 <-> 5 and so on; in even steps Q-1 <-> 0, -1 <-> 2, 3 <-> 4, 5 <-> 6 and so on. -
    - -
    -More messages are exchanged, however the total volume of communication is -independent of Q, making this algorithm particularly suitable for large -messages. This algorithm becomes competitive when the nodes are "very -fast" and the network (comparatively) "very slow".

    - -
  • Long (bandwidth reducing modified): same as above, -except that 0 -> 1 first, and then the Long variant is used on processes -0,2,3,4 .. Q-1.

    -
    - - -
    - -
- -The rings variants are distinguished by a probe mechanism that activates -them. In other words, a process involved in the broadcast and different -from the source asynchronously probes for the message to receive. When -the message is available the broadcast proceeds, and otherwise the -function returns. This allows to interleave the broadcast operation with -the update phase. This contributes to reduce the idle time spent by those -processes waiting for the factorized panel. This mechanism is necessary -to accomodate for various computation/communication performance ratio.

-
- -

Look-ahead

- -Once the panel has been broadcast or say during this broadcast operation, -the trailing submatrix is updated using the last panel in the look-ahead -pipe: as mentioned before, the panel factorization lies on the critical -path, which means that when the kth panel has been factorized and then -broadcast, the next most urgent task to complete is the factorization and -broadcast of the k+1 th panel. This technique is often refered to as -"look-ahead" or "send-ahead" in the literature. This package allows to -select various "depth" of look-ahead. By convention, a depth of zero -corresponds to no lookahead, in which case the trailing submatrix is -updated by the panel currently broadcast. Look-ahead consumes some extra -memory to essentially keep all the panels of columns currently in the -look-ahead pipe. A look-ahead of depth 1 (maybe 2) is likely to achieve -the best performance gain.

-
- -

Update

- -The update of the trailing submatrix by the last panel in the look-ahead -pipe is made of two phases. First, the pivots must be applied to form the -current row panel U. U should then be solved by the upper triangle of the -column panel. U finally needs to be broadcast to each process row so that -the local rank-nb update can take place. We choose to combine the -swapping and broadcast of U at the cost of replicating the solve. Two -algorithms are available for this communication operation. -
    -
  • Binary-exchange: this is a modified variant of the -binary-exchange (leave on all) reduction operation. Every process column -performs the same operation. The algorithm essentially works as follows. -It pretends reducing the row panel U, but at the beginning the only valid -copy is owned by the current process row. The other process rows will -contribute rows of A they own that should be copied in U and replace them -with rows that were originally in the current process row. The complete -operation is performed in log(P) steps. For the sake of simplicity, let -assume that P is a power of two. At step k, process row p exchanges a -message with process row p+2^k. There are essentially two cases. First, -one of those two process rows has received U in a previous step. The -exchange occurs. One process swaps its local rows of A into U. Both -processes copy in U remote rows of A. Second, none of those process rows -has received U, the exchange occurs, and both processes simply add those -remote rows to the list they have accumulated so far. At each step, a -message of the size of U is exchanged by at least one pair of process -rows.

    - -
  • Long: this is a bandwidth reducing variant -accomplishing the same task. The row panel is first spread (using a tree) -among the process rows with respect to the pivot array. This is a scatter -(V variant for MPI users). Locally, every process row then swaps these -rows with the the rows of A it owns and that belong to U. These buffers -are then rolled (P-1 steps) to finish the broadcast of U. Every process -row permutes U and proceed with the computational part of the update. A -couple of notes: process rows are logarithmically sorted before -spreading, so that processes receiving the largest number of rows are -first in the tree. This makes the communication volume optimal for this -phase. Finally, before rolling and after the local swap, an equilibration -phase occurs during which the local pieces of U are uniformly spread -across the process rows. A tree-based algorithm is used. This operation -is necessary to keep the rolling phase optimal even when the pivot rows -are not equally distributed in process rows. This algorithm has a -complexity in terms of communication volume that solely depends on the -size of U. In particular, the number of process rows only impacts the -number of messages exchanged. It will thus outperforms the previous -variant for large problems on large machine configurations.

    - -
- -The user can select any of the two variants above. In addition, a mix is -possible as well. The "binary-exchange" algorithm will be used when U -contains at most a certain number of columns. Choosing at least the block -size nb as the threshold value is clearly recommended when look-ahead is -on.

-
- -

Backward Substitution

- -The factorization has just now ended, the back-substitution remains to be -done. For this, we choose a look-ahead of depth one variant. The -right-hand-side is forwarded in process rows in a decreasing-ring -fashion, so that we solve Q * nb entries at a time. At each step, this -shrinking piece of the right-hand-side is updated. The process just above -the one owning the current diagonal block of the matrix A updates first -its last nb piece of x, forwards it to the previous process column, then -broadcast it in the process column in a decreasing-ring fashion as well. -The solution is then updated and sent to the previous process column. The -solution of the linear system is left replicated in every process row.

-
- -

Checking the Solution

- -To verify the result obtained, the input matrix and right-hand side are -regenerated. The normwise backward error (see formula below) is then -computed. A solution is considered as "numerically correct" when this -quantity is less than a threshold value of the order of 1.0. In the -expression below, eps is the relative (distributed-memory) machine -precision. - -
    -
  • || Ax - b ||_oo / ( eps * ( || A ||_oo * || x ||_oo + || b ||_oo ) * n ) -
- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/aprunner.gif b/hpl/www/aprunner.gif deleted file mode 100755 index 6508c806fa3f4f1de47b57225044c130cee1f765..0000000000000000000000000000000000000000 Binary files a/hpl/www/aprunner.gif and /dev/null differ diff --git a/hpl/www/copyright.html b/hpl/www/copyright.html deleted file mode 100755 index 934282c814fb203c635cbf6746f2e408a6a0e314..0000000000000000000000000000000000000000 --- a/hpl/www/copyright.html +++ /dev/null @@ -1,66 +0,0 @@ - - -HPL Copyright and Licensing Terms - - - - -

HPL Copyright Notice and Licensing Terms

- -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions -are met: -
    -
  1. Redistributions of source code must retain the above copyright -notice, this list of conditions and the following disclaimer. -
  2. Redistributions in binary form must reproduce the above copyright -notice, this list of conditions, and the following disclaimer in the -documentation and/or other materials provided with the distribution. -
  3. All advertising materials mentioning features or use of this -software must display the following acknowledgement: This product -includes software developed at the University of Tennessee, -Knoxville, Innovative Computing Laboratory. -
  4. The name of the University, the name of the Laboratory, or the -names of its contributors may not be used to endorse or promote -products derived from this software without specific written -permission. -
- -

Disclaimer

- -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -`AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY -OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/documentation.html b/hpl/www/documentation.html deleted file mode 100755 index 152188041bf247239411eb7374243773123ba64a..0000000000000000000000000000000000000000 --- a/hpl/www/documentation.html +++ /dev/null @@ -1,304 +0,0 @@ - - -HPL Documentation - - - - -

HPL Documentation

- -The HPL software distribution comes with a set of text files explaining -how to install, run and tune the software. These files reside in the top -level directory and their names are in upper case. To a large extent, -this page reproduces them. In addition, man- and HTML-pages are provided -for every routine in the package. To access the man pages, one must add -hpl/man to its MANPATH environment variable. The HTML pages can be -accessed on this site, or by pointing your browser to your local hpl/www -directory. Finally, the source code has been heavily documented. Despite -all the other documentation efforts, the source code remains the most -trustworthy and truthful piece of information about what goes on in HPL. -

- -

HPL Functions HTML Pages

- -Computational Kernels Wrappers When calling the Fortran -77 BLAS interface, these C functions allow to confine the C to Fortran -77 interface issues to a small subset of routines. - - - -
-
- -Local Auxiliaries Basic functionality, local swap functions. - - - -
-
- -Parallel Auxiliaries Index computations, parallel basic -functionality. - - - -
-
- -Grid Management Most of these routines have a direct -MPI equivalent. On new systems, when the entire MPI functionality is -not yet readily available, these functions are particularly convenient -since they rely on a mininal subset of the MPI standard. - - -
-
- -Panel Management - - -
-
- -Panel Factorization Recursive (matrix-multiply based) and -(matrix-vector based) panel factorization. - - -
-
- -Panel Broadcast - - -
-
- -Update - - -
-
- -Main Factorization / Look-ahead - - -
-
- -Backward Substitution - - -
-
- -Matrix generation A C version of the ScaLAPACK random -matrix generator with less functionality though. - -
-
- -Timers Sequential and parallel timing utilities. - -
-
- -Main Testing / Timing Driver - - -
- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/errata.html b/hpl/www/errata.html deleted file mode 100755 index 24275d2dd75d498f935957a010149c3f8f034147..0000000000000000000000000000000000000000 --- a/hpl/www/errata.html +++ /dev/null @@ -1,116 +0,0 @@ - - -HPL Errata-Bugs - - - - -

HPL Errata - Bugs

- -

Issues fixed in Version 2.1, October 26th, 2012

- -The output now reports exact time stamps before and after the -execution of the solver function pdgesv() was run. This could -allow for accurate accounting of running time for data center -management purposes. For example as reporting power -consumption. This is important for the Green500 project.

- -Fixed an out-of-bounds access to arrays in the HPL_spreadN() -and HPL_spreadT() functions. This may cause segmentation -fault signals. It was reported by Stephen Whalen from Cray.

- -

Issues fixed in Version 2.0, September 10th, 2008

- -Gregory Bauer found a problem size corresponding to the -periodicity of the pseudo-random matrix generator used in the -HPL timing program. This causes the LU factorization to -detect the singularity of the input matrix as it should have.

- -A problem size of 2^17 = 131072 causes columns 14 modulo 2^14 -(i.e. 16384) (starting from 0) to be bitwise identical on a -homogeneous platform. Every problem size being a power of 2 -and larger than 2^15 will feature a similar problem if one -searches far enough in the columns of the square input matrix.

- -The pseudo-random generator uses the linear congruential -algorithm: X(n+1) = (a * X(n) + c) mod m as described in the -Art of Computer Programming, Knuth 1973, Vol. 2. In the HPL -case, m is set to 2^31.

- -It is very important to realize that this issue is a problem -of the testing part of the HPL software. The numerical -properties of the algorithms used in the factorization and -the solve should not be questioned because of this. In fact, -this is just the opposite: the factorization demonstrated the -weakness of the testing part of the software by detecting the -singularity of the input matrix.

- -This issue of the testing program is not easy to fix. This -pseudo-random generator has very useful properties despite -this. It is thus currently recommended to HPL users willing -to test matrices of size larger than 2^15 to not use power -twos.

- -This issue has been fixed by changing the pseudo-random -matrix generator. Now the periodicity of the generator is -2^64.

- -

Issues fixed in Version 1.0b, December 15th, 2004

- -When the matrix size is such that one needs more than 16 GB -per MPI rank, the intermediate calculation (mat.ld+1) * -mat.nq in HPL_pdtest.c ends up overflowing because it is -done using 32-bit arithmetic. This issue has been fixed by -typecasting to size_t; Thanks to John Baron.

- -

Issues fixed in Version 1.0a, January 20th, 2004

- -The MPI process grid numbering scheme defaults now to row- -major ordering. This option can now be selected at run time.

- -The inlined assembly timer routine that was causing the -compilation to fail when using gcc version 3.3 and above has -been removed from the package.

- -Various building problems on the T3E have been fixed; Thanks -to Edward Anderson.

- -

Issues fixed in Version 1.0, September 27th, 2000

- -Due to a couple errors spotted in the VSIPL port of the -software, the distribution contained in the tar file of -September 9th, 2000 had been updated on September 27th, 2000 -with a corrected distribution. These problems were -not affecting in any way possible the BLAS version of the -software. If you are using the VSIPL port of HPL, -and want to make sure you are indeed using the latest -corrected version, please check the date contained in the -file HPL.build.log contained in the main directory.

- - - - -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/faqs.html b/hpl/www/faqs.html deleted file mode 100755 index ad853e7607e61f758aef834d9767490bf12206e1..0000000000000000000000000000000000000000 --- a/hpl/www/faqs.html +++ /dev/null @@ -1,126 +0,0 @@ - - -HPL Frequently Asked Questions - - - - -

HPL Frequently Asked Questions

- - -
- -

What problem size N should I run ?

- -In order to find out the best performance of your system, the -largest problem size fitting in memory is what you should aim for. -The amount of memory used by HPL is essentially the size of the -coefficient matrix. So for example, if you have 4 nodes with 256 Mb -of memory on each, this corresponds to 1 Gb total, i.e., 125 M double -precision (8 bytes) elements. The square root of that number is -11585. One definitely needs to leave some memory for the OS as well -as for other things, so a problem size of 10000 is likely to fit. As -a rule of thumb, 80 % of the total amount of memory is a good guess. -If the problem size you pick is too large, swapping will occur, and -the performance will drop. If multiple processes are spawn on each -node (say you have 2 processors per node), what counts is the -available amount of memory to each process.

-
- -

What block size NB should I use ?

- -HPL uses the block size NB for the data distribution as well as for -the computational granularity. From a data distribution point of -view, the smallest NB, the better the load balance. You definitely -want to stay away from very large values of NB. From a computation -point of view, a too small value of NB may limit the computational -performance by a large factor because almost no data reuse will occur -in the highest level of the memory hierarchy. The number of messages -will also increase. Efficient matrix-multiply routines are often -internally blocked. Small multiples of this blocking factor are -likely to be good block sizes for HPL. The bottom line is that "good" -block sizes are almost always in the [32 .. 256] interval. The best -values depend on the computation / communication performance ratio of -your system. To a much less extent, the problem size matters as well. -Say for example, you emperically found that 44 was a good block size -with respect to performance. 88 or 132 are likely to give slightly -better results for large problem sizes because of a slighlty higher -flop rate.

-
- -

What process grid ratio P x Q should I use ?

- -This depends on the physical interconnection network you have. -Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in [1..3]. -In other words, P and Q should be approximately equal, with Q -slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5, 3 x 4, 4 x 4, -4 x 6, 5 x 6, 4 x 8 ... If you are running on a simple Ethernet -network, there is only one wire through which all the messages are -exchanged. On such a network, the performance and scalability of HPL -is strongly limited and very flat process grids are likely to be the -best choices: 1 x 4, 1 x 8, 2 x 4 ...

-
- -

What about the one processor case ?

- -HPL has been designed to perform well for large problem sizes on -hundreds of nodes and more. The software works on one node and for -large problem sizes, one can usually achieve pretty good performance -on a single processor as well. For small problem sizes however, the -overhead due to message-passing, local indexing and so on can be -significant.

-
- -

Why so many options in HPL.dat ?

- -There are quite a few reasons. First off, these options are useful to -determine what matters and what does not on your system. Second, HPL -is often used in the context of early evaluation of new systems. In -such a case, everything is usually not quite working right, and it is -convenient to be able to vary these parameters without recompiling. -Finally, every system has its own peculiarities and one is likely to -be willing to emperically determine the best set of parameters. In -any case, one can always follow the advice provided in the -tuning section of this document and not -worry about the complexity of the input file.

-
- -

Can HPL be Outperformed ?

- -Certainly. There is always room for performance improvements. -Specific knowledge about a particular system is always a source of -performance gains. Even from a generic point of view, better -algorithms or more efficient formulation of the classic ones are -potential winners.

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/index.html b/hpl/www/index.html deleted file mode 100755 index 71b422464a3ba336e0e80d60552bb1b9cd35221c..0000000000000000000000000000000000000000 --- a/hpl/www/index.html +++ /dev/null @@ -1,128 +0,0 @@ - - - -HPL - A Portable Implementation of the High-Performance -Linpack Benchmark for Distributed-Memory Computers - - - - - -
- - - - - -
-

HPL - A Portable Implementation of the High-Performance Linpack -Benchmark for Distributed-Memory Computers

-
- - -
- - - - - - - -
Version 2.1 -A. Petitet, -R. C. Whaley, -J. Dongarra, -A. Cleary -October 26, 2012 -# Accesses -
-

- -HPL is a software package that solves a (random) -dense linear system in double precision (64 bits) arithmetic -on distributed-memory computers. It can thus be regarded as -a portable as well as freely available implementation of the High -Performance Computing Linpack Benchmark.

- -The algorithm used by HPL can be summarized by the -following keywords: Two-dimensional block-cyclic data distribution -- Right-looking variant of the LU factorization with row partial -pivoting featuring multiple look-ahead depths - Recursive panel -factorization with pivot search and column broadcast combined - -Various virtual panel broadcast topologies - bandwidth reducing -swap-broadcast algorithm - backward substitution with look-ahead -of depth 1.

- -The HPL package provides a testing and timing program to quantify -the accuracy of the obtained solution as well as -the time it took to compute it. The best performance -achievable by this software on your system depends on a large variety -of factors. Nonetheless, with some restrictive assumptions on the -interconnection network, the algorithm described here and its -attached implementation are scalable in the sense -that their parallel efficiency is maintained constant with respect -to the per processor memory usage.

- -The HPL software package requires the availibility -on your system of an implementation of the Message Passing Interface -MPI (1.1 compliant). -An implementation of either the Basic Linear Algebra -Subprograms BLAS or the Vector Signal Image -Processing Library VSIPL is also needed. -Machine-specific as well as generic implementations of -MPI, the -BLAS and -VSIPL are available for a large -variety of systems.

- -Acknowledgements: This work was supported in part -by a grant from the Department of Energy's Lawrence -Livermore National Laboratory and Los Alamos National Laboratory -as part of the ASCI Projects contract numbers B503962 and -12187-001-00 4R. - -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- -
-Innovative Computing Laboratory
-last revised October 26, 2012
-
- -
-#########################################################################
-
-file    hpl-2.1.tar.gz
-for     HPL - A Portable Implementation of the High-Performance Linpack
-,       Benchmark for Distributed-Memory Computers 
-by      Antoine Petitet, Clint Whaley, Jack Dongarra, Andy Cleary, Piotr
-        Luszczek
-
-#########################################################################
-
- - diff --git a/hpl/www/links.html b/hpl/www/links.html deleted file mode 100755 index da2639e99e185f7eb3f10620123f85dadd1ed008..0000000000000000000000000000000000000000 --- a/hpl/www/links.html +++ /dev/null @@ -1,89 +0,0 @@ - - -HPL Related Links - - - - -

HPL Related Links

- -The list of links below contains some relevant material to this -work. This list is provided for illustrative purposes, and should be -regarded as an initial starting point for the interested reader. This -list is by all means not meant to be exhaustive.

- -

Message Passing Interface (MPI)

- -MPI is a library specification for message-passing, proposed as a -standard by a broadly based committee of vendors, implementors, and -users. Machine-specific (optimized) as well as freely available MPI -libraries are available for a large variety of systems. Browse the -Message Passing Interface (MPI) -standard web page for more information.

- -

Basic Linear Algebra Subroutines (BLAS)

- -The BLAS are high quality -"building block" routines for performing basic vector and matrix -operations. A lot of "BLAS-related" information can be found at this -site. In particular, a reference implementation is available. This -reference implementation is not optimized for any -system, and it is therefore not recommended to use it -for benchmarking purposes. -However, machine-specific -optimized BLAS libraries are available for a variety of computer -systems. For further details, please contact your local vendor -representative. Alternatively, one may also consider using automatic -code generators such as ATLAS. -This tool automatically generates a complete and optimized BLAS -library for a large variety of modern systems.

- -

Vector Signal Image Processing Library (VSIPL)

- -VSIPL is an API defined by an open -standard comprised of embedded signal and image processing hardware and -software vendors, academia, users, and government labs. A lot of -"VSIPL-related" information can be found at this site. In particular, a -reference implementation is available. Machine-specific optimized VSIPL -libraries are available for a variety of computer systems. For further -details, please contact your local vendor representative.

- -

TOP 500 List

- -The TOP 500 -is an ordered list of the 500 most powerful computer systems worldwide. -Computers are ranked in this list by their performance on the - -LINPACK Benchmark.

- -

Parallel Dense Linear Algebra Software Libraries

- -Browse the Netlib software repository -or the National HPCC Software Exchange -to find a large collection of freely available linear algebra libraries. -

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/main.jpg b/hpl/www/main.jpg deleted file mode 100755 index df62edd330c2f55faa41d5f1c6bd45bc2f54a2c4..0000000000000000000000000000000000000000 Binary files a/hpl/www/main.jpg and /dev/null differ diff --git a/hpl/www/mat2.jpg b/hpl/www/mat2.jpg deleted file mode 100755 index 25afdc44c1b9802b3ef222afdabcb52fa6bb3420..0000000000000000000000000000000000000000 Binary files a/hpl/www/mat2.jpg and /dev/null differ diff --git a/hpl/www/pfact.jpg b/hpl/www/pfact.jpg deleted file mode 100755 index 33a7e55cb155411759fe28b27ef0d8377dc71ae4..0000000000000000000000000000000000000000 Binary files a/hpl/www/pfact.jpg and /dev/null differ diff --git a/hpl/www/references.html b/hpl/www/references.html deleted file mode 100755 index 95c6db17656d103019937b77a4a38cab2197ae25..0000000000000000000000000000000000000000 --- a/hpl/www/references.html +++ /dev/null @@ -1,276 +0,0 @@ - - -HPL References - - - - -

HPL References

- - -The list of references below contains some relevant published material -to this work. This list is provided for illustrative purposes, and -should be regarded as an initial starting point for the interested -reader. This list is by all means not meant to be exhaustive. -

- -The references have been sorted in four categories and chronologically -listed within each category. The four categories are - -
- -

Linpack Benchmark

- -
    - - -
  • LINPACK Users Guide, J. Dongarra, J. Bunch, C. Moler and -G. W. Stewart, SIAM, Philadelphia, PA, 1979. - - -
  • Performance of Various Computers Using Standard Linear Equations -Software, J. Dongarra, Technical Report CS-89-85, University of -Tennessee, 1989. (An updated version of this report can be found at - -http://www.netlib.org/benchmark/performance.ps). - - -
  • Towards Peak Parallel LINPACK Performance on 400, -R. Bisseling and L. Loyens, Supercomputer, Vol. 45, pp. 20-27, 1991. - -
  • Massively Parallel LINPACK Benchmark on the Intel Touchstone -DELTA and iPSC/860 Systems, R. van de Geijn, 1991 Annual Users -Conference Proceedings. Intel Supercomputer Users Group, Dallas, TX, -1991. - -
  • The LINPACK Benchmark on the AP 1000, R. Brent, Frontiers, -1992, pp. 128-135, McLean, VA, 1992. - - -
  • Implementation of BLAS Level 3 and LINPACK Benchmark on the -AP1000, R. Brent and P. Strazdins, Fujitsu Scientific and Technical -Journal, Vol. 5, No. 1, pp. 61-70, 1993. - - -
  • LU Factorization and the LINPACK Benchmark on the Intel -Paragon, D. Womble, D. Greenberg, D. Wheat and S. Riesen, Sandia -Technical Report, 1994. - - -
  • Massively Parallel Distributed Computing: Worlds First 281 -Gigaflop Supercomputer, J. Bolen, A. Davis, B. Dazey, S. Gupta, -G. Henry, D. Robboy, G. Schiffler, D. Scott, M. Stallcup, A. Taraghi, -S. Wheat from Intel SSD, L. Fisk, G. Istrail, C. Jong, R. Riesen, -L. Shuler, from Sandia National Laboratories, Proceedings of the Intel -Supercomputer Users Group 1995. - - -
  • High Performance Software on Intel Pentium Pro Processors or -Micro-Ops to TeraFLOPS, B. Greer and G. Henry, Proceedings of the -SuperComputing 1997 Conference, ACM SIGARCH - IEEE Computer Society -Press - ISBN: 0-89791-985-8, San Jose, CA, 1997. - -
- -
- -

Parallel LU Factorization

- -
    - - -
  • Communication Complexity of the Gaussian Elimination Algorithm -on Multiprocessors, Y. Saad, Linear Algebra and Its Applications, -Vol. 77, pp. 315-340, 1986. - - -
  • LU Factorization Algorithms on Distributed-Memory Multiprocessor -Architectures, G. Geist and C. Romine, SIAM Journal on Scientific -and Statistical Computing, Vol. 9, pp. 639-649, 1988. - - -
  • Parallel LU Decomposition on a Transputer Network, -R. Bisseling and J. van der Vorst, Lecture Notes in Computer Sciences, -Springer-Verlag, Eds. G. van Zee and J. van der Vorst, Vol. 384, -pp. 61-77, 1989. - - -
  • The Distributed Solution of Linear Systems Using the Torus-Wrap -Data Mapping, C. Ashcraft, ECA-TR-147, Boeing Computer Services, -Seattle, WA, 1990. - -
  • Experiments with Multicomputer LU-Decomposition, E. van de -Velde, Concurrency: Practice and Experience, Vol. 2, pp. 1-26, 1990. - - -
  • A Taxonomy of Distributed Dense LU Factorization Methods, -C. Ashcraft, ECA-TR-161, Boeing Computer Services, Seattle, WA, 1991. - - -
  • The Torus-Wrap Mapping for Dense Matrix Calculations on Massively -Parallel Computers, B. Hendrickson and D. Womble, SIAM Journal on -Scientific and Statistical Computing, Vol. 15, pp. 1201-1226, 1994. - -
  • Scalability Issues in the Design of a Library for Dense Linear -Algebra, J. Dongarra, R. van de Geijn and D. Walker, Journal of -Parallel and Distributed Computing, Vol. 22, No. 3, pp. 523-537, 1994. - - -
  • Matrix Factorization using Distributed Panels on the Fujitsu -AP1000, P. Strazdins, Proceedings of the IEEE First International -Conference on Algorithms And Architectures for Parallel Processing -ICA3PP-95, Brisbane, 1995. - - -
  • The Design and Implementation of the ScaLAPACK LU, QR, and -Cholesky Factorization Routines, J. Choi, J. Dongarra, S. Ostrouchov, -A. Petitet, D. Walker and R. C. Whaley, Scientific Programming, Vol. 5, -pp. 173-184, 1996. - -
- -
- -

Recursive LU Factorization

- -
    - - -
  • Locality of Reference in LU Decomposition with partial -pivoting, S. Toledo, SIAM Journal on Matrix. Anal. Appl., Vol. 18, -No. 4, 1997. - -
  • Recursion Leads to Automatic Variable Blocking for Dense -Linear-Algebra Algorithms, F. Gustavson, IBM Journal of Research -and Development, Vol. 41, No. 6, pp. 737-755, 1997 - -
- -
- -

Parallel Matrix Multiply

- -
    - - -
  • Matrix Algorithms on a Hypercube I: Matrix Multiplication, -G. Fox, S. Otto and A. Hey, Parallel Computing, Vol. 3, pp. 17-31, 1987. - - -
  • Basic Matrix Subprograms for Distributed-Memory Systems, -A. Elster, Proceedings of the Fifth Distributed-Memory Computing -Conference, Eds. D. Walker and Q. Stout, IEEE Press, pp. 311-316, 1990. - - -
  • The Parallelization of Level 2 and 3 BLAS Operations on -Distributed-Memory Machines, M. Aboelaze, N. Chrisochoides -and E. Houstis, CSD-TR-91-007, Purdue University, West Lafayette, -IN, 1991. - - -
  • The Multicomputer Toolbox Approach to Concurrent BLAS and LACS, -R. Falgout, A. Skjellum, S. Smith and C. Still, Proceedings of the -Scalable High Performance Computing Conference SHPCC-92, IEEE Computer -Society Press, 1992. - - -
  • A High Performance Matrix Multiplication Algorithm on a -Distributed-Memory Parallel Computer, Using Overlapped Communication, -R. Agarwal, F. Gustavson and M. Zubair, IBM Journal or Research and -Development, Vol. 38, No. 6, pp. 673-681, 1994. - -
  • PUMMA: Parallel Universal Matrix Multiplication Algorithms on -Distributed-Memory Concurrent Computers, J. Choi, J. Dongarra and -D. Walker, Concurrency: Practice and Experience, Vol. 6, No. 7, -pp. 543-570, 1994. - -
  • Matrix Multiplication on the Intel Touchstone DELTA, -S. Huss-Lederman, E. Jacobson, A. Tsao and G. Zhang, Concurrency: -Practice and Experience, Vol. 6, No. 7, pp. 571-594, 1994. - - -
  • A Three-Dimensional Approach to Parallel Matrix Multiplication, -R. Agarwal, S. Balle, F. Gustavson, M. Joshi and P. Palkar, IBM Journal -or Research and Development, Vol. 39, No. 5, pp. 575-582, 1995. - - -
  • A High Performance Parallel Strassen Implementation, -B. Grayson and R. van de Geijn, Parallel Processing Letters, Vol. 6, -No. 1, pp. 3-12, 1996. - - -
  • Parallel Implementation of BLAS: General Techniques for Level -3 BLAS, A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt and -R. van de Geijn, Concurrency: Practice and Experience, Vol. 9, No. 9, -pp. 837-857, 1997. - -
  • A Poly-Algorithm for Parallel Dense Matrix Multiplication on -Two-Dimensional Process Grid Topologies, J. Li, R. Falgout and -A. Skjellum, Concurrency: Practice and Experience, Vol. 9, No. 5, -pp. 345-389, 1997. - -
  • SUMMA: Scalable Universal Matrix Multiplication Algorithm, -R. van de Geijn and J. Watts, Concurrency: Practice and Experience, -Vol. 9, No. 4, pp. 255-274, 1997. - -
- -
- -

Parallel Triangular Solve

- -
    - - -
  • Parallel Solution Triangular Systems on Distributed-Memory -Multiprocessors, M. Heath and C. Romine, SIAM Journal on Scientific -and Statistical Computing, Vol. 9, pp. 558-588, 1988. - -
  • A Parallel Triangular Solver for a Distributed-Memory -Multiprocessor, G. Li and T. Coleman, SIAM Journal on Scientific -and Statistical Computing, Vol. 9, No. 3, pp. 485-502, 1988. - - -
  • A New Method for Solving Triangular Systems on Distributed-Memory -Message-Passing Multiprocessor, G. Li and T. Coleman, SIAM Journal -on Scientific and Statistical Computing, Vol. 10, No. 2, pp. 382-396, -1989. - - -
  • Parallel Triangular System Solving on a Mesh Network of -Transputers, R. Bisseling and J. van der Vorst, SIAM Journal -on Scientific and Statistical Computing, Vol. 12, pp. 787-799, 1991. - -
- - -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/results.html b/hpl/www/results.html deleted file mode 100755 index 9a7d8b8af74f8b97ba9a45a1e0914a84f1d6ee83..0000000000000000000000000000000000000000 --- a/hpl/www/results.html +++ /dev/null @@ -1,243 +0,0 @@ - - -HPL Results - - - - - - - -
- - -

HPL Performance Results

- - -The performance achieved by this software package on a few machine -configurations is shown below. These results are only provided for -illustrative purposes. By the time you read this, those systems -have changed, they may not even exist anymore and one can surely -not exactly reproduce the state in which these machines were when -those measurements have been obtained. To obtain accurate figures -on your system, it is absolutely necessary to -download the software and run it there. - -
-
- - - -
-
- -

4 AMD Athlon K7 500 Mhz (256 Mb) - (2x) 100 Mbs -Switched - 2 NICs per node (channel bonding)

- -
- - - - - - - -
OS Linux 6.2 RedHat (Kernel 2.2.14)
C compiler gcc (egcs-2.91.66 egcs-1.1.2 release)
C flags -fomit-frame-pointer -O3 -funroll-loops
MPI MPIch 1.2.1
BLAS ATLAS (Version 3.0 beta)
Comments 09 / 00

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Performance (Gflops) w.r.t Problem size on 4 nodes. -
GRID 2000 5000 800010000
1 x 4 1.28 1.73 1.89 1.95
2 x 2 1.17 1.68 1.88 1.93
4 x 1 0.81 1.43 1.70 1.80

-

- -
-

8 Duals Intel PIII 550 Mhz (512 Mb) - Myrinet

- -
- - - - - - - - - -
OS Linux 6.1 RedHat (Kernel 2.2.15)
C compiler gcc (egcs-2.91.66 egcs-1.1.2 release)
C flags -fomit-frame-pointer -O3 -funroll-loops
MPI MPI GM (Version 1.2.3)
BLAS ATLAS (Version 3.0 beta)
Comments UTK / ICL - Torc cluster - 09 / 00

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Performance (Gflops) w.r.t Problem size on 8- and 16-processors grids. -
GRID 2000 5000 8000100001500020000
2 x 4 1.76 2.32 2.51 2.58 2.72 2.73
4 x 4 2.27 3.94 4.46 4.68 5.00 5.16

-

- -
-

Compaq 64 nodes (4 ev67 667 Mhz processors per node) -AlphaServer SC

- -
- - - - - - - - -
OS Tru64 Version 5
C compiler cc Version 6.1
C flags -arch host -tune host -std -O5
MPI -lmpi -lelan
BLAS CXML
Comments ORNL / NCCS - - falcon - 09 / 00

-

- -In the table below, each row corresponds to a given number of cpus (or -processors) and nodes. The first row for example is denoted by 1 / 1, -i.e., 1 cpu / 1 node. Rmax is given in Gflops, and the value of Nmax -in fact corresponds to 351 Mb per cpu for all machine configurations.

- -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CPUS / NODES GRID N 1/2 Nmax Rmax (Gflops) Parallel Efficiency
1 / 1 1 x 1 150 6625 1.136 1.000
4 / 1 2 x 2 800 13250 4.360 0.960
16 / 4 4 x 4 2300 26500 17.00 0.935
64 / 16 8 x 8 5700 53000 67.50 0.928
256 / 64 16 x 16 14000 106000 263.6 0.906

-

-For Rmax shown in the table, the parallel efficiency per cpu has been -computed using the performance achieved by HPL on 1 cpu. That is fair, -since the CXML matrix multiply routine was achieving at best 1.24 Gflops -for large matrix operands on one cpu, it would have been difficult for a -sequential Linpack benchmark implementation to achieve much more than -1.136 Gflops on this same cpu. For constant load (as in the table 351 Mb -per cpu for Nmax), HPL scales almost linearly as it should. - -

-The authors acknowledge the use of the Oak Ridge National Laboratory -Compaq computer, funded by the Department of Energy's Office -of Science and Energy Efficiency programs.

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/roll.jpg b/hpl/www/roll.jpg deleted file mode 100755 index 88d2c56afd1ca99f6f3675841877037161379575..0000000000000000000000000000000000000000 Binary files a/hpl/www/roll.jpg and /dev/null differ diff --git a/hpl/www/rollM.jpg b/hpl/www/rollM.jpg deleted file mode 100755 index 0d7f076fd60ef8de229b080d6a41ca7058346ea8..0000000000000000000000000000000000000000 Binary files a/hpl/www/rollM.jpg and /dev/null differ diff --git a/hpl/www/scalability.html b/hpl/www/scalability.html deleted file mode 100755 index 00bb1a27ed8800fbe3a69b6c1af0eb88f053dc63..0000000000000000000000000000000000000000 --- a/hpl/www/scalability.html +++ /dev/null @@ -1,200 +0,0 @@ - - -HPL Scalability Analysis - - - - -

HPL Scalability Analysis

- -The machine model used for the -analysis is first described. This crude model is then used to first -estimate the parallel running time of the various phases of the -algorithm namely - -Finally the parallel efficiency -of the entire algorithm is estimated according to this machine model. -We show that for a given set of parameters HPL is scalable -not only with respect to the amount of computation, but also with -respect to the communication volume.

-
- -

The Machine Model

- -Distributed-memory computers consist of processors that are connected -using a message passing interconnection network. Each processor has -its own memory called the local memory, which is accessible only to -that processor. As the time to access a remote memory is longer than -the time to access a local one, such computers are often referred to -as Non-Uniform Memory Access (NUMA) machines.

- -The interconnection network of our machine model is static, meaning -that it consists of point-to-point communication links among -processors. This type of network is also referred to as a direct -network as opposed to dynamic networks. The latter are constructed -from switches and communication links. These links are dynamically -connected to one another by the switching elements to establish, at -run time, the paths between processors memories.

- -The interconnection network of the two-dimensional machine model -considered here is a static, fully connected physical topology. It -is also assumed that processors can be treated equally in terms -of local performance and that the communication rate between two -processors depends on the processors considered.

- -Our model assumes that a processor can send or receive data on only -one of its communication ports at a time (assuming it has more than -one). In the literature, this assumption is also referred to as the -one-port communication model.

- -The time spent to communicate a message between two given processors -is called the communication time Tc. In our machine model, Tc is -approximated by a linear function of the number L of double -precision (64-bits) items communicated. Tc is the sum of the time to -prepare the message for transmission (alpha) and the time (beta * L) -taken by the message of length L to traverse the network to its -destination, i.e.,

-
-Tc = alpha + beta L.

-
- -Finally, the model assumes that the communication links are -bi-directional, that is, the time for two processors to send each -other a message of length L is also Tc. A processor can send and/or -receive a message on only one of its communication links at a time. -In particular, a processor can send a message while receiving another -message from the processor it is sending to at the same time.

- -Since this document is only concerned with regular local dense linear -algebra operations, the time taken to perform one floating point -operation is assumed to be summarized by three constants gam1, -gam2 and gam3. These quantitites are flop rates approximations of the -vector-vector, matrix-vector and matrix-matrix operations for each -processor. This very crude approximation summarizes all the steps -performed by a processor to achieve such a computation. Obviously, -such a model neglects all the phenomena occurring in the processor -components, such as cache misses, pipeline startups, memory load or -store, floating point arithmetic and so on, that may influence the -value of these constants as a function of the problem size for -example.

- -Similarly, the model does not make any assumption on the amount of -physical memory per node. It is assumed that if a process has been -spawn on a processor, one has ensured that enough memory was -available on that processor. In other words, swapping will not occur -during the modeled computation.

- - -This machine model is a very crude approximation that is designed -specifically to illustrate the cost of the dominant factors of our -particular case.

-
-
- -

Panel Factorization and Broadcast

- -Let consider an M-by-N panel distributed over a P-process column. -Because of the recursive formulation of the panel factorization, it -is reasonable to consider that the floating point operations will -be performed at matrix-matrix multiply "speed". For every column in -the panel a binary-exchange is performed on 2*N data items. When this -panel is broadcast, what matters is the time that the next process -column will spend in this communication operation. Assuming one -chooses the increasing-ring (modified) -variant, only one message needs to be taken into account. The -execution time of the panel factorization and broadcast can thus be -approximated by:

-
-Tpfact( M, N ) = (M/P - N/3) N^2 gam3 + N log(P)( alpha + beta 2 N ) + -alpha + beta M N / P.

-
-
- -

Trailing Submatrix Update

- -Let consider the update phase of an N-by-N trailing submatrix -distributed on a P-by-Q process grid. From a computational point of -view one has to (triangular) solve N right-hand-sides and perform a -local rank-NB update of this trailing submatrix. Assuming one chooses -the long variant, the execution -time of the update operation can be approximated by:

-
-Tupdate( N, NB ) = gam3 ( N NB^2 / Q + 2 N^2 NB / ( P Q ) ) + -alpha ( log( P ) + P - 1 ) + 3 beta N NB / Q.

-
-The constant "3" in front of the "beta" term is obtained by counting -one for the (logarithmic) spread phase and two for the rolling phase; -In the case of bi-directional links this constant 3 should therefore -be only a 2.

-
- -

Backward Substitution

- -The number of floating point operations performed during the backward -substitution in given by N^2 / (P*Q). Because of the lookahead, the -communication cost can be approximated at each step by two messages -of length NB, i.e., the time to communicate the NB-piece of the -solution vector from one diagonal block of the matrix to another. It -follows that the execution time of the backward substitution can be -approximated by:

-
-Tbacks( N, NB ) = gam2 N^2 / (P Q) + N ( alpha / NB + 2 beta ).

-
-
- -

Putting it All Together

- -The total execution time of the algorithm described above is given by

-
-Sum(k=0,N,NB)[Tpfact( N-k, NB ) + Tupdate( N-k-NB, NB )] + -Tbacks( N, NB ).

-
-That is, by only considering only the dominant term in alpha, beta and -gam3:

-
-Thpl = 2 gam3 N^3 / ( 3 P Q ) + beta N^2 (3 P + Q) / ( 2 P Q ) + -alpha N ((NB + 1) log(P) + P) / NB.

-
-The serial execution time is given by Tser = 2 gam3 N^3 / 3. If we -define the parallel efficiency E as the ratio Tser / ( P Q Thpl ), we -obtain:

-
-E = 1 / ( 1 + 3 beta (3 P + Q) / ( 4 gam3 N ) + -3 alpha P Q ((NB + 1) log(P) + P) / (2 N^2 NB gam3) ).

-
-This last equality shows that when the memory usage per processor -N^2 / (P Q) is maintained constant, the parallel efficiency slowly -decreases only because of the alpha term. The communication volume -(the beta term) however remains constant. Due to these results, HPL -is said to be scalable not only with respect to the -amount of computation, but also with respect to the communication -volume.

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/software.html b/hpl/www/software.html deleted file mode 100755 index e8fec8af6851d421b7602311ccc74b9807fb4678..0000000000000000000000000000000000000000 --- a/hpl/www/software.html +++ /dev/null @@ -1,109 +0,0 @@ - - -HPL Software - - - - -

HPL Software

- -

Download and Installation

- -
    -
  1. Download the tar-gzipped file, -issue then "gunzip hpl-2.1.tar.gz; tar -xvf hpl-2.1.tar" and this -should create an hpl-2.1 directory containing the distribution. -We call this directory the top level directory. - -
  2. Create a file Make.<arch> in the top-level directory. -For this purpose, you may want to re-use one contained in the -setup directory. This Make.<arch> file essentially contains -the compilers, libraries, and their paths to be used on your system. - -
  3. Type "make arch=<arch>". This should create an executable -in the bin/<arch> directory called xhpl. For example, on our -Linux PII cluster, I create a file called Make.Linux_PII in the -top-level directory. Then, I type "make arch=Linux_PII". This -creates the executable file bin/Linux_PII/xhpl. - -
  4. Quick check: run a few tests (assuming you have 4 nodes for -interactive use) by issuing the following commands from the top -level directory: "cd bin/<arch> ; mpirun -np 4 xhpl". This -should produce quite a bit of meaningful output on the screen. - -
  5. Most of the performance parameters can be tuned, by modifying -the input file bin/<arch>/HPL.dat. See the -tuning page or the TUNING file in the -top-level directory. -
-
- -

Compile Time Options

- -At the end of the "model" Make.<arch>, the user is given -the opportunity to override some default compile options of this -software. The list of these options and their meaning is:

- -
- - - - - - - - - -
-DHPL_COPY_Lforce the copy of the panel L before bcast
-DHPL_CALL_CBLAScall the BLAS C interface
-DHPL_CALL_VSIPLcall the vsip library
-DHPL_DETAILED_TIMINGenable detailed timers

-

- -The user must choose between either the BLAS Fortran 77 interface, -or the BLAS C interface, or the VSIPL library depending on which -computational kernels are available on his system. Only one of these -options should be selected. If you choose the BLAS Fortran 77 -interface, it is necessary to fill out the machine-specific C to -Fortran 77 interface section of the Make.<arch> file. To do -this, please refer to the Make.<arch> examples contained in -the setup directory.

- -By default HPL will: -
    -
  • not copy L before broadcast, -
  • call the BLAS Fortran 77 interface, -
  • not display detailed timing information. -
- -As an example, suppose one wants this software to copy the panel of -columns into a contiguous buffer before broadcasting. It should -be more efficient to let the software create the appropriate MPI -user-defined data type since this may avoid the data copy. So, it -is a strange idea, but one insists. To achieve this one would add --DHPL_COPY_L to the definition of HPL_OPTS at the end of the file -Make.<arch>. Issue then a "make clean arch=<arch> ; -make build arch=<arch>" and the executable will be re-build -with that feature in.

- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- - diff --git a/hpl/www/spread.jpg b/hpl/www/spread.jpg deleted file mode 100755 index 56c255a3febb1160a8b80374d2df48120f021085..0000000000000000000000000000000000000000 Binary files a/hpl/www/spread.jpg and /dev/null differ diff --git a/hpl/www/spreadM.jpg b/hpl/www/spreadM.jpg deleted file mode 100755 index 433e4c0773ebdbcc30d343e130b060cd836352e3..0000000000000000000000000000000000000000 Binary files a/hpl/www/spreadM.jpg and /dev/null differ diff --git a/hpl/www/tuning.html b/hpl/www/tuning.html deleted file mode 100755 index fbbf17fb71e06e4c7a6cebb752e12731c5b564e8..0000000000000000000000000000000000000000 --- a/hpl/www/tuning.html +++ /dev/null @@ -1,476 +0,0 @@ - - -HPL Tuning - - - - -

HPL Tuning

- -After having built the executable hpl/bin/<arch>/xhpl, -one may want to modify the input data file HPL.dat. This file -should reside in the same directory as the executable -hpl/bin/<arch>/xhpl. An example HPL.dat file is -provided by default. This file contains information about the -problem sizes, machine configuration, and algorithm features -to be used by the executable. It is 31 lines long. All the -selected parameters will be printed in the output generated -by the executable.

- -We first describe the meaning of each line of this input file -below. Finally, a few useful -experimental guide lines to set up the file are given at -the end of this page.

-
- -

Description of the HPL.dat File

- -Line 1: (unused) Typically one would use -this line for its own good. For example, it could be used -to summarize the content of the input file. By default this -line reads: -
-HPL Linpack benchmark input file
-
- -
-Line 2: (unused) same as line 1. By default -this line reads: -
-Innovative Computing Laboratory, University of Tennessee
-
- -
-Line 3: the user can choose where the -output should be redirected to. In the case of a file, a -name is necessary, and this is the line where one wants to -specify it. Only the first name on this line is significant. -By default, the line reads: -
-HPL.out  output file name (if any)
-
- -This means that if one chooses to redirect the output to a -file, the file will be called "HPL.out". The rest of the line -is unused, and this space to put some informative comment on -the meaning of this line.

- -
-Line 4: This line specifies where the output -should go. The line is formatted, it must begin with a -positive integer, the rest is unsignificant. 3 choices are -possible for the positive integer, 6 means that the output -will go the standard output, 7 means that the output will -go to the standard error. Any other integer means that the -output should be redirected to a file, which name has been -specified in the line above. This line by default reads: -
-6        device out (6=stdout,7=stderr,file)
-
-which means that the output generated by the executable -should be redirected to the standard output.

- -
-Line 5: This line specifies the number of -problem sizes to be executed. This number should be less than -or equal to 20. The first integer is significant, the rest -is ignored. If the line reads: -
-3        # of problems sizes (N)
-
-this means that the user is willing to run 3 problem sizes -that will be specified in the next line.

- -
-Line 6: This line specifies the problem sizes -one wants to run. Assuming the line above started with 3, -the 3 first positive integers are significant, the rest is -ignored. For example: -
-3000 6000 10000    Ns
-
-means that one wants xhpl to run 3 (specified in line 5) -problem sizes, namely 3000, 6000 and 10000.

- -
-Line 7: This line specifies the number of -block sizes to be runned. This number should be less than or -equal to 20. The first integer is significant, the rest is -ignored. If the line reads: -
-5        # of NBs
-
-this means that the user is willing to use 5 block sizes that -will be specified in the next line.

- -
-Line 8: This line specifies the block sizes -one wants to run. Assuming the line above started with 5, -the 5 first positive integers are significant, the rest is -ignored. For example: -
-80 100 120 140 160 NBs
-
-means that one wants xhpl to use 5 (specified in line 7) -block sizes, namely 80, 100, 120, 140 and 160.

- -
-Line 9: This line specifies how the MPI -processes should be mapped onto the nodes of your platform. -There are currently two possible mappings, namely row- and -column-major. This feature is mainly useful when these nodes -are themselves multi-processor computers. A row-major mapping -is recommended.

- -
-Line 10: This line specifies the number of -process grid to be runned. This number should be less than -or equal to 20. The first integer is significant, the rest is -ignored. If the line reads: -
-2        # of process grids (P x Q)
-
-this means that you are willing to try 2 process grid sizes -that will be specified in the next line.

- -
-Line 11-12: These two lines specify the -number of process rows and columns of each grid you want to -run on. Assuming the line above (10) started with 2, the 2 -first positive integers of those two lines are significant, -the rest is ignored. For example: -
-1 2          Ps
-6 8          Qs
-
-means that one wants to run xhpl on 2 process grids (line -10), namely 1-by-6 and 2-by-8. Note: In this example, it is -required then to start xhpl on at least 16 nodes (max -of Pi-by-Qi). The runs on the two grids will be consecutive. -If one was starting xhpl on more than 16 nodes, say 52, only -6 would be used for the first grid (1x6) and then 16 (2x8) -would be used for the second grid. The fact that you started -the MPI job on 52 nodes, will not make HPL use all of them. -In this example, only 16 would be used. If one wants to run -xhpl with 52 processes one needs to specify a grid of 52 -processes, for example the following lines would do the job: -
-4  2         Ps
-13 8         Qs
-
- -
-Line 13: This line specifies the threshold -to which the residuals should be compared with. The residuals -should be or order 1, but are in practice slightly less than -this, typically 0.001. This line is made of a real number, -the rest is not significant. For example: -
-16.0         threshold
-
-In practice, a value of 16.0 will cover most cases. For -various reasons, it is possible that some of the residuals -become slightly larger, say for example 35.6. xhpl will flag -those runs as failed, however they can be considered as -correct. A run should be considered as failed if the residual -is a few order of magnitude bigger than 1 for example 10^6 or -more. Note: if one was to specify a threshold of 0.0, all -tests would be flagged as failed, even though the answer is -likely to be correct. It is allowed to specify a negative -value for this threshold, in which case the checks will be -by-passed, no matter what the threshold value is, as soon as -it is negative. This feature allows to save time when -performing a lot of experiments, say for instance during the -tuning phase. Example: -
--16.0        threshold
-
- -
-The remaning lines allow to specifies algorithmic features. -xhpl will run all possible combinations of those for each -problem size, block size, process grid combination. This is -handy when one looks for an "optimal" set of parameters. To -understand a little bit better, let say first a few words -about the algorithm implemented in HPL. Basically this is a -right-looking version with row-partial pivoting. The panel -factorization is matrix-matrix operation based and recursive, -dividing the panel into NDIV subpanels at each step. This -part of the panel factorization is denoted below by -"recursive panel fact. (RFACT)". The recursion stops when -the current panel is made of less than or equal to NBMIN -columns. At that point, xhpl uses a matrix-vector operation -based factorization denoted below by "PFACTs". Classic -recursion would then use NDIV=2, NBMIN=1. There are -essentially 3 numerically equivalent LU factorization -algorithm variants (left-looking, Crout and right-looking). -In HPL, one can choose every one of those for the RFACT, as -well as the PFACT. The following lines of HPL.dat allows you -to set those parameters.

-Lines 14-21: (Example 1) -
-3       # of panel fact
-0 1 2   PFACTs (0=left, 1=Crout, 2=Right)
-4       # of recursive stopping criterium
-1 2 4 8 NBMINs (>= 1)
-3       # of panels in recursion
-2 3 4   NDIVs
-3       # of recursive panel fact.
-0 1 2   RFACTs (0=left, 1=Crout, 2=Right)
-
- -This example would try all variants of PFACT, 4 values for -NBMIN, namely 1, 2, 4 and 8, 3 values for NDIV namely 2, 3 -and 4, and all variants for RFACT.

-Lines 14-21: (Example 2) -
-2       # of panel fact
-2 0     PFACTs (0=left, 1=Crout, 2=Right)
-2       # of recursive stopping criterium
-4 8     NBMINs (>= 1)
-1       # of panels in recursion
-2       NDIVs
-1       # of recursive panel fact.
-2       RFACTs (0=left, 1=Crout, 2=Right)
-
-This example would try 2 variants of PFACT namely right -looking and left looking, 2 values for NBMIN, namely 4 and 8, -1 value for NDIV namely 2, and one variant for RFACT.

- -
-In the main loop of the algorithm, the current panel of -column is broadcast in process rows using a virtual ring -topology. HPL offers various choices and one most likely want -to use the increasing ring modified encoded as 1. 3 and 4 are -also good choices.

-Lines 22-23: (Example 1) -
-1       # of broadcast
-1       BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
-
-This will cause HPL to broadcast the current panel using the -increasing ring modified topology.

-Lines 22-23: (Example 2) -
-2       # of broadcast
-0 4     BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
-
-This will cause HPL to broadcast the current panel using the -increasing ring virtual topology and the long message -algorithm.

- -
-Lines 24-25 allow to specify the look-ahead -depth used by HPL. A depth of 0 means that the next panel -is factorized after the update by the current panel is -completely finished. A depth of 1 means that the next -panel is immediately factorized after being updated. The -update by the current panel is then finished. A depth of k -means that the k next panels are factorized immediately after -being updated. The update by the current panel is then -finished. It turns out that a depth of 1 seems to give the -best results, but may need a large problem size before one -can see the performance gain. So use 1, if you do not know -better, otherwise you may want to try 0. Look-ahead of -depths 3 and larger will probably not give you better -results.

-Lines 24-25: (Example 1): -
-1       # of lookahead depth
-1       DEPTHs (>=0)
-
-This will cause HPL to use a look-ahead of depth 1.

-Lines 24-25: (Example 2): -
-2       # of lookahead depth
-0 1     DEPTHs (>=0)
-
-This will cause HPL to use a look-ahead of depths 0 and 1.

- -
-Lines 26-27 allow to specify the swapping -algorithm used by HPL for all tests. There are currently -two swapping algorithms available, one based on "binary -exchange" and the other one based on a "spread-roll" -procedure (also called "long" below). For large problem -sizes, this last one is likely to be more efficient. The user -can also choose to mix both variants, that is "binary-exchange" -for a number of columns less than a threshold value, and then -the "spread-roll" algorithm. This threshold value is then -specified on Line 27.

-Lines 26-27: (Example 1): -
-1       SWAP (0=bin-exch,1=long,2=mix)
-60      swapping threshold
-
-This will cause HPL to use the "long" or "spread-roll" -swapping algorithm. Note that a threshold is specified in -that example but not used by HPL.

-Lines 26-27: (Example 2): -
-2       SWAP (0=bin-exch,1=long,2=mix)
-60      swapping threshold
-
-This will cause HPL to use the "long" or "spread-roll" -swapping algorithm as soon as there is more than 60 columns -in the row panel. Otherwise, the "binary-exchange" algorithm -will be used instead.

- -
-Line 28 allows to specify whether the upper -triangle of the panel of columns should be stored in -no-transposed or transposed form. Example: -
-0            L1 in (0=transposed,1=no-transposed) form
-
- -
-Line 29 allows to specify whether the panel -of rows U should be stored in no-transposed or transposed -form. Example: -
-0            U  in (0=transposed,1=no-transposed) form
-
- -
-Line 30 enables / disables the equilibration -phase. This option will not be used unless you selected 1 or -2 in Line 26. Example: -
-1            Equilibration (0=no,1=yes)
-
- -
-Line 31 allows to specify the alignment in -memory for the memory space allocated by HPL. On modern -machines, one probably wants to use 4, 8 or 16. This may -result in a tiny amount of memory wasted. Example: -
-8       memory alignment in double (> 0)
-
- -
-

Guide Lines

- -
    -
  1. Figure out a good block size for the matrix multiply -routine. The best method is to try a few out. If you happen -to know the block size used by the matrix-matrix multiply -routine, a small multiple of that block size will do fine. -This particular topic is discussed in the -FAQs section.

    - -
  2. The process mapping should not matter if the nodes of -your platform are single processor computers. If these nodes -are multi-processors, a row-major mapping is recommended.

    - -
  3. HPL likes "square" or slightly flat process grids. Unless -you are using a very small process grid, stay away from the -1-by-Q and P-by-1 process grids. This particular topic is also -discussed in the FAQs section.

    - -
  4. Panel factorization parameters: a good start are the -following for the lines 14-21: -
    -1       # of panel fact
    -1       PFACTs (0=left, 1=Crout, 2=Right)
    -2       # of recursive stopping criterium
    -4 8     NBMINs (>= 1)
    -1       # of panels in recursion
    -2       NDIVs
    -1       # of recursive panel fact.
    -2       RFACTs (0=left, 1=Crout, 2=Right)
    -
    - -
  5. Broadcast parameters: at this time it is far from obvious -to me what the best setting is, so i would probably try them -all. If I had to guess I would probably start with the -following for the lines 22-23: -
    -2       # of broadcast
    -1 3     BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
    -
    -The best broadcast depends on your problem size and harware -performance. My take is that 4 or 5 may be competitive for -machines featuring very fast nodes comparatively to the -network.

    - -
  6. Look-ahead depth: as mentioned above 0 or 1 are likely to -be the best choices. This also depends on the problem size -and machine configuration, so I would try "no look-ahead (0)" -and "look-ahead of depth 1 (1)". That is for lines 24-25: -
    -2       # of lookahead depth
    -0 1     DEPTHs (>=0)
    -
    - -
  7. Swapping: one can select only one of the three algorithm -in the input file. Theoretically, mix (2) should win, however -long (1) might just be good enough. The difference should be -small between those two assuming a swapping threshold of the -order of the block size (NB) selected. If this threshold is -very large, HPL will use bin_exch (0) most of the time and if -it is very small (< NB) long (1) will always be used. In -short and assuming the block size (NB) used is say 60, I -would choose for the lines 26-27: -
    -2       SWAP (0=bin-exch,1=long,2=mix)
    -60      swapping threshold 
    -
    -I would also try the long variant. For a very small number -of processes in every column of the process grid (say < 4), -very little performance difference should be observable.

    - -
  8. Local storage: I do not think Line 28 matters. Pick 0 in -doubt. Line 29 is more important. It controls how the panel -of rows should be stored. No doubt 0 is better. The caveat is -that in that case the matrix-multiply function is called with -( Notrans, Trans, ... ), that is C := C - A B^T. Unless the -computational kernel you are using has a very poor (with -respect to performance) implementation of that case, and is -much more efficient with ( Notrans, Notrans, ... ) just pick -0 as well. So, my choice: -
    -0       L1 in (0=transposed,1=no-transposed) form
    -0       U  in (0=transposed,1=no-transposed) form
    -
    - -
  9. Equilibration: It is hard to tell whether equilibration -should always be performed or not. Not knowing much about the -random matrix generated and because the overhead is so small -compared to the possible gain, I turn it on all the time. -
    -1       Equilibration (0=no,1=yes)
    -
    - -
  10. For alignment, 4 should be plenty, but just to be safe, -one may want to pick 8 instead. -
    -8       memory alignment in double (> 0)
    -
    -
- -
-
- [Home] - [Copyright and Licensing Terms] - [Algorithm] - [Scalability] - [Performance Results] - [Documentation] - [Software] - [FAQs] - [Tuning] - [Errata-Bugs] - [References] - [Related Links]
-
-
- -