# ########################################################################
# Copyright 2013 Advanced Micro Devices, Inc.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
# http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ########################################################################

clFFT Readme

Version:       1.10
Release Date:  April 2013

ChangeLog:

____________
Current Version:
  * This release tested using the 9.012 runtime driver and the 2.8 APPSDK
  
____________
Version 1.8.291:
Fixed:
  * Memory leaks affecting use cases where 'clfftEnqueueTransform' is used in a loop
	  
____________
Version 1.8.269 (beta):
New:
  * clFFT now supports real-to-complex and complex-to-real transforms;
      refer to documentation for details
  * This release tested using the 12.4 Catalyst software suite
	  
Known Issues:
  * Some degradation in performance of real transforms due to known
      runtime/driver issues
  * Failures in real transforms have been seen on 7xxx series GPUs with certain
      problem sizes involving powers of 3 and 5  
  
____________
Version 1.6.244:
Fixed:
  * Failures observed in v1.6.236 in backward transforms of certain power of 2
      (involving radix 4 and radix 8) problem sizes.
	  
____________
Version 1.6.236:
New:
  * Performance of the FFT library has been improved for Radix-2 1D and 2D transforms
  * Support for R4XXX GPUs is deprecated and no longer tested
  * Preview: Support for AMD Radeon HD7000 series GPUs
  * This release tested using the 8.92 runtime driver and the 2.6 APP SDK
____________
Version 1.4:
New:
  * clFFT now supports transform lengths whose factors consist exclusively 
      of powers of 2, 3, and 5
  * clFFT supports double precision data types
  * clFFT executes on OpenCL 1.0 compliant devices
  * This release tested using the 8.872 runtime driver and the 2.5 APP SDK
  * A helper bash script appmlEnv.sh has been added to the root installation
      directory to assist in properly setting up a terminal environment to 
      execute clFFT samples

Fixed:
  * If the library is required to allocate a temporary buffer, and the user does
      not specify a temporary buffer on the Enqueue call, the library will 
      allocate a temporary buffer internally and the lifetime of that temporary 
      buffer is managed by the lifetime of the FFT plan; deleting the plan will 
      release the buffer.
  * Test failures on CPU device for 32-bit systems  (Windows/Linux) 

Known Issues:
  * Failures have been seen on graphics cards using R4550 (RV710) GPUs.
  
____________
Version 1.2:
New:
  * Reduced the number of internal LDS bank conflicts for our 1D FFT transforms,
      increasing performance.
  * Padded reads/writes to global memory, decreasing bank conflicts and 
      increasing performance on 2D transforms.
  * This release tested using the 8.841 runtime driver and the 2.4 APP SDK

Fixed:
  * Failures have been seen attempting to queue work on the second GPU device on
      a multi GPU 5970 card on Linux.

Known Issues:
  * It is recommended that users query for and explicitely create an 
      intermediate buffer if clFFT requires one.  If the library creates the 
      intermediate buffer internally, a race condition may occur on freeing the 
      buffer on lower end hardware.
  * Failures have been seen on graphics cards using R4550 (RV710) GPUs.
  * Test failures on CPU device for 32-bit systems  (Windows/Linux) 
  * It is recommended that windows users uninstall previous version of clFFT 
      before installing newer versions.  Otherwise, Add/Remove programs only 
      removes the latest version.  Linux users can delete the install directory.

____________
Version 1.0:
  * Initial release, available on all platforms

Known Issues:
  * Failures have been seen attempting to queue work on the second GPU device on
      a multi GPU 5970 card on Linux.
_____________________
Building the Samples:

To install the Linux versions of clFFT, uncompress the initial download and 
  then execute the install script.

For example:
  tar -xf clFFT-${version}.tar.gz
      - This installs three files into the local directory, one being an 
        executable bash script.

  sudo mkdir /opt/clFFT-${version}
      - This pre-creates the install directory with proper permissions in /opt 
        if it is to be installed there (This is the default).

  ./install-clFFT-${version}.sh
      - This prints an EULA and uncompresses files into the chosen install 
        directory.

  cd ${installDir}/bin64
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clfftLibDir}
      - Export library dependencies to resolve all external linkages to the 
        client program. The user can create a bash script to help automate this 
        procedure.

  ./Client -h
      - Understand the command line options that are available to the user 
        through the sample client.

  ./Client -iv
      - Watch for the version strings to print out; watch for 
        'Client Test *****PASS*****' to print out.

The sample program does not ship with native build files. Instead, a CMake
file is shipped, and users generate a native build file for their system.

For example:
  cd ${installDir}
  mkdir samplesBin/
      - This creates a sister directory to the samples directory that will house
        the native makefiles and the generated files from the build.

  cd samplesBin/
  ccmake ../samples/
      - ccmake is a curses-based cmake program. It takes a parameter that 
        specifies the location of the source code to compile.
      - Hit 'c' to configure for the platform; ensure that the dependencies to 
        external libraries are satisfied, including paths to 'ATI Stream SDK' 
        and 'Boost'.
      - After dependencies are satisfied, hit 'c' again to finalize configure 
        step, then hit 'g' to generate makefile and exit ccmake.

  make help
      - Look at the available options for make.

  make
      - Build the sample client program.

  ./clfft.Sample -iv
      - Watch for the version strings to print out; watch for 
        'Client Test *****PASS*****' to print out.
