CuPy – Preferred Networks, Inc.

Preferred Networks Releases CuPy v8

sakaguchi — Thu, 01 Oct 2020 07:00:03 +0000

TOKYO – October 1, 2020 – Preferred Networks, Inc. (PFN) today released CuPy v8, the new major update to the open-source library for general-purpose matrix calculation.

CuPy v8 provides the following new features:

Support for CUDA 11 and the latest NVIDIA GPU (Ampere architecture)
Boosts single-precision mathematics using TensorFloat-32 (TF32) computation mode
Official support for NVIDIA cuTENSOR/CUB
Performance improvements up to 9.7x for matrix computations in our benchmarks (see blog post for details)
Enhanced kernel fusion
Now supports merging computations including multiple reductions into a single kernel
Automatic tuning of kernel launch parameters using Optuna
Discover the optimal launch parameters depending on the data properties to improve performance
Memory pool sharing with external libraries
Improved interoperability with PyTorch by using pytorch-pfn-extras; for example, you can flexibly integrate CuPy as a preprocess code into the PyTorch workflow
Improved NumPy/SciPy function coverage
Many functions added, including the NumPy Polynomials package (results of Google Summer of Code 2020) and the SciPy image processing package

PFN will continue to swiftly incorporate the latest research outcomes while collaborating with supporting companies and open source communities for the development of CuPy.

投稿 Preferred Networks Releases CuPy v8 は Preferred Networks, Inc. に最初に表示されました。

Preferred Networks Migrates its Deep Learning Research Platform to PyTorch

sakaguchi — Thu, 05 Dec 2019 06:00:42 +0000

December 5, 2019, Tokyo Japan – Preferred Networks, Inc. (PFN, Head Office: Tokyo, President & CEO: Toru Nishikawa) today announced plans to incrementally transition its deep learning framework (a fundamental technology in research and development) from PFN’s Chainer to PyTorch. Concurrently, PFN will collaborate with Facebook and the other contributors of the PyTorch community to actively participate in the development of PyTorch. With the latest major upgrade v7 released today, Chainer will move into a maintenance phase. PFN will provide documentation and a library to facilitate the migration to PyTorch for Chainer users.

PFN President and CEO Toru Nishikawa made the following comments on this business decision.

“Since the start of deep learning frameworks, Chainer has been PFN’s fundamental technology to support our joint research with Toyota, FANUC, and many other partners. Chainer provided PFN with opportunities to collaborate with major global companies, such as NVIDIA and Microsoft. Migrating to PyTorch from Chainer, which was developed with tremendous support from our partners, the community, and users, is an important decision for PFN. However, we firmly believe that by participating in the development of one of the most actively developed frameworks, PFN can further accelerate the implementation of deep learning technologies, while leveraging the technologies developed in Chainer and searching for new areas that can become a source of competitive advantage.”

● Background

Developed and provided by PFN, Chainer has supported PFN’s R&D as a fundamental technology and significantly contributed to its business growth since it was open-sourced in June 2015. Its unique Define-by-Run method has gained support from the community of researchers and developers. It has been widely adopted as a standard method by the current mainstream deep learning frameworks, because it allows users to build complex neural networks intuitively and flexibly, speeding up the advancement of deep learning technology.
Meanwhile, the maturation of deep learning frameworks over the last several years has marked the end of the era when deep learning framework itself was the competitive edge to development. PFN believes that instead of making small adjustments to differentiate itself from competitors, it should contribute to the sustainable growth of the community of developers and users and create a healthy ecosystem with the common goal of further advancing deep learning technology.

● Migrating PFN’s deep learning R&D platform to PyTorch

PFN will migrate its deep learning research platform to PyTorch, which draws inspiration from Chainer, to enable flexible prototyping and a smooth transition from research to production for machine learning development. With a broad set of contributing developers including Facebook, PyTorch boasts an engaged developer community and is one of the most frequently used frameworks in academic papers. Migrating to PyTorch will allow PFN to efficiently incorporate the latest research results into its R&D activities and leverage its existing Chainer assets by converting them to PyTorch. PFN will cooperate with PyTorch team at Facebook and in the open-source community to contribute to the development of PyTorch, as well as supporting PyTorch on MN-Core, a deep learning processor currently being developed by PFN.

PFN has received the following comments from Facebook and the Toyota Research Institute :

Bill Jia, Vice President of AI Infrastructure, Facebook

“As a leading contributor to PyTorch, we’re thrilled that a pioneer in machine learning (ML), such as PFN, has decided to adopt PyTorch for future development,” said Bill Jia, Facebook Vice President of AI Infrastructure. “PyTorch’s enablement of leading-edge research, combined with its ability for distributed training and inference, will allow PFN to rapidly prototype and deploy ML models to production for its customers. In parallel, the entire PyTorch community will benefit from PFN code contributions given the organization’s expertise in ML tools.”

Gill Pratt, CEO, Toyota Research Institute

“TRI and TRI-AD welcome the transition by PFN to PyTorch,” said Gill Pratt, CEO of Toyota Research Institute (TRI), Chairman of Toyota Research Institute – Advanced Development (TRI-AD), and a Fellow of Toyota Motor Corporation. “PFN has in the past strongly contributed to our joint research, development, and advanced development in automated driving by creating and maintaining Chainer. TRI and TRI-AD have used PyTorch for some time and feel that PFN’s present adoption of PyTorch will facilitate and accelerate our application of PFN’s expertise in deep learning.”

● Major features of the latest deep learning framework Chainer v7 and general-purpose matrix calculation library CuPy v7.

Chainer v7 features improved inter-operability with C++-based ChainerX

Chainer v7 includes the distributed deep learning package ChainerMN, and ChainerX is supported by many Chainer functions
TabularDataset class has been added to flexibly process multi-column datasets
With ONNX and Chainer consolidated, Chainer v7 can work with inference engines through ONNX

For details about Chainer’s new features, future development, and documentation on how to migrate to PyTorch, please read the latest blog post from the Chainer development team.
https://chainer.org/announcement/2019/12/05/released-v7.html

With cuTENSOR and CUB library supported, CuPy has improved performance on NVIDIA GPUs
CuPy has experimentally added support for ROCm, enabling it to be used on AMD GPUs

Chainer Release Note: https://github.com/chainer/chainer/releases/tag/v7.0.0

Chainer Documentation: https://docs.chainer.org/en/v7.0.0/

PFN will continue to develop other open-source software (namely CuPy, and Optuna) as actively as ever.

投稿 Preferred Networks Migrates its Deep Learning Research Platform to PyTorch は Preferred Networks, Inc. に最初に表示されました。

Preferred Networks releases version 6 of both the open source deep learning framework Chainer and the general-purpose matrix calculation library CuPy

preferred_webmaster — Thu, 16 May 2019 09:30:07 +0000

May 16, 2019, Tokyo Japan – Preferred Networks, Inc. (PFN, Head Office: Tokyo, President & CEO: Toru Nishikawa) has released Chainer(TM) v6 and CuPy(TM) v6, major updates of PFN’s open source deep learning framework and general-purpose matrix calculation library, respectively. The latest version will run as-is on most of the code used in previous versions.
Chainer was released as open source software in 2015 and is known as a pioneer of flexible and intuitive deep learning frameworks based on the Define-by-Run method. Chainer has since been supported by many users and is being actively developed.

ChainerX, a C++ implementation of automatic differentiation that has experimentally been integrated into the main Chainer distribution since the release of the v6 beta version, now supports more examples. The use of ChainerX can significantly reduce overhead on the framework side in both forward and backward propagations without losing much of Chainer’s flexibility and backward compatibility, resulting in increased performance. In addition, Chainer and ChainerX source code does not need to be changed to use new hardware on ChainerX if a third-party developer implements the support for the hardware as a plug-in.

Main features of Chainer v6 and CuPy v6 are:

Integration of ChainerX
- Fast and more portable multi-dimensional arrays and the automatic differentiation backend have been added.
- A compatibility layer has been implemented to allow for the use of ChainerX arrays in the same manner as NumPy and CuPy arrays, allowing automatic differentiation with low overhead in C++.
- An integrated device API has been introduced. The unified interface can handle the specification of devices or inter-device transfer for a wide variety of backends such as NumPy, CuPy, iDeep, and ChainerX.
Enhanced support for training in mixed precision
- Mixed 16, a new default data type, has been added. It is a mixed precision mode that realizes transparent training using operations in single and half precisions.
- Dynamic scaling that detects and automatically adjusts overflow has been implemented in order to avoid underflow in mixed precision training.
Addition of a function and link test tool
- A test tool that generates unit tests for forward and backward propagations as well as second order differentials with minimal code has been added.
CuPy arrays to support NumPy functions
- NumPy’s experimental feature __array_function__ is supported now. CuPy arrays have been directly applied to many __array_function__ enabled Numpy functions.

Chainer Release Note: https://github.com/chainer/chainer/releases/tag/v6.0.0
Chainer Documentation: https://docs.chainer.org/en/v6.0.0/
Blog: https://chainer.org/announcement/2019/05/16/released-v6.html

PFN will continue improving Chainer performance and expanding the backend. It will contribute to improved performance in a wide range of use cases by making ChainerX easier to use as well as supporting more arithmetic operations.

Chainer has incorporated a number of development results from external contributors. PFN will continue to quickly adopt the results of the latest deep learning research and promote the development and popularization of Chainer in collaboration with supporting companies and the OSS community.

About the Chainer(TM) Open Source Deep Learning Framework

Chainer is a Python-based deep learning framework developed and provided by PFN, which has unique features and powerful performance that allow for designing complex neural networks easily and intuitively, thanks to its “Define-by-Run” approach. Since it was open-sourced in June 2015, as one of the most popular frameworks, Chainer has attracted not only the academic community but also many industrial users who need a flexible framework to harness the power of deep learning in their research and real-world applications.

Chainer quickly incorporates the results of the latest deep learning research. With additional packages such as ChainerRL (reinforcement learning), ChainerCV (computer vision), and Chainer Chemistry（a deep learning library for chemistry and biology）and through the support of Chainer development partner companies, PFN aims to promote the most advanced research and development activities of researchers and practitioners in each field.（http://chainer.org/）

投稿 Preferred Networks releases version 6 of both the open source deep learning framework Chainer and the general-purpose matrix calculation library CuPy は Preferred Networks, Inc. に最初に表示されました。

Preferred Networks releases version 5 of both the open source deep learning framework, Chainer and the general-purpose array calculation library, CuPy.

preferred_webmaster — Thu, 25 Oct 2018 09:12:58 +0000

Preferred Networks, Inc. (PFN, President and CEO: Toru Nishikawa) has released Chainer^(TM) v5 and CuPy^(TM) v5, major updates of PFN’s open source deep learning framework and general-purpose array calculation library, respectively.

In this major upgrade after six months, Chainer has become easier to use after integrating with ChainerMN, which has been provided as a distributed deep learning package to Chainer. The latest v5 will run as-is on most of the code used in previous versions.

Main features of Chainer v5 and CuPy v5 are:

Integrated with the ChainerMN distributed deep learning package

・With ChainerMN incorporated in Chainer, fast distributed deep learning on multiple GPUs can be conducted more easily.

Support for data augmentation library NVIDIA^(R)

・Chainer v5 performs faster data preprocessing by decoding and resizing of JPEG images on GPUs.

Support for FP16

・Changing to half-precision floating-point (FP16) format is possible with minimal code changes.

・Reduced memory consumption, which allows larger batch sizes.

・Further speed increases with the use of NVIDIA^(R) Volta GPU Tensor Cores.

Latest Intel^(R) Architecture compatibility

・Chainer v5 supports the latest version 2 of Chainer Backend for Intel^(R) Architecture (previously, iDeep, which was added to Chainer v4) for faster training and inference on Intel^(R) Processors.

High-speed computing and memory saving for static graphs

・Chainer v5 optimizes computation and memory usage by caching static graphs that do not change throughout training. This speeds up training by 20-60%.

Enhanced cooperation with Anaconda Numba and PyTorch, enabling the mutual exchange of parallel data

・Added ability to pass a CuPy array directly to a JIT-compiled function by Anaconda Numba.

・DLpack：Array data can be exchanged with PyTorch and other frameworks.

CuPy basic operations are 50% faster

・Performance of basic operations such as memory allocation and array initialization has improved.

Chainer Release Note: https://github.com/chainer/chainer/releases/tag/v5.0.0
CuPy Release Note: https://github.com/cupy/cupy/releases/tag/v5.0.0
Upgrade Guide：https://docs.chainer.org/en/v5.0.0/upgrade.html

Chainer and CuPy have incorporated a number of development results from external contributors. PFN will continue to quickly adopt the results of the latest deep learning research and promote the development and popularization of Chainer and CuPy in collaboration with supporting companies and the OSS community.

◆ About the Chainer^(TM) Open Source Deep Learning Framework

投稿 Preferred Networks releases version 5 of both the open source deep learning framework, Chainer and the general-purpose array calculation library, CuPy. は Preferred Networks, Inc. に最初に表示されました。

Preferred Networks released open source deep learning framework Chainer v4 and general-purpose array calculation library CuPy v4.

preferred_webmaster — Tue, 17 Apr 2018 08:48:54 +0000

Tokyo, Japan, April 17, 2018 — Preferred Networks, Inc. (PFN, Headquarters: Chiyoda-ku, Tokyo, President and CEO: Toru Nishikawa) has released v4 of Chainer and CuPy, major updates of the open source deep learning framework and the general-purpose array calculation library, respectively.

This major upgrade to Chainer and CuPy incorporates the results of the latest deep learning research over the last six months. The newly released v4 is largely compatible with previous versions of Chainer.

Main features of Chainer and CuPy v4 include:

Additional functions for fast, memory-efficient training on NVIDIA^(R) GPUs ^*1

Chainer now supports NVIDIA TensorCore to speed up convolutional operations. Loss scaling has also been implemented to alleviate the vanishing gradient problem when using half-precision floats.

Quick installation of CuPy

We have begun providing a binary package of CuPy to reduce the installation time from 10 minutes down to about 10 seconds.

Optimized for Intel^(R) Architecture

An Intel Deep Learning Package (iDeep) ^*2 backend has been added to make training and inference on Intel CPUs faster. This delivers an 8.9-fold improvement of GoogLeNet (a neural network used for image recognition) inference speed on CPUs, according to our benchmark results^*3.

More functions supporting second order differentiation

Enhanced support for second order differentiation, which was first introduced in v3, allows easier implementation of the latest networks and algorithms.

A new function to export results of training with Chainer in the Caffe format

A function to export Chainer’s computational procedure and learned weights in the Caffe format has been added as experimental. This makes it easier to use the results of training with Chainer even in an environment where Python cannot be executed. (Exporting into the ONNX format is also available via the onnx-chainer package.)

◆Chainer ReleaseNote: https://github.com/chainer/chainer/releases/tag/v4.0.0

◆Update Guide：https://docs.chainer.org/en/latest/upgrade.html

Chainer and CuPy have taken in a number of development results from external contributors. PFN will continue working with supporting companies and the OSS community to promote the development and popularization of Chainer and CuPy.

* 1：http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html

* 2：NumPy-compatible library for performing general arithmetic operations in deep learning at a high speed on Intel CPUs https://github.com/intel/ideep

* 3：The results of comparison in time to process an image between when iDeep was enabled and disabled. Intel Math Kernel Library was enabled in both cases. Intel Xeon^(R) CPU E5-2623 v3 was used.

About the Chainer Open Source Deep Learning Framework

Chainer is a Python-based deep learning framework developed mainly by PFN, which has unique features and powerful performance that allow for designing complex neural networks easily and intuitively, thanks to its “Define-by-Run” approach. Since it was open-sourced in June 2015, as one of the most popular frameworks, Chainer has attracted not only the academic community but also many industrial users who need a flexible framework to harness the power of deep learning in their research and real-world applications.

Chainer incorporates the results of the latest deep learning research. With additional packages such as ChainerMN (distributed learning), ChainerRL (reinforcement learning), ChainerCV (computer vision) and through the support of Chainer development partner companies, PFN aims to promote the most advanced research and development activities of researchers and practitioners in each field. （http://chainer.org/）

投稿 Preferred Networks released open source deep learning framework Chainer v4 and general-purpose array calculation library CuPy v4. は Preferred Networks, Inc. に最初に表示されました。

Preferred Networks released open source deep learning framework Chainer v3 and NVIDIA GPU array calculation library CuPy v2

preferred_webmaster — Tue, 17 Oct 2017 06:00:25 +0000

Preferred Networks, Inc. (PFN, Headquarters: Chiyoda-ku, Tokyo, President and CEO: Toru Nishikawa) has released Chainer v3, a major update of the open source deep learning framework Chainer^(R), as well as NVIDIA^(R) GPU array calculation library CuPy v2.

We release a major upgrade of Chainer every three months that quickly incorporates the results of the latest deep learning research. The newly released Chainer v3 will run without the need to change most of your code.

Main features of Chainer v3 and CuPy v2 include:

1. Automatic differentiation of second and higher order derivatives

Chainer now supports automatic differentiation of second order and higher derivatives in many functions. This will enable users to easily implement deep learning methods that require second order differentiation as per equations written in papers.

2. Improved CuPy memory allocation

In many neural nets, memory efficiency when using GPUs will improve significantly, and reallocation of memory will be reduced in some cases, increasing speed.

3. Sparse matrix support has been added to CuPy

Large-scale graph analysis and natural language processing, which have previously been highly costly to implement on GPUs, can now be implemented more easily thanks to sparse matrix calculation being available on the GPU.

◆ Chainer ReleaseNote: https://github.com/chainer/chainer/releases/tag/v3.0.0

Chainer v3 has taken in a number of development results from external contributors as its previous versions did. PFN will continue working with supporting companies and the OSS community to promote the development and popularization of Chainer.

◆ About the Chainer Open Source Deep Learning Framework

Chainer is a Python-based deep learning framework developed by PFN, which has unique features and powerful performance that enables users to easily and intuitively design complex neural networks, thanks to its “Define-by-Run” approach. Since it was open-sourced in June 2015, as one of the most popular frameworks, Chainer has attracted not only the academic community but also many industrial users who need a flexible framework to harness the power of deep learning in their research and real-world applications.

＊Chainer^(R) and CuPy^TM are the trademark or the registered trademark of Preferred Networks, Inc. in Japan and other countries.

投稿 Preferred Networks released open source deep learning framework Chainer v3 and NVIDIA GPU array calculation library CuPy v2 は Preferred Networks, Inc. に最初に表示されました。