Our Deep Learning platform built on the SmartGenie Framework is Faster, Scalable, Secure, Portable, Automated and supports open-source distributed run times with Hadoop & Spark.
The Smart Genie Engine is an application specific integrated circuit (ASIC) that is custom-designed and optimized for deep learning. The Smart Genie Engine includes everything you need for deep learning and nothing you don’t, resulting in a 20x increase in training speed and ensuring that Smart Genie Cloud will remain the world’s fastest deep learning platform for the foreseeable future! Smart Genie Engine delivers deep learning at ludicrous speed.
Blazingly fast data access via high bandwith memory
Training deep learning networks involves moving a lot of data, and current memory technologies are simply not up to the task. The Smart Genie Engine uses a new memory technology called High Bandwidth Memory that is both high-capacity and high-speed, providing 64 GB of on-chip storage and a blazingly fast 16 Terabits per second of memory access speed.
Unprecedented computing power
The Smart Genie Engine design includes mostly multipliers and local memory and skips elements such as caches that are needed for graphics processing but not deep learning. As a result, the Smart Genie Engine achieves unprecedented compute density and an order of magnitude more raw computing power than today’s state-of-the-art GPUs.
Throughput near the theoretical limit
The Smart Genie Engine has separate pipelines for computation and data management, so new data is always available for computation. This pipeline isolation, combined with plenty of local memory, means that the Smart Genie Engine can run near its theoretical maximum throughput much of the time.
Built-in networking for unprecedented speed and scalability
The Smart Genie Engine includes six bi-directional high-bandwidth links, enabling ASICs to be interconnected so that data can move between them — and even between chassis — in a seamless fashion. This enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their model to unprecedented sizes without any decrease in speed. Competing systems use oversubscribed, low-bandwidth Pie busses for all communication which greatly limits their ability to improve performance by adding more hardware.