Page Nav

HIDE

Breaking News:

latest

Ads Place

Review of re:Invent 2021 AI/ML Releases by a Former SageMaker PM

https://ift.tt/3daGUTY Opinion An in-depth analysis of all the new MLOps tools by AWS Every December, Amazon Web Services (AWS) releases ...

https://ift.tt/3daGUTY

Opinion

An in-depth analysis of all the new MLOps tools by AWS

Every December, Amazon Web Services (AWS) releases new machine learning (ML) capabilities for their customers. It’s an action-packed event in Las Vegas that usually sells out. There’s a lot of news, and I’m going to distill it down for the data science community.

Image of Conference from Unsplash

I previously wrote about Cloud MLOps Platforms on Towards Data Science and I’m currently building an applied ML stealth startup. I was previously a Senior Product Manager at AWS SageMaker, Computer Vision (AR/VR) Data, Tools, and Operations Lead at Facebook, and a founder at an applied ML hedge fund that invested in emerging market bonds. Feel free to Tweet or message me on LinkedIn your comments.

Over the last two years, I’ve observed there is a simple framework to apply ML products to: Applied AI/ML services, MLOps platforms, and ML frameworks. At the very bottom are frameworks that span ML libraries(PyTorch, Jax, XGBoost) to compilers and compute silicon (GPUs, ARM, TPU, ASICs). In the middle are MLOps platform SDKs like training, inference systems, metadata management, data processing, workflow engines, and notebook environments. Finally at the very top are AI/ML services, which abstract the layers below. AWS is happy to sell you AI services through Comprehend of Textract, but almost every enterprise I spoke with prefers to have their own ML teams that manage the complexity and glue the right tools together. My framework breakdown is actually in use in one of the standard AWS AI/ML marketing slides.

Image from Unsplash

SageMaker products span from low-level frameworks to AI services. Their product positioning aims to be a jack of all trades, which scores well on the Gartner Magic quadrant but remains widely debated by CIOs and managers of ML teams on the efficacy. Nevertheless, at 2021 re:Invent, SageMaker blurred the lines between the MLOps Platform and AI services, while also shipping new deep learning tools on the frameworks.

Without a doubt, SageMaker has put heavy emphasis on Deep Learning (DL) capabilities this year. Back in July, their leadership did something unprecedented by partnering with Hugging Face for a direct product collaboration of training and inference. AWS’s track record of partnering with open source startups is spotty compared to other cloud providers, but HuggingFace’s dominance in natural language processing (NLP) libraries made the synergies highly desirable.

Training Compiler — The product aims to reduce the training time of DL models by using different AWS proprietary libraries for running tensor operations. DL models are composed of a multidimensional matrix, and each layer of a neural network runs a series of mathematical operations during training. Each type of operation (add, subtract, etc) can be classified as an operator. In Numpy, there are over 1,000 operators and you can read more about this topic from my friend Chip. AWS chose to emphasize NLP models, which is a bet that customers have the most usage of deep learning with text. In my own CIO calls, I’ve seen similar trends. However, providing a generally usable product by a range of NLP customers has several challenges that I’m skeptical have been overcome.

First, SageMaker has its own container required for running training jobs. If there are missing operators, you won’t be able to run the training job until the library supports the optimized version. Fallback mechanisms introduce big bottlenecks that will slow jobs down. The FAQ for the Training product highlight these concerns, “Will I always get a faster training job with SageMaker Training Compiler? No, not necessarily”. Second, if you need full control of the container and what it installs, you’ll need white glove services from the AWS team. Third, if you’re making many iterations of the model in the experimentation phase, it’ll exacerbate the start times of your training job, which slows development velocity. It would be hard for me to recommend this to any ML team. Even running a POC is unlikely to be worth the effort for most deep learning use cases.

Ground Truth Plus — Ground Truth Plus allows companies to submit a project request, and SageMaker Program Managers will match the project to a set of workers that they manage. The only difference between Plus and standard Ground Truth is the worker management. In most deep learning data needs like audio transcription, segmentation, classification, and even 3D point cloud labelling, many startups have years of experience providing these services. Examples included private startup Scale.AI, socially responsible data annotation vendor Samasource, and Canadian Telus (through the acquisition of Lionbridge).

SageMaker has also released a few new features for their Studio Notebooks.

SageMaker Studio Lab — A free service for running notebooks similar to Google Colab. This product is great for the enthusiast community who wants to learn more about ML and access free compute. However, this isn’t going to help most enterprise customers. I’m on the waitlist to use the product, so I’ll reserve deeper comments until after I’ve used it.

SageMaker Canvas — If you’ve used SageMaker Autopilot, Canvas wraps that product with more Studio graphical tools to minimize the coding For business analysts without Python data science experience in organizations who want to do quick ML experimentation, I can see a clear benefit in exposing this feature to them. In other words, any company with a large Snowflake and Redshift user base can potentially optimize TCO by having their data analysts first run POCs with Canvas before pulling in data scientists for support. The challenge here remains that the type of problems these cookie-cutter models can solve are narrow and a large part of the problem for tabular data-based ML is in the data processing. Canvas isn’t available on US-West-1 as of writing, and I can share more details after playing around.

SageMaker Studio Spark Connector — Spark is probably one of the most widely used distributed data processing systems of all time. For data preprocessing before ML, this is a starting point for simplifying the developer experience. A majority of Fortune 500 companies have some deployment on Spark, and deployments vary from on instance (bare metal), Kubernetes (Spark Operators), Databricks, and in AWS’s case EMR. While I’ve seen more Databricks than EMR customers, non-EMR Spark support may be coming down the line. It’s worth noting that this is a developer experience enhancement and doesn’t benefit performance. Large-scale distributed processing jobs are usually best run when the data and computations are colocated. However, EMR for data processing and SageMaker for ML model training are run in two entirely separate compute environments.

Finally, there was also a SageMaker Inference Recommender. SageMaker and the majority of AWS products rely on the concept of instances. SageMaker compute jobs are essentially run on the standard EC2 node pool with a custom runtime, AMI, and container. Since instances are discrete, but traffic and workloads fall under a continuous distribution, many enterprises I’ve worked with have ML workloads run off Kubernetes. There are tools like knative on Kubernetes which can elastically scale a persistent service from zero, use spot instances through a resource configuration file. Libraries such as Seldon Core and KServe package these features together in an installable Kubernetes manifest, along with other features.

SageMaker Inference recommender attempts to close that gap by providing latency and throughput metrics for each instance type. You have to pay for the instance compute costs. Rather than make the larger investment to make model serving serverless or tied to a company’s existing compute cluster, SageMaker chose to put a bandaid. If a customer really needed this feature, it’s pretty straightforward to write a script to do what this Inference Recommender offers. In fact, Intuit, a major customer, did just that, and you can see their open-source code on Github earlier this year.

Managed service still provide tremendous values for companies. They lower total cost of ownership for an IT organization by offloading the mundane configuration and setup. However, the field changes too quickly and at this point there is only one real end game for SageMaker and other major MLOps platforms. They need to run their tools and compute engine on Kubernetes. Microsoft Azure and GCP already offer some form of this in their ML platforms.

The ecosystem of MLOps tools in Kubernetes open-source tools has grown quickly. Some of the large brand names, such as Kubeflow Pipelines, have become very popular products in enterprise. SageMaker could have a large share of the MLOps pie, but today they’ve taken a mostly walled-garden approach. That means, unfortunately, ML teams will have to continue mixing and matching open-source, homegrown, and multiple vendor services to make end-to-end ML workflows run well.

In short, the announcements for 2021 re:Invent in AI/ML were lacklustre, to say the least. It’s not for lack of demand from customers either—Morgan Stanley CIO surveys show analytics (AI/ML) is still a top-5 consideration quarter after quarter. Enterprises should continue to invest in their own MLOps team that can selectively pick out which solutions make sense for their company. All enterprises with more than 25 data scientists will not be able to run on tools like AWS SageMaker alone. Check out my previous post for a deeper dive on tooling options and the state of the ecosystem.

With all this said, as someone who has worked alongside many of the engineers in SageMaker, I’m confident they are investing in long-term game-changing bets for the industry. However, it seems many of them were not ready for 2021 re:Invent.


Review of re:Invent 2021 AI/ML Releases by a Former SageMaker PM was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/31kEoYV
via RiYo Analytics

No comments

Latest Articles