Roughly a year ago, Google announced the launch of Vertex AI, a managed AI platform designed to help companies to accelerate the deployment of AI models. To mark the service's anniversary and the kickoff of Google's Applied ML Summit, Google this morning announced new features heading to Vertex, including a dedicated server for AI system training and "example-based" explanations.
"We launched Vertex AI a year ago with a goal to enable a new generation of AI that empowers data scientists and engineers to do fulfilling and creative work," Henry Tappen, Google Cloud group product manager, told TechCrunch via email. "The new Vertex AI features we’re launching today will continue to accelerate the deployment of machine learning models across organizations and democratize AI so more people can deploy models in production, continuously monitor and drive business impact with AI."
As Google has historically pitched it, the benefit of Vertex is that it brings together Google Cloud services for AI under a unified UI and API. Customers including Ford, Seagate, Wayfair, Cashapp, Cruise and Lowe's use the service to build, train and deploy machine learning models in a single environment, Google claims -- moving models from experimentation to production.
Vertex competes with managed AI platforms from cloud providers like Amazon Web Services and Azure. Technically, it fits into the category of platforms known as MLOps, a set of best practices for businesses to run AI. Deloitte predicts the market for MLOps will be worth $4 billion in 2025, growing nearly 12x since 2019.
Gartner projects the emergence of managed services like Vertex will cause the cloud market to grow 18.4% in 2021, with cloud predicted to make up 14.2% of total global IT spending. "As enterprises increase investments in mobility, collaboration and other remote working technologies and infrastructure, growth in public cloud [will] be sustained through 2024," Gartner wrote in a November 2020 study.
Among the new features in Vertex is the AI Training Reduction Server, a technology that Google says optimizes the bandwidth and latency of multisystem distributed training on Nvidia GPUs. In machine learning, "distributed training" refers to spreading the work of training a system across multiple machines, GPUs, CPUs or custom chips, reducing the time and resources it takes to complete the training.
"This significantly reduces the training time required for large language workloads, like BERT, and further enables cost parity across different approaches," Andrew Moore, VP and GM of cloud AI at Google, said in a post today on the Google Cloud blog. "In many mission critical business scenarios, a shortened training cycle allows data scientists to train a model with higher predictive performance within the constraints of a deployment window."
In preview, Vertex also now features Tabular Workflows, which aims to bring greater customizability to the model creation process. As Moore explained, Tabular Workflows allows users to choose which parts of the workflow they want Google's "AutoML" technology to handle versus which parts they want to engineer themselves. AutoML, or automated machine learning -- which isn't unique to Google Cloud or Vertex -- encompasses any technology that automates aspects of AI development and can touch on development stages from the beginning with a raw dataset to building a machine learning model ready for deployment. AutoML can save time but can't always beat a human touch -- particularly where precision is required.
"Elements of Tabular Workflows can also be integrated into your existing Vertex AI pipelines," Moore said. "We’ve added new managed algorithms including advanced research models like TabNet, new algorithms for feature selection, model distillation and … more."
Germane to development pipelines, Vertex is also gaining an integration (in preview) with serverless Spark, the serverless version of the Apache-maintained open source analytics engine for data processing. Now, Vertex users can launch a serverless Spark session to interactively develop code.
Elsewhere, customers can analyze features of data in Neo4j's platform and then deploy models using Vertex courtesy of a new partnership with Neo4j. And -- thanks to a collaboration between Google and Labelbox -- it's now easier to access Labelbox's data-labeling services for images, text, audio and video data from the Vertex dashboard. Labels are necessary for most AI models to learn to make predictions; the models train to identify the relationships between labels, also called annotations, and example data (e.g., the caption "frog" and a photo of a frog).
In the event that data becomes mislabeled, Moore proffers Example-based Explanations as a solution. Available in preview, the new Vertex features leverages "example-based" explanations to help diagnose and treat issues with data. Of course, no explainable AI technique can catch every error; computational linguist Vagrant Gautam cautions against over-trusting tools and techniques used to explain AI.
"Google has some documentation of limitations and a more detailed white paper about explainable AI, but none of this is mentioned anywhere [today's Vertex AI announcement]," they told TechCrunch via email. "The announcement stresses that 'skills proficiency should not be the gating criteria for participation' and that the new features they provide can 'scale AI for non-software experts.' My concern is that non-experts have more faith in AI and in AI explainability than they should and now various Google customers can build and deploy models faster without stopping to ask whether that is a problem that needs a machine learning solution in the first place, and calling their models explainable (and therefore trustworthy and good) without knowing the full extent of the limitations around that for their particular cases."
Still, Moore suggests that Example-based Explanations can be a useful tool when used in tandem with other model auditing practices.
"Data scientists shouldn’t need to be infrastructure engineers or operations engineers to keep models accurate, explainable, scaled, disaster resistant and secure, in an ever-changing environment," Moore added. "Our customers demand tools to easily manage and maintain machine learning models."