Execute Machine Learning Jobs in Microsoft Azure Databricks from StreamSets
In my previous blog post, I demonstrated how to achieve low-latency inference using Databricks ML models in StreamSets. Now let's say you have a dataflow pipeline that is ingesting data, enriching it,...
View ArticleScaling Data Collectors on Azure Kubernetes Service
In this blog post, I will present a step-by-step guide on how to scale Data Collector instances on Azure Kubernetes Service (AKS) using provisioning agents—which help automate upgrading and scaling...
View ArticleField Mapper Processor: The Swiss Army Knife of Bulk Field Manipulation
Guest post by Jeff Evans, Senior Software Engineer, StreamSets. The Field Mapper processor, introduced in Data Collector version 3.8.0, provides a flexible and powerful way to manipulate fields en...
View ArticleAnnouncing StreamSets Data Collector 3.9.0 and StreamSets Data Collector Edge...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.9.0 and StreamSets Data Collector Edge 3.9.0. StreamSets Data Collector is open source under Apache License...
View ArticleAnnouncing StreamSets Data Collector 3.10.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.10.0 and StreamSets Data Collector Edge 3.10.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer Extensibility: Spark and Machine Learning
Apache Spark has been on the rise for the past few years and it continues to dominate the landscape when it comes to in-memory and distributed computing, real-time analysis and machine learning use...
View ArticleStreamSets Transformer Extensibility — Part 2: Spark MLeap Bundles to S3
In part 1, you learned how to extend StreamSets Transformer in order to train Spark ML RandomForestRegressor model. In this part 2, you will learn how to create Spark MLeap bundle to serialize the...
View ArticleStreamSets Cloud Unlocking Insights: Amazon S3 to Snowflake
StreamSets Cloud is a cloud service for designing, deploying and operating smart data pipelines, combining ease and scalability with the flexibility to execute pipelines anywhere – on-premise, or in a...
View ArticleAnnouncing StreamSets Data Collector 3.11.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.11.0 and StreamSets Data Collector Edge 3.11.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Your Questions Answered
StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables...
View ArticleStreamSets Data Collector: Simple Network Management Protocol And Management...
This is a guest post by Clark Bradley, Solutions Engineer, StreamSets SNMP stands for simple network management protocol and allow for network devices to share information. SNMP is supported across a...
View ArticleStreamSets Transformer: Design Patterns For Slowly Changing Dimensions
In this blog, we will look at a few design patterns for Slowly Changing Dimensions (SCD) Type 2 and see how StreamSets Transformer, the newest addition to the StreamSets DataOps Platform, makes it easy...
View ArticleAnnouncing StreamSets Data Collector 3.12.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.12.0 and StreamSets Data Collector Edge 3.12.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Natural Language Processing in PySpark
In two of my previous blogs I illustrated how easily you can extend StreamSets Transformer using Scala: 1) to train Spark ML RandomForestRegressor model, and 2) to serialize the trained model and save...
View ArticleStreamSets Transformer Extensibility: Spark and Machine Learning
Apache Spark has been on the rise for the past few years and it continues to dominate the landscape when it comes to in-memory and distributed computing, real-time analysis and machine learning use...
View ArticleStreamSets Transformer Extensibility — Part 2: Spark MLeap Bundles to S3
In part 1, you learned how to extend StreamSets Transformer in order to train Spark ML RandomForestRegressor model. In this part 2, you will learn how to create Spark MLeap bundle to serialize the...
View ArticleStreamSets Cloud Unlocking Insights: Amazon S3 to Snowflake
StreamSets Cloud is a cloud service for designing, deploying and operating smart data pipelines, combining ease and scalability with the flexibility to execute pipelines anywhere – on-premise, or in a...
View ArticleAnnouncing StreamSets Data Collector 3.11.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.11.0 and StreamSets Data Collector Edge 3.11.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Your Questions Answered
StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables...
View ArticleStreamSets Data Collector: Simple Network Management Protocol And Management...
This is a guest post by Clark Bradley, Solutions Engineer, StreamSets SNMP stands for simple network management protocol and allow for network devices to share information. SNMP is supported across a...
View Article
More Pages to Explore .....