MC
6 min readAug 19, 2020

--

AWS Machine Learning Specialty Exam Resources

Preparing for the AWS Machine Learning Exam 2020

This post is intended for those who are preparing for the AWS Machine Learning Specialty Exam. Unlike other exams out there, the AWS Machine Learning Specialty Exam is somewhat new and does not yet have as many online resources. It took me a while to find good study materials out there, so I decided to share what I found and what I did to help others who might be interested.

So why take the AWS Machine Learning Specialty Exam?

1. Validates your machine learning skills

2. The exam is a good way to get back into the field if you stepped out, changed roles or need a refresher for one reason or the other.

Here are some of the resources I found. There might be more out there but since I ended up having less than a week to prepare, these are the resources I used. Please keep in mind that I was a Data Scientist for several years and also took the AWS Solutions Architect Associate Exam earlier in the year.

Resources Used

Image by Patrick Tomasso from Unsplash

1. Whizlabs machine learning practice tests (2 full length practice tests and 1 half length test).

I did not go through all the questions they had but I liked the format and found the questions helpful. This is not a free resource but is offered at a discount periodically.

2. Machine Learning Specialty Practice Exam — This is not a full-length exam but like the Whizlabs exam it provided a summary report that affirmed what my weaknesses were. This is also not free (unless you have a discount voucher from AWS)

3. AWS Exam Guide and Sample Questions

While the guide was useful, I was more interested in the sample questions and their explanations. I scanned most of the links and papers suggested in the explanations. You can download these two resources from here:

https://www.whizlabs.com/learn/courses from the machine learning specialty section.

4. Good old google and the AWS site. Having tried a few questions, I was pretty aware of where I was weak. I then spent some time googling areas where I needed clarification and then pretty much decided it was worth going to the source. I went to the AWS site and read their documentation on AWS SageMaker Algorithms, and some of their Key Analytics Products such as Kinesis Streams, AWS Glue, AWS ReKognition, AWS Comprehend etc

5. Make use of the free webinars or digital courses. I did not have a chance to attend the webinar so instead, I used these two digital courses.

Exam Readiness: AWS Certified Machine Learning Specialty -> this course helped cement the ideas in my mind about how these theories are applied in an AWS environment. The course follows the structure of the exam guide and has questions at the end of each section. At the end of the course there are about 35 questions that test your understanding of the material. The answers and explanations are provided at the end.

Elements of Data Science -> I scanned through the sections of this course for about 1 hour. My focus was on understanding how all the AWS services tied together to provide different solutions to different problems from end to end.

The AWS documentation on streaming was great for this.

6. Miscellaneous — Other things that I found myself doing which was helpful:

a. Association — I tried to recall what I did when I was building and deploying machine learning pipelines. Reflected on how I would have done it in an AWS environment. For products and artifacts such as AWS Comprehend and Linear Learner, AWS Glue, I thought about other similar products I had built, or come across or used.

b. Practise — for some of the AWS stuff such as AMIs, Cloud formation, importing an external model to SageMaker I was able to lean on what I recalled from hands on practice.

c. Mnemonics — For some confusing or hard to remember aspects, I created phrases or rhymes for remembering them.

d. AWS Solutions Architect Concepts — I found these helpful here as well, as it made it easier to understand how to put together scalable or highly available data solutions. Also made it easier to understand how security should be handled etc.

Data Collection and Exploration

This section is the first part of the notes I put together while prepping for the AWS exam. I find that every few years, I come back to refresh my mind on similar concepts so the idea of creating high level reference notes appealed to me.

Think of data science as a process for getting insights from data by identifying patterns in the data. If you can apply heuristics to a problem or solve the issues manually and at scale, then you do not need the machine learning.

Data Collection / Ingestion

Common tools for moving batch data include:

AWS Data Migration Service (DMS)– used in migrating databases, AWS Glue

AWS tools for streaming: Kinesis Stream, Kinesis Firehose, Kinesis Video Stream

For more on AWS specific data ingestion tools go here. Take note of those that are fully managed and the kind of security provided with each of these options.

Data Transformation

A natural step after ingesting the data, is transforming the data. Aside from just ingesting data, some of these tools ingest or extract, transform and load (ETL) the data. In AWS for data ingestion, ETL or quick analytics tools we have AWS Glue (non streaming data, works with SQL and spark) and Kinesis Analytics (streaming data — almost real time).

Other data transformation tasks include removing duplicates, dealing with incomplete data, standardization and changing data structures or formats. It also includes cleaning up special characters, ensuring that columns with measurements or units are in same scale, that where necessary the same language is used, the data is represented as one feature per column, capitalization is consistent and missing data as well as outliers are handled. The treatment of outliers and missing data is especially important for machine learning.

Exploratory Data Analysis

The main way to understand data is by exploratory data analysis. Exploratory analysis can be through descriptive statistics (overall statistics, correlation matrices, attribute statistics, scatter plots etc). When preparing data for machine learning, understanding the relationship between variables is important. For instance, if two variables are highly correlated, you want to keep only one of them.

Aside from looking at summary statistics, data visualization can also provide meaningful insights about the data. Common visualization techniques include but are not limited to:

a) Scatter plot — shows the relationship between two numeric variables. Shows the nature of correlation: positive, negative or no correlation. A scatter plot matrix is a table of several scatter plots showing the relations of several variables in the data set to one another.

b) Histogram — shows the distribution (shape and spread) of an individual variable in the data set. For instance the histogram of the number of defects per hour or of bmi of a population.

Example of a Histogram

c) Boxplot (box and whisker plot)– shows the distribution of groups of data using quartiles. For instance a box plot showing the age by gender.

d) Correlation Matrix — a table showing correlation coefficients

e) Heat Map — Uses color to depict magnitude of an effect, with deeper hues indicating higher magnitudes of the effect. For instance, a heat map showing the distribution of volume of orders by state

f) Bubble chart — A 3D chart that uses bubbles and the size of the bubbles to show three dimensions of data. For instance a bubble chart that shows the relationship between GDP per capita and number of exports by country

References Used

https://www.8bitmen.com/what-is-data-ingestion-how-to-pick-the-right-data-ingestion-tool/

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

https://www.statstutor.ac.uk/resources/uploaded/pearsons.pdf

https://www.whizlabs.com/learn/courses

https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html

https://aws.amazon.com/dms/

https://www.aws.training/Details/eLearning?id=26598

--

--

MC

Data professional with extensive experience leading data science and data engineer product teams. Experienced in several technologies including big data.