Search

All (23)

Blog Posts (22)

Other Pages (1)

23 results found with an empty search

Blog Posts (22)

Getting started in Analytics & Data Science: UC Berkeley Executive Education
I'm thinking about graduate school and have an interesting project at work. To help guide both, I have started a 12 week learning adventure with UC Berkeley Haas Data Science: Bridging Principles and Practice. Week 0 was assessment and refresh. Statistics for Business: Decision Making and Analysis, 3rd. Edition . VARIABLES categorical variable: Column of values in a data table that identifies cases with a common attribute. Sometimes called qualitative or nominal variables (no order, see ordinal). (Fruit, Types, zip codes, Identification Numbers, etc). these are continuous variables. ordinal variable : A categorical variable whose labels have a natural order. (rating system 10 Best and 1 Worst). A Likert scale is a measurement scale that produces ordinal data, typically with five to seven categories. Another example can be Tiny, Small, Med, Large, Jumbo. numerical variable : Column of values in a data table that records numerical properties of cases (also called continuous variable). (Amounts, dates, times) measurement unit : Scale that defines the meaning of numerical data, such as weights measured in kilograms or purchases measured in dollars. CAUTION: The data that make up a numerical variable in a data table must share a common unit. area principle: The area of a plot that shows data should be proportional to the amount of data. time series : A sequence of data recorded over time. timeplot : A graph of a time series showing the values in chronological order. frequency : The time spacing of data recorded in a time series. distribution : The collection of values of a variable and how often each occurs. frequency table: A tabular summary that shows the distribution of a variable, count. relative frequency : The frequency of a category divided by the number of cases; a proportion or percentage.
Getting started in Analytics & Data Science
Dec 2018 - Feb 2019 I started with Practical Statistics for Data Scientists . This was a great way to think about the math, visualize it and get introduced to Python/R. Start with the definition of a mean and terminology like data frame, feature and outcomes. Move through statistics concepts like boxplots, scatterplots, central limit theorem, binomial distribution and significance testing including P-values. There's an entire chapter dedicated to Machine Learning algorithms like K-Nearest Neighbors, Bagging and Random Forests, Boosting and more. Pro tip: Focus on visualizing.
Data Products: Standards up front
For those of you looking to establish a new department at work or want to play with modeling for the first time, you should think about standards. When is it done? How do you know it's good enough to productize? Standards and protocols. When should standards be established? Set standards at the beginning. Even if your standards are a default set by an education institution, that's good enough. It only counts if you write it down though. Where do I start? At the very least, there's always best approaches and most appropriate models to use for problems and data sources (if it's not time data you won't use ARIMA). At the top of the project, take the time to work on a project plan that includes data models intended for data exploration and proof of concept phases. Each model should have a standard. Here's a couple of examples: Logistic Regression, AUC > .7 Linear Regression, r-squared > .6 Do not allow Forests models to be used. The nature of these models obscures us from having access to metrics for deep analysis. Reserve these models for boosting. What are the basic protocols to always use? Establish and codify your standards in the planning phase. Always deliver conclusions with visuals like ROC, gg-plots, elbow-charts, etc. Even as a professional, it's always fun to kick off an experiment. Drop me a note and tell me how you do it. -DCN

View All

Other Pages (1)

About | Danielle Costa Nakano
20+ years of experience building businesses - always looking to turn challenges to into opportunities - coaching high-functioning teams - business value, data-driven strategy - measurable outcomes Highlights - data strategy, product management, digital transformation - omni-channel personalization, data monetization - automation, business intelligence, predictive analytics, artificial intelligence next generation data & technology GRASSROOTS ANALYTICS iconic, global brand & nonprofit NATIONAL GEOGRAPHIC SOCIETY political technology market leader NGP VAN social good private equity software BONTERRA TECH nonprofit fundraising & advocacy technology EVERYACTION Industry Associations DATA PRODUCTS LEADERSHIP COMMUNITY 2023 - 2025 INTERNATIONAL INSTITUE OF BUSINESS ANALYSTS 2009 - 2016

View All

Search

23 results found with an empty search

Blog Posts (22)

Other Pages (1)

Subscribe Form