Search
24 results found with an empty search
Blog Posts (23)
- Fisher's Linear Discriminant (Machine Learning Algorithm)
A deep dive.... Description : We can view linear classification models in terms of dimensionality reduction. Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant , a method used in statistics , pattern recognition , and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier , or, more commonly, for dimensionality reduction before later classification . It is a classification method. Algorithm : To begin, consider the case of a two-class classification problem (K=2) . Blue and red points in R². In general, we can take any D-dimensional input vector and project it down to D’-dimensions. Here, D represents the original input dimensions while D’ is the projected space dimensions. Throughout this article, consider D’ less than D . In the case of projecting to one dimension (the number line), i.e. D’=1 , we can pick a threshold t to separate the classes in the new space. Given an input vector x : if the predicted value y >= t then, x belongs to class C1 (class 1) - where 📷.otherwise, it is classified as C2 (class 2). Take the dataset below as a toy example. We want to reduce the original data dimensions from D=2 to D’=1. In other words, we want a transformation T that maps vectors in 2D to 1D - T(v) = ℝ² →ℝ¹. First, let’s compute the mean vectors m1 and m2 for the two classes. Note that N1 and N2 denote the number of points in classes C1 and C2 respectively. Now, consider using the class means as a measure of separation. In other words, we want to project the data onto the vector W joining the 2 class means. It is important to note that any kind of projection to a smaller dimension might involve some loss of information. In this scenario, note that the two classes are clearly separable (by a line) in their original space. That is where the Fisher’s Linear Discriminant comes into play. The idea proposed by Fisher is to maximize a function that will give a large separation between the projected class means while also giving a small variance within each class, thereby minimizing the class overlap. In other words, FLD selects a projection that maximizes the class separation. To do that, it maximizes the ratio between the between-class variance to the within-class variance. In short, to project the data to a smaller dimension and to avoid class overlapping, FLD maintains 2 properties. A large variance among the dataset classes.A small variance within each of the dataset classes. Note that a large between-class variance means that the projected class averages should be as far apart as possible. On the contrary, a small within-class variance has the effect of keeping the projected data points closer to one another. To find the projection with the following properties, FLD learns a weight vector W with the following criterion. If we substitute the mean vectors m1 and m2 as well as the variance s as given by equations (1) and (2) we arrive at equation (3). If we take the derivative of (3) w.r.t W (after some simplifications) we get the learning equation for W (equation 4). That is, W (our desired transformation) is directly proportional to the inverse of the within-class covariance matrix times the difference of the class means. As expected, the result allows a perfect class separation with simple thresholding. For multiple classes, read on https://sthalles.github.io/fisher-linear-discriminant/ .
- Data Products 101
What is a data product? (circa 2024) Business Data that we want to reuse so we apply the software development lifecycle to it. When we reuse data products we increase consistency. Data consistency provides data quality. A data product is a trusted, reusable, and consumable data asset that solves business problems, generates insights, and/or improves operational efficiency. It can be a database table, report, APIs, or machine learning model. Technical A data product is made up of metadata and dataset instances Designed to be easily accessible to anyone with the right credentials Data products are the backbone of powerful data apps and help bridge the gap between data producers and consumers.
- Identity Resolution
Over the last several years, I have spent a lot of time thinking about identity resolution and how to do this in an ever-expanding ecosystem. Identity Resolution is one of the most important ingredients in data assets. What is identity resolution and how does it create value? Identity resolution — the process of matching identifiers across devices and touchpoints to a single profile — helps build a cohesive, omnichannel view of a person/donor/organization/consumer, enabling brands to deliver relevant messaging throughout the customer journey. Value is created through a consolidated, accurate 360-degree profile. Identity resolution results when many data sources are integrated from many channels and devices in an accurate, scalable, and privacy-compliant way to create a persistent and addressable individual profile. How is it used? Impact Reporting — accuracy Predictive analytics — accuracy Who's doing this in business and talking about it? Great description and example at Civis Analytics Another good example at Experian A definitive guide from Segment.com Drop me a note and your thoughts. -DCN
Other Pages (1)
- About | Danielle Costa Nakano
Danielle Costa Nakano Data Strategy Digital Transformation Omni-channel personalization Product Strategy Product Management Data Products Data Science Artificial Intelligence Predictive Analytics Machine Learning R&D 20+ years of experience building businesses - always looking to turn challenges to into opportunities - coaching high-functioning teams - business value, data-driven strategy - measurable outcomes Highlights - data strategy, product management, digital transformation - omni-channel personalization, data monetization - automation, business intelligence, predictive analytics, artificial intelligence next generation data & technology GRASSROOTS ANALYTICS iconic, global brand & nonprofit NATIONAL GEOGRAPHIC SOCIETY political technology market leader NGP VAN social good private equity software BONTERRA TECH nonprofit fundraising & advocacy technology EVERYACTION Industry Associations DATA PRODUCTS LEADERSHIP COMMUNITY 2023 - 2025 INTERNATIONAL INSTITUE OF BUSINESS ANALYSTS 2009 - 2016


