No menu items!

    How one can Use Conditional Formatting in Pandas to Improve Knowledge Visualization

    Date:

    Share post:


    Picture by Writer | DALLE-3 & Canva

     

    Whereas pandas is principally used for information manipulation and evaluation, it may additionally present fundamental information visualization capabilities. Nonetheless, plain dataframes could make the knowledge look cluttered and overwhelming. So, what might be achieved to make it higher? Should you’ve labored with Excel earlier than, which you can spotlight necessary values with completely different colours, font kinds, and many others. The thought of utilizing these kinds and colours is to speak the knowledge in an efficient means. You are able to do comparable work with pandas dataframes too, utilizing conditional formatting and the Styler object.

    On this article, we’ll see what conditional formatting is and easy methods to use it to reinforce your information readability.

     

    Conditional Formatting

     

    Conditional formatting is a characteristic in pandas that permits you to format the cells primarily based on some standards. You’ll be able to simply spotlight the outliers, visualize developments, or emphasize necessary information factors utilizing it. The Styler object in pandas supplies a handy method to apply conditional formatting. Earlier than masking the examples, let’s take a fast have a look at how the Styler object works.

     

    What’s the Styler Object & How Does It Work?

     

    You’ll be able to management the visible illustration of the dataframe by utilizing the property. This property returns a Styler object, which is answerable for styling the dataframe. The Styler object permits you to manipulate the CSS properties of the dataframe to create a visually interesting and informative show. The generic syntax is as follows:

    df.fashion.<technique>(<arguments>)

     

    The place <technique> is the precise formatting perform you wish to apply, and <arguments> are the parameters required by that perform. The Styler object returns the formatted dataframe with out altering the unique one. There are two approaches to utilizing conditional formatting with the Styler object:

    • Constructed-in Types: To use fast formatting kinds to your dataframe
    • Customized Stylization: Create your personal formatting guidelines for the Styler object and go them by way of one of many following strategies (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

    Now, we'll cowl some examples of each approaches that will help you improve the visualization of your information.

     

    Examples: Constructed-in-Types

     

    Let’s create a dummy inventory worth dataset with columns for Date, Value Value, Satisfaction Rating, and Gross sales Quantity to show the examples under:

    import pandas as pd
    import numpy as np
    
    information = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],
            'Value Value': [100, 120, 110, 1500, 1600, 1550],
            'Satisfaction Rating': [90, 80, 70, 95, 85, 75],
            'Gross sales Quantity': [1000, 800, 1200, 900, 1100, None]}
    
    df = pd.DataFrame(information)
    df

     

    Output:

     

    Unformatted Dataframe
    Unique Unformatted Dataframe

     

    1. Highlighting Most and Minimal Values

    We will use highlight_max and highlight_min capabilities to focus on the utmost and minimal values in a column or row. For column set axis=0 like this:

    # Highlighting Most and Minimal Values
    df.fashion.highlight_max(colour="green", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(colour="red", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

     

    Output:
     

    Max & Min Values
    Max & Min Values

     

    2. Making use of Colour Gradients

    Colour gradients are an efficient method to visualize the values in your information. On this case, we'll apply the gradient to satisfaction scores utilizing the colormap set to 'viridis'. It is a sort of colour coding that ranges from purple (low values) to yellow (excessive values). Right here is how you are able to do this:

    # Making use of Colour Gradients
    df.fashion.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

     

    Output:

     

    Colormap - viridis
    Colormap - viridis

     

    3. Highlighting Null or Lacking Values

    When we now have giant datasets, it turns into tough to determine null or lacking values. You need to use conditional formatting utilizing the built-in df.fashion.highlight_null perform for this goal. For instance, on this case, the gross sales quantity of the sixth entry is lacking. You'll be able to spotlight this info like this:

    # Highlighting Null or Lacking Values
    df.fashion.highlight_null('yellow', subset=['Sales Amount'])

     

    Output:
     

    Highlighting Missing Values
    Highlighting Lacking Values

     

    Examples: Customized Stylization Utilizing apply() & applymap()

     

    1.  Conditional Formatting for Outliers

    Suppose that we now have a housing dataset with their costs, and we wish to spotlight the homes with outlier costs (i.e., costs which might be considerably greater or decrease than the opposite neighborhoods). This may be achieved as follows:

    import pandas as pd
    import numpy as np
    
    # Home costs dataset
    df = pd.DataFrame({
       'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],
       'Value': [50, 300, 360, 390, 420, 450, 1000],
    })
    
    # Calculate Q1 (twenty fifth percentile), Q3 (seventy fifth percentile) and Interquartile Vary (IQR)
    q1 = df['Price'].quantile(0.25)
    q3 = df['Price'].quantile(0.75)
    iqr = q3 - q1
    
    # Bounds for outliers
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    # Customized perform to focus on outliers
    def highlight_outliers(val):
       if val  upper_bound:
          return 'background-color: yellow; font-weight: daring; colour: black'
       else:
          return ''
    
    df.fashion.applymap(highlight_outliers, subset=['Price'])
    

     

    Output:

     

    Highlighting Outliers
    Highlighting Outliers

     

    2. Highlighting Traits

    Contemplate that you just run an organization and are recording your gross sales every day. To investigate the developments, you wish to spotlight the times when your every day gross sales improve by 5% or extra. You'll be able to obtain this utilizing a customized perform and the apply technique in pandas. Right here’s how:

    import pandas as pd
    
    # Dataset of Firm's Gross sales
    information = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],
            'gross sales': [100, 105, 110, 115, 125]}
    
    df = pd.DataFrame(information)
    
    # Day by day share change
    df['pct_change'] = df['sales'].pct_change() * 100
    
    # Spotlight the day if gross sales elevated by greater than 5%
    def highlight_trend(row):
        return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]
    
    df.fashion.apply(highlight_trend, axis=1)

     

    Output:

     

    Highlight src=

     

    3. Highlighting Correlated Columns

    Correlated columns are necessary as a result of they present relationships between completely different variables. For instance, if we now have a dataset containing age, revenue, and spending habits and our evaluation reveals a excessive correlation (near 1) between age and revenue, then it means that older folks typically have greater incomes. Highlighting correlated columns helps to visually determine these relationships. This method turns into extraordinarily useful because the dimensionality of your information will increase. Let's discover an instance to higher perceive this idea:

    import pandas as pd
    
    # Dataset of individuals
    information = {
        'age': [30, 35, 40, 45, 50],
        'revenue': [60000, 66000, 70000, 75000, 100000],
        'spending': [10000, 15000, 20000, 18000, 12000]
    }
    
    df = pd.DataFrame(information)
    
    # Calculate the correlation matrix
    corr_matrix = df.corr()
    
    # Spotlight extremely correlated columns
    def highlight_corr(val):
        if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation
            return 'background-color: blue; text-decoration: underline'
        else:
            return ''
    
    corr_matrix.fashion.applymap(highlight_corr)

     

    Output:

     

    Correlated Columns
    Correlated Columns

     

    Wrapping Up

     

    These are simply among the examples I confirmed as a starter to up your recreation of knowledge visualization. You'll be able to apply comparable strategies to varied different issues to reinforce the information visualization, corresponding to highlighting duplicate rows, grouping into classes and choosing completely different formatting for every class, or highlighting peak values. Moreover, there are numerous different CSS choices you possibly can discover within the official documentation. You'll be able to even outline completely different properties on hover, like magnifying textual content or altering colour. Try the "Enjoyable Stuff" part for extra cool concepts. This text is a part of my Pandas sequence, so in case you loved this, there's a lot extra to discover. Head over to my creator web page for extra ideas, methods, and tutorials.

     
     

    Kanwal Mehreen Kanwal is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with drugs. She co-authored the book "Maximizing Productivity with ChatGPT". As a Google Technology Scholar 2022 for APAC, she champions range and educational excellence. She's additionally acknowledged as a Teradata Variety in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower ladies in STEM fields.

    Related articles

    Technical Analysis of Startups with DualSpace.AI: Ilya Lyamkin on How the Platform Advantages Companies – AI Time Journal

    Ilya Lyamkin, a Senior Software program Engineer with years of expertise in creating high-tech merchandise, has created an...

    The New Black Assessment: How This AI Is Revolutionizing Vogue

    Think about this: you are a dressmaker on a good deadline, observing a clean sketchpad, desperately attempting to...

    Ajay Narayan, Sr Supervisor IT at Equinix  — AI-Pushed Cloud Integration, Occasion-Pushed Integration, Edge Computing, Procurement Options, Cloud Migration & Extra – AI Time...

    Ajay Narayan, Sr. Supervisor IT at Equinix, leads innovation in cloud integration options for one of many world’s...