top of page
Search
Writer's pictureWalf Sun

Unleashing AI in SAP Data Analytics: Turning ACDOCA Data into Predictive Insights


In today's fast-paced business world, understanding your financial data is key. With SAP and AI, you can turn raw data into meaningful predictions that drive decisions. Below, we explore six powerful analyses using data from SAP's ACDOCA table to answer important business questions.

 

1. Revenue Trend by Account: Where’s Your Money Coming From?

By analyzing revenue trends by account over time, we can see how much each account contributes to the business.


Imagine looking at a graph that shows which of your accounts are making you the most money (or costing you!). It's like a financial health monitor that tells you which areas are growing and which ones need attention.


Are certain accounts bringing in more revenue this year than last?

 

In simple terms, the Revenue/Expense Trend by Account Over Time shows how much money each account is making (revenue) or losing (expenses) over a certain period, typically broken down by year. By looking at the trend, you can understand:


1.     Which accounts are contributing the most to revenue: Accounts with positive values show they are bringing in money. A steady or increasing line suggests consistent or growing contributions.


2.     Which accounts are incurring expenses: Accounts with negative values indicate they are causing expenses or losses. A decreasing line shows that the losses are getting bigger over time.


3.     Yearly patterns: The chart allows you to see how the performance of each account changes from year to year. Some accounts might have large swings in revenue or expenses, while others might show a more consistent pattern.


Below is a Python program that aggregates the data by year (GJAHR) and account (BELNR), sums up the local currency (KSL), and visualizes the revenue/expense trend over time:


import matplotlib.pyplot as plt

import pandas as pd

# Load the CSV file

file_path = 'your_file_path.csv'  # Replace with your actual file path

acdoca_data = pd.read_csv(file_path)

# Convert KSL column to numeric, handling errors as NaN

acdoca_data['KSL'] = pd.to_numeric(acdoca_data['KSL'], errors='coerce')

# Group by year (GJAHR) and account (BELNR), summing up KSL for each group

revenue_trend = acdoca_data.groupby(['GJAHR', 'BELNR'])['KSL'].sum().reset_index()

# Function to plot the trend

def plot_revenue_trend(data):

    plt.figure(figsize=(10, 6))

    for account in data['BELNR'].unique():

        account_data = data[data['BELNR'] == account]

        plt.plot(account_data['GJAHR'], account_data['KSL'], label=f'Account {account}')

    plt.title('Revenue/Expense Trend by Account Over Time')

    plt.xlabel('Year')

    plt.ylabel('Sum of Local Currency (KSL)')

    plt.legend(loc='best')

    plt.grid(True)

    plt.show()

# Plot the trend for top 5 accounts based on total KSL

top_accounts = revenue_trend.groupby('BELNR')['KSL'].sum().nlargest(5).index

top_accounts_data = revenue_trend[revenue_trend['BELNR'].isin(top_accounts)]

plot_revenue_trend(top_accounts_data)

 

2. Variance Analysis: Are You Staying On Budget?

Variance analysis compares planned versus actual revenue or expenses, helping you spot unexpected costs or profits.


Think of it like checking how far off you were from your monthly budget. Did you save more or spend more than you thought? Variance analysis gives you that answer so you can adjust your plans.


Did my costs vary significantly from what was budgeted?


In simple terms, the graph shows how well each account performed compared to what was originally planned for each year:

  1. Above the zero line: If a line goes above the zero line, it means the account performed better than expected—either it brought in more money (if it's revenue) or spent less (if it's expenses).

  2. Below the zero line: If a line dips below the zero line, it means the account did worse than expected—either it earned less money than planned or spent more.

Example interpretations:

  • Account 1 (yellow line): It had a strong performance in 2022 but started dipping in 2023, meaning it didn't meet expectations by the end.

  • Account 2 (orange line): This account consistently improved over time, always staying above the planned values, meaning it performed better each year.

  • Account 3 (red line): In 2021, this account underperformed significantly, but by 2023 it recovered, coming back above the planned levels.

  • Account 4 (pink line): It had a sudden jump in 2022, performing much better than planned, but then it started dipping again in 2023.

  • Account 5 (blue line): This account remained mostly stable across the years but consistently underperformed compared to the plan.

Overall, the graph helps you spot which accounts are performing well and which ones might need more attention to meet expectations.


This program will help you easily spot the variances and visualize them over time, providing insight into unexpected costs or profits.

 

import pandas as pd

import matplotlib.pyplot as plt

# Load CSV data

file_path = 'your_file_path.csv'  # Replace with the actual file path

data = pd.read_csv(file_path)

# Convert columns to numeric, handling any errors as NaN

data['PLANNED'] = pd.to_numeric(data['PLANNED'], errors='coerce')

data['ACTUAL'] = pd.to_numeric(data['ACTUAL'], errors='coerce')

# Calculate the variance: difference between actual and planned

data['VARIANCE'] = data['ACTUAL'] - data['PLANNED']

# Summarize variance by year and account

variance_summary = data.groupby(['GJAHR', 'BELNR']).agg(

    Planned_Sum=('PLANNED', 'sum'),

    Actual_Sum=('ACTUAL', 'sum'),

    Variance_Sum=('VARIANCE', 'sum')

).reset_index()

# Display the summary

print(variance_summary)

# Plot the variance

plt.figure(figsize=(10, 6))

for account in variance_summary['BELNR'].unique():

    account_data = variance_summary[variance_summary['BELNR'] == account]

    plt.plot(account_data['GJAHR'], account_data['Variance_Sum'], label=f'Account {account}')

 plt.title('Variance Analysis by Account Over Time')

plt.xlabel('Year')

plt.ylabel('Variance (Actual - Planned)')

plt.legend(loc='best')

plt.grid(True)

 

3. Profitability Analysis: Who’s Driving the Bottom Line?

This analysis helps you identify which accounts or products are the most profitable over time.

 It’s like figuring out which parts of your business are making the most money after costs. This analysis shows you where to double down for maximum profitability.

Which accounts are the most profitable this year?


In simple terms, the graph shows how much profit each account is making over time, from 2021 to 2023. Profitability is calculated by subtracting costs (expenses) from revenue (income). Here's what the graph is telling us:

  • Account 2 (orange line) is doing the best. Its profits are increasing steadily each year, meaning this account is making more money compared to what it's spending, and it's getting better at it over time.

  • Account 3 (red line) is stuck. Its profits are staying the same every year, so it's not improving, but it’s also not getting worse.

  • Account 4 (pink line) and Account 5 (blue line) are showing gradual improvement. Their profits are increasing, though more slowly compared to Account 2.

  • Account 1 (yellow line) has low profits compared to the others, and while it's not losing money, it's not growing much either.


Main takeaway:

  • Account 2 is the most profitable and growing fast.

  • Account 3 is neither improving nor declining.

  • Accounts 4 and 5 are improving but at a slower pace.

  • Account 1 is profitable but not seeing much growth.


This analysis helps identify which parts of the business (accounts) are performing well and which might need more attention or resources to boost profits.


import pandas as pd

import matplotlib.pyplot as plt

# Load CSV data

file_path = 'your_file_path.csv'  # Replace with the actual file path

data = pd.read_csv(file_path)

# Convert columns to numeric (if needed)

data['Revenue'] = pd.to_numeric(data['Revenue'], errors='coerce')

data['Cost'] = pd.to_numeric(data['Cost'], errors='coerce')

# Calculate profitability

data['Profitability'] = data['Revenue'] - data['Cost']

 # Group the data by year and account

profitability_summary = data.groupby(['GJAHR', 'BELNR']).agg(

    Revenue_Sum=('Revenue', 'sum'),

    Cost_Sum=('Cost', 'sum'),

    Profitability_Sum=('Profitability', 'sum')

).reset_index()

# Plot the profitability trend

plt.figure(figsize=(10, 6))

for account in profitability_summary['BELNR'].unique():

    account_data = profitability_summary[profitability_summary['BELNR'] == account]

    plt.plot(account_data['GJAHR'], account_data['Profitability_Sum'], label=f'Account {account}')

plt.title('Profitability Analysis by Account Over Time')

plt.xlabel('Year')

plt.ylabel('Profitability (Revenue - Cost)')

plt.legend(loc='best')

plt.grid(True)

# Display the plot


4. Cost Distribution: Where Are Your Costs Going?

Cost distribution analysis shows how expenses are spread across various accounts.

Picture all your costs laid out, showing exactly where your money is going. It’s like a snapshot of how much each part of your business is costing you.


What portion of my expenses is being spent on materials versus labor?


This graph shows the Cost Distribution Analysis by account over time, from 2021 to 2023. Here's how to interpret it:

  • Y-axis (Cost): This axis represents the total expenses for each account.

  • X-axis (Year): The time progression from 2021 to 2023.

  • Colored lines: Each line represents the cost trend for a specific account over the years.


Key takeaways:

  • Account 5 (blue line) consistently has the highest costs compared to other accounts, with a slight decrease in 2023.

  • Account 2 (orange line) and Account 3 (red line) show increasing costs in 2022, but they slightly decline in 2023.

  • Account 1 (yellow line) and Account 4 (pink line) have relatively lower costs, but they both show slight increases in 2022 before decreasing in 2023.


This analysis helps to understand where most of your expenses are concentrated and how they change over time, aiding in better financial planning and resource allocation. ​

Here’s the Python code for the Cost Distribution Analysis based on the graph shown, where it analyzes how expenses are spread across various accounts over time:

 

import pandas as pd

import matplotlib.pyplot as plt

# Simulated data to represent cost distribution across accounts and years

data = pd.DataFrame({

    'GJAHR': [2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023],

    'BELNR': ['Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5',

              'Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5',

              'Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5'],

    'Cost': [15000, 20000, 18000, 17000, 22000,

             16000, 21000, 19000, 17500, 23000,

             15500, 20500, 18500, 16500, 22500]

})

# Group the data by year and account to sum up the costs

cost_distribution = data.groupby(['GJAHR', 'BELNR']).agg(

    Cost_Sum=('Cost', 'sum')

).reset_index()

# Plotting the cost distribution by year and account

plt.figure(figsize=(10, 6))

for account in cost_distribution['BELNR'].unique():

    account_data = cost_distribution[cost_distribution['BELNR'] == account]

    plt.plot(account_data['GJAHR'], account_data['Cost_Sum'], label=f'Account {account}')

plt.title('Cost Distribution Analysis by Account Over Time')

plt.xlabel('Year')

plt.ylabel('Cost (in units)')

plt.legend(loc='best')

plt.grid(True)

# Display the plot


5. Correlation Analysis: How Do Different Factors Affect Each Other?

Correlation analysis examines the relationships between different financial factors (like revenue and expenses).


Imagine you want to see if increasing marketing spend leads to higher sales. Correlation analysis will show you if there's a connection between two factors and help you make smarter decisions.


Is there a relationship between increasing advertising and increased revenue?


Here is the Correlation Analysis graph showing the relationship between Revenue and Expenses. The scatter plot helps visualize how closely these two financial factors are related:

  • Each point represents a data entry showing a combination of revenue and expenses.

  • The red line is a regression line, showing the overall trend.

  • The correlation coefficient displayed on the chart indicates the strength of the relationship between revenue and expenses.


A higher correlation value means that when revenue increases, expenses also tend to increase in a predictable manner. Let me know if you'd like to explore this further or use real data


Here's a Python code to create a correlation analysis:

 

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

 # Load the CSV data (Replace 'your_file_path.csv' with the actual file path)

# data = pd.read_csv('your_file_path.csv')

 # Simulating data for correlation analysis since actual CSV is not provided

data = pd.DataFrame({

    'Revenue': [30000, 40000, 35000, 32000, 38000, 31000, 42000, 36000, 33000, 39000],

    'Expenses': [15000, 20000, 18000, 17000, 22000, 16000, 21000, 19000, 17500, 23000]

})

 

# Calculate the correlation coefficient between Revenue and Expenses

correlation = data['Revenue'].corr(data['Expenses'])

print(f'Correlation between Revenue and Expenses: {correlation:.2f}')

 # Create a scatter plot with a regression line to visualize the correlation

plt.figure(figsize=(8, 6))

sns.regplot(x='Revenue', y='Expenses', data=data, ci=None, scatter_kws={"color": "blue"}, line_kws={"color": "red"})

 # Add titles and labels

plt.title('Correlation Analysis: Revenue vs Expenses')

plt.xlabel('Revenue')

plt.ylabel('Expenses')

plt.grid(True)

 # Display the plot

 

6. Time Series Forecasting: What’s Next for Your Business?

Time series forecasting predicts future trends based on past data. With AI, you can forecast future revenue or expenses.


It's like having a crystal ball that shows you where your business is heading based on past performance. AI uses patterns in your data to make predictions, helping you plan for the future.


What will my revenue look like next quarter if current trends continue?


The graph above shows the Time Series Forecasting for revenue:

  • Historical Data (solid line with circles): This represents the actual revenue values from 2010 to 2020.

  • Forecasted Data (dashed line with x's): This predicts revenue for the next three years (2021 to 2023) based on the historical trend.


The forecast suggests that revenue will continue to rise steadily if the current trend persists, giving you a clear idea of future performance based on past data. This kind of analysis helps businesses plan for future growth or identify potential risks.


import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Simulating a dataset for time series (yearly revenue over 10 years)

data = pd.DataFrame({

    'Year': np.arange(2010, 2021),

    'Revenue': [30000, 32000, 34000, 36000, 38000, 40000, 42000, 45000, 47000, 49000, 51000]

})

# Fit an Exponential Smoothing model for forecasting

model = ExponentialSmoothing(data['Revenue'], trend='add', seasonal=None, damped_trend=True)

fit = model.fit()

# Forecast the next 3 years

forecast_years = np.arange(2021, 2024)

forecast = fit.forecast(steps=len(forecast_years))

# Combine historical data and forecast into a single DataFrame for plotting

forecast_data = pd.DataFrame({

    'Year': np.concatenate([data['Year'], forecast_years]),

    'Revenue': np.concatenate([data['Revenue'], forecast])

})

# Plot historical data and forecast

plt.figure(figsize=(10, 6))

plt.plot(data['Year'], data['Revenue'], label='Historical Revenue', marker='o')

plt.plot(forecast_years, forecast, label='Forecasted Revenue', linestyle='--', marker='x')

plt.title('Time Series Forecasting: Revenue Prediction')

plt.xlabel('Year')

plt.ylabel('Revenue')

plt.legend(loc='best')

plt.grid(True)

 # Display the plot


7. Outlier Detection: Catching the Oddballs in Your Data

This analysis identifies unusual data points that don’t fit the normal pattern.


Think of it as catching the oddball expenses or revenues that don’t make sense. Outlier detection helps you flag errors or unexpected entries that might need further investigation.


Were there any unusual spikes in expenses last month that don’t seem right?


The graph above shows the Outlier Detection in revenue data:

  • Blue circles represent the normal data points (revenue across different years).

  • Red dots highlight the outliers, which are values that deviate significantly from the expected trend (in this case, the year with revenue at 100,000 stands out as an outlier).


This type of analysis helps in identifying any unusual data points that may indicate errors or significant events worth investigating further. ​


import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import zscore

# Simulating data with some outliers

data = pd.DataFrame({

    'Year': np.arange(2010, 2021),

    'Revenue': [30000, 32000, 34000, 36000, 38000, 100000, 42000, 45000, 47000, 49000, 51000]  # Note the outlier (100000)

})

# Calculate Z-scores to identify outliers (Z-score > 3 or < -3)

data['Zscore'] = zscore(data['Revenue'])

data['Outlier'] = data['Zscore'].abs() > 3

# Plot the data points, highlighting outliers

plt.figure(figsize=(10, 6))

plt.plot(data['Year'], data['Revenue'], label='Revenue', marker='o')

# Highlight outliers

outliers = data[data['Outlier']]

plt.scatter(outliers['Year'], outliers['Revenue'], color='red', label='Outliers', zorder=5)

plt.title('Outlier Detection in Revenue Data')

plt.xlabel('Year')

plt.ylabel('Revenue')

plt.legend(loc='best')

plt.grid(True)

 # Display the plot


By combining the power of SAP’s ACDOCA data and AI-driven analytics, your business can gain deeper insights into financial performance and make smarter, data-driven decisions. Whether you’re tracking revenue, keeping an eye on costs, or forecasting future trends, AI is here to help turn your data into actionable predictions.


Notables: Dataset Preparation:

#Extraction from ACDOCA:

#Python Extraction Program

import pyrfc

import pandas as pd

from pyrfc import Connection

import time

# SAP connection details

sap_conn_params = {

    'user': 'username',

    'passwd': 'password',

    'ashost': 'hostname',

    'sysnr': '00',  # system number

    'client': '100',  # client number

    'lang': 'EN',  # language

 }

# Function to connect to SAP

def connect_to_sap():

    try:

        conn = Connection(**sap_conn_params)

        print("Connected to SAP")

        return conn

    except pyrfc.CommunicationError as e:

        print(f"Error connecting to SAP: {str(e)}")

        exit(1)

# Function to extract data from ACDOCA with error handling for SAPSQL_PARSE_ERROR

def extract_acdoca_data(conn, fields, batch_size=1000, max_rows=38000):

    total_rows_extracted = 0

    extracted_data = []

    row_offset = 0

    retry_attempts = 3  # Number of retry attempts in case of a recoverable error

    valid_fields = []

    for field in fields:

        try:

            # Test the field individually to check its validity

            result = conn.call('RFC_READ_TABLE',

                               QUERY_TABLE='ACDOCA',

                               DELIMITER='|',

                               FIELDS=[{'FIELDNAME': field}],

                               ROWCOUNT=1)

            if 'DATA' in result and result['DATA']:

                print(f"Field '{field}' is valid.")

                valid_fields.append(field)

            else:

                print(f"Field '{field}' is not valid or has no data.")

        except pyrfc.RFCError as e:

            print(f"Error with field '{field}': {str(e)}. Skipping this field.")

    if not valid_fields:

        print("No valid fields to extract data.")

        return [], []

    print(f"Proceeding with valid fields: {valid_fields}")

    # Proceed to extract data with valid fields

    while total_rows_extracted < max_rows:

        try:

            # Define filtering options (simplifying filtering to avoid SAPSQL errors)

            options = []

            # Extract data from ACDOCA with valid fields and row_offset for pagination

            result = conn.call('RFC_READ_TABLE',

                               QUERY_TABLE='ACDOCA',

                               DELIMITER='|',

                               FIELDS=[{'FIELDNAME': field} for field in valid_fields],

                               OPTIONS=options,

                               ROWCOUNT=min(batch_size, max_rows - total_rows_extracted),

                               ROWSKIPS=row_offset)

            # Check if data is returned

            if 'DATA' in result and result['DATA']:

                for row in result['DATA']:

                    extracted_data.append(row['WA'].split('|'))

                total_rows_extracted += len(result['DATA'])

                row_offset += len(result['DATA'])

                print(f"Extracted {len(result['DATA'])} rows, total so far: {total_rows_extracted}")

            else:

                break  # No more data available, stop the extraction

        except pyrfc.RFCError as e:

            print(f"Error during data extraction: {str(e)}")

            retry_attempts -= 1

            if retry_attempts > 0:

                print(f"Retrying... {retry_attempts} attempts left.")

                time.sleep(2)  # Short delay before retrying

                continue  # Retry extraction

            else:

                print("Max retries reached. Aborting extraction.")

                break

    return extracted_data, valid_fields

# Main function to run the extraction process

def main():

    conn = connect_to_sap()

    # Manually define a set of fields to be tested

    fields_to_test = ['BUKRS', 'BELNR', 'GJAHR', 'BUDAT', 'HSL', 'KSL', 'WSL']

    # Step 1: Extract the data using valid fields

    data, valid_fields = extract_acdoca_data(conn, fields_to_test, batch_size=1000, max_rows=38000)

    # Step 2: Save the extracted data to a CSV file

    if data:

        df = pd.DataFrame(data, columns=valid_fields)

        output_file = 'acdoca_38000_rows.csv'

        df.to_csv(output_file, index=False)

        print(f"Data saved to {output_file}")

    else:

        print("No data extracted.")

    # Close the SAP connection

    conn.close()

# Run the main function

if name == "__main__":

    main()



 

23 views0 comments

Comments


Featured Blog Post

bottom of page