In today's fast-paced business world, understanding your financial data is key. With SAP and AI, you can turn raw data into meaningful predictions that drive decisions. Below, we explore six powerful analyses using data from SAP's ACDOCA table to answer important business questions.
1. Revenue Trend by Account: Where’s Your Money Coming From?
By analyzing revenue trends by account over time, we can see how much each account contributes to the business.
Imagine looking at a graph that shows which of your accounts are making you the most money (or costing you!). It's like a financial health monitor that tells you which areas are growing and which ones need attention.
Are certain accounts bringing in more revenue this year than last?
In simple terms, the Revenue/Expense Trend by Account Over Time shows how much money each account is making (revenue) or losing (expenses) over a certain period, typically broken down by year. By looking at the trend, you can understand:
1. Which accounts are contributing the most to revenue: Accounts with positive values show they are bringing in money. A steady or increasing line suggests consistent or growing contributions.
2. Which accounts are incurring expenses: Accounts with negative values indicate they are causing expenses or losses. A decreasing line shows that the losses are getting bigger over time.
3. Yearly patterns: The chart allows you to see how the performance of each account changes from year to year. Some accounts might have large swings in revenue or expenses, while others might show a more consistent pattern.
Below is a Python program that aggregates the data by year (GJAHR) and account (BELNR), sums up the local currency (KSL), and visualizes the revenue/expense trend over time:
import matplotlib.pyplot as plt
import pandas as pd
# Load the CSV file
file_path = 'your_file_path.csv' # Replace with your actual file path
acdoca_data = pd.read_csv(file_path)
# Convert KSL column to numeric, handling errors as NaN
acdoca_data['KSL'] = pd.to_numeric(acdoca_data['KSL'], errors='coerce')
# Group by year (GJAHR) and account (BELNR), summing up KSL for each group
revenue_trend = acdoca_data.groupby(['GJAHR', 'BELNR'])['KSL'].sum().reset_index()
# Function to plot the trend
def plot_revenue_trend(data):
plt.figure(figsize=(10, 6))
for account in data['BELNR'].unique():
account_data = data[data['BELNR'] == account]
plt.plot(account_data['GJAHR'], account_data['KSL'], label=f'Account {account}')
plt.title('Revenue/Expense Trend by Account Over Time')
plt.xlabel('Year')
plt.ylabel('Sum of Local Currency (KSL)')
plt.legend(loc='best')
plt.grid(True)
plt.show()
# Plot the trend for top 5 accounts based on total KSL
top_accounts = revenue_trend.groupby('BELNR')['KSL'].sum().nlargest(5).index
top_accounts_data = revenue_trend[revenue_trend['BELNR'].isin(top_accounts)]
plot_revenue_trend(top_accounts_data)
2. Variance Analysis: Are You Staying On Budget?
Variance analysis compares planned versus actual revenue or expenses, helping you spot unexpected costs or profits.
Think of it like checking how far off you were from your monthly budget. Did you save more or spend more than you thought? Variance analysis gives you that answer so you can adjust your plans.
Did my costs vary significantly from what was budgeted?
In simple terms, the graph shows how well each account performed compared to what was originally planned for each year:
Above the zero line: If a line goes above the zero line, it means the account performed better than expected—either it brought in more money (if it's revenue) or spent less (if it's expenses).
Below the zero line: If a line dips below the zero line, it means the account did worse than expected—either it earned less money than planned or spent more.
Example interpretations:
Account 1 (yellow line): It had a strong performance in 2022 but started dipping in 2023, meaning it didn't meet expectations by the end.
Account 2 (orange line): This account consistently improved over time, always staying above the planned values, meaning it performed better each year.
Account 3 (red line): In 2021, this account underperformed significantly, but by 2023 it recovered, coming back above the planned levels.
Account 4 (pink line): It had a sudden jump in 2022, performing much better than planned, but then it started dipping again in 2023.
Account 5 (blue line): This account remained mostly stable across the years but consistently underperformed compared to the plan.
Overall, the graph helps you spot which accounts are performing well and which ones might need more attention to meet expectations.
This program will help you easily spot the variances and visualize them over time, providing insight into unexpected costs or profits.
import pandas as pd
import matplotlib.pyplot as plt
# Load CSV data
file_path = 'your_file_path.csv' # Replace with the actual file path
data = pd.read_csv(file_path)
# Convert columns to numeric, handling any errors as NaN
data['PLANNED'] = pd.to_numeric(data['PLANNED'], errors='coerce')
data['ACTUAL'] = pd.to_numeric(data['ACTUAL'], errors='coerce')
# Calculate the variance: difference between actual and planned
data['VARIANCE'] = data['ACTUAL'] - data['PLANNED']
# Summarize variance by year and account
variance_summary = data.groupby(['GJAHR', 'BELNR']).agg(
Planned_Sum=('PLANNED', 'sum'),
Actual_Sum=('ACTUAL', 'sum'),
Variance_Sum=('VARIANCE', 'sum')
).reset_index()
# Display the summary
print(variance_summary)
# Plot the variance
plt.figure(figsize=(10, 6))
for account in variance_summary['BELNR'].unique():
account_data = variance_summary[variance_summary['BELNR'] == account]
plt.plot(account_data['GJAHR'], account_data['Variance_Sum'], label=f'Account {account}')
plt.title('Variance Analysis by Account Over Time')
plt.xlabel('Year')
plt.ylabel('Variance (Actual - Planned)')
plt.legend(loc='best')
plt.grid(True)
plt.show()
3. Profitability Analysis: Who’s Driving the Bottom Line?
This analysis helps you identify which accounts or products are the most profitable over time.
It’s like figuring out which parts of your business are making the most money after costs. This analysis shows you where to double down for maximum profitability.
Which accounts are the most profitable this year?
In simple terms, the graph shows how much profit each account is making over time, from 2021 to 2023. Profitability is calculated by subtracting costs (expenses) from revenue (income). Here's what the graph is telling us:
Account 2 (orange line) is doing the best. Its profits are increasing steadily each year, meaning this account is making more money compared to what it's spending, and it's getting better at it over time.
Account 3 (red line) is stuck. Its profits are staying the same every year, so it's not improving, but it’s also not getting worse.
Account 4 (pink line) and Account 5 (blue line) are showing gradual improvement. Their profits are increasing, though more slowly compared to Account 2.
Account 1 (yellow line) has low profits compared to the others, and while it's not losing money, it's not growing much either.
Main takeaway:
Account 2 is the most profitable and growing fast.
Account 3 is neither improving nor declining.
Accounts 4 and 5 are improving but at a slower pace.
Account 1 is profitable but not seeing much growth.
This analysis helps identify which parts of the business (accounts) are performing well and which might need more attention or resources to boost profits.
import pandas as pd
import matplotlib.pyplot as plt
# Load CSV data
file_path = 'your_file_path.csv' # Replace with the actual file path
data = pd.read_csv(file_path)
# Convert columns to numeric (if needed)
data['Revenue'] = pd.to_numeric(data['Revenue'], errors='coerce')
data['Cost'] = pd.to_numeric(data['Cost'], errors='coerce')
# Calculate profitability
data['Profitability'] = data['Revenue'] - data['Cost']
# Group the data by year and account
profitability_summary = data.groupby(['GJAHR', 'BELNR']).agg(
Revenue_Sum=('Revenue', 'sum'),
Cost_Sum=('Cost', 'sum'),
Profitability_Sum=('Profitability', 'sum')
).reset_index()
# Plot the profitability trend
plt.figure(figsize=(10, 6))
for account in profitability_summary['BELNR'].unique():
account_data = profitability_summary[profitability_summary['BELNR'] == account]
plt.plot(account_data['GJAHR'], account_data['Profitability_Sum'], label=f'Account {account}')
plt.title('Profitability Analysis by Account Over Time')
plt.xlabel('Year')
plt.ylabel('Profitability (Revenue - Cost)')
plt.legend(loc='best')
plt.grid(True)
# Display the plot
plt.show()
4. Cost Distribution: Where Are Your Costs Going?
Cost distribution analysis shows how expenses are spread across various accounts.
Picture all your costs laid out, showing exactly where your money is going. It’s like a snapshot of how much each part of your business is costing you.
What portion of my expenses is being spent on materials versus labor?
This graph shows the Cost Distribution Analysis by account over time, from 2021 to 2023. Here's how to interpret it:
Y-axis (Cost): This axis represents the total expenses for each account.
X-axis (Year): The time progression from 2021 to 2023.
Colored lines: Each line represents the cost trend for a specific account over the years.
Key takeaways:
Account 5 (blue line) consistently has the highest costs compared to other accounts, with a slight decrease in 2023.
Account 2 (orange line) and Account 3 (red line) show increasing costs in 2022, but they slightly decline in 2023.
Account 1 (yellow line) and Account 4 (pink line) have relatively lower costs, but they both show slight increases in 2022 before decreasing in 2023.
This analysis helps to understand where most of your expenses are concentrated and how they change over time, aiding in better financial planning and resource allocation.
Here’s the Python code for the Cost Distribution Analysis based on the graph shown, where it analyzes how expenses are spread across various accounts over time:
import pandas as pd
import matplotlib.pyplot as plt
# Simulated data to represent cost distribution across accounts and years
data = pd.DataFrame({
'GJAHR': [2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023],
'BELNR': ['Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5',
'Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5',
'Account 1', 'Account 2', 'Account 3', 'Account 4', 'Account 5'],
'Cost': [15000, 20000, 18000, 17000, 22000,
16000, 21000, 19000, 17500, 23000,
15500, 20500, 18500, 16500, 22500]
})
# Group the data by year and account to sum up the costs
cost_distribution = data.groupby(['GJAHR', 'BELNR']).agg(
Cost_Sum=('Cost', 'sum')
).reset_index()
# Plotting the cost distribution by year and account
plt.figure(figsize=(10, 6))
for account in cost_distribution['BELNR'].unique():
account_data = cost_distribution[cost_distribution['BELNR'] == account]
plt.plot(account_data['GJAHR'], account_data['Cost_Sum'], label=f'Account {account}')
plt.title('Cost Distribution Analysis by Account Over Time')
plt.xlabel('Year')
plt.ylabel('Cost (in units)')
plt.legend(loc='best')
plt.grid(True)
# Display the plot
plt.show()
5. Correlation Analysis: How Do Different Factors Affect Each Other?
Correlation analysis examines the relationships between different financial factors (like revenue and expenses).
Imagine you want to see if increasing marketing spend leads to higher sales. Correlation analysis will show you if there's a connection between two factors and help you make smarter decisions.
Is there a relationship between increasing advertising and increased revenue?
Here is the Correlation Analysis graph showing the relationship between Revenue and Expenses. The scatter plot helps visualize how closely these two financial factors are related:
Each point represents a data entry showing a combination of revenue and expenses.
The red line is a regression line, showing the overall trend.
The correlation coefficient displayed on the chart indicates the strength of the relationship between revenue and expenses.
A higher correlation value means that when revenue increases, expenses also tend to increase in a predictable manner. Let me know if you'd like to explore this further or use real data
Here's a Python code to create a correlation analysis:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the CSV data (Replace 'your_file_path.csv' with the actual file path)
# data = pd.read_csv('your_file_path.csv')
# Simulating data for correlation analysis since actual CSV is not provided
data = pd.DataFrame({
'Revenue': [30000, 40000, 35000, 32000, 38000, 31000, 42000, 36000, 33000, 39000],
'Expenses': [15000, 20000, 18000, 17000, 22000, 16000, 21000, 19000, 17500, 23000]
})
# Calculate the correlation coefficient between Revenue and Expenses
correlation = data['Revenue'].corr(data['Expenses'])
print(f'Correlation between Revenue and Expenses: {correlation:.2f}')
# Create a scatter plot with a regression line to visualize the correlation
plt.figure(figsize=(8, 6))
sns.regplot(x='Revenue', y='Expenses', data=data, ci=None, scatter_kws={"color": "blue"}, line_kws={"color": "red"})
# Add titles and labels
plt.title('Correlation Analysis: Revenue vs Expenses')
plt.xlabel('Revenue')
plt.ylabel('Expenses')
plt.grid(True)
# Display the plot
plt.show()
6. Time Series Forecasting: What’s Next for Your Business?
Time series forecasting predicts future trends based on past data. With AI, you can forecast future revenue or expenses.
It's like having a crystal ball that shows you where your business is heading based on past performance. AI uses patterns in your data to make predictions, helping you plan for the future.
What will my revenue look like next quarter if current trends continue?
The graph above shows the Time Series Forecasting for revenue:
Historical Data (solid line with circles): This represents the actual revenue values from 2010 to 2020.
Forecasted Data (dashed line with x's): This predicts revenue for the next three years (2021 to 2023) based on the historical trend.
The forecast suggests that revenue will continue to rise steadily if the current trend persists, giving you a clear idea of future performance based on past data. This kind of analysis helps businesses plan for future growth or identify potential risks.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Simulating a dataset for time series (yearly revenue over 10 years)
data = pd.DataFrame({
'Year': np.arange(2010, 2021),
'Revenue': [30000, 32000, 34000, 36000, 38000, 40000, 42000, 45000, 47000, 49000, 51000]
})
# Fit an Exponential Smoothing model for forecasting
model = ExponentialSmoothing(data['Revenue'], trend='add', seasonal=None, damped_trend=True)
fit = model.fit()
# Forecast the next 3 years
forecast_years = np.arange(2021, 2024)
forecast = fit.forecast(steps=len(forecast_years))
# Combine historical data and forecast into a single DataFrame for plotting
forecast_data = pd.DataFrame({
'Year': np.concatenate([data['Year'], forecast_years]),
'Revenue': np.concatenate([data['Revenue'], forecast])
})
# Plot historical data and forecast
plt.figure(figsize=(10, 6))
plt.plot(data['Year'], data['Revenue'], label='Historical Revenue', marker='o')
plt.plot(forecast_years, forecast, label='Forecasted Revenue', linestyle='--', marker='x')
plt.title('Time Series Forecasting: Revenue Prediction')
plt.xlabel('Year')
plt.ylabel('Revenue')
plt.legend(loc='best')
plt.grid(True)
# Display the plot
plt.show()
7. Outlier Detection: Catching the Oddballs in Your Data
This analysis identifies unusual data points that don’t fit the normal pattern.
Think of it as catching the oddball expenses or revenues that don’t make sense. Outlier detection helps you flag errors or unexpected entries that might need further investigation.
Were there any unusual spikes in expenses last month that don’t seem right?
The graph above shows the Outlier Detection in revenue data:
Blue circles represent the normal data points (revenue across different years).
Red dots highlight the outliers, which are values that deviate significantly from the expected trend (in this case, the year with revenue at 100,000 stands out as an outlier).
This type of analysis helps in identifying any unusual data points that may indicate errors or significant events worth investigating further.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore
# Simulating data with some outliers
data = pd.DataFrame({
'Year': np.arange(2010, 2021),
'Revenue': [30000, 32000, 34000, 36000, 38000, 100000, 42000, 45000, 47000, 49000, 51000] # Note the outlier (100000)
})
# Calculate Z-scores to identify outliers (Z-score > 3 or < -3)
data['Zscore'] = zscore(data['Revenue'])
data['Outlier'] = data['Zscore'].abs() > 3
# Plot the data points, highlighting outliers
plt.figure(figsize=(10, 6))
plt.plot(data['Year'], data['Revenue'], label='Revenue', marker='o')
# Highlight outliers
outliers = data[data['Outlier']]
plt.scatter(outliers['Year'], outliers['Revenue'], color='red', label='Outliers', zorder=5)
plt.title('Outlier Detection in Revenue Data')
plt.xlabel('Year')
plt.ylabel('Revenue')
plt.legend(loc='best')
plt.grid(True)
# Display the plot
plt.show()
By combining the power of SAP’s ACDOCA data and AI-driven analytics, your business can gain deeper insights into financial performance and make smarter, data-driven decisions. Whether you’re tracking revenue, keeping an eye on costs, or forecasting future trends, AI is here to help turn your data into actionable predictions.
Notables: Dataset Preparation:
#Extraction from ACDOCA:
#Python Extraction Program
import pyrfc
import pandas as pd
from pyrfc import Connection
import time
# SAP connection details
sap_conn_params = {
'user': 'username',
'passwd': 'password',
'ashost': 'hostname',
'sysnr': '00', # system number
'client': '100', # client number
'lang': 'EN', # language
}
# Function to connect to SAP
def connect_to_sap():
try:
conn = Connection(**sap_conn_params)
print("Connected to SAP")
return conn
except pyrfc.CommunicationError as e:
print(f"Error connecting to SAP: {str(e)}")
exit(1)
# Function to extract data from ACDOCA with error handling for SAPSQL_PARSE_ERROR
def extract_acdoca_data(conn, fields, batch_size=1000, max_rows=38000):
total_rows_extracted = 0
extracted_data = []
row_offset = 0
retry_attempts = 3 # Number of retry attempts in case of a recoverable error
valid_fields = []
for field in fields:
try:
# Test the field individually to check its validity
result = conn.call('RFC_READ_TABLE',
QUERY_TABLE='ACDOCA',
DELIMITER='|',
FIELDS=[{'FIELDNAME': field}],
ROWCOUNT=1)
if 'DATA' in result and result['DATA']:
print(f"Field '{field}' is valid.")
valid_fields.append(field)
else:
print(f"Field '{field}' is not valid or has no data.")
except pyrfc.RFCError as e:
print(f"Error with field '{field}': {str(e)}. Skipping this field.")
if not valid_fields:
print("No valid fields to extract data.")
return [], []
print(f"Proceeding with valid fields: {valid_fields}")
# Proceed to extract data with valid fields
while total_rows_extracted < max_rows:
try:
# Define filtering options (simplifying filtering to avoid SAPSQL errors)
options = []
# Extract data from ACDOCA with valid fields and row_offset for pagination
result = conn.call('RFC_READ_TABLE',
QUERY_TABLE='ACDOCA',
DELIMITER='|',
FIELDS=[{'FIELDNAME': field} for field in valid_fields],
OPTIONS=options,
ROWCOUNT=min(batch_size, max_rows - total_rows_extracted),
ROWSKIPS=row_offset)
# Check if data is returned
if 'DATA' in result and result['DATA']:
for row in result['DATA']:
extracted_data.append(row['WA'].split('|'))
total_rows_extracted += len(result['DATA'])
row_offset += len(result['DATA'])
print(f"Extracted {len(result['DATA'])} rows, total so far: {total_rows_extracted}")
else:
break # No more data available, stop the extraction
except pyrfc.RFCError as e:
print(f"Error during data extraction: {str(e)}")
retry_attempts -= 1
if retry_attempts > 0:
print(f"Retrying... {retry_attempts} attempts left.")
time.sleep(2) # Short delay before retrying
continue # Retry extraction
else:
print("Max retries reached. Aborting extraction.")
break
return extracted_data, valid_fields
# Main function to run the extraction process
def main():
conn = connect_to_sap()
# Manually define a set of fields to be tested
fields_to_test = ['BUKRS', 'BELNR', 'GJAHR', 'BUDAT', 'HSL', 'KSL', 'WSL']
# Step 1: Extract the data using valid fields
data, valid_fields = extract_acdoca_data(conn, fields_to_test, batch_size=1000, max_rows=38000)
# Step 2: Save the extracted data to a CSV file
if data:
df = pd.DataFrame(data, columns=valid_fields)
output_file = 'acdoca_38000_rows.csv'
df.to_csv(output_file, index=False)
print(f"Data saved to {output_file}")
else:
print("No data extracted.")
# Close the SAP connection
conn.close()
# Run the main function
if name == "__main__":
main()
Comments