Pagerduty has indicated a potential incident with our Furnishment API -- Bloom is actively investigating
Incident Report for bloomcredit
Resolved
Incident Severity: Low

Affected Services/API Endpoints:

Furnishment API

Description:

The performance degradation issue affecting the Product Catalog API has been successfully resolved. Response times for API requests have returned to normal levels, and users should no longer experience delays or timeouts when retrieving product information.

Resolution Details:

After conducting a thorough investigation, the engineering team identified and implemented optimizations to address the root cause of the performance degradation. Specifically, the following actions were taken:

- Optimized database queries to improve query execution times and reduce load on the database server.
- Implemented caching mechanisms to store frequently accessed product data and reduce the need for repeated database queries.
- Enhanced monitoring and alerting systems to proactively identify and mitigate performance issues in the future.
- Conducted extensive testing to validate the effectiveness of implemented optimizations and ensure the stability of the Furnishment API.

Next Steps:

- Conduct a post-incident analysis to review the incident response process, identify lessons learned, and implement improvements to prevent similar performance issues in the future.
- Enhance documentation and training materials to educate team members on best practices for optimizing API performance and troubleshooting performance-related issues.
- Continue monitoring system metrics and user feedback to promptly address any emerging performance concerns and maintain the reliability of the Furnishment API.

Communication Plan:

- Internal communication: An internal post-mortem meeting will be scheduled to discuss the incident response process and identify areas for improvement. Updates will be provided to relevant teams via email and virtual meetings.
- External communication: An update will be posted on the public status page and sent to impacted customers and partners via email/slack channels notifying them of the resolution of the performance degradation incident.

Internal Notes:

The engineering team will continue to monitor the performance of the Furnishment API closely to ensure that the optimizations implemented remain effective and that any emerging performance issues are promptly addressed.

Follow-up:

A post-incident analysis will be conducted to identify contributing factors and implement preventive measures to mitigate similar performance issues in the future.

Incident Owner: Furnishment Team

Key Stakeholders: Product Management, Customer Support
Posted Mar 29, 2024 - 18:53 UTC
Investigating
Affected Services/API Endpoints:

Furnishment API

Description:

We observed a degradation in the performance of our Furnishment Product Catalog API. Response times for API requests have increased significantly, resulting in delays in retrieving product information for both internal systems and external integrations. This degradation in performance impacts the user experience and may lead to decreased efficiency for customers relying on our API services.

Current Status:

The incident is under active investigation. Our engineering team is analyzing system metrics and logs to identify the root cause of the performance degradation. Initial investigations suggest a potential bottleneck in the database query processing layer.

Root Cause Analysis (if known):

Preliminary analysis indicates that the performance degradation may be attributed to increased database query load, potentially caused by a surge in API requests or inefficient query execution.

Actions Taken:

- Engaged the engineering team to investigate and troubleshoot the performance issues.
- Implemented temporary optimizations to alleviate the load on the affected database server.
- Monitored system metrics and performance indicators to track the effectiveness of implemented optimizations.
- Notified relevant internal teams, including product management and customer support, about the ongoing incident.

Next Steps:

Continue investigating the root cause of the performance degradation, focusing on database query optimization and scalability improvements.
Implement long-term solutions to enhance the resilience and scalability of the Product Catalog API.
Communicate updates internally and externally as the investigation progresses and performance improvements are achieved.

Estimated Time to Resolution (ETR):

The estimated time to resolution is currently unknown. Updates will be provided as more information becomes available and improvements are implemented.

Communication Plan:

Internal communication: Regular updates will be provided to relevant teams via email, Slack channels, and virtual meetings.
External communication: A public status page will be updated with information about the incident and expected resolution times. Additionally, email notifications and slack messages will be sent to impacted customers and partners.

Internal Notes:

Additional resources and support have been allocated to the engineering team to expedite the resolution process and minimize the impact on customers.


Follow-up:

A post-incident analysis will be conducted to identify contributing factors and implement preventive measures to mitigate similar performance issues in the future.

Incident Owner: Furnishment Team

Key Stakeholders: Product Management, Customer Support
Posted Mar 29, 2024 - 18:47 UTC
This incident affected: Furnishment API (Submission Engine, Data Management).