Skip to main content

Command Palette

Search for a command to run...

Should I Stream JSON or Poll JSON: A Practical Take

Updated
3 min read
Should I Stream JSON or Poll JSON: A Practical Take
K

Frontend Software Engineer with a passion for unraveling the intricacies of software and systems. Armed with a B.Tech in Mech Engg and an MS in Engg Management from SJSU. My journey spans VFX, tech logistics, and automotive retail, always seeking the harmony between technology and practical application

Streaming vs. Polling JSON: Choosing the Right Approach for Rubric Generation

When designing the rubric generation system, I faced a key architectural decision:

Should I stream JSON responses as they are generated, or should I poll the backend for updates?

Streaming sounds like the obvious choice for real-time applications, but after careful consideration, I chose polling. Here’s why.

Understanding the Problem: Rubric Generation is an Asynchronous Process

Generating rubrics is not an instant operation. It involves multiple steps, each happening asynchronously:

  1. Analyze the uploaded question paper.

  2. Extract questions and transform them into JSON.

  3. Send each JSON block to OpenAI for evaluation (create tasks for OpenAI to generate rubrics for each question).

  4. Store responses in the database for retrieval.

Since each JSON response is generated independently, streaming might seem like a good fit. But does it actually make sense?

Challenges with Streaming JSON

If I were to stream rubric data to the frontend, several key questions arose:

1) What if the user navigates away or kills the app?

  • Streaming requires an active connection. If the user leaves mid-process, they could lose all streamed data.

  • Polling allows users to resume tracking progress anytime, even after navigating away.

2) When should we insert data into the database?

  • Should we stream first and insert later? If so, what happens if the database rejects some updates?

  • Should we insert first and then stream? If so, how do we ensure updates are pushed properly?

3) What if one process breaks?

  • If data insertion and streaming run separately, one could fail while the other succeeds.

  • This could lead to users seeing rubric updates that don’t actually exist in the database.

  • To avoid this, I would need to set up queues, event listeners, and retry mechanisms, making the infrastructure much more complex.

At this point, streaming JSON became a rabbit hole of complexity. I realized that I would need to:
1. Maintain a queue to persist data while streaming.
2. Set up event listeners to detect failures and retry processing.
3. Ensure that streaming doesn’t outpace database updates.

This added unnecessary complexity at both the infrastructure and implementation levels.

Polling JSON: A More Predictable & Reliable Approach

While polling is often seen as an "archaic" approach, it actually provides predictability and consistency in this case.

Here’s how the rubric generation flow works with polling:

  1. User uploads a question paper.

  2. The system extracts questions and converts them into JSON.

  3. A background job is started to generate rubrics for the entire question set.

  4. As the LLM generates rubrics for each question, we insert them into the database under a common parent ID.

  5. Once all questions have been processed, the job is marked as finished.

  6. The frontend polls an API that aggregates all rubrics linked to the parent ID and displays them in real time.

Why Polling Wins in This Scenario

More Reliable: Polling always retrieves data from the database, ensuring consistency.
Handles User Navigation: Users can leave and return at any time without losing progress.
Simpler Infrastructure: No need for persistent connections, queues, or event-driven architecture.
Predictable Execution: Each poll fetches exactly what’s in the database—nothing more, nothing less.

Final Thoughts: Occam’s Razor in Action

Occam’s Razor states:
"The simplest solution that meets all requirements is usually the best."

Polling ensures real-time updates while avoiding the complexities of streaming, queue management, and event handling.

For use cases like chat apps or stock tickers, streaming makes sense. But for a job-based system like rubric generation, polling is the better choice.

A short video of rubrics polling in action.