AI Case Study: Implementing a Real-time Financial News Agent - Part 3: In this part we will extract the text from the screen

Image of a frustrated trader which is sitting in front of multiple screens. The amount of information overwhelms him

This article is part 3 of a series of articles about how to implement a real-time financial news agent. Here you can find the overview of the project: AI Case Study: Implementing a Real-time Financial News Agent - Overview

Overview

In this part we will capture the screen and extract the text from the screen.

Extraction Code:

import ollama

system_message = """Only extract the text from the image.

response in JSON in the format:

an Array (top level key result) with objects for each found text

the objects have the keys:
- type: <headline/name_tag/displayed_document/quoted_text/date_time_tag>
- text: <the text>

{
result:[
    {
      "type": "name_tag",
      "text": "Donald Trump"
    },
    {
      "type": "headline",
      "text": "New Tariffs on cars. 25%"
    }
]
}

Important that the top level is an array


"""


for i in range(0,100):

    response = ollama.chat(
        model='gemma3:4b',
        format="json",    
        messages=[
        {
            'role': 'system',
            'content': system_message,
        },   
        {
            'role': 'user',
            'content': 'What is in this image?',
            'images': ['./resources/trump-executive-order.png']
        }]
    )
    
    print(response['message']['content'])

Example - Extracted text from the image above:


{
  "result":[
    {
      "type": "headline",
      "text": "PRESIDENT TRUMP SIGNS EXECUTIVE ORDERS IN THE OVAL OFFICE, TAKES QUESTIONS WITH ELON MUSK"
    },
    {
      "type": "date_time_tag",
      "text": "FEBRUARY 11, 2025"
    }
  ]
}

On our Tesla P40 GPU, it takes 1.5 seconds to extract the text from the image with the gemma3:4b model. It just takes around 5GB of VRAM so we can run two or three instances at the same time. Meaning 1 analyzed frame per second is possible.

Later we will integrate this extraction mechanism into our screen capture code from part 2. Stay tuned for the next part.

DISCLAIMER: The provided code does not present a production-ready setup regarding security and stability.
All code presented in this tutorial is used at your own risk.
Consider always performing security audits before putting any code in production.

Furthermore, none of the parts of the tutorials or code content should be considered as financial advice.
Always consult a professional investment advisor before making any investment decisions.

At CorticalFlow expanding the cognitive ability of the user is our mission.

Check out our Website CorticalFlow
Follow us on Twitter @CorticalFlow
Check out our GitHub CorticalFlow
Check out our YouTube CorticalFlow