Fix example evals #4

davidaparicio · 2024-10-31T21:47:53Z

Need to fix some example evals like : FAILED examples/triage_agent/evals.py::test_conversation_is_successful[messages0] - assert False == True or this one:

___________________________________________________________________________________________________ test_does_not_call_weather_when_not_asked[Hi!] ____________________________________________________________________________________________________

query = 'Hi!'

    @pytest.mark.parametrize(
        "query",
        [
            "Who's the president of the United States?",
            "What is the time right now?",
            "Hi!",
        ],
    )
    def test_does_not_call_weather_when_not_asked(query):
        tool_calls = run_and_get_tool_calls(weather_agent, query)

>       assert not tool_calls
E       assert not [{'function': {'arguments': '{"location": "New York", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]

examples/weather_agent/evals.py:44: AssertionError
==== short test summary info =====
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[Who's the president of the United States?] - assert not [{'function': {'arguments': '{"location": "United States"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[What is the time right now?] - assert not [{'function': {'arguments': '{"location": "", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[Hi!] - assert not [{'function': {'arguments': '{"location": "New York", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
====== 3 failed, 3 passed in 3.10s ======

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix example evals #4

Fix example evals #4

davidaparicio commented Oct 31, 2024

Fix example evals #4

Fix example evals #4

Comments

davidaparicio commented Oct 31, 2024