Skip to content

Commit

Permalink
Use a context-free grammar to simplify parsing (#36)
Browse files Browse the repository at this point in the history
* try lark cfg

* fixes #35

* support just a or p for meridiem

* basic support for ranges, distribute date, meridiem

* distribute meridiem across choices

* optional comma

* distribute meridiem across time objects too

* two more tests

* support plaintext times/dates/weekdaysc

* support distributing time across multiple dates

* distribute across ranges

* support (time date) and distribute dates from last or first

* hardcoded support for duration in minutes

* more todos

* added support for past instead of future - fixes #7

* rename to tfh

* new config for transformer

* infer datetimes to match old behavior

* modernize tests

* added tests for gh issues (not fixed yet)

* support most duration possibilities, fixes #22

* test for case insensitivity, fixes #24

* added support for in/for/ago, fixes #25

* hm how to deal with random text?

* fixed dash dates

* handle ambiguous day or token with new rule

* support for day suffixes

* support just day-th

* some comments

* added lark as dep, move now to config

* mention lists too

* Update README.md

* starting transformer refactor

* pass a few tests + add range/list custom objs

* more generalized infer

* passed all no-inference tests

* pass simple date/time tests with inference

* pass all datetime inference tests

* pass all tests - support range infer

* distribute month and time

* simpler datetime infer, fixes #8

* only infer if not defined

* distribute month across days

* passed the other distribute month test too

* catch case with two ambiguous tokens

* allow comma and or

* drop unused files

* rename unknown -> ambiguous

* finally allow unknown tokens, fixes #26

* oops added fully spelled out weekdays

* added more weekdays

* added more text

* update readme with up-to-date information

* annotate type

* nits for formatting

* nits for formatting, drop old section

* Update README.md

* 2x faster tests just by reusing grammar ast

* version bump + update to include readme

* moved grammar to dedicated file

* drop unneeded code

* nit delete byline

* updated pytest, drop travis

* fixed coverage settings

* move makefile into readme, no need

* add a note about recurrence, from a different PR
  • Loading branch information
alvinwan authored Jan 16, 2025
1 parent 545e427 commit 6d174a0
Show file tree
Hide file tree
Showing 16 changed files with 852 additions and 1,592 deletions.
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[run]
include = */timefhuman/*
include = timefhuman/*
omit = tests/*

[report]
Expand Down
15 changes: 0 additions & 15 deletions .travis.yml

This file was deleted.

3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
include README.md
include LICENSE
include timefhuman/grammar.lark
4 changes: 0 additions & 4 deletions Makefile

This file was deleted.

141 changes: 70 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,118 +2,117 @@

[![PyPi Downloads per Month](https://img.shields.io/pypi/dm/timefhuman.svg)](https://pypi.python.org/pypi/timefhuman/)
[![Coverage Status](https://coveralls.io/repos/github/alvinwan/timefhuman/badge.svg?branch=master)](https://coveralls.io/github/alvinwan/timefhuman?branch=master)
[![Build Status](https://travis-ci.org/alvinwan/timefhuman.svg?branch=master)](https://travis-ci.org/alvinwan/timefhuman)

Convert human-readable, date-like strings written in natural language to Python objects. Describe specific datetimes or ranges of datetimes. [Supports Python3+](https://github.com/alvinwan/timefhuman/issues/3). You can skip installation and try timefhuman directly, using the [pytwiddle demo →](https://pytwiddle.com/?id=example:datetime.py)
Convert human-readable, date-like strings written in natural language to Python objects. Find datetimes, ranges of datetimes, lists of datetimes, and durations in text. Supports Python3+[^1]

[^1]: https://github.com/alvinwan/timefhuman/issues/3

----

## Getting Started

To start, describe days of the week or times of day in the vernacular.

```
```python
>>> from timefhuman import timefhuman
>>> timefhuman('upcoming Monday noon')

>>> timefhuman('Monday noon')
datetime.datetime(2018, 8, 6, 12, 0)
```

Use any human-readable format with a time range, choices of times, or choices of time ranges.
Use any human-readable format to describe a datetime, datetime range, list of datetimes, or a duration. You can also use any combination of the above, such as a list of ranges.

```
>>> timefhuman('7/17 3-4 PM')
```python
>>> timefhuman('3p-4p') # time range
(datetime.datetime(2018, 7, 17, 15, 0), datetime.datetime(2018, 7, 17, 16, 0))
>>> timefhuman('7/17 3 p.m. - 4 p.m.')
(datetime.datetime(2018, 7, 17, 15, 30), datetime.datetime(2018, 7, 17, 16, 0))
>>> timefhuman('Monday 3 pm or Tu noon')

>>> timefhuman('7/17 4PM to 7/17 5PM') # datetime range
(datetime.datetime(2018, 7, 17, 16, 0), datetime.datetime(2018, 7, 17, 17, 0))

>>> timefhuman('Monday 3 pm or Tu noon') # list of datetimes
[datetime.datetime(2018, 8, 6, 15, 0), datetime.datetime(2018, 8, 7, 12, 0)]
>>> timefhuman('7/17 4 or 5 PM')
[datetime.datetime(2018, 7, 17, 16, 0), datetime.datetime(2018, 7, 17, 17, 0)]
>>> timefhuman('7/17 4-5 or 5-6 PM')

>>> timefhuman('30 minutes') # duration
datetime.timedelta(seconds=1800)

>>> timefhuman('7/17 4-5 or 5-6 PM') # list of datetime ranges
[(datetime.datetime(2018, 7, 17, 16, 0), datetime.datetime(2018, 7, 17, 17, 0)),
(datetime.datetime(2018, 7, 17, 17, 0), datetime.datetime(2018, 7, 17, 18, 0))]
```

Parse lists of dates and times with more complex relationships.
`timefhuman` will also infer any missing information, using context from other datetimes.

```
>>> timefhuman('7/17, 7/18, 7/19 at 2')
[datetime.datetime(2018, 7, 17, 2, 0), datetime.datetime(2018, 7, 18, 2, 0), datetime.datetime(2018, 7, 19, 2, 0)]
>>> timefhuman('2 PM on 7/17 or 7/19')
[datetime.datetime(2018, 7, 17, 14, 0), datetime.datetime(2018, 7, 19, 14, 0)]
```
```python
>>> timefhuman('3-4p') # infer "PM" for "3"
(datetime.datetime(2018, 7, 17, 15, 0), datetime.datetime(2018, 7, 17, 16, 0))

Use the vernacular to describe ranges or days.
>>> timefhuman('7/17 4 or 5 PM') # infer "PM" for "4" and infer "7/17" for "5 PM"
[datetime.datetime(2018, 7, 17, 16, 0), datetime.datetime(2018, 7, 17, 17, 0)]

>>> timefhuman('7/17, 7/18, 7/19 at 9') # infer "9a" for "7/17", "7/18"
[datetime.datetime(2018, 7, 17, 9, 0), datetime.datetime(2018, 7, 18, 9, 0),
datetime.datetime(2018, 7, 19, 9, 0)]
```
>>> timefhuman('noon next week') # coming soon

>>> timefhuman('today or tomorrow noon') # when run on August 4, 2018
[datetime.datetime(2018, 8, 4, 12, 0), datetime.datetime(2018, 8, 5, 12, 0)]
You can also pass in irrelevant text, and `timefhuman` will return all datetime-like objects in the text. You could use this to extract datetimes from an email for example.

```python
>>> timefhuman("How does 5p mon sound? Or maybe 4p tu?")
[datetime.datetime(2018, 8, 6, 17, 0), datetime.datetime(2018, 8, 7, 16, 0)]
```

# Installation
See more examples in [`tests/test_e2e.py`](tests/test_e2e.py).

## Installation

Install with pip using

```
```python
pip install timefhuman
```

Optionally, clone the repository and run `python setup.py install`.
Optionally, clone the repository and run `pip install -e .`.

You can also try timefhuman without a local installation using the [twiddle](https://pytwiddle.com/?id=example:datetime.py).
## Advanced Usage

# Usage
Use the `tfhConfig` class to configure `timefhuman`. For example, you can pass a `now` datetime to use different default values.

Use the `now` kwarg to use different default values for the parser.

```
```python
>>> from timefhuman import timefhuman, tfhConfig
>>> import datetime
>>> now = datetime.datetime(2018, 8, 4, 0, 0)
>>> timefhuman('upcoming Monday noon', now=now)
>>> config = tfhConfig(now=datetime.datetime(2018, 8, 4, 0, 0))

>>> timefhuman('upcoming Monday noon', config=config)
datetime.datetime(2018, 8, 6, 12, 0)
```

Use a variety of different formats, even with days of the week, months, and times with everyday speech. These are structured formats. [`dateparser`](https://github.com/scrapinghub/dateparser) supports structured formats across languages, customs etc.
Alternatively, you can completely disable date inference by setting `infer_datetimes=False`. Instead of always returning a datetime, `timefhuman` will be able to return date-like or time-like objects for only explicitly-written information.

```
>>> from timefhuman import timefhuman
>>> now = datetime.datetime(year=2018, month=7, day=7)
>>> timefhuman('July 17, 2018 at 3p.m.')
datetime.datetime(2018, 7, 17, 15, 0)
>>> timefhuman('July 17, 2018 3 p.m.')
datetime.datetime(2018, 7, 17, 15, 0)
>>> timefhuman('3PM on July 17', now=now)
datetime.datetime(2018, 7, 17, 15, 0)
>>> timefhuman('July 17 at 3')
datetime.datetime(2018, 7, 17, 3, 0)
>>> timefhuman('7/17/18 3:00 p.m.')
datetime.datetime(2018, 7, 17, 15, 0)
```

# Why
```python
>>> config = tfhConfig(infer_datetimes=False)

[`dateparser`](https://github.com/scrapinghub/dateparser) is the current king of human-readable-date parsing--it supports most common structured dates by trying each one sequentially ([see code](https://github.com/scrapinghub/dateparser/blob/a01a4d2071a8f1d4b368543e5e09cde5eb880799/dateparser/date.py#L220)). However, this isn't optimal for understanding natural language:
>>> timefhuman('3 PM', config=config)
datetime.time(15, 0)

```
>>> import dateparser
>>> dateparser.parse("7/7/18 3 p.m.") # yay!
datetime.datetime(2018, 7, 7, 15, 0)
>>> dateparser.parse("7/7/18 at 3") # :(
>>> dateparser.parse("7/17 12 PM") # yay!
datetime.datetime(2018, 7, 7, 12, 0)
>>> dateparser.parse("7/17/18 noon") # :(
>>> dateparser.parse("7/18 3-4 p.m.") # :((((( Parsed July 18 3-4 p.m. as July 3 4 p.m.
datetime.datetime(2018, 7, 3, 16, 0)
>>> timefhuman('12/18/18', config=config)
datetime.date(2018, 12, 18)
```

To remedy this, we can replace "noon" with "12 p.m.", "next Monday" with "7/17/18", "Tu" with "Tuesday" etc. and pass the cleaned string to `dateparser`. However, consider the number of ways we can say "next Monday at 12 p.m.". Ignoring synonyms, we have a number of different grammars to express this:
Here is the full set of supported configuration options:

- 12 p.m. on Monday
- first Monday of August 12 p.m.
- next week Monday noon
```python
class tfhConfig:
direction: Direction = Direction.next # next/previous/none
infer_datetimes: bool = True # infer missing information using current datetime
now: datetime = datetime.now() # current datetime, only used if infer_datetimes is True
```

This issue compounds when you consider listing noontimes for several different days.
## Development

- first half of next week at noon
- 12 p.m. on Monday Tuesday or Wednesday
- early next week midday
To run tests and simultaneously generate a coverage report, use the following commands:

The permutations--even the possible *combinations*--are endless. Instead of enumerating each permutation, `timefhuman` extracts tokens: "anytime" modifies the type from 'date' to 'range', "next week" shifts the range by 7 days, "p.m." means the string right before is a time or a time range etc. Each set of tokens is then combined to produce datetimes, datetime ranges, or datetime lists. This then allows `timefhuman` to handle any permutation of these modifiers. Said another way: `timefhuman` aims to parse *unstructured* dates, written in natural language.
```shell
$ py.test --cov
$ coverage html
$ open htmlcov/index.html
```
2 changes: 0 additions & 2 deletions pytest.ini

This file was deleted.

10 changes: 6 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from setuptools import setup
from setuptools.command.test import test as TestCommand

tests_require = ['pytest==3.7.1', 'pytest-cov==2.5.1', 'coverage==4.5.1', 'coveralls==1.3.0']
install_requires = []
tests_require = ['pytest==8.3.4', 'pytest-cov==6.0.0', 'coverage==7.6.10', 'coveralls==4.0.1']
install_requires = ['lark==1.2.2']


class PyTest(TestCommand):
Expand All @@ -24,15 +24,17 @@ def run_tests(self):
sys.exit(errno)


VERSION = '0.0.5'
VERSION = '0.1.0'

setup(
name="timefhuman",
version=VERSION,
author="Alvin Wan",
author_email='hi@alvinwan.com',
description=("Convert natural language date-like string to Python objects"),
license="BSD",
long_description=open('README.md', 'r', encoding='utf-8').read(),
long_description_content_type='text/markdown',
license="Apache 2.0",
url="https://github.com/alvinwan/timefhuman",
packages=['timefhuman'],
tests_require=tests_require,
Expand Down
3 changes: 0 additions & 3 deletions test.py

This file was deleted.

Loading

0 comments on commit 6d174a0

Please sign in to comment.