Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Pass custom values from Matcher pattern definitions to matched tokens #13520

Open
apodgorny opened this issue Jun 5, 2024 Discussed in #13519 · 0 comments

Comments

@apodgorny
Copy link

apodgorny commented Jun 5, 2024

Discussed in #13519

Originally posted by apodgorny June 5, 2024
Consider a case where I need to tag FAX and TEL separately.

Tel: 24234-3433-3322
Fax: 24234-3433-3323

I currently have two options for NER with Matcher:

  1. Match [{'LOWER': 'tel'}, {'ORTH': ':'}, {PATTERN_TO_MATCH_PHONE}]
  2. Match [{PATTERN_TO_MATCH_PHONE}]

Neither case accomplishes the goal

  1. Has unnecessary extra tokens (that may be needed for additional unrelated tagging – I have a case to show as well)
  2. Does not distinguish between FAX and TEL

SOLUTION:

Token.set_extension('exclude', default=False, force=True)
patterns = [
    {'LOWER': 'tel', '_': {'exclude': True}}, 
    {'ORTH': ':', '_': {'exclude': True}}, 
    {PATTERN_TO_MATCH_PHONE}
]

These custom values should be passed into tokens matched by call: matches = matcher(doc), to be able to distinguish between them based on pattern that matched like so doc[n]._.exclude == True

This would covers multiple cases that were previously hard or impossible to solve with SpaCy matcher:

  1. Matching by preceding tokens
  2. Matching by following tokens
  3. Matching complex pattern of tokens that appear in a constellation to tag them separately.
  4. Cascading match, where you tag items and match again relying on previously tagged entities, but not overwriting them
  5. Other potential cases, that I did not think of, but other could invent, that would benefit from possibility of passing data this way.

Thank you for awesome library – this addition would make it awesome-awesome :)

P.S. Extra credit :)
If we could do matches[n].tokens it would be triple awesome

@apodgorny apodgorny changed the title Feature Request: Add PATTERN_ID option in Matcher pattern definitions Feature Request: Pass custom values from Matcher to matched tokens Jun 9, 2024
@apodgorny apodgorny changed the title Feature Request: Pass custom values from Matcher to matched tokens Feature Request: Pass custom values from Matcher pattern definitions to matched tokens Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant