Overview
Permissions
Capabilities
Network Operations
Confirms text is in english, to prevent consuming LLM tokens on non-english text
The English Classifier uses a mix of technologies to determine if the provided text is in english or not. It's optimized for both long form and short form content, especially X, Blue Sky, and Truth Social posts. Its purpose is to quickly filter out non-english content to prevent wasting tokens on LLM calls.
The code currently has a Typescript implementation using franc for language detection against cleaned text, with an extendable REST API that allows for adding new tools without impacting the interface later. A Rust implementation that uses lingua and whatlang will follow once the Torus Rust SDK is available.
Swagger docs can be found at the base url for the REST API here: https://real-trump.fun/language-detection/
Source code and up to date documentation can be found here: https://github.com/ad0ll/torus-english-detection