High Level Plan for Initial Development and Rollout
We envision the following phases for the design and development of EXTRA. (The phases are in likely order of execution, but may well overlap).
- Hire a developer experienced in natural language processing
- Hire a linguist for writing the rules and testing the EXTRA platform
- Gather Steering Committee of news publishers from IPTC and beyond
Evaluate existing open source projects and frameworks
- Survey other open source efforts to see whether any could accelerate the development of EXTRA
- Including GATE, UIMA, NLTK, OpenNLP, SRILM
Design and develop technical approach
- Design high-level technical approach, select implementation technologies
- Design EXTRA API for maintaining rule sets and classifying documents
- Decide which two languages will be supported by the initial prototype
- Assemble and annotate two test corpuses, one for each language, with desired taxonomy
- Design the rule language and rule sets for applying the taxonomy to the two corpuses
- Develop a minimum viable rules engine
- Configure source code management for EXTRA on github
- Publish documentation - project overview, contribution guidelines
- Draft preliminary list of requirements and features
- Agree on and publish license for EXTRA
- Secure twitter handles, launchpad account and domain names
- Set up an EXTRA email list and an EXTRA Slack
Develop EXTRA software and rule sets
- Publicize releases of EXTRA platform and rule sets
- Solicit, prioritize and implement features and bug fixes for EXTRA
- Write guidebook for how to integrate EXTRA platform
- Write guidebook for how to develop and test EXTRA rule sets
First non-core open source contributor
First production deployment