Sidekick Goes to Washington: How the Department of Commerce Can Make Their Data AI-Ready
The Department of Commerce has issued a request for information: AI and Open Government Data Assets Request for Information, seeking expert input on making data AI-ready. At mySidewalk, we have extensive experience handling and transforming vast datasets and making them available to agentic AI (hello, Sidekick!), and we have been specifically asked for feedback. Here’s our take on how the DoC can enhance their data's AI-readiness.
Keep It Simple
To ensure AI systems can easily understand and use the data, it's crucial to employ consistent and accessible terminology. Avoid complex jargon and maintain internal consistency. Opt for popular, straightforward data formats like CSV and Parquet over intricate ones like buffers, Avro, or schema.org. While the RFI mentions knowledge graphs, these are less effective for distribution. Simple, tabular formats are more practical, as evidenced by the popularity of Hugging Face’s "datasets" platform/library. Additionally, store metadata alongside or (better yet) integrated with the data to simplify access and enhance usability.
Don't Spare the Metadata
Invest in detailed metadata documentation to save significant time and resources in the long run. Use relational technologies such as knowledge graphs, RDBMS, and schemas to centralize and maintain a robust ontology, converting it into simpler formats for distribution. Ensure all identifiers are human-readable and linked to human and machine-readable metadata, accommodating various use cases and technologies.
Make Evals Easy
Recognize the novelty of AI technology and the learning curve for many implementers. Provide resources for evaluating the quality and accuracy of AI systems. The Department of Commerce should offer a comprehensive, accurate, and dynamic evaluation suite to allow users to test AI systems’ knowledge and accuracy using the DoC’s data catalog, ensuring high-quality and reliable AI applications.
In summary, to make their data AI-ready, the Department of Commerce should focus on simplicity, thorough metadata documentation, and ease of evaluation. By following these guidelines, the DoC can enhance the accessibility and usability of their data, benefiting both current and future users. At mySidewalk, we’re committed to supporting communities using the latest technology and data for good. We believe these recommendations will significantly improve the AI-readiness of government data, paving the way for more effective and innovative applications.
No Comments Yet
Let us know what you think