In a recent blog post, we talked about how the Lightspeed Systems Web Filter’s URL database works, including how content is categorized. We published this because Lightspeed believes in total transparency. Transparency is also why our Web Filter’s URL database is available to the public. (This database also categorizes websites in new Relay for Chrome.) With the Dynamic Database Lookup tool, anyone can search a URL; see how it is categorized and why; and submit a website for review and recategorization if they believe it has been miscategorized.
We have had some questions recently about our website categorizations, what they mean, and how they’re made. Here are five interesting things to know about our URL database and Web Filter.
1. Our database categorizes sites based on content. Lightspeed Systems has a set of consistent categorization guidelines that are applied to the 60 million websites in our database. Our human and robot teams continually monitor these websites, checking each other’s work and categorizing content into these 31 categories. (You can learn more about precisely how those sites are monitored and reviewed here.)
Online content has evolved considerably since we began filtering K-12 schools 17 years ago. Over time, we have updated and added to our default categories and will continue to do so.
The Dynamic Database Lookup tool allows anyone to search URLs and see how we categorize them by default.
2. Website content is automatically categorized based on a variety of factors. Our categorization engine software is also involved in categorizing websites, and has a similarly consistent set of standards for evaluating content. The engine analyzes websites based on factors including keywords; URL strings; common phrases; top-level and second-level domains; files found; outgoing links; and content labels.
Outgoing links are a particularly weighty factor for website categorizations. Sites with links to pornographic content will typically be categorized as porn by association. Similarly, domains that link to multiple adult sites will cause the engine to categorize those domains as adult. Outside links to inappropriate content are strong indicators that a site isn’t child-friendly. This is just one reason why a site that looks relatively tame on its face may be categorized adult.
Speaking of the adult category….
3. “Adult” doesn’t mean “porn.” Some users have been alarmed to see their favorite news sites, social networks and content-sharing services categorized adult. These are typically websites that include a wide variety of content — for instance, images that contain nudity or adult content on sexuality. Think of “adult” as “mature.” Adult sites may have plenty of interesting, informative content that’s appropriate for students — but also mature content that isn’t appropriate, especially for younger students.
For instance, image curation sites typically have acceptable use policies that forbid sharing graphic imagery, but may not strictly enforce these policies. This is precisely why we categorize Pinterest as adult.
Aggregators, in addition to sharing straightforward news content, may feature images containing nudity, or offer frank articles about adult subjects like sex. If these sites don’t have a clear taxonomy that allows us to categorize subdomains in a granular manner, then we will categorize them as adult by default. Administrators can recategorize these types of sites if they want them available to their students.
4. Don’t like our categories? Feel free to change them! Our Web Filter’s default categories were built to make digital learning safe and CIPA compliance easy. However, we recognize that safe online learning is not a one-size-fits-all approach. That’s why Lightspeed Systems doesn’t block the web — we categorize it. Web Filter’s default categories are just that — out-of-the-box settings that any filter administrator can change with a few clicks. Administrators are free to recategorize online content on a local level, giving school IT departments the flexibility to make informed choices about what’s best for their schools.
There are only three categories of websites that cannot be unblocked or recategorized locally: sites in the offensive, illicit and extremism categories. These are sites that feature highly explicit content (e.g. pornography, mutilation) or encourage violence to advance an agenda or set of beliefs. These are considered sealed categories due to overwhelming concerns about student safety.
5. Always feel free to tell us if you disagree. If you believe a website has been categorized incorrectly, we’re happy to hear from you! Submit any site for review here; if you include your email address and reason for your request, your submission will go to the top of our review queue. We will also contact you when we have completed the review.