Commons talk:Categories
Need help with categories? Try the Commons:Help desk. Questions about how to handle a category issue? Try Commons:Help desk or (if the issue has wider implications), Commons:Village pump. |
Sortkey recommendations
[edit]Regarding this:
- The special sortkey
τ
(lowercase Greek letter Tau) is used to sort templates at the end of the related Commons-category, see for example Category:Transport templates sorted in Category:Transport. (Sorting in Commons is not case sensitive so only uppercase Τ (Tau) is shown.)
I wonder whether this is a good recommendation. I tried it for a few categories and I found it quite confusing because the uppercase Tau is, in practice, visually indistinguible from the latin letter T. I see a lot of template categories with a sortkey of ~
and would actually consider that a very good idea because it's sorted after Z, it's visually recognizable, and it's a character which you probably can find on way more keyboards than the Tau.
On a side note, I would expect using three different dashes as sortkeys to create a lot of confusion, and for many people it will be hard to understand the difference between them. So I would also suggest to remove any mention of the emdash and the endash here.
Thanks -- Reinhard Müller (talk) 15:35, 16 February 2024 (UTC)
- agreed. i think common practice is using ~ for anything commons-, or more broadly, wikimedia-related stuff.
- the dashes were added by https://commons.wikimedia.org/w/index.php?title=Commons%3ACategories&diff=prev&oldid=703824439 . RZuo (talk) 15:40, 16 February 2024 (UTC)
- @W like wiki: any input from your side? --Reinhard Müller (talk) 15:53, 22 February 2024 (UTC)
- Grüß Dich @Reinhard Müller: Yes, I agree. I can't remember on which Commons page I read this recommondation with the Greek Tau before including it here, but I had always the same problems, need to copy paste it from here or even made a copy of this on my user page. And also about the equal look to normal "T" I was not so happy, but when I wrote this chapter I can not create a new rule. But when we do this now I appreciate!
- Same with the dashes. Maybe some have an idea if they are useful, but also here I would apreciate a changing, deletion in this case. Best Regards -- W like wiki Please ping me! • Postive1 • Postive2 16:25, 22 February 2024 (UTC)
- Thanks to everybody who commented! I updated the page and hope that I didn't mess up anything regarding the translation. --Reinhard Müller (talk) 18:15, 22 February 2024 (UTC)
Controversial categories
[edit]Hi, I'd like to get feedback regarding categories that can be seen as controversial. On en-wiki, there is a rule that
Categorizations should generally be uncontroversial; if the category's topic is likely to spark controversy, then a list article (which can be annotated and referenced) is probably more appropriate.
As far as I can see there is no such policy on Wikicommons. Is there some other policy which deals with this issue? What is the community consensus?
To provide a concrete example, this edit added back the category Territories under occupation by Russia to the category Abkhazia. This is controversial, since while the overwhelming majority of countries consider Abkhazia to be a part of Georgia, only a minority explicitly said that it's occupied by Russia (see Wikipedia:Russian-occupied_territories_in_Georgia#International_position).
I believe that this category isn't helpful since the category name cannot explain all these nuances. It would be better to create a page/gallery with the related media. I'm pinging User:Laurel_Lodged who has added this category. Alaexis (talk) 09:21, 28 May 2024 (UTC)
- I agree that some such policy is needed in Commons. I agree that "Categorizations should generally be uncontroversial". But one editor's uncontroversial is another editor's hot potato. Unlike Wiki, Commons does not lend itself to list article creation. So the likely solution is a case-by-case evaluation and an agreement to adhere to community consensus. By the way, regarding Abkhazia, Wiki itself says, "On 23 October 2008, the Parliament of Georgia declared Abkhazia a Russian-occupied territory, a position shared by most United Nations member states.[1] So it's not just me. Laurel Lodged (talk) 15:05, 28 May 2024 (UTC) Laurel Lodged (talk) 15:05, 28 May 2024 (UTC)
- Ehh, where do you see that in the source? I've tagged it on en-wiki. If I'm missing something and the source does say it, then indeed it wouldn't be controversial and I would not object to placing Abkhazia in this category. Alaexis (talk) 20:23, 28 May 2024 (UTC)
- "Georgia asserted that the territories of South Ossetia and Abkhazia, including the upper Kodori Valley, were occupied by Russian forces. On 23 October, the Parliament of Georgia adopted a law declaring Abkhazia and South Ossetia “occupied territories” and the Russian Federation a “military occupier.” This claim was reiterated […] In describing the “current occupation” Georgia also stated: “the western part of the former ‘buffer zone’ (the village of Perevi in the Sachkhere District) remains under Russian occupation." If Wiki is making claims not supported by sources, then Wiki is the place to make those edits. Laurel Lodged (talk) 10:30, 30 May 2024 (UTC)
- Yes, absolutely. But there is a difference between *Georgia* considering it an occupied territory and "most UN members" sharing this position. I never argued with the former. Alaexis (talk) 08:42, 1 June 2024 (UTC)
- "Georgia asserted that the territories of South Ossetia and Abkhazia, including the upper Kodori Valley, were occupied by Russian forces. On 23 October, the Parliament of Georgia adopted a law declaring Abkhazia and South Ossetia “occupied territories” and the Russian Federation a “military occupier.” This claim was reiterated […] In describing the “current occupation” Georgia also stated: “the western part of the former ‘buffer zone’ (the village of Perevi in the Sachkhere District) remains under Russian occupation." If Wiki is making claims not supported by sources, then Wiki is the place to make those edits. Laurel Lodged (talk) 10:30, 30 May 2024 (UTC)
- Ehh, where do you see that in the source? I've tagged it on en-wiki. If I'm missing something and the source does say it, then indeed it wouldn't be controversial and I would not object to placing Abkhazia in this category. Alaexis (talk) 20:23, 28 May 2024 (UTC)
- Question @Alaexis, @Laurel Lodged It is hard to tell if this is really a question about general policy or if it is really a discussion about a particular case. In the case of the latter, this should really be had as a CfD over Category:Abkhazia. If there is a change to policy that you think would help improve things, that should be discussed here, and you can certainly refer to this case as reference. Josh (talk) 20:15, 18 July 2024 (UTC)
References
- ↑ Georgia/Russia, Independent International Fact-Finding Mission on the Conflict in South Ossetia | How does law protect in war? - Online casebook. casebook.icrc.org. Archived from the original on 4 October 2023. Retrieved on 5 March 2024.
FYI: Moved historical page, redirected that target to this page
[edit]"Commons:Naming categories" now redirects to this commons: ns page. It was problematic for the number of links (internal and from WD), and the confusion being caused with the pre-existing arrangement.
The page that was at that space is now at Commons:Naming categories (historical). The number of links to its detail are minimal, and it should not be problematic for functional management of this site having it moved. — billinghurst sDrewth 00:02, 31 May 2024 (UTC)
- @Billinghurst Thank you for doing that, it is a big help to avoid confusion for folks. Josh (talk) 20:02, 18 July 2024 (UTC)
Sortkey recommendations
[edit]a question that bounced around in my mind a few times is what are the purposes of each of the symbolic sortkeys? the most commons ones I see are '(space)', '*', '+' and '~'. what are their roles?
- So far, that's not clearly defined, and different people use completely different sortkey prefixes for the same purpose.
- I have collected a few ideas about what could be seen as "best practice". I don't know whether we actually want to come up with a policy or at least a recommendation, but if, then this list might serve as a base for that. Thanks --Reinhard Müller (talk) 07:02, 9 July 2024 (UTC)
another thing while discussing sorting is a common thing that I see in category pages with accent marks in the titles: they use {{DEFAULTSORT:}} to exclude remove the accents. simple example is 'café' which is turned into {{DEFAULTSORT:cafe}}. if this is something that should be encouraged in the wiki, please feel free to add it to the policy! Juwan (talk) 10:11, 8 July 2024 (UTC)
- @Reinhard Müller, thank you for sharing some ideas on ways to use a variety of sort keys to sub-sort by type of sub-topic. I am often frustrated by the willy-nilly use of special characters by users, especially to '+1' their preferred topics to the top of the list. I readily use a few established special characters for sorting non-topical categories, such as a space for index categories, # for numbers, ? for 'unidentified' (maintenance) categories, and ~ for some other types of maintenance categories. For topical non-number categories however, I do not see the attraction of using special character sorting, as it requires a few things at a minimum:
- The user must already be familiar with the sort key special character system.
- The user must parse the topic they are seeking, in order to figure out which special character they should look under.
- The system has to be consistently-enough employed that once a user has passed hurdles 1 and 2, they can have some confidence they will actually find what they are looking for.
- Currently, none of these are true for a lot of the special characters, and so I generally resist using them for topical categories, and while I think your list is well thought-out, I don't think in the end that it provides any real additional value over using the alphabetical sort system that categories are fundamentally based on.
- As for using sort-keys for normal alphabetical sorting (e.g. using sortkey 'buildings' to sort Category:Science buildings in Category:Science), that is extremely useful and I use it a lot. I do think some additional guidelines right here on COM:CAT to help users quickly grasp common practices is a good idea. Josh (talk) 19:56, 18 July 2024 (UTC)
- it is certainly a very nice scheme. the only issue in my case is what you've raised. is this perhaps going too far? as in, is it too complex for someone to understand, especially without necessarily having to read the policy? Juwan (talk) 20:21, 18 July 2024 (UTC)
- @JnpoJuwan I always try and keep accessibility front and center in my mind when considering categorization. For someone like me who's been on the project since its inception (or close enough anyway), I am able to take the time to learn and apply various elegant schemes for organization, but for especially newer or irregular users, that really isn't practical. Even as a veteran, I am routinely frustrated when I look for something and don't find it (such as buildings under b) but instead have to then figure if a) the sub even exists, and b) what special character did someone come up to sort it under. Having a standardized key list and implementing it consistently might help that for me since I spend a lot of time in categories and can learn and keep fresh that knowledge (I even kind of like the scheme), but I still don't think it helps the bulk of less-regular users just looking to sort their contributions or find images for their projects. For this reason, I think it falls down on the accessibility question. Josh (talk) 20:59, 18 July 2024 (UTC)
- it is certainly a very nice scheme. the only issue in my case is what you've raised. is this perhaps going too far? as in, is it too complex for someone to understand, especially without necessarily having to read the policy? Juwan (talk) 20:21, 18 July 2024 (UTC)
- @JnpoJuwan As far as accent marks (or their suppression) in sort keys is concerned, I have seen some discussion on whether the current sort algorithm handles accents and other diacritics as it should. It is certainly not consistent about how search handles them. I don't know if we really should be suppressing them though, and I generally don't in the few cases I've had them to worry about. My native language doesn't use diacritics (except for borrow words) so I probably don't have the best intuitive feel on which way to go on this question. Josh (talk) 20:01, 18 July 2024 (UTC)
- to give you some perspective, in my native language Portuguese at least, we tend to ignore accent marks when sorting, so an algorithm would sort it like so: aa áb ac. I haven't seen how other languages manage their sorting schema (speakers of, for example, Spanish, German, Swedish would probably want diacritics kept), I need more opinions on that side. Juwan (talk) 20:18, 18 July 2024 (UTC)
- @JnpoJuwan Thank you for that insight, it is always fascinating how different languages have such different perspectives on the world. As a mono-lingual project with a multi-lingual audience, that remains a big challenge for Commons to grapple with. Josh (talk) 21:01, 18 July 2024 (UTC)
- Spanish considers ñ a distinct letter between n and o; accented vowels are treated as if the accent weren't there, except if words are otherwise identically spelled (e.g. que and qué), in which case the unaccented one comes first. Historically they treated ch as a single latter sorting after c and ll as a single letter sorting after l, but in the last few decades that has largely disappeared.
- German normally sorts ä, ö, and ü as ae, oe, and ue; the difference is considered a typographic convention. Ditto for ß and ss.
- Romanian sorts ș after s and ț after t and considers them distinct letters. Similarly a, ă, â are considered distinct letters (in that order), and the same for i and î.
- Those are the only languages other than English where I know enough to speak confidently. Inconveniently, as far as I can tell, mediawiki doesn't readily support correctly sorting ñ, ș, or ț, nor the three non-standard Romanian vowels. - Jmabel ! talk 21:43, 18 July 2024 (UTC)
- In my native language Hungarian, ö (and ő) is sorted after o (and ó), the same goes for u/ú and ü/ű (other diacritic differences – including those between o and ó, between ö and ő etc. – count only if there’s no other difference, but di- and trigraphs have their own places – if they really di- or trigraphs, and not only those letters next to each other; a rule nearly impossible to create an algorithm for). This means that according to the Hungarian rules, Olaszliszka goes before Öcsöd – however, according the German rules, it’s just the other way round, Oecsoed being before Olaszliszka! This demonstrates that a Commons-wide default cannot fulfill all languages’ needs, so I think the only sensible default other than the current one is completely disregarding accents, i.e. treating ñ and ň the same as n; ö, ő and ô the same as o; ș, ş and š the same as s, and so on. —Tacsipacsi (talk) 22:21, 18 July 2024 (UTC)
- in short, is what {{DEFAULTSORT:}} tries to achieve is a way to bypass MediaWiki's (current) technical restrictions? Juwan (talk) 22:53, 18 July 2024 (UTC)
- @JnpoJuwan: not really. Even with {{DEFAULTSORT:}} we have to live with most of those restrictions. But (besides the issues that started this discussion about handling incommensurate subcats separately) {{DEFAULTSORT:}} lets us
- sort people "last name first" (though increasingly this happens implicitly as a side effect of Wikidata Infoboxes)
- sort numbers sanely (by default they'd sort alphabetically) so we can force a sequence 1, 2, 3, ... 9, 10, 11, ... 20, ... instead of 1, 10, 11, ... 2, 20, ... 3, ... 9
- do things like in a language where every public square is going to begin with "Plaza", sort a list of public squares by the part of the name that actually matters, so not everything is just lumped under "P"
- As noted above, some other uses are more controversial. - Jmabel ! talk 05:04, 19 July 2024 (UTC)
- sorry, I didn't specify that I mean in this context. these are all perfectly fine uses that I am aware of and have used before. Juwan (talk) 23:46, 21 July 2024 (UTC)
- @JnpoJuwan: not really. Even with {{DEFAULTSORT:}} we have to live with most of those restrictions. But (besides the issues that started this discussion about handling incommensurate subcats separately) {{DEFAULTSORT:}} lets us
- @JnpoJuwan Thank you for that insight, it is always fascinating how different languages have such different perspectives on the world. As a mono-lingual project with a multi-lingual audience, that remains a big challenge for Commons to grapple with. Josh (talk) 21:01, 18 July 2024 (UTC)
- to give you some perspective, in my native language Portuguese at least, we tend to ignore accent marks when sorting, so an algorithm would sort it like so: aa áb ac. I haven't seen how other languages manage their sorting schema (speakers of, for example, Spanish, German, Swedish would probably want diacritics kept), I need more opinions on that side. Juwan (talk) 20:18, 18 July 2024 (UTC)
Use of English varieties in category names
[edit]There was a discussion at Commons talk:Categories/Archive 4#LANGVAR in category names ?, and many users had expressed support to implement local dialectal names for categories. However, there was no consensus on the proposal by Joshbaumgartner, which would implement it. So I have modified the proposal and drafted it at User:Sbb1413/ENGVAR proposal. It is not intended to be a separate policy. Rather, it is intended to be additions and modifications of the existing policy at COM:CAT to accommodate local dialectal terms. The main changes of this proposal include the avoidance of ambiguous dialectal terms. Sbb1413 (he) (talk • contribs • uploads) 14:02, 18 July 2024 (UTC)
- Assuming we leave base topic categories like Category:Car parks rather than matching the Wikipedia article w:Parking lot should we:
- A, say that (due to requiring English) when there is no national variant we leave all sub categories at the title created and thus can't move Category:Parking lots in Austria to Category:Car parks in Austria to match Category:Car parks.
- B, say that when there is no national variant we match the base topic category so use Category:Car parks in Austria not Category:Parking lots in Austria and if Category:Parking lots in Austria is created its acceptable to move to Category:Car parks in Austria. When there is a national variant like Category:Parking lots in Vermont we use that and not Category:Car parks in Vermont.
- C, say that we always match the base category thus Category:Parking lots in Vermont should be renamed to Category:Car Parks in Vermont. Crouch, Swale (talk) 14:09, 18 July 2024 (UTC)
- I would support the option B. Obviously, if you don't have any national variants, then you should apply the first sentence of the Universality Principle (the second sentence is being altered per my proposal) and rename "parking lots" to "car parks". However, if most (or all) of the subcats are named "parking lots", the parent should be renamed to "parking lots" instead of the other way around. This is normal for topics without national dialectal terms. My proposal is aimed to topics that have different terms in different countries. Sbb1413 (he) (talk • contribs • uploads) 14:17, 18 July 2024 (UTC)
- I hate to be pedantic about it since it's sort of tangential but "parking lots" and "car parks" aren't usually the same thing. Although there is some overlap there. Generally though parking lots can be any place where you park a car regardless of size. Whereas car parks are usually much larger parking "instillations" (for lack of a better way to put it). So it's possible that the difference in categories names is just because the images in said categories are for different types of parking areas. Not necessarily differences in a local dialect or whatever. Which I think is relevant to this. As it's kind of hard to determine as a random bystander what's a local term for something versus just different type of that thing. Like personally as someone who lives in the United States I'd say we have both "car parks" and "parking lots." Again, depending on the size of the parking area and if it has on site facilities or not. It's not really a regional dialect thing though. --Adamant1 (talk) 14:43, 18 July 2024 (UTC)
- I support option B as well. Category:Controlled-access highways might be a better example where in British English we use "motorway" and in American English "Freeway" but "Controlled-access highway" is a COMMUNALITY title. Category:Petrol stations should probably be renamed Category:Filling stations per COMMUNALITY. Crouch, Swale (talk) 15:00, 18 July 2024 (UTC)
Category:Filling stations redirects to Category:Fueling stations, which includes several types of fueling stations other than the conventional fuel stations for motor vehicles (petrol/gas stations). So Category:Petrol stations will remain as it is. Since we already have the universal term Category:Controlled-access highways, it might be unnecessary to have dialectal terms for each country. So I will add another exception to my proposal. Sbb1413 (he) (talk • contribs • uploads) 15:07, 18 July 2024 (UTC)Category:Petrol stations should probably be renamed Category:Filling stations per COMMUNALITY [sic].
- Even in U.S. usage, not all controlled-access highways are freeways. There are also turnpikes (a.k.a. tollways) and parkways (more systematically landscaped; mostly in the Northeastern U.S.). (There are also the terms "throughway" and "expressway", but I don't believe there is any special meaning there, they are either freeways or turnpikes). Just to confuse things more, there are a fair number of roads with "turnpike" in their name that are no longer toll roads, so "tollway", while less common than "turnpike", might be the better choice; also, some parkways have tolls. There might even be some other terms I'm not thinking of. - Jmabel ! talk 16:55, 18 July 2024 (UTC)
- @Sbb1413 Thank you for raising this for some further discussion. This represents a potentially significant redirection in our category naming approach and as such you are completely correct to frame it as a change/addition to current Commons category policies as opposed to a new stand-alone policy. I think this is a good approach, as it necessarily requires us to consider any impacts on existing comcat and adjust them at the same time to rectify any inconsistencies that might be exposed were this merely to be put forward as an unrelated new policy. I unfortunately have some other priorities at the moment, but I want to give this due thought and provide a comprehensive input, though it may be a week or so before I can do that properly. In the meantime, I would like to get clarity on a couple of items just to understand the starting point here as accurately as possible:
- Would the intent be to have an official list of approved language variations for specific topics with due process required for additions or changes, or is each topic to simply be up to the normal wrangling among users over which term is right in their given locale?
- Would the intent be to retroactively apply this policy to existing topics, or to stick with the policy of needing more than just langvar reasons to change an existing category?
- What ultimate authority, if any, would we be relying on to determine correct langvar?
- I'm sure there will be more to follow. Josh (talk) 19:39, 18 July 2024 (UTC)
- @Joshbaumgartner My answers:
- The intent is to have a consensus-based list of accepted language variations for certain topics. The list will be inserted at the top of a topic category. For small categories, additions and changes would be done boldly, while larger categories would require agreements with some other users.
- This policy will be retroactively applied to existing topics.
- The ultimate authority to determine the correct ENGVAR is of course consensus.
- Sbb1413 (he) (talk • contribs • uploads) 11:27, 19 July 2024 (UTC)
- I have created a rough draft of a templated list of consensus-based English dialectal terms at User:Sbb1413/ENGVAR template. This can be a standalone template or a part of {{Topic by country}} and {{Country category}} templates. China and Russia are included for their own English terms for "astronaut". Sbb1413 (he) (talk • contribs • uploads) 12:03, 19 July 2024 (UTC)
- @Sbb1413 Thanks, those are more or less along the lines of what I would have guessed. I think the right way to do the {{Topic by country}}, etc. templates would be to build in a langvar switch of some sort, ideally without requiring manual activation, with the approved variations added into the data templates. However, I wouldn't really worry too much about exactly how to do the templates at this stage. I would simply remark that converting templates to support this will be its own effort for a team to develop and implement once the new policy is enacted. This discussion should focus on what the right policy is and getting the language (no pun) correct for COM:CAT. Templates and other tools will have to follow suit. Josh (talk) 12:43, 19 July 2024 (UTC)
- One of the things that strikes me as I think about this direction, is that all of the logic for saying that Australia categories should be given Australian English names instead of the universal English topic name, or Canada, the US, etc., is the same logic that would say France should use French or Mexico should have Mexican Spanish variations of a topic name. Obviously, we are starting this discussion limited to English variants, but realistically, I'm not sure that is anything more than a purely arbitrary line we are drawing. Just a thought. Josh (talk) 12:52, 19 July 2024 (UTC)
- @Joshbaumgartner We can use the same English term for all countries if it is used by all the major dialects. If such term doesn't exist, some countries will use their own regional terms while others will use the "universal English topic name". Since our core naming policy is to use English in category names as much as possible, we obviously shouldn't extend this proposal to other languages. Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary. Sbb1413 (he) (talk • contribs • uploads) 13:07, 19 July 2024 (UTC)
- Well our core policy is also to apply the Universality Principle, and this proposal is considering upending that. In fact, from a legal theory perspective, I would posit that the universality principle is the primary supporting principle of the English-only naming policy, or at least it is the key reflection of the policy's intent, i.e. if the UP goes (or is neutered), then what real basis is there for the English-only policy? I totally understand that we don't necessarily want to extend this proposal to other languages, as that would be a much bigger fish to fry and may bring in some vocal opposition, but I'm not sure if maintaining English-only is still tenable if the UP is eroded. I don't think this is a point for or against this effort, just a consideration of potential future ramifications. Also, this is a fundamental policy change being considered, so I would not consider anything obvious. As for variations in other languages, I know Spanish spoken in Mexico has plenty of variation vis-a-vis Spain or even other parts of Latin America. That is my only personal practical experience with a non-English language, so I'm 2-for-2 so far on language variations being a thing. However, even if a language is completely homogenous across its usage, it doesn't change the point that localization is localization, whether we are talking variants or entirely different languages. Josh (talk) 13:32, 19 July 2024 (UTC)
- @Joshbaumgartner We can use the same English term for all countries if it is used by all the major dialects. If such term doesn't exist, some countries will use their own regional terms while others will use the "universal English topic name". Since our core naming policy is to use English in category names as much as possible, we obviously shouldn't extend this proposal to other languages. Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary. Sbb1413 (he) (talk • contribs • uploads) 13:07, 19 July 2024 (UTC)
- @Joshbaumgartner My answers:
Oppose Can we have some compassion with people on Commons who do not have English as their native language? It is OK to have English as the main language for categories, I agree with that. But let's not allow all kind of local variations of English within a category string.
- Not only for myself (Dutch, wich is a language in the same (German) language family as English, and so relatively close to English), but I also think of all people at the other end of the language spectrum, who are even used to another script than Latin (a lot of Asian languages + Cyrillic + Greece and I might forget some).
- I do not only search along the line of a category string, but also via the search box. If I know that the main category is "Gray", I expect that I can search for Gray houses in Norfolk, England as well, and not have to know that is then "Grey". And if the main category is "Flower shops", I do not want to have to search with "Florist shops" for Leeds.
Let's keep it simple, and let everybody has to adjust; in comparison with non native English speakers to me it looks a little sacrifice for the people who are used to local vatiants of English. --JopkeB (talk) 10:19, 22 July 2024 (UTC)
- @JopkeB, this is one of my primary concerns. Requiring that users have some ability to work in English (possibly through 3rd party software) seems at the moment unavoidable given the technical limitations of the basic software. However, this is a multi-lingual project, and conceivably should permit full utility for users regardless of their particular language skills. Universality seems to be the best solution for this that I am aware of since at least once a term or phrase is known, it can be relied on going forward, per your 'gray/grey' example. As a native English speaker (are are probably the plurality of Commons users), I have no problem at all reading 'flower shops' and 'florist shops' or 'gasoline stations' and 'filling stations' and instantly understanding what is being said, but even then, I share the frustration of having to try several searches for a sub-category, even when I know the parent category name, not sure whether the sub-category just doesn't exist or if I should try another different term or format. I can only imagine that frustration is multiplied for those without native or even substantial English knowledge. Any proposal to deviate from Universality would need to clearly show that it will not add any additional hurdles for non-English users (the majority of the planet) for me to support it. Josh (talk) 20:24, 22 July 2024 (UTC)
- Oppose Per JopkeB and Joshbaumgartner. You can get into some pretty sketchy and pedantic territory when you allow for local variations. Even with English, which would seem to only have a few, but actually have quit a lot. See California English for one example outside of British and American English. There's also Scottish English, Irish English, Etc. Etc. We shouldn't have to figure out which variation to go with every time we want to search for a category per the Universality Prenciple and just because it would be a major pain the ass. It makes sense to use American English as the standard though because it's most widely used form of English in the world. No offense to British people, but British English is only spoken in Britain and a few former colonies and there's no reason to over turn the University Principle (which as Josh points out is at the core of Commons policy) just to adopt a niche form of English. That doesn't even get into variations of other languages either. --Adamant1 (talk) 03:03, 27 July 2024 (UTC)
- The mention of "California English" is really an overstatement. The linked article is mainly about regional pronunciation variants in three different regions of California. About the only thing here that would be different in writing is that Californians commonly use "the" with a highway route number ("the 5" instead of "Interstate 5" or "I-5"). - Jmabel ! talk 17:02, 27 July 2024 (UTC)
- Take it or leave it. It was just the first example that came to mind. Of course I'm not a linguist. Nor am I saying the idea of local variations should be rejected purely (or even at all) because of California English. Or really anything else. I'm just pointing out that the idea that there is only two variations of English in existence, American and British, is overly simplistic. --Adamant1 (talk) 00:16, 28 July 2024 (UTC)
- Agree: "We should have to figure out which variation to go with every time ...".
- Not agree: "use American English as the standard", because British English is NOT only spoken in Britain; we in the Netherlands learn British English at school and I guess there are other countries where British English is teached. So it is not only the countries where English is the main language you have to take into account. JopkeB (talk) 04:09, 27 July 2024 (UTC)
- Likewise I do not agree with adopting any specific variant of English as standard, but instead I think the status quo is correct. So long as the term is mutually intelligible and accurately conveys the topic, it should not be changed just to align with one or the other variant. Whether it is gray or grey, any English speaker can easily understand what is being covered, even if they may smirk at what they consider a misspelling, so no real reason to go with one over the other, so long as what we go with is consistently used throughout Commons. I can only imagine the howls of protest if we start trying to rename everything to match UK, US, etc. spelling/vocabulary--probably only slightly less than the ruckus that will happen if every locality is fair game to argue about what local term is most common there. Josh (talk) 05:07, 27 July 2024 (UTC)
- I'm not neccessarily advocting for adopting any specific variant of English as a standard. But its juat a fact that American English is the standard due to how widely its used compared to other ones and universality. I'm sure there's edge cases where that's not the case though, which is fine. I don't think its something can realistically enforce one or another. Although adopting this clearly wouldn't be beneficial even if people are already using local variations without it. --Adamant1 (talk) 06:19, 27 July 2024 (UTC)
- @Adamant1: Can you prove that "its just a fact that American English is the standard ..."? How is it measured? JopkeB (talk) 04:53, 28 July 2024 (UTC)
- @JopkeB: I mentioned it below this, but from what I remember when I looked into it a year or two ago the top countries that contributors come from on here are the United States and Germany. German's usually (if not exclusively) speak American English. Regardless, it largely depends on the numbers from there, but if people from the United States and Germany make up the largest amount of total English speaking users on here then American English is just naturally going to be the standard. That's assuming my memory is correct and that nothing has changed about the user base since then. --Adamant1 (talk) 05:13, 28 July 2024 (UTC)
- @Adamant1: Can you prove that "its just a fact that American English is the standard ..."? How is it measured? JopkeB (talk) 04:53, 28 July 2024 (UTC)
- I'm not neccessarily advocting for adopting any specific variant of English as a standard. But its juat a fact that American English is the standard due to how widely its used compared to other ones and universality. I'm sure there's edge cases where that's not the case though, which is fine. I don't think its something can realistically enforce one or another. Although adopting this clearly wouldn't be beneficial even if people are already using local variations without it. --Adamant1 (talk) 06:19, 27 July 2024 (UTC)
- Likewise I do not agree with adopting any specific variant of English as standard, but instead I think the status quo is correct. So long as the term is mutually intelligible and accurately conveys the topic, it should not be changed just to align with one or the other variant. Whether it is gray or grey, any English speaker can easily understand what is being covered, even if they may smirk at what they consider a misspelling, so no real reason to go with one over the other, so long as what we go with is consistently used throughout Commons. I can only imagine the howls of protest if we start trying to rename everything to match UK, US, etc. spelling/vocabulary--probably only slightly less than the ruckus that will happen if every locality is fair game to argue about what local term is most common there. Josh (talk) 05:07, 27 July 2024 (UTC)
- Option B since some level of consistency is better than none. Laurel Lodged (talk) 07:36, 27 July 2024 (UTC)
- Option B Best of these options. One could at a later point think about machine translated category titles where the things are mostly addressed that way which is when option A & C become more reasonable. --Prototyperspective (talk) 10:13, 27 July 2024 (UTC)
- Option A because the discussion of "national variants" doesn't help the universality. Living in Europe and not being a native English speaker, I learned a British English and found this in most contacts on the continent. And if sbb1413 says that "Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary" it only shows a lack of knowledge about other languages. I speak a German dialect that is not normally written but even our written language has so many words that are different from the words used in Germany that misunderstandings are possible. Unless a really universal English can be defined, it is better to accept "regional varieties" and help users with good category trees. I would be happy to find the gray houses in the grey houses main category, rather than knowing I must search for gray (or v/v). We can only continue peacefully if the existing varieties of English are accepted and not one be declared universal. And I clearly contradict the above statement " British English is only spoken in Britain and a few former colonies". Please accept variety. It will help universality more than putting regional English varieties down. And watch the category trees!-- Gürbetaler (talk) 22:00, 27 July 2024 (UTC)
- British English can be extended to a few other places besides former Crown Colonies. It still doesn't disprove my point that American English has vastly more usage then British English globally. Not like it matters though since know where have I said I think we should former with the later, or visa versa. Both can coexist depending on the situation perfectly fine. Although there's still the practical issue of universality which that is inconsistant with regardless. --Adamant1 (talk) 00:03, 28 July 2024 (UTC)
- @Adamant1: do you have any basis for the claim that American English is far more widely used globally? Not particularly what I have observed. For example, educated people from India who speak English speak an English far more related to British English than American English, and I'd guess right there we are talking about a population in the hundred-million range. It is in some ways distinct, but in every way that British and American English differ from one another, it is more like the British. - Jmabel ! talk 01:38, 28 July 2024 (UTC)
- @Jmabel: I don't remember the exact numbers but speaking purely in relation to Commons I think the top two countries contributors come from is Germany and the United States and I'm pretty sure they mainly (if not exclusively) speak American English in Germany. I don't remember where India is on that list, but it really doesn't matter how many people from India speak British English if they only make up like 5% of contributors to begin with. Outside of that this website has a map at the top showing the usage of both globally. Assuming it's accurate sure British English takes up more land mass globally but so what? Cool that British English is more popular in Siberia by land mass. Those people are either not on Commons to begin with or mainly (if not exclusively) create categories in Russian. I'd say look at what variation of English the people from like the top 5 countries (or top 1 or 2 depending on the numbers) for editors is on here and go with that. Otherwise I think your just losing the point in universality. --Adamant1 (talk) 02:27, 28 July 2024 (UTC)
I don't remember where India is on that list, but it really doesn't matter how many people from India speak British English if they only make up like 5% of contributors to begin with.
- @Adamant1 Indians are 5% if you consider all the contributors. However, if you consider only the anglophone contributors, the percentage should be at double digits. Besides contributors, there are also significant viewers from India who watch our images via Wikipedia.
Assuming it's accurate sure British English takes up more land mass globally but so what? Cool that British English is more popular in Siberia by land mass. Those people are either not on Commons to begin with or mainly (if not exclusively) create categories in Russian.
- I don't believe in this prejudice since I strongly believe in the "all men are created equal" principle. Besides, Serbians have their own language, which use both Latin and Cyrillic scripts. Sbb1413 (he) (talk • contribs • uploads) 05:23, 28 July 2024 (UTC)
- there are also significant viewers from India who watch our images via Wikipedia. @Sbb1413: Sure, but from what I understand most viewers of media on Commons don't do it by way of categories. Let alone do they regularly interact with them in any meaningful way. So viewers don't really matter here.
- @Jmabel: I don't remember the exact numbers but speaking purely in relation to Commons I think the top two countries contributors come from is Germany and the United States and I'm pretty sure they mainly (if not exclusively) speak American English in Germany. I don't remember where India is on that list, but it really doesn't matter how many people from India speak British English if they only make up like 5% of contributors to begin with. Outside of that this website has a map at the top showing the usage of both globally. Assuming it's accurate sure British English takes up more land mass globally but so what? Cool that British English is more popular in Siberia by land mass. Those people are either not on Commons to begin with or mainly (if not exclusively) create categories in Russian. I'd say look at what variation of English the people from like the top 5 countries (or top 1 or 2 depending on the numbers) for editors is on here and go with that. Otherwise I think your just losing the point in universality. --Adamant1 (talk) 02:27, 28 July 2024 (UTC)
- @Adamant1: do you have any basis for the claim that American English is far more widely used globally? Not particularly what I have observed. For example, educated people from India who speak English speak an English far more related to British English than American English, and I'd guess right there we are talking about a population in the hundred-million range. It is in some ways distinct, but in every way that British and American English differ from one another, it is more like the British. - Jmabel ! talk 01:38, 28 July 2024 (UTC)
- British English can be extended to a few other places besides former Crown Colonies. It still doesn't disprove my point that American English has vastly more usage then British English globally. Not like it matters though since know where have I said I think we should former with the later, or visa versa. Both can coexist depending on the situation perfectly fine. Although there's still the practical issue of universality which that is inconsistant with regardless. --Adamant1 (talk) 00:03, 28 July 2024 (UTC)
- I don't believe in this prejudice since I strongly believe in the "all men are created equal" I believe "all men are created equal" to, but not all men speak the same language and that's what we're talking about here. Not the inherent value of humans or whatever. To that end this Wikipedia article says that only 5% of people in Russia speak English to begin with. Change that to Siberia and the number of English speakers there is essentially non-exiting. The point being, you could take that map I linked to say British English must be the popular variation purely based on landmass, but then large areas of that land have almost no English speakers to begin with anyway. Like sure China is huge landmass and population wise, but conversely less then 1% of the population there speaks english to begin with. So the fact that British English is more popular then American in China is statistically and (more importantly) practically meaningless. --Adamant1 (talk) 05:38, 28 July 2024 (UTC)
- @Adamant1: Do I understand you well: to decide what kind of English should be the standard, you only look at the contributors to Commons? Not to the whole world?
- If I understand the Commons scope well, the category structure we make is not only for contributors, but for everybody who wants to find media with educational content, like Wikipedians and people contributing to other Wikimedia projects, scholars, school children and business people (and many more) looking for illustrations for their papers and presentations, from all over the world. So our target group is much broader than just Commons contributors. And I think we should include and consider the whole world to decide what kind of English should be the standard on Commons.
- Categories do matter to viewers, because I think the search engine includes category names in search results (the Commons search engine as well as Google Image). And Wikidata is to make sure that spelling variants are included. And if not: I hope they do or going to do so.
- But I wonder whether we should want to have one variant of English as the standard at all on Commons. So: why is this part of the discussion even necessary? I like the current situation better in which we choose one variant per concept and use that variant throughout the category structure.
- JopkeB (talk) 05:50, 28 July 2024 (UTC)
- I think that's essentially my position. Like I said in the comment your responding to, less then 1% of people in China speaks English to begin with. Realistically the amount of people who view media from Commons is pretty low. Then if you consider people from China who use or interact with categories on a regular basis it's even lower then that. So I have no problem with considering the whole world to decide what kind of English should be the standard on Commons. That's literally what I'm doing here and the fact is that if you look at how many people actually speak English to begin with in places where British English is the predominant language it's to small to be meaningful.
- That doesn't even account the percentage of computer users versus cell phone users globally either. 66% of the world's population has access to the internet, but then only 40% of that percentage use a computer and most people don't use or interact with folders on mobile. So I think we should be realistic about what we're actually talking about here. Instead of just acting like we should base this purely on inclusivity or whatever. There's nothing wrong with that, but there's no point in being inclusive when it comes to groups of people that clearly don't even exist on here in the first place. --Adamant1 (talk) 06:06, 28 July 2024 (UTC)
- @Jmabel and Adamant1: Yes, India (largely) follows British English due to its colonial legacy. On one hand, I have seen some news media imposing British spellings in many organization names (like "Johnson Space Centre", "World Health Organisation"). On the other hand, I have seen some media using -ize and mdy format. The mdy format was introduced by the British, who formerly used it. The -ize spelling is either an instance of Americanization or is based on the OED spelling conventions. I generally use British English with OED and IUPAC spelling conventions except when naming Indian categories, where I would use hardline British spellings. Sbb1413 (he) (talk • contribs • uploads) 04:48, 28 July 2024 (UTC)
- @Sbb1413: Interesting. Am I correct to assume it largely depends on the situation? Like I assume someone who works at a call center for a company in the United States would probably prefer American English. Otherwise they would use British. Or am I wrong about that? --Adamant1 (talk) 05:07, 28 July 2024 (UTC)
- JopkeB, I don't see this as such a big problem. We have not had much of a problem thus far with the setup of new categories. Subcategories can follow their parent or siblings as a precedent, those setting up specific new categories with a language distinction are likely to understand that. Or, we can fix anything that comes up later on. Explanation of categories can be done with header text and that easily supports multiple languages.
- Where we have a problem is with renames. Particularly those deliberately enforcing language changes 'for consistency'. If we make it clear that we just don't do that then we should be good. This is especially the case for non-natives changing another language. If you don't have the knowledge, then don't edit it. Andy Dingley (talk) 20:32, 3 August 2024 (UTC)
New proposal
[edit]Question How do we go on with this discussion? How can we close it? Because I see:
- there is no consensus about the initial proposal by Sbb1413 with three/four options (including variant D = leave it as it is now, no alterations or/and additions to the Universality Principle ).
- someone started a discussion about one standard variant of English on Commons, without even posing the question whether we want such a thing; and there is no consensus about one pervailing variant of English either.
--JopkeB (talk) 06:30, 28 July 2024 (UTC)
- @JopkeB It is a terrible decision, but there are several pros and cons of using different English variants:
- Consistent with local usage.
- Consistent with English Wikipedia category names.
- Inconsistent with the spirit of the Universality Principle .
- Problems identifying which variant is used in countries whose official/primary language is not English.
- Problems with navigation templates like {{Countries of the Americas}} and {{Topic by country}}.
- Similarly, there are several pros and cons of using a single English variant for a specific topic:
- Consistent with the spirit of the Universality Principle .
- No problems with navigation templates like {{Countries of the Americas}} and {{Topic by country}}.
- No problems identifying which variant is used in countries whose official/primary language is not English.
- Inconsistent with local usage.
- Inconsistent with English Wikipedia category names.
- This is why my proposal is to use English variants for countries whose official/primary language is English. Other countries will follow the English variant used in the main topic category. Note that the Commonwealth of Nations (excluding some members who use different variants) and the European Union are treated as single countries for the purpose of this proposal. Both organizations officially use British English, so there are possibilities that the members (except some Commonwealth members) would follow that variant. The Netherlands is a member of the EU, so it can freely use the British variant. Sbb1413 (he) (talk • contribs • uploads) 03:53, 2 August 2024 (UTC)
- In fact, using multiple variant categories for one topic and single variant categories for another is a huge mess. In light of technical reasons, we can use a single English variant throughout a topic, with other variants as descriptions. For example, the description of Category:Organizations of the United Kingdom would be read as:
- British English: Organisations of the United Kingdom
- This is much better than renaming category names in consistent with the local English variant. Sbb1413 (he) (talk • contribs • uploads) 04:06, 2 August 2024 (UTC)
- Also, in the domain of space exploration, the description of Category:Astronauts from Russia would be read as:
- English: Russian cosmonauts
- Similarly, Category:Astronauts from China would be read as:
- English: Chinese astronauts, also known as taikonauts.
- --Sbb1413 (he) (talk • contribs • uploads) 04:16, 2 August 2024 (UTC)
- Thanks for the overview. I would like to add to "pros and cons of using a single English variant for a specific topic":
- Not only "No problems with navigation templates" but "No problems with navigation in general", for all users, no matter what English variant is preferred by the searcher, just because of the consistency throughout a topic.
- Support I think your latest addition might be the key for a solution: using a single English variant throughout a topic, with other variants as descriptions.
- --JopkeB (talk) 04:28, 2 August 2024 (UTC)
- @Crouch, Swale, Laurel Lodged, Prototyperspective, and Gürbetaler: Those who had supported my proposal, what do you think of my newer proposal of using single English variant throughout a topic, with other variants as descriptions? Sbb1413 (he) (talk • contribs • uploads) 05:02, 2 August 2024 (UTC)
- More examples of ENGVAR descriptions are found at Category:Aluminium, Category:Apartment buildings, Category:Caesium, Category:Eggplant, Category:Sulfur etc. Sbb1413 (he) (talk • contribs • uploads) 05:50, 2 August 2024 (UTC)
- I think it's also fine. The issue that there need to be machine translated category titles depending on the set uselang. For such short phrases the latest MT tech is nearly always accurate even for smaller languages and one could try this first with some of the best-working languages like Spanish. This may also allow many more people to find the WMC pages if the search the Web in their own language and indexing works well. Prototyperspective (talk) 10:43, 2 August 2024 (UTC)
The issue that there need to be machine translated category titles depending on the set uselang. For such short phrases the latest MT tech is nearly always accurate even for smaller languages and one could try this first with some of the best-working languages like Spanish. This may also allow many more people to find the WMC pages if the search the Web in their own language and indexing works well.
- I don't understand how it is related to the ENGVAR issue in Commons. A machine translator can translate any English text to another language, regardless of variant. Sbb1413 (he) (talk • contribs • uploads) 11:31, 2 August 2024 (UTC)
- I was just saying we're discussing the wrong thing basically, it doesn't matter which variant is used or if there is a standard naming because it can be easily machine translated. I support your proposal partly because the variant indeed doesn't matter and that's also why I don't care much about which of your proposals is implemented, it would be good to standardize/specify it and I think this translatability needs to be kept in mind when thinking about which option would be best and can also solve potential Cons like "people looking for the term in their own language may not find the category as well if another language variant is used". Prototyperspective (talk) 11:46, 2 August 2024 (UTC)
- I've put this here, it's a bit tangential to this discussion: meta:Community Wishlist/Wishes/Add machine translated category titles on WMC. Prototyperspective (talk) 17:06, 4 August 2024 (UTC)
- I was just saying we're discussing the wrong thing basically, it doesn't matter which variant is used or if there is a standard naming because it can be easily machine translated. I support your proposal partly because the variant indeed doesn't matter and that's also why I don't care much about which of your proposals is implemented, it would be good to standardize/specify it and I think this translatability needs to be kept in mind when thinking about which option would be best and can also solve potential Cons like "people looking for the term in their own language may not find the category as well if another language variant is used". Prototyperspective (talk) 11:46, 2 August 2024 (UTC)
- I think it's also fine. The issue that there need to be machine translated category titles depending on the set uselang. For such short phrases the latest MT tech is nearly always accurate even for smaller languages and one could try this first with some of the best-working languages like Spanish. This may also allow many more people to find the WMC pages if the search the Web in their own language and indexing works well. Prototyperspective (talk) 10:43, 2 August 2024 (UTC)
- Thanks for the overview. I would like to add to "pros and cons of using a single English variant for a specific topic":
- This proposal seems to suggest the (apparently) current rule of the UP (which has been disputed). I can see the pros and cons but I'd slightly support following national variants. Crouch, Swale (talk) 14:27, 2 August 2024 (UTC)
- @Crouch, Swale As per my new proposal, the "topic by country" category names won't use the corresponding national variant, but the English descriptions may use national variants. Descriptions are easier to deal with compared to category names. Plus we don't have to add extra lines of code in navigation templates to deal with national variants. Currently, English Wikipedia don't use any such navigation templates, so it can happily follow ENGVAR in category names. So there will be no changes in the Universality Principle , apart from the text itself being modified to follow a consistent spelling norm. Sbb1413 (he) (talk • contribs • uploads) 14:46, 2 August 2024 (UTC)
- Thanks, I understand now but I still favour the national variants proposal. Crouch, Swale (talk) 14:47, 2 August 2024 (UTC)
- I understand your sentiments in favour of national variants. While the category names will follow a consistent variant throughout the topic, you can still add descriptions using national variants. There are already templates that can give descriptions in different national variants:
- Sbb1413 (he) (talk • contribs • uploads) 14:54, 2 August 2024 (UTC)
- I support User:Crouch, Swale. Gürbetaler (talk) 19:27, 3 August 2024 (UTC)
- Thanks, I understand now but I still favour the national variants proposal. Crouch, Swale (talk) 14:47, 2 August 2024 (UTC)
- @Crouch, Swale As per my new proposal, the "topic by country" category names won't use the corresponding national variant, but the English descriptions may use national variants. Descriptions are easier to deal with compared to category names. Plus we don't have to add extra lines of code in navigation templates to deal with national variants. Currently, English Wikipedia don't use any such navigation templates, so it can happily follow ENGVAR in category names. So there will be no changes in the Universality Principle , apart from the text itself being modified to follow a consistent spelling norm. Sbb1413 (he) (talk • contribs • uploads) 14:46, 2 August 2024 (UTC)
- An inherent problem with the "Universality principle" is that it seems to assume that there is some sort of "standard" terminology used everywhere. There is not. Whatever variety of English is used, there are occasional terms that are different in different countries or regions, to the point that if we try to impose some version globally it can be a term that is not only not used but is completely unfamiliar to millions of people. Uniformity is a worthy goal as a general rule, but refusing to acknowledge occasional exceptions can create problems greater than any potential superficial appearance of benefit. -- Infrogmation of New Orleans (talk) 21:33, 3 August 2024 (UTC)
- "Standard terminology" has at least two meanings:
- Terminology used as a standard within a (geographic or cultural) community. Example: spelling of English words in different parts of the world.
- Terminology used as a standard within a system. In a system it does not matter how words are spelled, as long as the same concept is spelled the same way throughout that system. You can even give them codes instead of real words (like in Wikidata), as long as you add good descriptions in languages that are understandable by people. For me the category structure in Commons is such a system. And though I was not involved at all in the creating process of the principles, I think the Universality Principle was established for this reason. In Commons this principle is at least necessary to have "No problems with navigation". but also for communicating with other systems and for technical solutions (like templates), current and future ones. I guess this is the reason why Sbb1413 proposed to (In light of technical reasons, we can) "use a single English variant throughout a topic, with other variants as descriptions", which I support.
- That "a term that is not only not used but is completely unfamiliar to millions of people" is not relevant in a system, it is only relevant within a community. I (not a native English speaker) consider myself as one of those millions people (should perhaps be "billions") and I do not mind having to look up English terms used in Commons on a daily basis. I do not see why it is a problem for native English speakers to look up a word only now and then. JopkeB (talk) 05:30, 4 August 2024 (UTC)
- Yes, organization on Commons has become massively more US/England English centric than I thought the intention was in the early months of the project. Even the use of standard Latin names for plants and animals, one of the few non-English standards initially agreed to at the start, is widely disregarded by native English speakers refusing to look up a word only now and then. -- 17:23, 5 August 2024 (UTC) — Preceding unsigned comment added by Infrogmation (talk • contribs) (UTC)
- "Standard terminology" has at least two meanings: