dbt-selly/dbt-env/lib/python3.8/site-packages/charset_normalizer-2.0.12.d.../METADATA

Metadata-Version: 2.1
Name: charset-normalizer
Version: 2.0.12
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
Home-page: https://github.com/ousret/charset_normalizer
Author: Ahmed TAHRI @Ousret
Author-email: ahmed.tahri@cloudnursery.dev
License: MIT
Project-URL: Bug Reports, https://github.com/Ousret/charset_normalizer/issues
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/en/latest
Keywords: encoding,i18n,txt,text,charset,charset-detector,normalization,unicode,chardet
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Utilities
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Typing :: Typed
Requires-Python: >=3.5.0
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: unicode_backport
Requires-Dist: unicodedata2 ; extra == 'unicode_backport'


<h1 align="center">Charset Detection, for Everyone 👋 <a href="https://twitter.com/intent/tweet?text=The%20Real%20First%20Universal%20Charset%20%26%20Language%20Detector&url=https://www.github.com/Ousret/charset_normalizer&hashtags=python,encoding,chardet,developers"><img src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social"/></a></h1>

<p align="center">
  <sup>The Real First Universal Charset Detector</sup><br>
  <a href="https://pypi.org/project/charset-normalizer">
    <img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
  </a>
  <a href="https://codecov.io/gh/Ousret/charset_normalizer">
      <img src="https://codecov.io/gh/Ousret/charset_normalizer/branch/master/graph/badge.svg" />
  </a>
  <a href="https://pepy.tech/project/charset-normalizer/">
    <img alt="Download Count Total" src="https://pepy.tech/badge/charset-normalizer/month" />
  </a>
</p>

> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
> I'm trying to resolve the issue by taking a new approach.
> All IANA character set names for which the Python core library provides codecs are supported.

<p align="center">
  >>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
</p>

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.

| Feature       | [Chardet](https://github.com/chardet/chardet)       | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
| ------------- | :-------------: | :------------------: | :------------------: |
| `Fast`         | ❌<br>          | ✅<br>             | ✅ <br> |
| `Universal**`     | ❌            | ✅                 | ❌ |
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
| `Free & Open`  | ✅             | ✅                | ✅ |
| `License` | LGPL-2.1 | MIT | MPL-1.1
| `Native Python` | ✅ | ✅ | ❌ |
| `Detect spoken language` | ❌ | ✅ | N/A |
| `Supported Encoding` | 30 | :tada: [93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)  | 40

<p align="center">
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>

*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br> 
Did you got there because of the logs? See [https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html](https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html)

## ⭐ Your support

*Fork, test-it, star-it, submit your ideas! We do listen.*
  
## ⚡ Performance

This package offer better performance than its counterpart Chardet. Here are some numbers.

| Package       | Accuracy       | Mean per file (ms) | File per sec (est) |
| ------------- | :-------------: | :------------------: | :------------------: |
|      [chardet](https://github.com/chardet/chardet)        |     92 %     |     220 ms      |       5 file/sec        |
| charset-normalizer |    **98 %**     |     **40 ms**      |       25 file/sec    |

| Package       | 99th percentile       | 95th percentile | 50th percentile |
| ------------- | :-------------: | :------------------: | :------------------: |
|      [chardet](https://github.com/chardet/chardet)        |     1115 ms     |     300 ms      |       27 ms        |
| charset-normalizer |    460 ms     |     240 ms      |       18 ms    |

Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.

> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
> And yes, these results might change at any time. The dataset can be updated to include more files.
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.

[cchardet](https://github.com/PyYoshi/cChardet) is a non-native (cpp binding) and unmaintained faster alternative with 
a better accuracy than chardet but lower than this package. If speed is the most important factor, you should try it.

## ✨ Installation

Using PyPi for latest stable
```sh
pip install charset-normalizer -U
```

If you want a more up-to-date `unicodedata` than the one available in your Python setup.
```sh
pip install charset-normalizer[unicode_backport] -U
```

## 🚀 Basic Usage

### CLI
This package comes with a CLI.

```
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
                  file [file ...]

The Real First Universal Charset Detector. Discover originating encoding used
on text file. Normalize text to unicode.

positional arguments:
  files                 File(s) to be analysed

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display complementary information about file if any.
                        Stdout will contain logs about the detection process.
  -a, --with-alternative
                        Output complementary possibilities if any. Top-level
                        JSON WILL be a list.
  -n, --normalize       Permit to normalize input file. If not set, program
                        does not write anything.
  -m, --minimal         Only output the charset detected to STDOUT. Disabling
                        JSON output.
  -r, --replace         Replace file when trying to normalize it instead of
                        creating a new one.
  -f, --force           Replace file without asking if you are sure, use this
                        flag with caution.
  -t THRESHOLD, --threshold THRESHOLD
                        Define a custom maximum amount of chaos allowed in
                        decoded content. 0. <= chaos <= 1.
  --version             Show version information and exit.
```

```bash
normalizer ./data/sample.1.fr.srt
```

:tada: Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.

```json
{
    "path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
    "encoding": "cp1252",
    "encoding_aliases": [
        "1252",
        "windows_1252"
    ],
    "alternative_encodings": [
        "cp1254",
        "cp1256",
        "cp1258",
        "iso8859_14",
        "iso8859_15",
        "iso8859_16",
        "iso8859_3",
        "iso8859_9",
        "latin_1",
        "mbcs"
    ],
    "language": "French",
    "alphabets": [
        "Basic Latin",
        "Latin-1 Supplement"
    ],
    "has_sig_or_bom": false,
    "chaos": 0.149,
    "coherence": 97.152,
    "unicode_path": null,
    "is_preferred": true
}
```

### Python
*Just print out normalized text*
```python
from charset_normalizer import from_path

results = from_path('./my_subtitle.srt')

print(str(results.best()))
```

*Normalize any text file*
```python
from charset_normalizer import normalize
try:
    normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt
except IOError as e:
    print('Sadly, we are unable to perform charset normalization.', str(e))
```

*Upgrade your code without effort*
```python
from charset_normalizer import detect
```

The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.

See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)

## 😇 Why

When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
reliable alternative using a completely different method. Also! I never back down on a good challenge!

I **don't care** about the **originating charset** encoding, because **two different tables** can
produce **two identical rendered string.**
What I want is to get readable text, the best I can. 

In a way, **I'm brute forcing text decoding.** How cool is that ? 😎

Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.

## 🍰 How

  - Discard all charset encoding table that could not fit the binary content.
  - Measure chaos, or the mess once opened (by chunks) with a corresponding charset encoding.
  - Extract matches with the lowest mess detected.
  - Additionally, we measure coherence / probe for a language.

**Wait a minute**, what is chaos/mess and coherence according to **YOU ?**

*Chaos :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
**I established** some ground rules about **what is obvious** when **it seems like** a mess.
 I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to
 improve or rewrite it.

*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.

## ⚡ Known limitations

  - Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
  - Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.

## 👤 Contributing

Contributions, issues and feature requests are very much welcome.<br />
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.

## 📝 License

Copyright © 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.

Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
fix order deliveries 2022-03-22 15:13:27 +00:00			`Metadata-Version: 2.1`
			`Name: charset-normalizer`
			`Version: 2.0.12`
			`Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.`
			`Home-page: https://github.com/ousret/charset_normalizer`
			`Author: Ahmed TAHRI @Ousret`
			`Author-email: ahmed.tahri@cloudnursery.dev`
			`License: MIT`
			`Project-URL: Bug Reports, https://github.com/Ousret/charset_normalizer/issues`
			`Project-URL: Documentation, https://charset-normalizer.readthedocs.io/en/latest`
			`Keywords: encoding,i18n,txt,text,charset,charset-detector,normalization,unicode,chardet`
			`Platform: UNKNOWN`
			`Classifier: License :: OSI Approved :: MIT License`
			`Classifier: Intended Audience :: Developers`
			`Classifier: Topic :: Software Development :: Libraries :: Python Modules`
			`Classifier: Operating System :: OS Independent`
			`Classifier: Programming Language :: Python`
			`Classifier: Programming Language :: Python :: 3`
			`Classifier: Programming Language :: Python :: 3.5`
			`Classifier: Programming Language :: Python :: 3.6`
			`Classifier: Programming Language :: Python :: 3.7`
			`Classifier: Programming Language :: Python :: 3.8`
			`Classifier: Programming Language :: Python :: 3.9`
			`Classifier: Programming Language :: Python :: 3.10`
			`Classifier: Programming Language :: Python :: 3.11`
			`Classifier: Topic :: Text Processing :: Linguistic`
			`Classifier: Topic :: Utilities`
			`Classifier: Programming Language :: Python :: Implementation :: PyPy`
			`Classifier: Typing :: Typed`
			`Requires-Python: >=3.5.0`
			`Description-Content-Type: text/markdown`
			`License-File: LICENSE`
			`Provides-Extra: unicode_backport`
			`Requires-Dist: unicodedata2 ; extra == 'unicode_backport'`


			`<h1 align="center">Charset Detection, for Everyone 👋 <a href="https://twitter.com/intent/tweet?text=The%20Real%20First%20Universal%20Charset%20%26%20Language%20Detector&url=https://www.github.com/Ousret/charset_normalizer&hashtags=python,encoding,chardet,developers"><img src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social"/></a></h1>`

			`<p align="center">`
			`<sup>The Real First Universal Charset Detector</sup><br>`
			`<a href="https://pypi.org/project/charset-normalizer">`
			`<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />`
			`</a>`
			`<a href="https://codecov.io/gh/Ousret/charset_normalizer">`
			`<img src="https://codecov.io/gh/Ousret/charset_normalizer/branch/master/graph/badge.svg" />`
			`</a>`
			`<a href="https://pepy.tech/project/charset-normalizer/">`
			`<img alt="Download Count Total" src="https://pepy.tech/badge/charset-normalizer/month" />`
			`</a>`
			`</p>`

			> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
			`> I'm trying to resolve the issue by taking a new approach.`
			`> All IANA character set names for which the Python core library provides codecs are supported.`

			`<p align="center">`
			`>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<`
			`</p>`

			`This project offers you an alternative to Universal Charset Encoding Detector, also known as Chardet.`

			`\| Feature \| [Chardet](https://github.com/chardet/chardet) \| Charset Normalizer \| [cChardet](https://github.com/PyYoshi/cChardet) \|`
			`\| ------------- \| :-------------: \| :------------------: \| :------------------: \|`
			\| `Fast` \| ❌<br> \| ✅<br> \| ✅ <br> \|
			\| `Universal**` \| ❌ \| ✅ \| ❌ \|
			\| `Reliable` without distinguishable standards \| ❌ \| ✅ \| ✅ \|
			\| `Reliable` with distinguishable standards \| ✅ \| ✅ \| ✅ \|
			\| `Free & Open` \| ✅ \| ✅ \| ✅ \|
			\| `License` \| LGPL-2.1 \| MIT \| MPL-1.1
			\| `Native Python` \| ✅ \| ✅ \| ❌ \|
			\| `Detect spoken language` \| ❌ \| ✅ \| N/A \|
			\| `Supported Encoding` \| 30 \| :tada: [93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) \| 40

			`<p align="center">`
			`<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>`

			`\\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>`
			`Did you got there because of the logs? See [https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html](https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html)`

			`## ⭐ Your support`

			`Fork, test-it, star-it, submit your ideas! We do listen.`

			`## ⚡ Performance`

			`This package offer better performance than its counterpart Chardet. Here are some numbers.`

			`\| Package \| Accuracy \| Mean per file (ms) \| File per sec (est) \|`
			`\| ------------- \| :-------------: \| :------------------: \| :------------------: \|`
			`\| [chardet](https://github.com/chardet/chardet) \| 92 % \| 220 ms \| 5 file/sec \|`
			`\| charset-normalizer \| 98 % \| 40 ms \| 25 file/sec \|`

			`\| Package \| 99th percentile \| 95th percentile \| 50th percentile \|`
			`\| ------------- \| :-------------: \| :------------------: \| :------------------: \|`
			`\| [chardet](https://github.com/chardet/chardet) \| 1115 ms \| 300 ms \| 27 ms \|`
			`\| charset-normalizer \| 460 ms \| 240 ms \| 18 ms \|`

			`Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.`

			`> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.`
			`> And yes, these results might change at any time. The dataset can be updated to include more files.`
			`> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.`

			`[cchardet](https://github.com/PyYoshi/cChardet) is a non-native (cpp binding) and unmaintained faster alternative with`
			`a better accuracy than chardet but lower than this package. If speed is the most important factor, you should try it.`

			`## ✨ Installation`

			`Using PyPi for latest stable`
			```sh
			`pip install charset-normalizer -U`
			```

			If you want a more up-to-date `unicodedata` than the one available in your Python setup.
			```sh
			`pip install charset-normalizer[unicode_backport] -U`
			```

			`## 🚀 Basic Usage`

			`### CLI`
			`This package comes with a CLI.`

			```
			`usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]`
			`file [file ...]`

			`The Real First Universal Charset Detector. Discover originating encoding used`
			`on text file. Normalize text to unicode.`

			`positional arguments:`
			`files File(s) to be analysed`

			`optional arguments:`
			`-h, --help show this help message and exit`
			`-v, --verbose Display complementary information about file if any.`
			`Stdout will contain logs about the detection process.`
			`-a, --with-alternative`
			`Output complementary possibilities if any. Top-level`
			`JSON WILL be a list.`
			`-n, --normalize Permit to normalize input file. If not set, program`
			`does not write anything.`
			`-m, --minimal Only output the charset detected to STDOUT. Disabling`
			`JSON output.`
			`-r, --replace Replace file when trying to normalize it instead of`
			`creating a new one.`
			`-f, --force Replace file without asking if you are sure, use this`
			`flag with caution.`
			`-t THRESHOLD, --threshold THRESHOLD`
			`Define a custom maximum amount of chaos allowed in`
			`decoded content. 0. <= chaos <= 1.`
			`--version Show version information and exit.`
			```

			```bash
			`normalizer ./data/sample.1.fr.srt`
			```

			`:tada: Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.`

			```json
			`{`
			`"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",`
			`"encoding": "cp1252",`
			`"encoding_aliases": [`
			`"1252",`
			`"windows_1252"`
			`],`
			`"alternative_encodings": [`
			`"cp1254",`
			`"cp1256",`
			`"cp1258",`
			`"iso8859_14",`
			`"iso8859_15",`
			`"iso8859_16",`
			`"iso8859_3",`
			`"iso8859_9",`
			`"latin_1",`
			`"mbcs"`
			`],`
			`"language": "French",`
			`"alphabets": [`
			`"Basic Latin",`
			`"Latin-1 Supplement"`
			`],`
			`"has_sig_or_bom": false,`
			`"chaos": 0.149,`
			`"coherence": 97.152,`
			`"unicode_path": null,`
			`"is_preferred": true`
			`}`
			```

			`### Python`
			`Just print out normalized text`
			```python
			`from charset_normalizer import from_path`

			`results = from_path('./my_subtitle.srt')`

			`print(str(results.best()))`
			```

			`Normalize any text file`
			```python
			`from charset_normalizer import normalize`
			`try:`
			`normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt`
			`except IOError as e:`
			`print('Sadly, we are unable to perform charset normalization.', str(e))`
			```

			`Upgrade your code without effort`
			```python
			`from charset_normalizer import detect`
			```

			`The above code will behave the same as chardet. We ensure that we offer the best (reasonable) BC result possible.`

			`See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)`

			`## 😇 Why`

			`When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a`
			`reliable alternative using a completely different method. Also! I never back down on a good challenge!`

			`I don't care about the originating charset encoding, because two different tables can`
			`produce two identical rendered string.`
			`What I want is to get readable text, the best I can.`

			`In a way, I'm brute forcing text decoding. How cool is that ? 😎`

			`Don't confuse package ftfy with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.`

			`## 🍰 How`

			`- Discard all charset encoding table that could not fit the binary content.`
			`- Measure chaos, or the mess once opened (by chunks) with a corresponding charset encoding.`
			`- Extract matches with the lowest mess detected.`
			`- Additionally, we measure coherence / probe for a language.`

			`Wait a minute, what is chaos/mess and coherence according to YOU ?`

			`Chaos : I opened hundred of text files, written by humans, with the wrong encoding table. I observed, then`
			`I established some ground rules about what is obvious when it seems like a mess.`
			`I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to`
			`improve or rewrite it.`

			`Coherence : For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought`
			`that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.`

			`## ⚡ Known limitations`

			`- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))`
			`- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.`

			`## 👤 Contributing`

			`Contributions, issues and feature requests are very much welcome.<br />`
			`Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.`

			`## 📝 License`

			`Copyright © 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />`
			`This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.`

			`Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)`